The benchmarking of Meta's new AI model is somewhat misleading. Meta released a new flagship AI model named Maverick on Saturday and achieved second place in the LM Arena test. But Meta in the LM Aren...
Meta's Benchmarking of Its New AI Models Is a Bit Misleading
Meta released a new flagship AI model named Maverick on Saturday and ranked second in the LM Arena test. However, the version of Maverick deployed by Meta on the LM Arena is inconsistent with the version widely available to developers. As pointed out by multiple AI researchers on X, Meta clearly mentioned in its announcement that the Maverick participating in the LM Arena test is an "experimental chat version". Meanwhile, a chart on the official Llama website shows that Meta's LM Arena test was conducted using "Llama 4 Maverick optimized for conversation". Optimizing a model for a certain benchmark and then releasing the "ordinary" version of the model makes it difficult for developers to accurately predict the actual performance of the model in specific scenarios.
—— Techcrunch
via Windvane Reference Express - Telegram Channel