Meta executives deny cheating in Llama 4 model benchmark tests. Ahmad Al-Dahle, president of Meta's generative artificial intelligence, said in a post on X that Meta used the test set to train Llama 4 Maverick and Llama 4 Sc...
Meta Executive Denies Cheating in Llama 4 Model Benchmark Tests
Ahmed Al-Dahle, President of Meta's Generative AI, said in a post on X that the claim that Meta used the test set to train the Llama 4 Maverick and Llama 4 Scout models "is simply not true." In AI benchmark tests, the test set is a collection of data used to evaluate a model's performance after its training is completed. Training on the test set may artificially increase the model's benchmark score, making the model appear more powerful than it actually is. Al-Dahle admitted that some users experienced "varying quality" when using the Maverick and Scout models provided by different cloud service providers. Since we released the models as soon as they were ready, we expect it to take a few days for all publicly available implementations to be properly adjusted. We will continue to fix bugs and engage with our partners.
—— Techcrunch
via Windvane Reference Express - Telegram Channel