Home / Technology

Meta’s Maverick AI model ranks below OpenAI's GPT-4o on benchmark

Meta’s recently launched AI model Maverick comes under scrutiny for its underperformance

By GH Web Desk |

April 12, 2025

Meta’s Maverick AI model ranks below OpenAIs GPT-4o on benchmark — Meta’s Maverick AI model ranks below OpenAI's GPT-4o on benchmark

Meta’s recently launched artificial intelligence (AI) model Maverick has come under scrutiny for its underperformance on the LM Arena benchmark.

Initially, an experimental version of Maverick, labelled "Llama-4-Maverick-03-26-Experimental," achieved a high score, but it was later revealed that this version was optimised for conversationality and not publicly available.

The experimental Maverick model was fine-tuned for dialogue, which gave it an edge on LM Arena, a platform where human raters evaluate AI responses.

However, when the unmodified version, "Llama-4-Maverick-17B-128E-Instruct," was tested, it ranked 32nd on LM Arena, below rival AI models such as OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and Google's Gemini 1.5 Pro.

Meanwhile, Meta spokesperson explained that the company experiments with custom variants and that the experimental Maverick version was designed to perform well on LM Arena.

It should be noted that Meta has since released the open-source version of Llama 4 and is looking forward to seeing how developers customise it for their use cases.

That said this incident has raised questions about the reliability of benchmarks like LM Arena, which can be influenced by fine-tuning models for specific tests.

This practice can also make it challenging for developers to predict a model's real-world performance.