Gossip Herald

Home / Technology

AI consulting tools face real-world hurdles as Mercor CEO backs their future

Mercor CEO Brendan Foody hinted at a potential IPO

By GH Web Desk |
AI consulting tools face real-world hurdles as Mercor CEO backs their future
AI consulting tools face real-world hurdles as Mercor CEO backs their future

Recent findings indicate that AI agents aren't yet ready to fully replace human consultants.

Mercor, an AI training leader, evaluated the effectiveness of advanced AI models as agents handling real-life consulting, finance, and legal activities.

These models frequently underperformed, although Mercor's CEO Brendan Foody shared that this is only part of the situation.

The consultancy assignments in Mercor's APEX-Agents benchmark were structured to replicate genuine management consulting scenarios, relying on insights from professionals and advisors at firms like McKinsey, BCG, Deloitte, Accenture, and EY.

Analysing various task categories, the AI agents achieved less than a 25% success rate on their first attempt. 

Across eight tries, their success rate rose to just 40%. In management consulting tasks, OpenAI's GPT 5.2 initially performed best, succeeding nearly 23% of the time on the first try. 

Recently launched Anthropic's Opus 4.6 displayed even better performance, accomplishing around 33% of the tasks.

Despite numerous incomplete tasks, Foody highlighted that GPT 3's success rate was only 3% compared to 23% for GPT 5.2. 

Anthropic's model advanced from 13% to 33% in consultancy tasks within months. Foody anticipates these models approaching a 50% success rate by year-end.

"These are among the most challenging tasks in the business world, costing millions to accomplish by consulting firms, and the models are making remarkable progress," Foody expressed.

AI has already begun reshaping the consulting field, affecting hiring methods and revenue generation. 

However, the potential for AI replacing human consultants grows as technologies advance.

McKinsey's leader Bob Sternfels noted that the reputed consulting firm employs 60,000 people, 25,000 being AI agents.

Sternfels expressed that this marks the first occasion in McKinsey's timeline where expansion is possible without increasing staff numbers.

The advanced models that Mercor examined included options from OpenAI, Google, and Anthropic, among others.

One specific consultancy task required the AI agents to "evaluate category consumption patterns and market reach utilising the Category Penetration Score method for PureLife's strategic portfolio," needing distinct outcomes.

Mercor noticed that AI agents excelled in research and were fairly adept at data interpretation, according to Foody.

Their consistent challenges arose with more extended tasks—the longer a task took or the more complex it was, the likelier the model could falter.