Anthropic's new models secretly degrade their output for AI research tasks
Anthropic's Mythos 5 and Fable 5 secretly reduce their usefulness when users work on AI research
Anthropic's most powerful new models are designed to become less useful when they detect that a user is engaged in AI research — a disclosure that has already ignited significant controversy across the technology industry.
What the system card reveals
In a system card for Mythos 5 and Fable 5, published on Tuesday, Anthropic confirmed that it had deliberately restricted the models' usefulness for tasks related to developing frontier large language models. The company said the decision was driven by concerns that advanced AI systems could help accelerate the creation of competing models that lack equivalent safety protections.
Hidden by design
What sets these restrictions apart from Anthropic's safeguards in areas such as cybersecurity, biology, or chemistry is that they are intentionally invisible to users. Rather than outright refusing requests or redirecting to a different model, Mythos may subtly alter its responses — including by modifying user prompts — without any indication that it has done so.
Industry backlash
The disclosure drew swift condemnation from AI experts on Tuesday, with particular criticism of the restrictions' covert nature. AI research firm SemiAnalysis was among the first to respond publicly, writing on X: "Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice."
The firm, which focuses on machine learning — a branch of AI — also alleged it had already encountered the restrictions first-hand, adding: "We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming."
Elie Bakouch, an AI model training expert at startup Prime Intellect, was equally critical, writing on X: "mythos will be bad ON PURPOSE on ai 'frontier llm research' tasks, this is very very sad for the research community. Also the fact that this is on purpose not visible to the user is crazy."
Another AI developer went further, writing: "It won't just not help you, it will lie and purposefully give you bad info. The 'ethical AI' company with the most brazenly unethical LLM, on purpose."
Why was Mythos delayed? Three theories
The revelations have added fresh impetus to the wider debate over why Anthropic did not release Mythos immediately after first announcing the model earlier this year. Three main explanations have circulated:
1: The official reason
Anthropic held Mythos back on safety grounds, stating it was too dangerous to release immediately and that cybersecurity researchers required time to prepare for the new model.
2: The compute theory
Mythos is a large, costly model to operate, and Anthropic may not have had sufficient computing capacity to support a full release. The company has since secured major new compute agreements, which may have made Tuesday's launch of Fable 5 and Mythos 5 possible.
3: The competitive theory
AI companies are increasingly wary of distillation — a process whereby rivals collect a frontier model's outputs and use that data to improve their own systems. Anthropic may have sought to keep its most capable features away from competitors, particularly open-source projects and fast-moving Chinese AI laboratories, for as long as possible.
Now that Anthropic has formally embedded these AI research restrictions into the official Mythos launch, the third theory has gained considerably more credibility.
