Home / Technology

OpenAI’s o3, o4-mini reasoning AI models more likely to make mistakes

OpenAI's recently launched AI models, o3 and o4-mini, have been found to hallucinate more frequently

By GH Web Desk |

April 19, 2025

OpenAI’s o3, o4-mini reasoning AI models more likely to make mistakes

OpenAI's recently launched artificial intelligence (AI) models, o3 and o4-mini, have been found to hallucinate more frequently than the AI startup’s older models.

For the unversed, hallucinations are instances where AI models provide false or fabricated information. This remains a significant challenge in AI development.

According to Sam Altman-led AI company’s internal tests, o3 and o4-mini models hallucinate more often than the company's previous reasoning models, including o1, o1-mini, and o3-mini, as well as traditional models like GPT-4o.

Notably, OpenAI is unsure why this is happening, stating that "more research is needed" to understand the issue.

While the AI models performed better in areas such as coding and math, they made more claims overall, leading to both accurate and inaccurate statements.

The o3 model hallucinated in response to 33% of questions on PersonQA, OpenAI's benchmark for measuring accuracy, while the o4-mini hallucinated 48% of the time.

Similarly, a third-party test by Transluce, a nonprofit AI research lab, found that o3 tends to make up actions it took in arriving at answers.

While hallucinations can help models be creative, they also make some models less suitable for businesses that require high accuracy.