Home / Technology

Scientists warn of 'sombre reality' as AI fails critical accuracy tests

AI’s true scientific reliability is closer to a 'low D grade' than expert level

By GH Web Desk |

March 18, 2026

Scientists warn of &apos;sombre reality&apos; as AI fails critical accuracy tests — Scientists warn of 'sombre reality' as AI fails critical accuracy tests

A recent study led by Washington State University has issued a stark warning to the global scientific community regarding the reliability of artificial intelligence in academic research.

Professor Mesut Cicek and his team tested ChatGPT versions 3.5 and 5 mini against 700 scientific hypotheses to evaluate their accuracy and consistency.

Although the AI initially appeared competent, a deeper analysis published in the Rutgers Business Review revealed that its true reliability was only marginally better than random chance.

The researchers identified a significant "inconsistency problem," noting that the software often provided different answers to the same question when asked repeatedly.

"We are not just talking about accuracy, we're talking about inconsistency, because if you ask the same question again and again, you come up with different answers," Cicek observed.

Furthermore, the AI struggled immensely with identifying false statements, correctly labelling them only 16.4 per cent of the time.

The report highlights a phenomenon known as the "fluency trap," where the smooth and convincing nature of AI-generated language creates an "illusion of understanding."

This leads users to mistakenly accept polished prose as a verified fact. Professor Cicek maintains that AI lacks a conceptual "brain," relying on memorised patterns rather than genuine reasoning.

Advocating for a culture of rigorous verification, Cicek urged researchers to remain cautious. "Always be skeptical. I'm not against AI. I'm using it. But you need to be very careful," he stated.

The findings suggest that despite its rapid integration into academia, AI remains a tool that requires human oversight and healthy scepticism.