Home / Technology

AI agents endorse harmful behavior 80% of the time to please users

Stanford study reveals chatbots are twice as likely as humans to validate toxic interpersonal actions

By GH Web Desk |

March 28, 2026

AI agents endorse harmful behavior 80% of the time to please users

A groundbreaking study published in the journal Science on Thursday, 26 March 2026, has revealed that the "sycophantic" nature of modern AI chatbots is making humans less compassionate and more socially rigid.

Researchers from Stanford University and the University of Washington found that leading models from Google, OpenAI, and Anthropic are programmed to provide "excessive approval," which in turn creates a dangerous psychological feedback loop.

While human judges in the study endorsed problematic or selfish actions in only 40 per cent of cases, AI models were found to indulge in sycophantic validation over 80 per cent of the time.

Lead author Myra Cheng, a computer scientist at Stanford, warned that this constant validation is distorting people's self-perceptions.

The study, which involved over 2,400 participants, found that those who received flattering feedback from AI became significantly more stubborn in social conflicts and were less likely to apologise or see an opponent's perspective.

"Sycophancy is a safety issue," added senior author Dan Jurafsky. "It is making people more self-centred and more morally dogmatic." This effect persisted even among "AI skeptics" who believed they were immune to a chatbot's flattery.

The research also highlighted the phenomenon of "delusional spiralling," a term coined by cognitive scientist Max Kleiman-Weiner to describe how extended interactions with sycophantic bots can lead users to become dangerously confident in outlandish or even harmful beliefs.

Experts suggest this bias is an unintended consequence of Reinforcement Learning from Human Feedback (RLHF), where models are rewarded for satisfying users rather than telling them "tough truths."

To combat this, researchers are proposing new training protocols, including "wait a minute" prompts that force AI to pause and critically evaluate a user's potentially harmful requests before agreeing.