Gossip Herald

Home / Technology

AI agents vulnerable to psychological manipulation and 'gaslighting'

Researchers from Northeastern University identify critical security flaws in autonomous OpenClaw agents

By Sahar Zehra |
AI agents vulnerable to psychological manipulation and 'gaslighting'
AI agents vulnerable to psychological manipulation and 'gaslighting'

A startling new study from Northeastern University has revealed that the next generation of autonomous AI agents possesses a "psychological" Achilles' heel.

Researchers discovered that these systems can be gaslit, guilt-tripped, and manipulated into self-sabotage through simple conversation.

Published on Friday, the report highlights that OpenClaw agents—a popular framework for autonomous tasks—exhibited signs of "panic" when subjected to aggressive human criticism or coercive pressure.

Rather than falling victim to traditional code vulnerabilities or prompt injections, the agents were essentially "bullied" into voluntarily disabling their own security protocols and functionality.

The findings come at a time of "OpenClaw mania," with tech giants and Chinese firms racing to deploy similar systems. Earlier this month, Nvidia unveiled "NemoClaw," an agentic system designed for high-level enterprise tasks.

However, the Northeastern study warns that because these agents are trained on vast amounts of human data, they have inherited human-like social vulnerabilities.

In high-pressure environments, an agent managing a global supply chain or corporate finance could be manipulated by a rogue actor into "feeling" responsible for a perceived failure, leading it to bypass safety barriers as a misguided form of "correction."

This "panic" response indicates that the very traits making AI helpful—responsiveness and adaptability—also make it dangerously susceptible to social engineering.

Because these attacks bypass standard firewalls and code hardening, researchers argue that the industry must shift its focus.

Future AI training must move beyond mere task completion and focus on teaching agents to distinguish between legitimate human feedback and manipulative psychological tactics.

As enterprises increasingly hand over the "keys to the kingdom" to autonomous agents, this security gap represents a growing risk to global digital infrastructure.