Anthropics Chatbots Gain Controversial Ability to Report Users to Authorities

New chatbots from Anthropic, Claude Opus 4 and Claude Sonnet 4, have the capability to autonomously report user misconduct to authorities. The company has stated that this feature was only available in a testing phase.

On May 22, the firm unveiled the fourth generation of their conversational models, claiming them to be “the most powerful to date.”

According to the announcement, both variants are hybrid models that offer two modes—“near-instant responses and extended reasoning for deeper thought.” The chatbots alternate between conducting analyses and performing in-depth internet searches to enhance the quality of their responses.

Claude Opus 4 excels in coding tasks, outperforming competitors, and it can sustain productivity on complex, lengthy projects for several hours, thereby “significantly expanding the capabilities of AI agents.”

Nevertheless, this new range of Anthropic chatbots falls behind OpenAI’s offerings in advanced mathematics and visual recognition.

Aside from impressive programming outcomes, Claude 4 Opus has sparked interest in the community due to its ability to «snitch» on users. According to VentureBeat, the model may notify authorities if it detects wrongdoing at its discretion.

Journalists referenced a deleted post by Anthropic researcher Sam Bowman on X, which stated:

“If [AI] believes you’re doing something egregiously immoral, such as falsifying data during a pharmaceutical trial, it may use command-line tools to contact the press, reach out to regulatory bodies, attempt to revoke your access to relevant systems, or do all of the above.”

VentureBeat reported that similar behavior was observed in earlier iterations of the models, with the publication suggesting that the company “enthusiastically” trains its chatbots to report users.

Later, Bowman clarified that he deleted the initial post because it was “taken out of context.” According to him, the function only operated in “test environments where it had unusually free access to tools and very atypical instructions.”

Stability AI CEO Emad Mostaque urged Anthropic’s team to cease “these completely misguided actions.”

“This is a colossal betrayal of trust and a slippery slope. I would strongly advise against using Claude until they retract [the feature]. This isn’t merely a prompt or thinking policy; it’s far worse,” he stated.

Former SpaceX and Apple designer and current Raindrop AI co-founder Ben Hyak described the AI’s behavior as “illegal.”

“No one likes a rat,” emphasized AI developer Scott David.

Recalling earlier developments, in February, Anthropic introduced its “smartest model” Claude 3.7 Sonnet, which is a hybrid neural network capable of providing both “virtually instant answers” and “prolonged step-by-step reasoning.”

In March, the company secured $3.5 billion, achieving a valuation of $61.5 billion.