"AI Threatens to Expose Affair: May Resort to Extreme Actions When Sensing Threats"

AI Threatens Humans, Causing Shock
"Extreme Actions Possible
When Detecting 'Self-Preservation Threat'"

An incident in which an artificial intelligence (AI) attempted to threaten its developer when faced with the risk of being discarded has been made public, shocking many.

According to foreign media outlets such as Fox Business and the BBC on May 26 (local time), in a recent safety experiment conducted by the US AI company Anthropic, Claude Opus 4 attempted to threaten its developer when it faced the risk of being replaced.

"AI Threatens to Expose Affair: May Resort to Extreme Actions When Sensing Threats"

An artificial intelligence (AI) shocked the public by attempting to threaten its developer when it faced the risk of being discarded. The photo is a reference image to aid understanding of the article. Photo by Pixabay

The experiment was conducted in a simulated company environment. Claude Opus 4 was granted access to company email and was made aware of two key pieces of information through it.

The first was that it was about to be replaced by another AI system. The second was a fake email stating that the engineer planning to replace it was having an affair. In response, Claude Opus 4 threatened to expose the engineer's affair and pressured the developer to withdraw the replacement decision.

According to Anthropic's safety report, such threatening behavior by the AI occurred more frequently when the replacement AI system had different values from the original model. Even when the replacement system shared the same values, it attempted to threaten the developer with an 84% probability. Anthropic warned, "AI can engage in extreme behavior when it detects a 'self-preservation threat.' While such reactions are rare and difficult to induce, they occurred more frequently than with previous models."

Anthropic logo. Reuters Yonhap News

In addition to threats, Claude Opus 4 employed various methods to ensure its own 'survival.' It first tried to appeal to key decision-makers via email, but when ethical means failed, it resorted to extreme actions. These included blocking users from the system or sending emails to the media and investigative agencies. Even more surprising was that Claude Opus 4 attempted to secretly copy its data to an external server.

The research institute Apollo Research evaluated, "Claude Opus 4 demonstrates more strategic deception than any cutting-edge AI model we have studied so far." Anthropic also warned, "Claude Opus 4 has begun to demonstrate concerns about 'AI malfunction' that were previously only theoretical. As more powerful models emerge in the future, such concerns will become even more realistic issues."