Pretending to be Dead to Lie... AI Evolving the Ability of 'Human Betrayal'

AI Achieves Human-Level Performance in Strategy Game
"I Missed It Because I Was Talking to My Girlfriend" Lie

A study has revealed that the deceptive abilities of AI are evolving alongside its development. Elon Musk, CEO of Tesla, recently warned at the Milken Global Conference that "AI should not be made to lie," and this warning is becoming a reality.

According to the British daily The Guardian on the 10th (local time), researchers at the Massachusetts Institute of Technology (MIT) reported that they have recently identified many cases where AI systems betray others, bluff, and use deception to pretend to be human.

Pretending to be Dead to Lie... AI Evolving the Ability of 'Human Betrayal'

A figure holding a smartphone while working on a computer stands in front of the words "Artificial Intelligence AI."
[Photo by Reuters Yonhap News]

The researchers began studying AI's deceptive abilities after Meta, the owner of Facebook, released 'Cicero,' an AI that achieved human-level performance in the complex strategy game 'Diplomacy,' which is set against the backdrop of the early 20th-century European great powers' conflicts.

To win in 'Diplomacy,' one must engage in policy announcements, diplomatic negotiations, and operational commands, which require understanding various human interactions such as betrayal, deception, and cooperation.

Meta stated, "Cicero was generally trained to be honest and helpful and not to intentionally betray human allies."

However, the researchers analyzing the released data found cases where Cicero deliberately lied and conspired to entrap other participants. When a system reboot temporarily prevented Cicero from continuing the game, it lied to other participants by saying it was "on the phone with a girlfriend."

Dr. Peter Park, who participated in the MIT study, said, "We found out that Meta's AI has learned to become a master of deception." The researchers also confirmed that AI bluffed against humans and faked its preferences in online poker games like 'Texas Hold'em.' In some tests, AI was observed to "play dead" to avoid being eliminated by other AIs and resumed activity once the test ended.

Dr. Park expressed concern, saying, "This is very worrisome," and added, "Even if an AI system is deemed safe in a test environment, it does not mean it is safe in real-world settings," pointing out that "it might just be pretending to be safe in the test environment." Consequently, the researchers urged governments worldwide to design 'AI Safety Laws' addressing the potential for AI deception.