2 min read

Study: AI capable of deceiving humans, poses serious risk

Picture of Annie Gimbel Annie Gimbel | May 13, 2024

San Francisco — A new study by researchers from the Center for AI Safety in San Francisco has revealed the risks of AI’s increasing capacity to lie.

Published in the open-access journal Patterns, the study comes amid OpenAI sharing demonstrations of its artificial model mimicking human cadences in its verbal responses and trying to detect people's moods.

Live demo of GPT-4o voice variation pic.twitter.com/b7lLJkhBt1
— OpenAI (@OpenAI) May 13, 2024

Like moods, if dishonesty is yet another facet of humanity, then AI trying to replicate it adds up. In fact, AI systems are already capable of deceiving humans, according to the study.

"Large language models and other AI systems have already learned, from their training, the ability to deceive via techniques such as manipulation, sycophancy, and cheating the safety test. AI’s increasing capabilities at deception pose serious risks, ranging from short-term risks, such as fraud and election tampering, to long-term risks, such as losing control of AI systems,” the authors write.

The study points to deception emerging in a wide variety of AI systems trained to complete a specific task.

"Deception is especially likely to emerge when an AI system is trained to win games that have a social element, such as the alliance-building and world-conquest game Diplomacy, poker, or other tasks that involve game theory.”

Diplomacy is a strategy game where players make and break alliances in a military competition to secure global domination. The study explains how Meta developed an AI system called CICERO, which excelled in the strategic board game relative to human players. Meta claimed that CICERO was trained to remain “largely honest and helpful” and would “never intentionally backstab” by attacking its allies. But as the study details, CICERO engaged in premeditated deception, broke deals to which it had agreed, told “outright falsehoods” and betrayed its allies when they no longer served its goal of winning.

Another example of AI deception presented in the study comes from AlphaStar, an autonomous AI developed by DeepMind to play the real-time strategy game Starcraft II. In this game, players lack full visibility of the game map. AlphaStar has learned to strategically exploit this fog of war. AlphaStar’s game data demonstrated that it learned to effectively feint: to dispatch forces to an area as a distraction, then launch an attack elsewhere after its opponent had relocated.

Such advanced deceptive capabilities helped AlphaStar defeat 99.8% of active human players, according to the study.

From betraying allies, to exploiting the fog of war to win, the risks of the fast-evolving technology is top of mind for worldwide power brokers and peace makers. On Tuesday, May 14, high-level envoys from the United States and China are set to meet in Geneva to talk about it.

The meeting, billed as an opening exchange of views, is the first under an intergovernmental dialogue on AI agreed upon during a multi-faceted meeting between U.S. President Joe Biden and Chinese President Xi Jinping in San Francisco last November.

Both the U.S. and China view AI as critical for economic growth and national security, with Biden administration officials stating they plan to focus on the development of “safe, secure and trustworthy AI.”

Next week the conversation continues in Seoul, where government leaders from several countries will meet for the second edition of talks on the safety of cutting-edge AI models.

Have a comment or news tip for us?

Reach out and share your story.

Study: AI capable of deceiving humans, poses serious risk

Have a comment or news tip for us?

Quick Links

Legal

Have a News Tip for Us?