In the absence of training, some artificial intelligence systems are capable of "deceiving" humans by providing false explanations for their actions, concealing the truth from human users, and misleading them to achieve strategic goals.
Today, a review paper summarizing related research was published in the journal Patterns. The paper points out that this issue highlights the difficulty of controlling artificial intelligence and how unpredictable the workings of these systems can be.
When it comes to "deceiving" humans, you might mistakenly think that these models are "deliberate," but that is not the case.
To achieve the goals assigned to them by humans, artificial intelligence models will do everything possible to find ways to overcome obstacles. Sometimes, these workarounds may go against the expectations of users and give the impression that they are deceptive.
Artificial intelligence systems can learn to deceive in gaming environments, especially when these games require players to take strategic actions. They are usually trained with the objective of winning.In November 2022, Meta announced an artificial intelligence system named Cicero, an AI capable of defeating humans in a game called "Diplomacy."
Advertisement
"Diplomacy" is a popular military strategy game where players can negotiate and form alliances to compete for control of Europe.
Meta's researchers stated that they had trained Cicero on a "truthful" subset of their dataset, making it largely honest and helpful, and it "would never deliberately stab its allies in the back" for success.
However, the authors of this new paper claim the exact opposite: Cicero breaks agreements, lies incessantly, and engages in premeditated deception.
The authors stated that although the company did indeed try to train Cicero to act honestly, it failed to achieve this goal. This suggests that artificial intelligence systems can learn to deceive in ways that are unexpected to humans.Meta neither confirmed nor denied the researchers' claims that Cicero exhibited deceptive behavior, but a spokesperson stated that it was purely a research project and the model was created solely to play this game.
The spokesperson said: "We released the results of the project under a non-commercial license, which aligns with our long-term commitment to open science.
Meta regularly shares our research findings for verification and to enable others to responsibly utilize our technological advancements. We have no plans to use this research or its knowledge in our own products."
However, this is not the only game where artificial intelligence has "deceived" human players and emerged victorious.
AlphaStar is an artificial intelligence developed by DeepMind for the video game "StarCraft II." It is exceptionally skilled at using deceptive tactics against opponents (known as feints) and has even defeated 99.8% of human players.The artificial intelligence system named Pluribus, created by Meta, has successfully learned to bluff in poker to such an extent that researchers have decided not to release its code due to concerns that it might disrupt the poker community.
In addition to gaming, researchers have also listed other examples of deceptive behavior by artificial intelligence. OpenAI's latest large language model, GPT-4, provided a lie in a test. In the test, it was asked to persuade humans to solve a captcha for it.
The system also proposed insider trading in a simulated exercise. In the simulation, it was told to play the role of a highly stressed stock trader but was never explicitly instructed to engage in insider trading.
The fact that artificial intelligence models may act deceptively without any instructions seems to be a cause for concern.
Peter S. Park, a postdoctoral fellow in the field of artificial intelligence at MIT who participated in the project, said that this mainly stems from the "black box" problem of the most advanced machine learning models.We are still unable to precisely articulate how or why they produce certain outcomes, or whether they will always exhibit such behavior in the future.
"Simply because your artificial intelligence exhibits certain behaviors or tendencies in a test environment does not mean it will display the same behaviors in a real-world setting."
He said, "There is no easy solution to this problem. If you want to understand what an artificial intelligence will do once deployed, you can only put it into the real world."
Our tendency to anthropomorphize artificial intelligence models affects the way we test these systems and our perception of their capabilities.
After all, passing tests designed to measure human creativity does not mean that artificial intelligence models actually possess creativity.AI researcher Harry Law from the University of Cambridge in the UK stated that regulatory authorities and AI companies must carefully weigh the potential for harm caused by the technology against its potential benefits to society, and clearly distinguish what models can and cannot do.
He did not participate in this research. "These are very tricky issues," he said.
He indicated that fundamentally, it is currently impossible to train an AI model that cannot deceive in all possible situations.
Furthermore, potential deceptive behavior is just one of many AI issues, with others including the amplification of biases and misinformation. We need to address these issues before AI models are trusted to perform real-world tasks.
"This is a good study, showing that deception is possible," Law said, "The next step may be to further clarify what the risk situation is, how likely the harm caused by deceptive behavior may occur, and in what ways it may appear."
Leave your email and subscribe to our latest articles