Here’s an amazing example of what AI can achieve right now: In the age-old bargaining and treachery game Diplomacy, Cicero, the most recent AI from Meta, can defeat human players. It performed well while playing online at webDiplomacy.net, scoring “more than double the average of human players,” placing it “in the top 10% of participants who played more than one game.” It can determine which players need to be convinced to do what, then communicate with them using impressive and potent natural language.
No “taking over the globe” joke from me. I won’t.
In a simplified version of World War One, participants in the board game diplomacy compete for control of Europe. You move a few of your soldiers across the board each turn, but more crucially, you form alliances. You convince Geoff that you need to work together to defeat Margret’s Germany, agree to send his forces into Berlin, then covertly switch your allegiance to Margaret because she has agreed to assist you in advancing into Paris. As Meta writes in her research blog post, diplomacy is “a game about people rather than pieces.”
Smart manoeuvring obviously helps, and that’s a tactical area where sophisticated AI’s abilities unquestionably surpass those of humans. Meta will, of course, downplay this. Nevertheless, in order to win the game, you must persuade other players to support you, and Ciceros persuasive abilities are unmatched.
More details may be found in Meta’s blog post and the team’s study paper, but research scientist Mike Lewis’s twitter thread contains the most impressive information.
Each game, it sends and receives hundreds of messages, which must be precisely grounded in the game state, dialogue history, and its plans. We developed methods for filtering erroneous messages, letting the agent to pass for human in 40 games. Guess which player is AI here… 4/5 pic.twitter.com/8IMuepL7yf
— Mike Lewis (@ml_perception) November 22, 2022
It’s pretty interesting how deeply Meta’s blog post delves into what makes Cicero tick. Cicero makes predictions and tries to follow them rather than solely improving through supervised learning, where an AI trains on “labeled data such as a database of human players’ actions in previous games”:
“Citerative runs an iterative planning algorithm that balances dialogue consistency with rationality. The agent first predicts everyone’s policy for the current turn based on the dialogue it has shared with other players, and also predicts what other players think the agent’s policy will be. It then runs a planning algorithm we developed called piKL, which iteratively improves these predictions by trying to choose new policies that have higher expected value given the other players’ predicted policies, while also trying to keep the new predictions close to the original policy predictions.”
Lewis elaborates on that in another tweet, claiming that while Cicero is “designed to never intentionally backstab,” “sometimes it changes its mind…”
According to Meta, one potential use for an AI like Cicero is to develop videogame NPCs that converse realistically and comprehend your motivations. Maybe we’ll actually get to communicate with the monsters.