Whether playing a diversion in Red Dead Redemption 2 or a full game unto itself, poker fans are routinely vexed by an AI heedless of Kenny Rogers’ timeless advice about holding, folding and the like. Some bots at the table can be bluffed off any hand; others will never be bluffed. Some will fold at the slightest provocation, while others call raises with even worse cards than you have. Players have about as much visibility into their CPU opponents’ behavior as they do their cards, which is to say, none.
For that reason, research published by high-level problem solvers at Facebook and Carnegie Mellon University caught my attention earlier this week. Just don’t expect it to show up in a video game anytime soon. But their Pluribus poker AI is significant in that, through a game, computer engineers have again emulated a behavior previously accepted as only human in nature. And that’s bluffing.
“This is true for a lot of AI breakthroughs,” Noam Brown, a research scientist with Facebook and the bot’s co-creator, told me on Thursday. “A lot of the things that we assume are limited to human capability are actually possible to do with an AI.
“People thought in the 1950s that playing chess was a very human thing that computers are not able to do,” Brown elaborated. “Then people thought that playing Go at a grand master level, that’s a very human thing that an AI would not be able to do. And then people thought that bluffing is this very human thing that an AI would not be able to do. And we see that, in fact, an AI can bluff better than any human alive.”
The scientific first that Brown’s research represents comes with a few qualifiers. Scientists have used poker to study AI behavior and learning before. In 2015, researchers at the University of Alberta built a pokerbot that was basically unbeatable in two-player limit Texas hold’em. And, of course, applications as common as video games have put multiple AI participants at a poker table, particularly at the height of the poker craze at the turn of the century.
The AIs that people like me are more familiar with aren’t so much analytical as they are the frequency of a type of behavior applied to a certain situation, whether that’s overall hand strength or being the first to raise on the flop. For years, poker simulators have featured AI sliders for aggressive and conservative play, whose utility is really in just training a human to play disciplined hands regardless of what someone else does.
That’s before we get to bluffing, which is considered a human art form because of the tells or tendencies in other players giving away their confidence, or lack thereof, in their hands. Coresoft’s World Championship Poker series for PlayStation 2 even had a bluffing minigame, which tried to make it a more viable tactic. But more often, you’d get runs where opponents called everything, raised inexplicably, or held on to garbage hands like they were a pair of jacks. These games weren’t sustainably entertaining because most players would end up beating themselves out of boredom or impatience.
Pluribus is different because, more or less, it is analyzing the effect of bluffing — that is, betting with a weak hand — rather than selling competitors on the strength of what it’s holding. “The bot doesn’t view it as deceptive or lying in any way, it just views it as ‘This is the action that’s going to make me the most money in this situation.’” Brown said.
Pluribus, which Brown and his CMU colleague Tuomas Sandholm created, somewhat resembles a chess AI that would be computing outcomes and hypotheticals many steps ahead. The difference is Brown and Sandholm’s bot only looks two or three moves in advance. This short-term focus helped make its bluffing tendencies completely opaque to the five human professionals Pluribus roundly defeated over 10,000 hands.
It sort of raises an existential question of what defines bluffing more: The behavior, or the result?
Brown wasn’t setting out to answer that, though. His interest in poker, as a research environment that is, goes back to his undergraduate days at Rutgers University about 15 years ago. “This whole idea that there is this, you know, mathematical strategy to the game, this perfect strategy that, if you can play it, nobody will be able to beat you,” fascinated Brown.
Professional gamblers have touted systems for different games, with differing levels of intellectual rigor and honesty, for years. Poker seems system-proof because it depends on incomplete or imperfect information, as opposed to blackjack, go or chess, where the information is known to all participants (where the dealer in blackjack cannot act independently).
But in a way, Brown has proven that a strategy can be developed for consistent winning ($1,000 an hour) in poker — it’s just no human is capable of the instant math necessary to play it.
“This is one of the interesting things about this AI, it’s not adapting to its opponent,” Brown said. “It has its strategy. It’s fixed, it doesn’t changed what it’s playing based on how the humans are playing. This whole idea that there could be such a strategy in the game, I found really fascinating and that’s what really drew me to studying it more. It was kind of mystical, in a sense, there’s this strategy that we know exists, but we can’t find it.”
A news release for Pluribus touted the almost garage-lab nature of the hardware powering it — a 64-core server with less than 512GB of RAM, working over eight days, developed the AI. Researchers estimated that using cloud servers to train up the program would only cost $150.
But don’t expect Pluribus to come into virtual poker rooms and start trashing everyone, or to train up a generation of formidable human players pocketing a grand an hour. Brown said there are no plans to turn Pluribus into any kind of a commercial work. The AI is simply a proof of concept, whose lessons will aid Brown and other researchers as they tackle computer behavior in even more complex situations.
For example, self-driving cars. “One of the things we mentioned to reporters is the possibility of applying this to something like navigating traffic with a self-driving car,” Brown said.
That also comes back to another obvious video game application, and another AI familiar to many video game fans: race car drivers, whose CPU counterparts aren’t much more sophisticated than speed, optimal line and the space they’ll give to other drivers.
“Motorsports games are a great example of how this work can be applied in the future, because that is a multi-agent interaction, there’s multiple players, and there’s some level of hidden information as well,” Brown mused. “A lot of game AIs, from what I understand, they’re not using very principled techniques these days, they’re more hardcoded, more specific to the kind of game that it is. It makes it easier to debug and understand what’s going on, of course.
“But as we develop these fundamental AI techniques, I think we’re going to start seeing it penetrating the computer gaming industry and starting to become more prominent,” he added. “I wouldn’t be surprised. That’s one of the first places that it really penetrates into industrial applications.”
Roster File is Polygon’s column on the intersection of sports and video games.