45 minutes
SDS 569: A.I. For Crushing Humans at Poker and Board Games
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
In this episode, Research Scientist at Meta AI, Dr. Noam Brown, joins us on the absolute cutting-edge of A.I. capabilities today. Press play to hear Noam shed light on his award-winning no-limit poker-playing algorithms, and the real-world implications of his game-playing AI breakthroughs.
About Noam Brown
Noam Brown is a Research Scientist at Meta AI working on multi-agent artificial intelligence and computational game theory. Noam co-created Libratus and Pluribus, the first AIs to defeat top humans in two-player no-limit poker and multiplayer no-limit poker, respectively. He has received the Marvin Minsky Medal for Outstanding Achievements in AI, was named one of MIT Tech Review's 35 Innovators Under 35, and his work on Pluribus was named by Science as one of the top 10 scientific breakthroughs of 2019. Noam received his PhD from Carnegie Mellon University in 2020. Before CMU, Noam worked at the Federal Reserve Board researching algorithmic trading in financial markets.
Overview
Dr. Noam Brown joined Jon Krohn live for a special episode of the SuperDataScience podcast taped at MLConf. Noam, who is currently a research scientist at Meta AI, opened the conversation by elaborating on why a company like Meta invests heavily in AI research. The big picture idea is that "the company is thinking long term" to capitalize on breakthroughs quickly, he says. Formerly known as Facebook AI, this group is packed with many of the greatest minds in machine learning today and produces some of the world's most cutting-edge AI research.
Prior to being at Meta, Noam completed his Ph.D. at Carnegie Mellon in computer science, where he developed AI algorithms (Libratus and Pluribus algorithms) that defeated the top human players of no-limit poker — a remarkable achievement that made the cover of Science Magazine. Surprisingly, both of these algorithms did not use deep learning to achieve their breakthroughs.
Currently, at Meta AI, Noam focuses on the seven-player game Diplomacy, which is very similar to Risk. The focus of the game is on negotiation with the other players, but at the end of that phase all the moves are executed at the same time.
While Noam's long-term goal is to build an AI that could beat top humans at the game, in the short term, his team is focused on the simpler version of the game that removes explicit communication between the players, known as no-press Diplomacy. To emulate having a human involved and obtain a large amount of training data, they use the data from webDiplomacy (where humans play Diplomacy) to learn how humans play this game. From there, they were able to use that training data to model the humans and build a bot that could play well with that human model.
But how is AI research into Diplomacy applicable to real-world problems, you might be wondering? According to Noam, a clear application is self-driving cars and modeling the behavior of humans on the road.
Next, Jon wondered why Noam, who was a successful Ph.D. student, decided to exit academia and work in big tech. While conducting faculty interviews around the United States, Meta offered him the opportunity to start immediately as he entertained the academia route. Ultimately, he found that working in big tech research was a far better opportunity in every single way. And he particularly enjoyed the freedom and the access to resources that he wouldn't have as a professor.
Finally, it was time to field some questions from the audience. Some of these included:
- What are the main barriers to getting AI game theory techniques beyond games to self-driving cars?
- How does AI know when to bluff?
- Is the bot adapting to these players as the game goes on (Libratus & Pluribus)?
- What are Noam’s recommendations for people breaking into poker A.I.?
To hear more from Noam, including his book recommendations and his inspiring expertise in game theory, tune in to the episode.
In this episode you will learn:
- What Meta A.I. is and how it fits into Meta, the company [3:01]
- Noam's award-winning no-limit poker-playing algorithms, Libratus and Pluribus algorithms. [4:33]
- What game theory is and how does Noam integrate it into his models? [8:45]
- The real-world implications of Noam’s game-playing A.I. breakthroughs [25:24]
- Why Noam elected to become a researcher at a big tech firm instead of in academia [27:06]
- The main barriers to getting AI game theory techniques beyond games to self-driving cars [30:16]
- Recommendations for people who want to break into poker AI [37:45]
Items mentioned in this podcast:
- SuperDataScience
- MLconf
- Google DeepMind
- Meta AI
- Game Theory
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
- Libratus: The Superhuman AI for No-Limit Poker
- Pluribus Paper1: Superhuman AI for multiplayer poker
- Pluribus Paper 2: Superhuman AI for multiplayer poker
- Diplomacy game
- No-Press Diplomacy from Scratch
- webDiplomacy
- Never Split the Difference by Chris Voss and Tahl Raz
Podcast Transcript
Jon Krohn: 00:00
This is episode number 569 with Dr. Noam Brown research scientist at Meta AI.
Jon Krohn: 00:11
Welcome to the SuperDataScience podcast. The most listened-to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple.
Jon Krohn: 00:42
Welcome back to the SuperDataScience podcast. For today's episode, we've got a big first for you. The first-ever episode of the SuperDataScience podcast filmed in front of a live audience. We shot this episode at MLconf, The Machine Learning Conference in New York. This means that you'll hear audience reactions in real-time and near end of the episode, many great questions from audience members once I open the floor up to them. For this exceptional episode, we of course lined up an exceptional guest, Noam as a research scientist at Meta AI. The group formally known as Facebook AI Research. This group is packed with many of the greatest minds in machine learning today and produces some of the world's most cutting-edge AI research. Noam in particular is focused on developing AI systems that can defeat the best humans at complex games that computers have hitherto been unable to succeed at.
Jon Krohn: 01:33
Most notably during his PhD in computer science at the prestigious Carnegie Mellon University, Noam developed AI systems that defeated the top human players of no limit poker, a remarkable achievement that made the cover of the Science Magazine. Prior to Meta AI, Noam worked for Google DeepMind and the US Federal Reserve Board. In addition to his PhD, he holds a master's in robotics from Carnegie Mellon and a bachelor's degree in math and computer science from Rutgers University in New Jersey. Today's episode has some moments here and there that get deep into the weeds of machine learning theory. But for the most part, today's episode will appeal to anyone who's interested in understanding the absolute cutting edge of AI capabilities today.
Jon Krohn: 02:11
In this episode, Noam details what Meta AI is and how it fits into Meta, one of the largest tech companies on the planet. He talks about his award-winning no limit poker playing algorithms. What game theory is and how he integrates it into his models, the algorithm he recently developed that can beat the world's best players at no press diplomacy, a complex strategy board game, the real world implications of his game, playing AI breakthroughs and why he elected to become a researcher at a big tech firm instead of in academia. All right, you ready for this amazing episode? Let's go.
Jon Krohn: 02:52
All right, Noam, welcome to the SuperDataScience podcast, the first ever live filmed episode of SuperDataScience. It's wonderful to have you here. You work at Facebook AI Research, which is recently renamed to Meta AI. So what is Meta AI? And how does it fit into the broader Meta organization?
Noam Brown: 03:13
Yeah, so Meta AI is the branch of Meta that's focused on just AI, more broadly. AI applied to products, AI applied to things like the metaverse and also fundamental AI research. And so the sub-organization of Meta AI that I'm in, Facebook AI Research is focused on fundamental breakthroughs in AI, not necessarily directed at specific products, but just trying to advance the state of the art more broadly.
Jon Krohn: 03:37
Awesome. And so I know that it isn't uncommon in the big tech organizations to be funding AI research groups like this that are working on fundamental AI research. But what is the big picture idea there? Why are big tech companies doing that kind of thing?
Noam Brown: 03:51
Yeah. The big picture is that these companies are thinking about the long term. They're not necessarily thinking about just five years ahead or 10 years ahead, but also 25, 50 years ahead. And by funding this fundamental AI research, there could be breakthroughs that can't be foresee currently. And if they're the ones that are creating those breakthroughs and funding those breakthroughs, then they are able to capitalize on it quickly. And so by getting a lot of smart people in a room together and letting them do AI research, not just blue sky research, there are potential products, 25 years down the line that could emerge that they couldn't have foreseen today.
Jon Krohn: 04:29
It sounds like an incredible opportunity to work in a place like that. Prior to being at Meta, you were doing a computer science PhD at Carnegie Mellon University. And while you were there, you developed the first AI to defeat top humans at no-limit poker. You had Libratus as one of your algorithms, they received the Marvin Minsky medal for outstanding AI achievement. And then you also had a Pluribus algorithm. That was the cover story on Science Magazine. We really do have a star among us here. And so could you tell us a bit about these algorithms, Libratus, Pluribus? How did you develop these and what are the differences between the two models?
Noam Brown: 05:11
Yeah. Taking a step back like poker AI has been a topic in the domain for decades. In fact, if you look at the original papers on game theory by John Nash, the on Nash equilibria, the only application that he actually discusses in the papers is poker, because it's really, yeah, it really is a challenging game theory problem. Now research really had been going on since the '70s, since the '80s. I think it really took off after Deep Blue beat Gary Kasparov. People we're looking at well, what's the next game? We have these AI that can play perfect information games where both players know the exact state of the world, but poker is very different because you have access to hidden information that the other side doesn't have access to. Research, I'd say started really intensely in the early 2000s. I started my PhD in 2012 and yeah, at that point, the research had progressed, but there was still a long way to go to actually beating top humans in no limit poker.
Jon Krohn: 06:14
It's a really interesting time, 2012 to be starting a PhD because that was around the time that deep learning started making a comeback. Was that something that you focused on a lot and was it incorporating deep learning perhaps into these algorithms that made a big difference?
Noam Brown: 06:30
Well, actually in Libratus and Pluribus, we didn't use any deep learning. And I think a lot of people in 2017 in machine learning found that quite surprising. It seemed like every breakthrough was happening with deep learning these days. But actually, I think it really showed that deep learning by itself is not enough. There's AI beyond just deep learning. And it's not necessarily a choice between the two, I think that the techniques are actually quite complimentary and since Libratus and Pluribus a lot of my research has been on how do you combine the game theoretical reinforcement learning techniques that we used in Libratus and Pluribus with deep learning techniques?
Jon Krohn: 07:11
Nice. Tell us about the algorithms then. How were you able to achieve these results? What were people doing before you started working on poker playing AI? And then what did you add into it that all of a sudden allowed your approach to beat the best players in the world?
Noam Brown: 07:28
Yeah, so a lot of the early research was focused on linear programming approaches, which worked pretty well, but only at small scales. And when you go from a small scale poker game to an extremely large game, like No Limit Texas Hold'em where there's more states than atoms in the universe, then the linear programming techniques simply don't scale. And so gradually the research shifted towards more of a reinforcement learning approach, which is actually the same kind of technique that was used, for example, in AlphaGo and Alpha Zero. But the reinforcement learning techniques used in AlphaGo in these perfect information game AIs, don't work in imperfect information games. Because they don't understand that there's hidden information. And I think the best way to explain it is in a game, like chess or Go, if there's a good move, it doesn't become less good the more you play it.
Noam Brown: 08:17
If you're going to open with the Sicilian Defense, it doesn't become worse if you open with it 100% of the time compared to 10% of the time. But in poker, that's not true. If you're going to bluff 100% of the time, that's a lot worse than bluffing 10% of the time. And so you have to understand not just which actions are good, but how to balance the actions to get the probabilities right. And so you need to expand on the reinforcement learning techniques that were used in perfect information games to accomplish that.
Jon Krohn: 08:45
Nice. Do you want to tell us a little bit more about that? Does it involve game theory concepts?
Noam Brown: 08:49
Yeah. Yeah. Game theory this is the key ingredient I think that really led us go from perfect information games to imperfect information games. We use game theory concepts in particular, Nash equilibria, Minimax equilibria. And the idea there is that in every game, every finite game, there is an optimal strategy where if everybody is playing that strategy, then no player can do better by deviating to a different strategy. What this means in a two player, zero sum game in particular is that in every two player zero sum game, there is an unbeatable strategy that if you play it, you are guaranteed an expectation to not lose no matter what your opponent does. And I think a lot of people find this surprising, but you the about Rock Paper Scissors, for example, the Nash equilibrium and Rock Paper Scissors is to randomize equally between throwing rock paper and scissors with one third probability each. Because if you do that, then no matter what your opponent does, you are not going to lose an expectation.
Noam Brown: 09:48
Now that said in Rock Paper Scissors, you might not win an expectation also, but in a more complicated game like poker, what we find is that if you approximate the Nash equilibrium, your opponent is going to make mistakes. And in the long run, you're going to end up winning in anyway. And I should say, if you talk to professional poker players, modern professional poker players, they approach the game the same way. They try to approximate this Nash equilibrium, what they call game theory optimal strategy, and wait for their opponent to make mistakes.
Jon Krohn: 10:17
Do you play poker, Noam?
Noam Brown: 10:19
I got really into poker when I was a kid in high school, a little bit in college, not so much for the gambling aspect. I was actually never really played for high stakes, but I was really interested in the strategy of the game. I think this idea that there's an unbeatable strategy in poker, that if you could find it, then you just make infinite money basically, I thought that was a pretty interesting concept.
Jon Krohn: 10:39
Yeah. And would you say that you achieved? That you found a machine that could perform perfectly?
Noam Brown: 10:44
Yeah. I, myself was not a very good poker player, but I guess my hope was that I can make a machine that could do it. And I think, yeah, we came pretty close to that. I think it's fair to say that it's unbeatable by humans now.
Jon Krohn: 10:55
Yeah. It's an incredible achievement. And clearly has already won a lot of plaudits. Now you're taking some of the concepts that you had from no-limits poker. This idea of hidden information that isn't available, like it is in Go or chess. Just like the cards are hidden away from an algorithm or from a human playing a game, you're now moving on to another game at Meta AI called Diplomacy, which similarly has hidden intentions, hidden information. Can you explain the game of Diplomacy to us?
Noam Brown: 11:30
Yeah. What's interesting is that poker, two player poker in particular is a purely adversarial game. Whatever you win, the other person loses. When we went to six player poker, we thought that going from two players to six players is going to be a very difficult challenge in poker because of the fact that it's not two players, zero sum anymore. And a lot of the game theory concepts that we relied on don't carry over to more than two players. For example, this idea that Nash equilibrium means you're not going to lose an expectation no matter what your opponent does, that really only applies in two players zero sum games. And when you go to more than two players, you could lose and there's nothing you can do to stop that. If all of your opponents team up against you, there's no perfect strategy that's going to guarantee that you win.
Noam Brown: 12:16
Now, it turns out that six player poker, this wasn't a problem. We used similar techniques. We added a lot of innovations to make it scalable. Six player poker is obviously a much more complicated game than two player poker. And so we had to come up with new techniques to allow us to scale to that size. But this game theory problem actually ended up not being an issue. And that's because six player poker is a very adversarial game. There's no real room for collaboration among your opponents to allow them to seem against you. But when you go to Diplomacy, so Diplomacy is this seven player game where there is a big emphasis on cooperation in addition to competition. The way the game works, it's kind of like Risk. If you've ever played Risk before. You control one of seven powers and you move pieces on a board and you try to take over the board.
Jon Krohn: 13:07
Like a map of Europe, right?
Noam Brown: 13:09
Yes.
Jon Krohn: 13:09
Just like Risk.
Noam Brown: 13:10
Yeah. But the focus of the game is on negotiations with the other players. At the start of each turn, you spend about 15 minutes talking to other players in privates, negotiating with them saying, "I'll support you. I'll help you this turn if you help me next turn." Or make all sorts of deals and alliances. But then at the end of that negotiation phase, everybody writes down their moves at the same time. And all the moves are executed simultaneously.
Jon Krohn: 13:37
When you play this in person with people, you would go off into separate rooms and that kind of thing? And so you'd kind of have this sense of, you might have some sense of who's strategizing with whom based on who went off together and how the secret conversation, that kind of thing?
Noam Brown: 13:51
That's right. Yeah. And you don't know what the conversations that they're having are, but you know this person just talking to this person and they tell you what their conversation was about. Maybe they're lying, maybe they're telling the truth. There's a lot of intrigue. And then because all the moves are written down simultaneously and because you're not held to any agreements that you said, there's a lot of backstabbing and betrayal that happens in the game. You tell somebody or they tell you that they're going to support you and then the moves happen and you see, oh, they actually decided to attack you. And so it's a very different game from these purely adversarial games like poker, chess, Go, you really have to understand cooperation and trust and be able to build trust with these other players and understand the human psychology of the game as well.
Jon Krohn: 14:34
That sounds incredibly complex to work into an algorithm today.
Jon Krohn: 14:40
This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SuperDataScience, it's the namesake of this very podcast. In the platform, you'll discover all of our 50 plus courses, which together provide over 300 hours of content with new courses being added on average once per month. All of that and more you get as part of your membership at SuperDataScience. Don't hold off, sign up today at www.superdatascience.com. Secure your membership and take your data science skills to the next level.
Noam Brown: 15:18
Yeah, and to be honest, the reason why we decided to work on Diplomacy next is because after poker, we saw the breakthroughs that were happening in games like Go, poker, even games like StarCraft. And it became clear that AI was advancing very rapidly. In 2016, you have AI's beating top humans in the game of Go. 2017 poker, after that people were saying, well, StarCraft and realtime strategy games are the next grand challenge for AI, but that fell within two years. And we were saying like, "Look, Go took decades for AIs to be humans. Chess, took decades. Poker took decades. What is a game that would be so beyond the capabilities of AI techniques today that it would take a similar amount of time? It would be similar, the impressive if we were to succeed and we felt like Diplomacy was that game.
Jon Krohn: 16:10
Yeah. That's so cool. And so there's two different variations on this Diplomacy gameplay, there's the press version and the no press version. And so it sounds like the no press version could be easier because there's less communication or-
Noam Brown: 16:24
Yeah. Yeah. We took on this really ambitious goal of making an AI that could be top humans in this game. Obviously that's a very long term goal. And so as a short term objective, we decided to focus on a simpler version of the game that's still popular among humans, where there's no explicit communication between the players during each phase.
Jon Krohn: 16:44
Got it.
Noam Brown: 16:44
Now, this is still actually a very difficult challenge because you have to... So everybody just writes down their moves without talking to each other, and then they're executed simultaneously, but what's different here is because it's not a two player, zero sum game, you have to model the other players. And that's actually a key part of this game. You can't just approximate an equilibrium in the same way that you do with Go and poker and expect to do well because you have to be able to model the other players and best respond to that. From a game theory standpoint what's going on here is that, there are multiple Nash equilibria. Nash equilibrium is this really nice solution concept. In two player zero sum games, you can compute any sort of equilibrium and they're all kind of interchangeable. You don't have to play the same one as the other person in order to do well.
Noam Brown: 17:36
But in a game like Diplomacy, there are multiple different equilibrium. And so you could through self play, just by learning, from scratch, playing against yourself, learn an equilibrium, but it doesn't mean that you're going to do well with actual humans because they might be playing a different equilibrium. It's kind of like if it was self-driving cars, if you had a car that learned to drive purely on its own, without any human data, it might learn to drive on the left side of the road. And that's a totally reasonable solution. It's an equilibrium that's totally valid. But if you were to put it on the roads in Manhattan, it would not do very well. You have to understand how humans drive and how humans play these games in order for the AI to do well in these games.
Jon Krohn: 18:15
So that presents to me, perhaps one of the big challenges here is where do you get your training data for something like this? The contemporary Go game playing algorithms while they originally... So AlphaGo, the one that's popularized in the AlphaGo documentary was trained on some human game play. But then the more recent versions of the game, there's no human game play that it's trained on at all. You just have the algorithm playing against itself and in so doing, it comes up with ways of learning and ways of playing Go that had world champion Go players describing it as having aliens on earth. With Diplomacy, how can you emulate having a human involved in your training data and still have tons and tons and tons of training data?
Noam Brown: 19:07
Yeah, so that's right. AlphaGo did use human data, but the more recent versions, AlphaZero don't use human data. You don't need human data to do well in a game like Go. And actually with poker as well, we didn't use any human data and we were able to beat top humans. And that is a feature of purely adversarial games. But in a game like Diplomacy, we've actually shown that we trained a bot without human data, just by playing it against itself. And we played it with real humans and it didn't do well. Fortunately there is a website, webdiplomacy.net where humans play this game and we were able to get training data from that site. And so we are able to use that to get some indication of how humans play. It's not a huge amount of data, but it is enough that we were able to model the humans and then build a bot that could play well with that human model.
Jon Krohn: 19:59
Oh, wow. And then that bot, is it using the same kind of deep reinforcement learning that you're using for the general Diplomacy algorithm that you're building? Or is it some separate algorithm?
Noam Brown: 20:14
What we did is, this is actually very cutting edge research. We actually just put out a paper recently on this, and we're going to put another paper in the near future going into more detail about this. But so actually I should start by saying how the Go and poker AIs work. They start from scratch playing and totally randomly and then play against themselves. And in that process, so what's called self play, they gradually improve. They understand, okay, this action's making me more money or winning more often, I should take this more often in the future. And in the long term, they eventually converged to this equilibrium. And in Diplomacy, we end up doing something similar, except we regularize the algorithm. We regularize the policy towards this human imitation learning policy that we have. We regularize it towards the human data. And so in that way, it's finding an equilibrium that is in some sense compatible with how humans are playing. And so we actually ran a competition just recently where we pitted this AI against 50 real humans, sorry, more than 50 actually, quite a few real humans in the tournaments and the AI came in first place in no press Diplomacy.
Jon Krohn: 21:28
Wow. And so that's what you just published on?
Noam Brown: 21:30
That is actually still, it's going to be published soon, but the technique that we described in an earlier paper and this result, we're going to publish in the near future.
Jon Krohn: 21:40
Incredible. Are you able to go into any more particular detail on the deep reinforcement learning approaches that are involved in having the algorithm work so effectively in order to be able to beat humans at the no press version? It's that no press version of Diplomacy, you now just have hot off the press, this result that you can beat them. And so how does that work? Tell us in detail about the deep reinforcement learning technique involved.
Noam Brown: 22:10
Yeah, it's actually in some ways similar to the AIs that are used in poker and Go and chess other than the fact that we're regularizing towards the human data.
Jon Krohn: 22:20
Got it.
Noam Brown: 22:21
Now, one thing I should emphasize, and I think is underappreciated in the wider AI community is that, this wasn't just model free reinforcement learning that we were using. Actually a big focus of the research was on search. Instead of the bots just acting instantaneously, when it's turn to act, it will actually compute and improve policy for all the players. It will try to figure out what is the optimal policy for all the players on this turn, given my value function about what's going to happen after this turn, the values to all the players after this turn. And this is actually similar to what's done in Go, and also in poker. AlphaZero, AlphaGo, they use Monte Carlo tree search, and that is a critical component to reaching superhuman performance in Go. In poker also the big breakthrough that allowed us to beat top humans compared to prior bots is that we added search. So on each turn, it was figuring out its optimal policy for the entire round of poker that it was on. And this is actually done in a tabular way. It's not done using neural net function approximation. But it is using neural nets to predict the value at the end of the turn. It's using an estimate from the deep neural nets to predict the value beyond this turn, and then using, based on that information, it's computing the optimal policy within the turn.
Jon Krohn: 23:42
Nice. Super cool. How are projects like this potentially relevant to real world applications? Actually, even before we go to that, now that you've succeeded at beating human players at no-press Diplomacy, the big research question is going to turn to press Diplomacy I imagine, which is going to be presumably what is a magnitude more challenging? Because once you can have natural language communication, the kind of behind the scenes strategizing and lying, in order to have a machine that could perform well at that, it would have to have a deep understanding of natural language and human behavior in the context of lying and strategizing. It sounds like a big jump to have to do that.
Noam Brown: 24:32
Yeah. Like I said, I see this as a very long term research agenda and I think by making an expert level, no press Diplomacy bot, we've just done the first step of maybe a dozen steps. We don't have to jump straight to natural language. There are ways that you could, for example, allow bots to communicate with a small communication channel where they're just sending a few bits back and forth.
Jon Krohn: 24:58
Got it.
Noam Brown: 24:59
There's a lot that we can do. And I think there's a lot of interesting research questions that come up even in that domain. But yes, the long term goal is can we get this bot to play in natural language with humans and do well? And if we can get there, I think that the applications I think are pretty clear.
Jon Krohn: 25:18
Yeah. Do you have a few examples?
Noam Brown: 25:21
Right. Well, first of all, just with the no-press Diplomacy, I we've had really fascinating breakthroughs with being able to model humans. And I should say also, we are doing experiments now to see if this technique is more broadly applicable. And we found that similar techniques work well in Go, for modeling humans. It works well in Hanabi, another cooperative card game for modeling humans. And I think there's potential for this to more broadly be used to model human behavior in the real world. And that's ultimately the goal. We're not here to just make AI to play games. We're here to use games as a benchmark to measure progress against real humans, but to eventually apply to the real world.
Noam Brown: 26:00
And so I think by being able to model humans better in these games, we can use these techniques to better model humans in the real world, and develop AI that can cooperate, collaborate with humans in the real world as well. For example, I think one clear application might be self-driving cars. The big challenge with self-driving cars is modeling the behavior of other humans on the road. You can't just assume that they're going to behave the same way that the robot's going to behave. And like I said, there's all these different conventions, quirks that humans have that you have to be able to understand, even if they're irrational and that's what we're able to do in Diplomacy. And hopefully those techniques extend to these kinds of domains as well.
Jon Krohn: 26:40
Amazing, exciting to see what continues to come out of your research group. And so speaking of doing research at Meta AI, you were a very accomplished PhD student, award-winning models, featured on the cover of magazines, you probably had your pick of companies that you'd like to work in or academic institutions that you'd like to work at. Why did you choose to do research at a big tech company, specifically Meta AI instead of staying in academia?
Noam Brown: 27:14
Yeah, that's a good question. Actually my original plan was to go to academia, to become a professor. And I planned to do just one year in industry research in between grad school and starting as faculty member. And so I did interviews for the big tech companies and I got the offer from Meta AI, from Meta, previously Facebook. And they said I could start immediately instead of waiting until after I graduated. And so I told them like, "Look, I'm going to be going on the faculty market. I'm going to be interviewing with universities for three months, just flying around the country, not doing any work for you all." And they said, "Yeah, that's fine. Just come here. We'll pay you to do all that and no big deal." And so, I thought, okay, well, that's strictly better than being a grad student, making $35,000 a year.
Noam Brown: 28:00
Why not just join Meta. And so I did. And I was there for about six months. I did all the faculty interviews. I had some really great offers that I was really excited about, but at that point I had gotten to see what it's like doing research in Meta. And I thought it was amazing. I thought it was just simply a better opportunity than being in academia. My collaborators were top notch. I had complete freedom to pursue any research that I wanted to, we had access to more resources. And in pretty much every way, it was an advantage over being a professor. I still consider... And I think what really did it for me is I was talking to a professor on a second visit about my choice between going into industry research or going to academia.
Noam Brown: 28:53
And this professor had actually been at AT&T Labs in the early two thousands. And what he told me is, one of the big risks of being an industry research lab is that they might go away and with faculty you have tenure. But he told me, look what AT&T, Bell Labs, kind of went under it, it kind of shifted away from long term research to more short term research. And a lot of the prominence researchers left. They all got great jobs in universities. And that made me realize that it's not like I have to choose right now between those two paths. I can always go from industry research into academia later. And so for me, I decided, look, this is the better opportunity right now. I'm able to do my best research here. I'm going to go with Meta.
Jon Krohn: 29:38
Amazing, makes a lot of sense, and I can definitely see why you chose to do that. I do have a couple final questions for you that I'll get to later, that are my standard questions that I end episodes with. But in the meantime, I thought I'd open it up for the first time ever, we can have people in the audience ask questions from the guest on SuperDataScience. And so I wanted to open up the floor to that. We do have a mic that we can pass around. And as you do that, it would be awesome, you can let us know maybe your first name and what you do, and then ask your question.
Noam Brown: 30:15
Yeah. So the question was, what are the main barriers to getting these techniques beyond games to real world applications like self driving cars? I think the main challenge is that in a game you have well-defined environments with well-defined payoffs. You either win or you lose, and you have a choice between a finite number of actions. And in the real world, things are not so clear. It's not really clear what happens after you turn the steering wheel right. You have an indication, but it's unpredictable. And then also what's the objective function that you're trying to maximize, that's also a bit more fuzzy. And so things are just less well defined in real world compared to, in a game where, you know exactly the dynamics model. Now, there has been a lot of research on overcoming this.
Noam Brown: 31:02
And in fact, I think one of the research breakthroughs that I was really excited about is MuZero. This is a new technique out of deep minds as a couple years ago. And it's like AlphaZero. It's able to play Go at the same performance as AlphaZero, except it doesn't know the rules of the game. It's learning the rules as it goes. And that is a big step towards applying these techniques to the real world. And I think if a lot of researchers are pushing in this direction, and I think the research that we're doing on cooperative games and imperfect information games can be extended to the real world once there's more breakthroughs in that research.
Jon Krohn: 31:40
All right. Noam, so the question is, how do you take into account out the cultural differences in the case of self-driving?
Noam Brown: 31:47
Yeah, that's a great question. And this is definitely true that when you're driving on the road, you have to understand other participants are not going to behave the same way that you are. And obviously we're not doing this on roads yet. We're focusing on games, but yeah, also with humans in these games, there is a style that humans have and you have to be able to adapt to that kind of style. And so if we have data and I think that's the key that we need to have data on how humans are behaving, and if we have that data, then we can model how they behave and then we can respond to that appropriately.
Jon Krohn: 32:27
Do you think that an algorithm would be able to identify when it's in a particular context? I guess so, that you're in Connecticut or Peru, and so that could be a parameter in the model or... Yeah.
Noam Brown: 32:39
Yeah. And I think you could. If you have sufficient data, you should be able to pick up that where I am, who I'm driving with, you can see the behavior that they're exhibiting and based on their previous behavior, you can kind of predict what they're going to do in the future. If you notice that they're driving like Peruvians, you can predict that they're going to drive like Peruvians going forward. And other people on the road are going to drive like Peruvians.
Noam Brown: 33:04
The question was, how does the AI know when to bluff? And first of all, I should say, and I think it's really smart that you do not become a professional poker player, with the breakthroughs in AI that are happening, it's not a very good profession to be in these days. In fact, when we were doing this competition, we made this AI, we played it against these top professional poker players. And at the end of the competition, they were coming to me for career advice because they were realizing like, oh, our days are numbered as professional poker plays. And I thought this was really bizarre because like, I'm this grad student, this poor grad student, and these are these high roller poker players, and they're coming to me for career advice. But yeah, fortunately a lot of them were able to transition away from poker very successfully.
Jon Krohn: 33:49
At least in online poker, there could be bots playing against you, but surely just like, you could have a Go computer that can be the world's best go player, there's still some in watching humans play Go against each other. So you could presumably still have high stakes poker games where no bots are allowed, no robots.
Noam Brown: 34:10
Yeah. Actually a lot of the really profitable poker players were playing online. Because they were simply able to play more tables at the same time. And online poker is at a point where it's really difficult to play high stakes poker because there's this risk of bots being on these websites. And yeah, there is still active professional poker community, but a lot of it has shifted towards live poker where there's less money to be made. Still, if you're really good, you can make a lot of money, but [inaudible 00:34:38] talk about the same order of magnitude that it was 10 years ago.
Jon Krohn: 34:41
Nice. And then did we answer your question?
Noam Brown: 34:44
No, we did not. The question was, how does the AI know when to bluff? The bot actually learns this through self play. I think a lot of people find this surprising that the bot doesn't view bluffing as lying. It just use it as the action that makes it the most money. If it has a bad hand, it has a choice between folding or raising. And if it raises, it understands that the there's a chance the other player might fold. And it learns this through experience that it played in against itself in previous games. And it saw in those situations, if it raised that player folded. And so it understands that, if it does that, there's a good chance that it'll actually make money. Now it has to get the probabilities right. And so that's, I think the big challenge, it has to understand that if it raises too much and it learns this through experience as well, that if it starts to raise too much too often, and then the other player starts to call instead of folding, because it understands that, well, this person when they're raising, they don't actually have a good hands. I'm going to start calling more often. And so then, the bot that's bluffing will bluff less often. And so they kind of go back and forth, but they eventually arrive at this equilibrium where the bot is bluffing with the right frequency. And the other bot is calling with the right probability so that it all balances out.
Jon Krohn: 36:01
Cool. There you go. Great question.
Noam Brown: 36:03
That's a great question. The question was, is the bot adapting to these players as the game goes on, like between hands? So for Libratus and Pluribus we actually did not do this. The bot was simply trying to approximate the equilibrium and it wasn't trying to learn the human style and try to exploit them. And this is actually sufficient for beating top humans in poker. I think a lot of people found this surprising as well, but it's true that a lot of professional poker players do take this approach as well. They just try to approximate the equilibrium and they understand like, look, I'm playing against somebody that is really strong. They're not going to have many weaknesses anyway, let me just approximate the equilibrium and do well. Now where this is a problem is, if you're playing against weak players, the bot is still going to do extremely well.
Noam Brown: 36:50
It's still going to make a ton of money, but it might not make as much money as a top professional would that could exploit this weak player's weaknesses. Now that said, with the recent research that we have in no-press Diplomacy, we've developed techniques that allow you to balance playing an equilibrium with modeling the opponents and best responding to that. Now in cooperative games that is essential, but also in competitive games, it allows you to exploit the other player's weaknesses. And so one of the things we've considered doing is revisiting poker and seeing if we could develop bots that can in addition to playing an equilibrium, also exploits the weaknesses of its opponent. Now we're probably not going to go this route because, I feel like poker is done and we have bigger fish to fry, and it's also quite difficult to get data for this, but it is something that I think is now possible and that has never been possible before with poker AI.
Jon Krohn: 37:44
Super cool.
Noam Brown: 37:45
Yeah. The question was for people that are interested in breaking into this field, are there any recommendations? For me, I went about it in a fairly roundabout way. I actually started off in more of a game theory side. I worked in computational game theory and my plan was to pursue a PhD in computational game theory. The challenge of making a poker AI was considered a game theory challenge. And so that's why I worked on that. And gradually my research shifted more towards the deep reinforcement learning side of things. I think if you want... I should say breaking into these fields is very these days, but there are opportunities. And I think a lot of it is that you have to be self-motivated to pursue these kinds of directions. Becoming familiar with machine learning concepts, becoming familiar with PyTorch, being able to demonstrate research aptitude. It doesn't have to be in a career that you're doing right now. It could be on your own. And in fact, in some ways it's even better, because it shows that you're really motivated. Then if you want to become a research engineer, you just have to show strength in this area from an engineering standpoint. And if you want to become a research scientist, which is honestly not that different, you have to go for a PhD in one of these universities and work with somebody who does research in this field.
Jon Krohn: 39:11
Nice. All right. Excellent questions. Thank you. We got one more? Okay. They got one last one.
Noam Brown: 39:18
The question was on the interpretability of our techniques. Are we able to interpret what the bot is doing and why? And I'd say the answer is no for right now. It's not a priority for our research to be able to better understand the reasons why the bot is taking these kinds of choices. Now that said, I think one of the interesting things about the human modeling aspect is that it actually makes it easier. I think, we were discussing earlier that with the state of the art Go AIs, they play in a very alien style. It seems like Martians coming down and playing without ever having been exposed to the way humans play. One of the things we're able to do with these new techniques is that we're able to make really strong AIs that behave in fairly humanlike ways. And so you can develop a Go AI that is really strong and in fact is stronger than any human alive, but still plays in a fairly humanlike style. And so in that way, it can actually be used to help humans get better at their game, because it's not like this is totally foreign for the human, it's just an improvement, a slight improvement on where they're at currently.
Jon Krohn: 40:27
All right. Yeah. So excellent audience questions. Thank you very much everyone. We just got a couple to wrap up the episode. It's our usual questions that I always have. Noam, them do you have a book recommendation for us?
Noam Brown: 40:37
Book recommendation. I recently read in preparation for working on Diplomacy, Never Split The Difference. Which actually thought was a generally useful book, but I think it was also useful in particular for my research, because it made me appreciate that a lot of negotiating and communicating, it's not just about behaving perfectly rationally. You really have to understand the human aspect of it as well. And I think for an AI to really succeed in these kinds of domains, it also has to understand the human elements, not just assuming everybody's like a robot.
Jon Krohn: 41:13
Nice. So a negotiating tactics book was useful for you as a person, as well as a AI researcher?
Noam Brown: 41:19
I'd say so. Yeah. I think I realized that everything, pretty much a lot of things in everyday life are negotiations. Maybe just not framed typically in that way. And so I thought it was a generally useful book.
Jon Krohn: 41:32
And then one final question, Noam is how do we stay up to date on your latest?
Noam Brown: 41:37
Yeah. Whenever we have a new paper out, it goes on Google Scholar. And so you're able to follow me on there. Also, I use Twitter. So my handle is Polynoamial, P-O-L-Y-N-O-A-M-I-A-L. It's like a pawn on my name. And I think those are the best ways.
Jon Krohn: 41:57
Nice. We'll be sure to include that in the show notes for the episode. Thank you so much everyone here live at MLconf in New York. It's been awesome to have this first ever live experience. Thank you for the people who are willing to whisper off in the sides. And yeah, thank you so much Noam for being here. A really incredible guest, we learned a ton from you, I'm sure.
Noam Brown: 42:18
Thank you for having me. It's been great.
Jon Krohn: 42:25
Awesome. Thank you for joining us for that first ever episode of SuperDataScience filmed in front of a live audience. Noam's eloquence and brilliance made what could have a stressful experience, a smashing success. In today's episode, Noam filled us in on how Meta invests heavily in its prestigious Meta AI group, in order to develop technology that could be dominant in decades from now. He talked about his Libratus algorithm, which defeated leading professionals at two player, no-limit poker. His Pluribus algorithm, which defeated leading pros at multiplayer, no-limit poker, how computational game theory is critical to allowing his algorithms to excel against humans. How Monte Carlo tree search combined with deep reinforcement learning enabled Noam and his Meta AI colleagues to devise a model that could excel at no-press Diplomacy, a complex strategy game wherein anticipating human intent is critical to success.
Jon Krohn: 43:16
And how breakthroughs and models that can anticipate human intent like these models could be a boon to practical applications like self-driving cars. As always you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Noam's social media profiles, as well as my own social media profiles at superdatascience.com/569. That's superdatascience.com/569. If you enjoyed this episode, I'd greatly appreciate it if you left a review on your favorite podcasting app or on Super Data Science YouTube channel, I also encourage you to let me know your thoughts on this unique live episode format directly by adding me on LinkedIn or Twitter. And then me know how you'd thought about it.
Jon Krohn: 43:57
Your feedback is invaluable for helping us shape future episodes of the show and to determine whether we try to do things like filming in front of a live audience again. Thanks to my colleagues at Nebula for supporting me while I create content for you. And thanks of course, to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another trail blazing episode for us today. Keep on rocking it out there folks. And I'm looking forward to enjoying another round of the SuperDataScience podcast with you very soon.
Show all
arrow_downward