Podcastskeyboard_arrow_rightSDS 663: Astonishing CICERO negotiates and builds trust with humans using natural language

77 minutes

Data ScienceArtificial Intelligence

SDS 663: Astonishing CICERO negotiates and builds trust with humans using natural language

Podcast Guest: Alexander Holden Miller

Tuesday Mar 21, 2023

Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn


The algorithm CICERO may not have had the press of ChatGPT, but its powers are certainly comparable, if not favorable, to Microsoft’s disruptive tool. In this episode, Jon Krohn speaks with Alexander Holden Miller, Research Engineering Manager at Meta AI.


Thanks to our Sponsors:




Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

About Alexander Holden Miller
Alexander H. Miller is a Research Engineering Manager in Meta AI’s Fundamental AI Research group. He's supported researchers working in most sub-domains in machine learning but has been especially involved in conversational AI research and more recently reinforcement learning and planning. He started as a Research Engineer in New York, spent two years in London helping to grow Meta AI’s presence in Europe, and came back to New York to support an ambitious project to get AI to play the complex multiplayer board game Diplomacy, which was just published late last year. He’s enrolled in NYU's Master of Science program in Computer Science and holds a Bachelor of Arts in Computer Science from Cornell.

Overview
For decades, AI seems to have been in the business of beating human competitors at board games. From Go to Chess, artificial intelligence has managed to conquer even the most skilled human players. And yet such games rely on a degree of mathematical logic—what if a machine could best humans at a game dependent on negotiation and persuasion? This was exactly the question that Alexander Holden Miller and the team at Meta AI sought to answer. They found their perfect test environment in Diplomacy, a tabletop game that is also available to play online, where gamers must conquer territories that belong to other players, while they keep their own lands safe from potential invaders. It may sound like Risk, but it’s much more complex: Players must also use critical powers of negotiation and deduction to plan ahead and win the game.

It took the team at Meta AI three years to develop a machine that could convince other online players it was a human participant. The most complex component of their work was to develop natural language models that could interact with humans and convince them to team up against other players for domination of the map. To be successful, Alexander and the team had to condition the machine so that it could process and ultimately manipulate human-like responses to reciprocity, betrayal and trust.

Listen in to hear Alexander detail how he and his team at Meta AI created CICERO, the tools they used to help it hold convincing negotiations with up to six players at a time, future applications of CICERO—and whether or not you stand a chance in working out if you’re playing against a bot!

In this episode you will learn:  
  • Training a natural language model to interact with Diplomacy players [05:07]
  • Processing speeds for a Diplomacy bot [29:32]
  • Using transformer architectures [37:25]
  • How Diplomacy AI actually works [43:25]
  • CICERO’s potential real-world applications [55:28]
  • How to R&D an AI project [59:27]
  • How to become an AI Research Manager [1:06:12]
 
Jon:
This is episode number 663 with Alexander Holden Miller, Senior Research Engineering Manager at Meta AI. Today’s episode is brought to by epic LinkedIn Learning instructor Keith McCormick. 

Welcome to the SuperDataScience Podcast, the most listened-to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple.

Welcome back to the SuperDataScience podcast. I am over the moon to be able to share today's unforgettable episode with Alex Holden Miller with you. Because of its extraordinary popularity, most people assume that the biggest achievement in AI in the past year is ChatGPT. I would argue however, that the biggest recent achievement in AI is an algorithm called CICERO that was developed by researchers working at the tech giant Meta. As published in the prestigious academic journal science in November, CICERO is capable of using natural language conversation to coordinate with humans, make strategic alliances, and ultimately win in an extremely complex board game called Diplomacy. Excelling in a game with incomplete information and vastly more possible states of play than games previously conquered by AI like chess and Go would be a wild feat in and of itself. But CICERO's capability to converse and negotiate in real time with six other human players in order to strategize victoriously is the really mind boggling capability. 

To detail for you how the game of Diplomacy works, why Meta chose to tackle this game with AI and how they developed a model that competes in the top decile of human Diplomacy players without any other players catching a whiff that CICERO could possibly be a machine, I'm joined in today's episode by Alexander Holden Miller, a co-author of the CICERO Paper. Alex has been working in Meta AI's fundamental AI research group fair for nearly eight years. He currently serves as a senior research engineering manager in the prestigious AI Lab. He has supported researchers working in most sub-domains in machine learning, but has been especially involved in conversational AI research and more recently reinforcement learning and planning. He holds a degree in computer science from Cornell University and is currently pursuing a master's in computer science from New York University. Today's episode will appeal most to technical listeners, but much of the episode will be broadly fascinating to anyone who'd like to appreciate the absolute state of the art in AI today. All right, you ready for this extraordinary episode? Let's go.

Alex, welcome to the SuperDataScience Podcast. Awesome to have you here live with me in New York. Thank you for making the trek downtown, although I guess not far from Meta's office. 

Alex:
Well, and I'm just 15 minutes up in the West Village, so it's easy. 

Jon:
And we've got a really exciting episode today. I've wanted to film an episode with you for a long time. We were scheduled to do an episode in spring of 2022, a year ago, and you got Covid right before. 

Alex:
I did. 

Jon:
So this is episode that came out with Noam Brown, episode number 569, where we're live on stage at MLConf in New York. And because we were really down to the wire on whether your Covid would go away enough, cause it was like days before we were waiting to get like a negative test back and very last minute Noam stepped up, was able to do it. It's an amazing episode and luckily he has expertise that is quite different from yours. So we were able to dig deep into different topics than we will today. But it was really funny filming it live because we'd paid to have this video made that was in a loop alongside the stage for all the audience members there. And so I had your face and name and title huge beside me on the stage and it was Noam Brown. But Alex Holden Miller, I didn't get your name and face for my apartment here on some big screens, but we'll still, we'll make it work. I guess I could have had that rolling. That'd be funny. 

So we knew each other socially. We met at a brunch actually in New York I guess a little more than a year ago. And it seemed like a great idea right off the bat to have a podcast episode with you because we got talking about what you're doing at Meta and it's so fascinating and we're actually, the timing now is even better because a year ago when we were chatting, a lot of what you could have disclosed about what you're working at Meta would've been under wraps because it was, it hadn't been published yet, but now it's been published to great success. Super exciting and congratulations. So your team recently published a model called CICERO, which could play the board game Diplomacy. Describe the game of Diplomacy. I hadn't candidly heard of it before we met at that brunch a year ago, but it's clearly a very popular game around the world. To me it sounded a lot like Risk when I first heard of it.

Alex: Yeah. So I actually hadn't heard of it before this team started either. So Diplomacy, we kind of like to say it's, it's a combination between Risk where you have a map, in this case of Europe, there's different territories that have either fleets or armies in them and your goal is to take over more territory. But then also maybe some combination of Poker as well, because there is this aspect where people are all kind of doing things at the same time. There's hidden information, you don't really know what's gonna happen in the future. Unlike Risk where it's just term based, like you each do your move and you can see exactly what the board state is at all times apart from, you know, your like bonus cards, and Survivor where there's this aspect of like diplomacy and negotiation and even at times backstabbing that you kind of have to deal with at the same time as the rest of the board. 

And so the project started about three and a half years ago when Noam and some others who you had on the podcast before were, Noam had just been working on Poker before which you can hear a lot about. And it's definitely very interesting. And it was kind of looking to say like, from Poker, where do we go now? And kind of said, what is the hardest board game that we could possibly do? And landed on Diplomacy for a variety of reasons that we can talk about. But basically each of these different aspects of like Risk, of Poker, of Survivor are different elements of challenge to AI. And so decided like, okay, you know, this is something that could take 10 years to solve. Like, let's get working on it, like get cracking and solve this problem. And fortunately it didn't take time.. 

Jon:
Yeah. That's wild. Just three years and even a year ago you weren't sure, at least you couldn't tell me like how well it seemed to be going. And yeah, it is an immensely hard game because of how you have all of this natural language dialogue happening. So we, just in the last couple of months has been a lot of popular press obviously about algorithms like ChatGPT, that are capable of carrying on a compelling conversation, but that is brand new. And that's around the same time that your paper came out. So you had been developing these natural language models that were capable of interacting with human players so convincingly that the algorithm is able to convince them to be involved in teaming up against other countries. And it has to be able to think very far ahead. So unlike something like ChatGPT, which is just using past bits of the conversation and information within its parameter weights from all the information that it's tuned on. In this case with your model with CICERO, there's this, in my view, much more complex aspect where it needs to be able to be planning ahead and thinking about how it can strategize with natural language with all these other human players and win the game. Which also I remember when we were talking a year ago, you were at that time it seemed like mostly focused on a version of the game that did not have the natural language aspects. 

Alex:
Yeah. So I think everything you described is definitely the hardest part. Like the language interacting with other people, like this is the hardest part. That being said maybe I can even like walk back to some of the parts without that right? Exactly to say the version of the game without language. Cuz I think it helps to also introduce like all these other aspects that are very difficult about the game. Because first of all, well, there aren't that many pieces on the board compared to say something like Go, every single unit on the board has a bunch of different choices for what it can do. Right? So you have different types of, like, you can, you can move a unit, you can keep it where it is, you can have it support another unit. You can have fleets like convoy armies across bodies of water and these moves can be done to any of the territories that are around them. And so because of this, the action space hugely explodes, right? So instead of something like a 100, 200 possible actions in board say like Go, it's something like 10 to 20. 

Jon:
Whoa. 

Alex:
And like the number of possible board states is like more than Go, than Go is compared to chess. 

Jon:
Oh, wow. 

Alex:
And so even just starting with that super [inaudible].

Jon:
This is, I don't want to completely derail everything that you're saying, but you obviously know this game really well now. And when you started working on this project three years ago, you didn't know it. Are you yourself now a decent Diplomacy player? 

Alex:
I, I think I'm a decent Diplomacy player. I actually had I got into playing quite a bit online on webdiplomacy.net one of the more popular sites for playing the game online. And had the great pleasure to go to a couple in-person Diplomacy tournaments. What we're actually, 

Jon:
Is it not as a competitor or as a competitor? 

Alex:
As a competitor. 

Jon:
No way. 

Alex:
And played them it was actually the North American Championships this year and last year. 

Jon:
What? 

Alex:
And the world championships this last year. 

Jon:
Whoa. 

Alex:
I'm sorry. In 2021 and 2022. 

Jon:
Right, right, right, right, right. 

Alex:
I did not place highly but I would absolutely recommend anybody who's interested to try out these tournaments. There's quite a few all over. And the players are amazing and it's like a very intense and emotional experience. And I think I even feel like I like learned things about myself and like my personality that like I really had to grapple with when you're like sitting across this board with someone and you're like trying to convince them to work with you, and you're like, how do I convince you that like you should work with me and that I'm not lying to you, and how do I know that you're not lying to me? And like all of this was actually, it was a big challenge and it was quite a lot of fun to get into it. 

Jon:
So how does that, I said I didn't wanna derail you, now I am derailing you a bit, but when you're in one of these big tournaments, these human tournaments, how is it set up? Like there's individual rooms with like...? Yeah. 

Alex:
So you have, you'll have a board and seven of you stand around the table. Each of you playing with one of the great powers in Europe in the year 1900. So it's, it's kind of like leading up to World War I, this idea that like, diplomacy could actually be what prevents like the bloodshed and like the failure of diplomacy led to all of the conflict. And so here you are with the chance to come to a diplomat solution. The founder of the game actually said like, the ideal outcome is that everybody just ties. And in fact, many, if most games actually end in the tie and not an actual winner.

Jon:
Oh really? Like a 7, a 7-way tie?

Alex:
So a 7-way tie is very uncommon. And in that sense, that's kind of the ideal diplomatic setting because nobody is able to actually overpower anybody else cuz everybody just maintains like diplomatic stalemate. It's kind of a boring game if it goes that way. I played in one that did and usually everybody is angry and then satisfied. But yeah, so you stand around the board and then for each turn there's 15 minutes on the clock and you can like pair off and just grab someone, go talk to 'em away from the board and then come back, make your deals and create your alliances. And then you all write down your moves, you put them in the center and then the timer stops and all the moves just happen all at the same time. So everybody just has to sit and watch. Well, like all of the pieces are moved around the board. Then you find out whether your potential ally has actually gone through with what they promised they would do or if instead they've made, moved all their units up to your border and are about to just walk into the sanders that you left open because you trusted them too much.

Jon:
So that's the complete information part that you're talking about. Whereas in Go, in chess, in Risk, you can see at any given time, you know what the state of play is, but because people make their moves in advance secretly, and then everybody kind of exposes their moves after they've decided on what they will be, that there's this, there's this hidden element that's more like Poker. 

Alex:
Exactly.

Jon:
Nice. Cool. Well, it's great to understand that kind of broader context around how the game is played, especially these in-person human tournaments. In order for this Diplomacy AI, CICERO to be able to play, I guess that was only, that's only ever been used in online tournaments, right? Where there's no way for people to know that it's a machine. 

Alex:
Yeah, that's exactly right. Yeah. We did not have the ambition to try and hook it up to like a speech detect system with like proper like [inaudible] and emotions and all of that in order to, 

Jon:
Humanoid bomb. 

Alex:
Try and negotiate with people, yeah, yeah, live. So we kept it all to text. 

Jon:
Just put it in in sunglasses in a trench coat and no one will notice. 

Alex:
Or put one put like, you know, like monument next to the table, although you need to then also give it wheels so it can, can drive away to have private conversations. 

Jon:
Yeah. So online and I guess, so is it, did you say net Diplomacy, webDiplomacy? 

Alex:
webDiplomacy.net, yep. 

Jon:
Recently, in Episode #655, Keith McCormick and I discussed the importance of managing the machine learning lifecycle effectively. To allow you to learn about Keith’s approach to all phases of the lifecycle, he’s kindly making his “Predictive Analytics Essentials” course available for free. All you have to do is follow Keith McCormick on LinkedIn and follow the special hashtag #SDSKeith. The link gives you temporary course access but with plenty of time to finish it. Mastering machine learning project management is just as important as learning algorithms. Check out the hashtag #SDSKeith on LinkedIn to get started right away.

Alex:
I recommend it, go there, try out the game. 

Jon:
And, and if people do go there, they won't now theoretically know whether they're playing against a human or CICERO. 

Alex:
So our, we do not actually have our AI running on this site, although it's possible that somebody has stirred up a copy of it. Our model weights are available by request on our GitHub. But there is, you can play against a copy of our no-press one versus one bot. So we, in order to simplify the game, we actually, 

Jon:
I think we need the no press. 

Alex:
Yes. So no-press just means there's no conversation between people. Whereas full press is, is what we used to say, that you can talk to the other players. It's kind of meant to speak to an older version of the game where people would play by mail and you would call like the conversations like press releases from the countries. 

Jon:
Oh. That's wild. So people would play over the course of months. 

Alex:
Yes. Yeah. Yeah. The game's around since like the 1950s apparently. 

Jon:
Very early online version, done by post. 

Alex:
Yeah. Yeah. Yeah, apparently Kissinger really loved the game and I think maybe JFK too. 

Jon:
Yeah. It sounds like there's like real life skills you can be learning from playing this game. 

Alex:
Yeah. 

Jon:
Become more diplomatic. All right. So the last time that I interrupted you, you were about to tell us about how anybody can go and play the no-press version of this algorithm right now, right? 

Alex:
Yeah. So this specifically is a version of the game called France versus Austria, where there's only two players. And we kind of went back to this because this simplifies the game a little bit. It still has the much higher scale than Go that I already mentioned. But in this setting then you can still use a modified version of old techniques of self-play reinforcement learning to teach the bot the game. There we had to do tricks to deal with the scale. There, there's a model that we developed called Dora because of particularly ways that this model introduced exploration techniques. You know, when you have to, how do you, how do you get a model to sample from 10 to 20 possible actions? So it actually like has any learning signal to choose which moves should it should get better from. And there's ways that you can kind of like narrow this down to the moves that are most likely, most of those moves are terrible to play and like a human would never even play them. And so first kind of solved like the one versus one scaling problem. And indeed you can train the model from self play from scratch to play one versus one. And it can do very well. And you can play against this and... 

Jon:
No press.

Alex:
You'll lose every time with no language. Then even just adding the multiplayer component now makes it extremely difficult. If you train the model from scratch using self play, it cannot compete at all with if you put it against six humans where whereas the one versus one bot can destroy you. Like very good players can, can compete with it, but it's very, very, very good. And this is because even in the setting where you don't talk to each other there is both competition and cooperation and you have to actually play in line with the human norms in order to be competitive. So for example two powers who are next to each other even without talking to each other, can still ally just by moving their pieces away from one another. Right? And then after doing so and let's say each one takes territory, now they have another choice. They build a new unit, do they send it to attack the other person? Or do they say, no, this is working great and keep pushing units to the other front line. You don't have enough units to cover all of your sides. And so a lot of times it's very appealing to like, keep them moving in the same direction so that and just like bank on that trust.

Well, maybe leaving just enough of a guard force to like deter them being tempted to turn around and attack you. And so this like interplay between like this, this trust that is barely established, right? Cuz there's, there's not even a promise, there's no conversation on it. And like how you're moving your units around the board. This comes from a lot of human norms around like reciprocity and trust and like what when you see somebody move a unit in this way, what does it mean? What are they signaling to me? And so actually no-press Diplomacy because of this is often called gunboat Diplomacy. Because it's like, it, just about where, where do you put your guns and not like what are you actually saying directly to the other person. 

Jon:
Yeah. It's kind of obvious from the way that the board is set up at any given time. Like, oh, you have all of these units right up against this one border and this other border. You haven't needed them so far because this other power has been leaving you alone. So you just kind of assume that momentum will continue until maybe you start to notice, oh, that other power has now, like it's taken over the territory of all of the other territories it was butting up against. Now I'm the only one left. 

Alex:
I have to attack them to stop them from attacking me. And this definitely is all at play. And in general, Diplomacy actually does have a strong, like stop the leader aspect because if there's a tie, everybody shares the score. But if one person gets more than half of the board, that person wins outright and everyone else gets zero. 

Jon:
Yeah I, I don't play Diplomacy, but I do play Settlers of Catan, which is on my shelf here. Viewers can see on the YouTube version. So like, my board games are limited to Chess, Backgammon, Settlers of Catan and Exploding Kittens, which you got into as a family Christmas and is a very short, fun game that you can play with people of all ages. But in Settlers of Catan I have, I tend to be strong early, which ends up being to my detriment cause then everyone gangs up on me. 

Alex:
Oh, definitely. But yeah, so if you, if you wanna be competitive in this in this game, even without targeting each other, you have to follow these human norms, right? Like a self play agent. A lot of times we'll learn things like just attack everybody on all sides. Or you have an alliance, like it seems like a human would think that they're an alliance and like just betray it right away cuz who cares? Because that's how the bot plays. Like the bot doesn't take it personally if it gets betrayed. It just keeps playing from, from where it stands, right? Like it doesn't, it hasn't learned this kind of human behavior of like vengeance, right? Like a human who'd be betrayed often will then spend the rest of the game just trying to make the player who betrayed them lose instead of actually like play to maximize their scoring. 

Jon:
Didn't even care to win. 

Alex:
Exactly. Second and across multiple games is actually winning strategy because it teaches people to be very careful about betraying other humans, right? But the bot doesn't learn that in the self-play. 

Jon:
So when the bots are playing online, they have like a name that is a human sounding thing, or I guess Bot 3000. And so, so then people can end up even in these online tournaments, do you end up playing, you would keep noticing that Bot 3000 was like in the same games as you? 

Alex:
Yeah. So all of the online games were played where the game takes place anonymously. And so you don't, like the other powers are just named for their powers, right? Like they're playing against France, England, Germany and not not against like Bot 1250 right. Yeah. And so the names are revealed afterwards. And so afterwards where people were able to look back and see like, these were bots, so then, then you could maybe know this, right? And in fact basically to deal, so we dealt with this problem, we basically, the trick is regularized towards human behavior. So take a data set of human behavior and regularize a couple aspects of the model. So first is the planning procedure. So we have a, you know, in chess or Go you would have Monte Carlo tree search and you can apply this algorithm there too. In Diplomacy it's more complicated cuz you can't just like roll out forward when you can't, when it's non-deterministic, right? When all the moves are hidden, you don't know what the other players are going to make. You can't roll out into the future in the same way. 

Jon:
Oh yeah.

Alex:
You have to do this more complex planning process. And in that planning process, we do regularize towards what humans have done in all of the training games. And that helps you to predict like what is likely gonna happen on the human side and also moves that like, might be better for you to do because it's more like what human would do. 

Jon:
Right. So self-play alone doesn't work with these multiplayer games. In those cases you need to rely on training on human data to have some idea of from all of like the almost infinite number of possible moves at any given point. What kinds of moves are relatively plausible, relatively likely. And part of why you have to do that is because the Monte Carlo tree search won't work like it does in other games where you have full information. 

Alex:
Yeah. And, and so even in chess or Go where you have full information, you can actually still benefit from this in that it will make the planning process more accurately predict what humans will do. Right? Like you are, you are incorporating this like even if this wasn't like what the model thinks was the optimal move to do, a human may be more likely to still play that move for some reason. It may be that that is actually more optimal move and like the self play model didn't actually like learn that. Or it just might be that like humans are likely to play moves like that even if it's maybe not what the model thinks is optimal. And so we actually had a first paper on this that came out like about a year ago.

Jon:
The no-press version. 

Alex:
Yes. And it was specifically making this change to the planning process. But what you actually also need to do to get to work really well is also make the change to the self play process. So during self play you can also sample from human actions so that the model is trained to play against what humans would actually play. And more or less like the combination of these like there's more, there's more nuance there to make it all work well. But more or less the combination of these where like you are both training against human behavior and planning conditioned on human behavior enables you to play much more effectively in this setting where you're going to end up playing with humans. Like these bots can play well against other copies of bots, right? But when you're gonna have to play against humans, you need to incorporate human behavior. Specifically in these multiplayer settings and particularly when there's cooperation that's involved as well. Cuz you can't, like, you can't follow human norms of cooperation when you haven't seen any human behavior.

Jon:
Mathematics forms the core of data science and machine learning. And now with my Mathematical Foundations of Machine Learning course, you can get a firm grasp of that math, particularly the essential linear algebra and calculus. You can get all the lectures for free on my YouTube channel. But if you don't mind paying a typically small amount for the Udemy version, you get everything from YouTube plus fully worked solutions to exercises and an official course completion certificate. As countless guests on the show have emphasized, to be the best data scientist you can be, you've got to know the underlying math. So check out the links to my Mathematical Foundations and Machine Learning course in the show notes or at jonkrohn.com/udemy. That's jonkrohn.com/U-D-E-M-Y. Have other people tried, like do, are there other bots that people have had for playing Diplomacy historically, which obviously could not, couldn't come near the level that you guys are playing at, but is that something that you tested against as well? That they're like in the same way that you know, that with chess algorithms or Go algorithms, you have these competitions against like simulations against other bots?

Alex:
Yeah, so there's been a number of more like handwritten Diplomacy bots in the past. All all of this, no press, all of this without language. There, the first like deep learning based approach for Diplomacy AI came from Mila from Montreal. And they published that a couple years ago. And then basically about a year later both us and DeepMind then also published different approaches to applying deep learning to no-press Diplomacy. So we did, we do compare against their AI, yeah, but there's not, it's not like a like super rich field of bots in a way that like go, it's a little bit more like there, there's more available chess, there's more available. 

Jon:
And chess certainly is. And part of why I thought of this question is that when I was a kid I had this chess board with magnetic pieces and so I could play against a chess computer on this physical board. Like it wasn't, I didn't, there wasn't a screen except that it, it had this little LED display which would say, you know, move something from this square to this other square. And one of the frustrating things for me playing, I never really got super into playing against this chess computer, despite me wanting to be better at chess. Cuz I was at this super nerdy school where there were lots of kids that were great at chess and you'd play at lunch and we'd have tournaments and stuff. I was like, oh, I'll get this chess computer. But it would take so long between moves to compute. So when you guys are designing this and it has to be able to compete in real time, that must be a consideration that comes up a lot like the compute, like does, is it effectively real time or does it, does it need time to process? 

Alex:
Yeah, so the first so first a lot of this has to be done offline, right? So in order to actually fully understand these models, where the point of them is playing against humans, you know, it's expensive to have a human game. You have to get six people together. It's gonna take two hours. And so a lot of the work is first on offline, we'll like, play the bots against each other and measure things like this. And then finally when we're ready, we would set up a series of games. And the culmination for this for the no-press models was a tournament at the beginning of last year. So I think this may have already happened just, or have just finished when we met where the bot actually won the tournament. And we started with a version of the bot that was like lightly conditioned on human play. And it turned out that like people could actually recognize it. It was like doing things that were slightly awkward and things like this. And then we just turned up the regularization parameter and the planning. And then basically from there it was very, very good. Close to perfect. The best human in the tournament said like, the only way that you could tell which player was the bot by just looking at the moves was who's doing the best. It's probably the bot. 

Jon:
Oh, wow. 

Alex:
And 

Jon:
That's, sorry I'm interrupting you again, but that's, it's interesting, that's so different from a game like Go where the AlphaGo algorithm that DeepMind created was described as having like alien moves. And in that kind of situation where, you know, you don't need to be negotiating and forming alliances, it doesn't matter that the algorithm comes up with completely foreign kinds of moves. 

Alex:
Yeah.

Jon:
But in a game like Diplomacy where even in the no-press game, you're still picking up on subtleties of how people are playing to try to figure out who you should be forming alliances with or whatever, there's, if one of the players you're playing with is doing these kinds of alien moves, you probably are inclined to kind of as a group pick on it. 

Alex:
Yeah. And I do think at that point in the tournament, players may have been intentionally looking for it and intentionally banding together to destroy it cuz they knew they were playing in the tournament against a lot. But then, so to actually answer your question, this tournament was five minute turns. And so it's pretty quick. And it does have to make its moves in a reasonable amount of time, on the other hand, five minutes for a neural network to return to move, like, not super terrible, especially when it's not doing anything else. Like there was no language at this point right? And so you know, it's not like super cheap, but it's also, it's okay. Yeah. Five minutes is like a reasonable amount of time to think. Where that got more difficult is with the language model. Right? Cuz then you have to respond fast enough for people to respond back to you and back and forth. 

Jon:
So that, yeah. So once you jump to the press version that when you say like, let's say, I don't know how standard this is, but I guess let's just assume that like a five minute between five minutes between moves is standard. And just to recap for listeners why that's significant is that, so you have seven players and I guess it's every five minutes that you submit on a piece of paper, I guess in real life or something, what your movie's going to be. So that whole time throughout that five minutes, I guess any amount of conversation can be happening. 

Alex:
Yeah, exactly. Exactly. So this, this is really the hard part of Diplomacy, right? Everything I just described, already hard, right? But like, the real hard part is now do all of that while having strategic, convincing conversations with people that can go basically as fast as you can send messages back and forth, right? And with six people at once. And of course, in-person the norm is usually 15 minutes because then you can have time to pull people aside, talk to them, then go pull somebody else aside. But when you're playing online, you can just type away, talk to all six players at once, and it can be a little overwhelming. You only have typically two or three players who are actually like neighboring you. And so those are usually the people you focus on. But there's a lot of conversations to be had. There's a lot to do in the turn to like have those conversations and then actually come up with the move that you're gonna play. 

Jon:
So yeah. So like, I guess in broad strokes, how did you guys make that jump from a year ago having a success in the tournament, in the no-press tournament? It might have even seemed at that time, like the goal from being able to play at expert level in no-press to press would be enormous. And yet here we are sitting a year later and you've published on this enormous success of being able to compete in the top decile, right?

Alex:
Yes, exactly. Yeah. And so fortunately we were actually working on full press Diplomacy in parallel. And so had already been making progress on how do you set up these models to actually say intelligent things to the other players and things that are really grounded and accurate and precise, right? Because I think that's a big weakness of these other really large language models is they're very, they're like unbounded in a lot of ways. They have, they get a very small amount of context which is like the prompts that you provide to it, and then they're expected to output something that is like high quality, ideally accurate and things like this. And this is like obviously a huge challenge to do. But how, how do you actually get those to work better in a setting where there is more grounding, right? There's more context. There's like in the case of Diplomacy, you have, you know, the game board, the conversations that you've already had, the rules of the game. 

And like then actually like what does the agent want to accomplish in that and how does it do things that will actually help you with that? There's a lot more nuance there, but then hopefully you can use that to get the models to actually say something like, better than, you know, spitting out a message that just looks like it could be from a game of Diplomacy, which that's what you get if you just fine tune a large language model on Diplomacy data, right? It'll say things that sound like Diplomacy but it'll, it'll do things like, it'll suggest convoying a fleet which is definitely not possible. It will suggest moving to territories that are unreachable and things like this. So completely contradicting the rules of game, the moves that have happened so far. 

Jon:
Right, right, right. Because it, because that LLM would've initially been trained on like a sense of global geography, not just the geographies of this game. And yeah, it, it would, yeah, its parameters would be tuned on what's possible in a, with a real life fleet as opposed to what's possible in just the game. 

Alex:
Yeah. And I think there, I don't think there exists enough Diplomacy data to fine tune these models for long enough to actually get them to like a hundred percent accurately say things about the game. Not, not that I think we got to a hundred percent, but the way that we were able to train these models, I think got them a very large portion of the way there and kind of showed how you can in a more grounded setting, get the models to behave much better. And, and I think more like intentionally rather than just like likely tokens after our prompt.

Jon:
Right. So you didn't start with like a regular LLM that's just trained on internet language, but we've heard a lot in recent years about transformer architectures a particular kind of deep learning architecture that is capable of contextualizing over very long passages of text and can carry on long conversations compellingly like ChatGPT can. So does your language architecture involve transformer architectures? 

Alex:
Yeah. Yeah. So there's, there's kind of a bunch of different pieces to our model. On the language line, the foundation is a BART model, so this is a transformer-based model, which is an encoder decoder unlike GPT. It, so it takes in a context, it encodes a state and then it can attend to that context in order to fully generate a new output, like in this case a message. And kind of there, so there's a few different pieces to how we actually use this. So we have that language model but that model needs to know what to talk about, right? So the naive thing that I kinda mentioned before is just feed in the conversation history and the game board state, try and fine tune the model to produce the messages that a human would've produced in that setting. This like starts to look like it's working, but it says things are inaccurate. 

The other thing is if you're asking not to produce the moves that you're going to make it also is hugely exploitable. And this is kind of like a somewhat known already in the language model like negotiation literature say papers from Mike Lewis where if you put a language model on negotiation task and you train it on human data and then say, "Model thank you for agreeing to this deal, that is amazing for me that like, I'm so glad we had this conversation and thank you for agreeing to this". Humans actually only say that to each other after having agreed to something. And so the model thinks that it has agreed to something and then just gives you whatever you just said. Right? It's like kind of amazing to actually see this happen in the bots. 

Jon:
It's like, it's like backdoor. 

Alex:
Yeah, it is a backdoor. I mean, you know, it's like the like safety skirt prompts but for negotiation bots, it's just tell them that they've already agreed and they'll just give you whatever you ask for. But so that, that was kind of an obvious problem. And so it's kind of like we had to go back and say, okay, what are we gonna feed into the language model? And so this is where actually the no-press models come in. So we can use those models that are like this to generate plans for the model and then if they can condition on those plans, then they can now talk about that in order to have a more guided conversation. 

Jon:
So the no-press model weights were useful in your press model. 

Alex:
Sort of it required a few more improvements like in no press, if you're trying to do planning, you don't have to factor in the fact that other people are talking to each other. Like other people's moves are, are relatively like, well they are like uncorrelated because they can't plan them. And so you have to adjust your planning procedure to handle like correlated behavior. And a few other nuances like this that like we had to adapt it quite a bit for the language setting. But then you can output plans and say like, look, this is what I wanna do now let's talk about this. And in fact, we actually would create per player plans. So if you're France and you wanna talk to England, the model will actually output a plan that is, here's what I wanna do, here's a move that England is somewhat likely to do and this is what you should talk to them about. Like it's actually plans both for you and for your conversation partner. I mean, so it makes the conversations much richer cuz not only does it have these actually like strategically sound plans to work with that don't have to be stuck in the language model, but also all this information about like how things are working, what like valid moves are and stuff like this doesn't have to be as carefully encoded into a language model. 

Cuz if it is told these are the moves you should talk about, then it doesn't talk about illegal moves cuz it's is looking at the moves it should be talking about. And this grounding helps a lot and it even helps the model to better model the language. Like if you look at the like perplexity right, or measure of how well a language model learned a piece of data providing the model with plans to condition on actually lowers the perplexity by quite a bit. It's easier for the language model to learn the language when it's not also trying to learn the rules. Which, so I kind of skipped to the output of the model, which is you give it plans and can condition on them to produce language but you actually have to teach the language model to do that too. Right? So that means we have to stick plans into the training data. And this is actually also hard, right? Like the data you start with is, here is the conversations that people had and here is the move that they made, but like the move that they made is not necessarily what their plan was when they sent that message. And so we have to kind of do this like inference of what was the person likely to do in the, like what were they trying to do in this moment? 

Jon:
Wow. 

Alex:
And kind of say like, you know, what moves increased in likelihood after this move is often is like a way that we kind of framed it. And by inserting that into a training set, then you can condition the language model on those plans. And then at test time, you know, when you're playing a game you can actually feed the plants in. So it's like, it's kind of a complicated modular architecture here with like all these different steps. But by doing this then you actually get this like very powerful language model. And, yeah. 

Jon:
Yeah, the science paper, which will include a link to in the show notes the science paper that reveals CICERO and describes this top decile performance in press Diplomacy, it talks about different submodules coming together in order to form the overall CICERO architecture. Are you, you've obviously alluded to some of them here. Are you able to kind of give us an overall sense? You know, we can't have the podcast episode go on for hours and hours and hours. People can refer to the science paper to get all the details, but just at a high level, what are kind of the key submodules and how they interact?

Alex:
Yeah. And so the, that you, I kind of already hit those at the, at the high level, right? Which is like you have a language model that needs to be trained to condition on plans, which you had to get a separate model to do to infer those. Then you have the planning like strategic reasoning if you will, model, which consists of both self play reinforcement learned model, which was regularized for human behavior. Then there's a planning apparatus on top of that also regularized for human behavior. And then that produces plans that go into the language model. The language model produces a bunch of outputs and then we have a bunch of filters on top that help to clean up those outputs before we send them. So, maybe that is doing like one final check for like saying things that don't make sense or like are illegal and or are like offensive. We tried to filter out things like that. That's the gist of the architecture.

Jon:
That's nice that at some point you were like, you know what? We need to make sure that we're filtering out offensive things that your Diplomacy, does that happen? Is that a way that we could tell that we're playing against it, your agent? Because humans might just be like, all right, f* off, but yours wouldn't.

Alex:
I mean, we did the best we could. It's definitely not always perfect. And some of it is even just like irrelevance. Like there is this one funny example where the model either somebody sent it messages too quickly for it to respond or they just like the model had filtered out all of the messages it was going to reply with the first time. And then somebody sent it this message like, where are you? Like like, I need to hear from you. And the model replied like, "Oh, sorry, I was on the call with my girlfriend" like. And it's like, ok, that's clearly not true. Right?

Jon:
That's wild. 

Alex:
And doesn't make any sense. And we did try to filter out things that like, were not actually like in context correct. And even like real players would talk about things like discord channels and stuff that like, we don't want to refer to this kind of meta stuff either. 

Jon:
So the bot needs to talk about discord channels in order for it to seem realistic. If it talks about a girlfriend, you know, it's lying.

Alex:
Yeah. But so that's kind of like a bunch of the things the filters we're targeting as well. Right? Just like irrelevant messages that like didn't really make sense cuz it's so easy for that to hallucinate and make a mistake. 

Jon:
Super cool Alex, congratulations on this tremendous accomplishment. I don't know if you heard this particular episode, but in my recap of, so I guess my final episode of 2022, I recapped the enormous achievements and some of those were obvious AI models in our space that everyone's heard of, like ChatGPT or like DALL·E 2, but I included CICERO on the list because this is tremendous and everybody should know about this achievement cuz it's incredible. So congratulations Alex, what's next for you, what kind of research is lined up? Like what, you know, having achieved this, you set out on this 10 year project that took three years to be able to compete at a high level at the most challenging game that you could think of. So yeah, I mean was there even like as an emotional state once you like published the paper and stuff, were you, was, were you kinda like, like, I don't know, when I trained, when I spent two years training for a marathon and then you finish it and you're like, all right, so now what?

Alex:
Yeah, totally. Totally. Yeah, maybe I will slightly go back and give just a tiny bit more detail on like how did we know this was actually good? Right? I think, I don't think I actually mentioned that you mentioned it was in the top decile of the players that played against. But I I think that's, that's like a good, that's a good start, right? Like it, it, it, it was able to play well, was able to win games, it was able to perform consistently at a high level. Definitely not like there's nowhere near superhuman. Like we would not claim that, there are players that beat it in the tournament that you played in. But I think the other thing that we really appreciated was it was actually able to play in 40 games. So that's about 72 hours of gameplay total, sends thousands of messages. We mentioned the paper, I think it's something like 6,000 or 7,000 messages and was not actually recognized as an AI. So this is something in particular we're very proud of, so I just wanted to throw that in there. But anyway, so we got to the end of these 40 games, we were able to publish the results and absolutely, I, like, I felt a, definitely a huge [inaudible] of like, oh my gosh, we made it. And honestly, I wasn't even in the project for the full three and a half years. I actually have just been a part of it for the last couple. And so other members of the team have worked harder and longer than, than me. So I hope they feel just as relieved and proud. 

Jon:
And now it actually be a really good time maybe even just to mention in that context, so you are a senior research engineering manager at FAIR. So what does that mean in the context of being involved in a project like this? 

Alex:
Yeah, for sure. So my role in the project like this is more to make sure the project is running smoothly. That people know what each other are doing, they know what they're doing and why they're doing it. They are like getting all the context that they need to do while they're getting the help that they need to do it. All of like the apparatus around it is moving forward, like as we're preparing for the launch, working with our product manager, our marketing team, our comms team our video team, and like all of these different roles that come together to actually make the launch successful. Kind of helping to make sure that our team is coming to them with all the content that they need, like the all of that. So, you know, helping to grow the careers of everybody in the project as they're going through it and like all of that.

Jon:
Kind of normal management stuff. 

Alex:
Exactly. Exactly. 

Jon:
And so then, so the CICERO project was already going on to some extent, and you began working on this perhaps alongside other research projects that you're also managing simultaneously. 

Alex:
So yeah, I had two projects that I was supporting during this time. So Diplomacy was one, the CICERO project. And then the other was a project based in France that was working on automatically proving mathematical theorems. So yeah, they have resulted in Europe's last year in [inaudible] this year. 

Jon:
Wow. 

Alex:
Yeah, they, they were able to get to a point where they were able to solve 12 international math Olympian problems using AI and yeah. So that, that's exciting as well. Obviously a different topic, but

Jon:
So, alright, so I kind of want to give you the, I wanted the audience to kind of have that context around you know, how you fit into this FAIR puzzle, but I interrupted you as you were just explaining kind of what was next. 

Alex:
Yeah, for sure. So I think obviously we're, we're really excited about and proud of this result. I think there were a few things that we were able to start seeing some signal that they would be interesting but didn't get to fully explore them. And I think one big area of that is like, how do you bring reinforcement learning techniques and like planning techniques more to bear in getting good results out of language models? I think like kind of the, like one aspect of this is kind of just like, how do you pour more compute in at infants time and actually get something more out of it? In some ways, this is what we're able to do with our model, although that's way oversimplifying it is like, by putting all this compute in the planning, we can then give the language model something much more like grounded and intentional to work with so that the language model outputs look more grounded and intentional. And I think there, there's still more room to dig into that. Like one, one, even just interesting technique that, that we tried and were able to get a little bit of initial success in is this like value-based filtering.

So we trained a value model on the messages that the model was sending. And we would say like if the message that was sent like was rated as drastically lowering the model's value don't send that message. And this was actually able to filter out messages that were like strategically catastrophic. So for example, if let's say you have a unit that is neighboring two of an opponent's centers and they also have a unit that's neighboring two of their centers, because our model is always conditioned on the actual plans that is going to do it doesn't really lie, not intentionally. And so it will, what can happen is the model can actually like accidentally tell the other player what move it's actually going to do. And if they know that then instead of a 50:50 shot blocking, they can actually just go ahead and block the move that the model was gonna do. And there's a huge mistake for the model to send these messages and the value-based filtering connection detect this, that like the, by sending this message, I will more likely have a lower score at the end of the next turn than if I don't send this message. So don't send it. And I, you know, we weren't able to get that to a point where you could actually use that to like choose which message you generate or things like this. 

But I think there's more there. And I think in general there's like a lot of room for continuing to look at how reinforcement learning based techniques and how these like planning techniques can improve the outputs of language models. I think we're already seeing this with like RLHF work that's happening to tune the outputs of language models like ChatGPT to say things that are more what humans are expecting of them. Right? An example is maybe like if you, if you ask a model like a math problem it maybe is just as likely to finish the answer with like a quote, right? I, I'm actually, I borrowing this example from Sergei who does annotations in this space where they, you know, if you say two plus two equals blank, language model might say like two plus two equals blank, four plus four equals blank, Sam couldn't remember the answer to either one. Rather than actually just outputting four, which is what you want. And so, you know, all these techniques to try and push language models more towards like, what, what is the human actually expecting you know, their supervised learning of like instruction fine-tuning, there's RLHF to like rank outputs that are like as entirety good outputs. But I think there's still a lot more work to do in this space. 

Jon:
Cool. Well that's exciting. And at the beginning of this conversation, one of the things that I learned is that a lot of this, what seemed to me on the surface to be relatively basic research, basic AI research does in fact tend to have a lot of applications within the real world, including within Meta itself. So how could this technology, how could CICERO, the work that you've already done and published on, be applied to the real world or the metaverse say? 

Alex:
Yeah, I think you know, obviously we only applied it to the Diplomacy. The actual agent CICERO could only play the game of Diplomacy, but I think what we see here is, is a really convincing example of a grounded language model that actually can have goals and actually execute on them. And so I think other settings where we might see that is like there could be like digital agents in the, in the metaverse there, or, you know, even NPCs in the video game is maybe more approachable example where like, you know, you could think of something like, you know, the, you're playing Skyrim or something, the guard at the gate perfectly fine for him to just be, you know large like GPT model or something like sounding things that sound like what a guard would say. But you don't really need to interact with him. But if you talk to like, you know, the king of the, of that city, you want him to actually be able to have like a back and forth with you of like, here are the things that he wants. You know, he wants someone to go kill this monster. He wants more information on his last son. Like all this and you want, I don't know, treasure new equipment like the next piece of information move you to the next stage of the quest. And so you can actually train models knowing that these factors are at play and that conversations should be grounded on them and actually generate like much more nuanced, grounded conversations. 

Jon:
Wow. 

Alex:
I mean, is there, that's maybe like a gaming example, but you could also think of other settings where there's like a virtual assistant of some kind that needs to understand all of the actual context that is at play, what the human is actually wanting and not just be able to produce things that sound like they make sense but actually like take actions and talk about those actions and yeah. Rather than just kind of like the scripted commands that we have today which are more precise but it's cuz we can't do things like this yet.

Jon:
That is super fascinating. Yeah. So that is a big limitation that, you know, anybody who spent a lot of time in ChatGPT, sometimes you're getting bits of conversation that seem totally relevant, but other times it's going, it ends up going off piece. And so what you're saying is that this research, this CICERO research is helpful because it, it shows that given the right constraints, you can have a grounded natural language model that is specific to some particular kind of task, whether it's virtual, like you're describing some conversation in the metaverse in some video game or like a virtual assistant. And this conversation will be grounded and factual and helpful and action oriented in a way that the other kinds of incumbent LLMs often do astray.

Alex:
Yeah, I hope so. And of course you need to like find data that can actually like teach them all to this, right? Like we talked about having to infer the plans that humans had before they sent a message. Like there, there's still like work to do to actually get it to apply to a particular setting. But I think the techniques we showed kind of give you tools in the toolbox for actually doing that. 

Jon:
Super cool. All right, so we've learned a little bit now of your role in the whole project. So your, your role as an engineering manager. So some specific questions that occurred to me as, as you were talking about that are related to how we tackle big, long-term R&D projects like this? So when you set out on what's potentially a ten-year R&D project, how do you decide what to do first? Who is needed? How is the project structured? How do you decide on your R&D roadmap? 

Alex:
Yeah, that's a great question. I mean, in this case, you know, I was not, I was not there for the beginning of the project. So I'll kind of, I can infer some of that and then talk about my experience in other projects too. I think here the starting point looks like a harder version of problems that we already had, right? Like Diplomacy, no-press Diplomacy in particular was introducing a bunch of difficult problems. But that we, they didn't look so different from problems in Poker, in Go, in some of these other games that we already work on. So we kind of like can see already the kinds of techniques that we may need to develop to get started. And also to our advantage, like other researchers were also working in the space, right? So Mila had published their results and that kind of gives some starting points to the kinds of things that you can try. I think with a research project like this, you don't, you don't need to necessarily know the answer to how long it's gonna take and like everything that you're going to need now, but you can start to look at the problem and like, okay, what are the challenges that we're gonna face? And what kind of talent are we going to need? Like, okay, this problem is a lot higher scale. We're going to need strong research engineers who can handle that scale, right? Like, who can help us to scale up our reinforcement learning algorithms who can help us to scale our planning algorithms make those faster, make them work better. 

We are going to need access to GPUs cuz we're gonna need to train like a model for longer in order to deal with this. But also like, like just training it for longer or bigger or whatever, like isn't going to actually enable that full leap. Like you're gonna have to come up with clever ideas to actually do that. And so in our case, like the team also had like clever like game theorists who could come up with like equilibrium finding techniques that like, don't you need to use when you can't just, you know, use MCTS like you can in chess. And so there's like research talent, there's engineering talent that you need. And then of course, you know, looking at what was there that's still only covered no press, right? There wasn't really like substantial language modeling expertise on the team. And so then the next step was bringing in research scientists, a research engineer who had expertise in that space and giving them the runway to work on that part of the problem. And then, okay, now we need more talent on that side to like bring in another research engineer, bring in another research scientist and like give them more like capacity to keep working on that problem. So I don't, I don't know if that was like a very nuanced answer to the... 

Jon:
No, that's perfect. That was perfect. Yeah, that was a huge amount of detail. 

Alex:
Yeah, it's kind of like finding, finding the mix of talent that you need both like on the scientific side and on the engineering side you know, getting hold of the compute you need, getting hold of the data you need, but maybe that's always the answer, right?

Jon:
Yeah. But it's kind of nice to hear you talk through it and think through it. Yeah, I learned a lot from that answer. So for the audience too. So speaking more a bit about your background and how you ended up being a research manager. Why did you choose to do research at a big tech company as opposed to say in academia? 

Alex:
Yeah, so I actually joined the team straight out of my undergrad. And in this case I kind had kind of a software engineer generalist, but with a great interest in machine learning profile. And the team was basically looking for more engineering muscle to build into the team. So the, at that point there were a lot more research scientists than, than engineers. And a lot of the engineers were a little bit more focused on infrastructure obviously critical to AI work actually happening. But I think they were seeking to build a little bit more of a model where there's a daily collaboration between engineers and scientists. And so joined in that role particularly working in NLP and conversation, like I question-answering this kind of work. And yeah, I think for me, like, you know, I didn't, I never followed the path to the academic research lab, right? I think the next step would have been to master's or a PhD. 

But instead like was really excited about machine learning, found this team that was somehow willing to take me on to be a part of that and get into it. And I, I've honestly, I've loved it. It's been an amazing lab. Their ability to still create a ton of autonomy for the researchers so that they can really like carefully choose the problems that they're working on is like the right problems to push and stay of the art and AI forward. And still being held accountable for doing that, right? But like having the freedom to really like carefully make those choices themselves and not have everything just pushed top down. Well then being able to have strong engineering talent working alongside that has been exciting. 

Jon:
Yeah, you get access to enormous human resources as well as compute resources. 

Alex:
Of course. 

Jon:
Academics probably wouldn't have access to. Super cool. And then you're directly tied in a way that I wasn't even aware of before we began this conversation today. You are directly tied to real world applications, even in the relatively short term, unlike an academic might be.

Alex:
Yeah, exactly. There, there is always the pull to try and find a way to actually apply your research to a real world application within the company if you're working on something that feels appropriate to that. There's definitely, you know, there's some research that is super long term. And to be honest, I actually have not had any impact on a specific product the entire time I've been working there myself, I think my next project maybe, but the, yeah, not yet. But plenty of my colleagues have. And I think it is a rewarding opportunity and it kind of, obviously it like way further validates your research if it actually is helpful to a product itself.

Jon:
Yeah. It sounds like it's just a matter of time, especially with something like the CICERO project. So how does somebody become an AI research manager like you have in a big tech company? And related question, why have you decided to pursue a master's now in computer science despite getting all of this valuable real world experience? It seems like the work you've been doing over the last year since your undergrad, you know, now coming on eight years at Meta doing research, you've surely learned a ton that you would've done in a master's or a PhD, but you have, you are formally pursuing now you're nearing the end of a master's in computer science at NYU. So there's two different parts of this question. I don't know which one you want to tackle first, but like, yeah. 

Alex:
Yeah. I can start with the main question. So I think for me, I really enjoyed the opportunity to shift my focus a little bit from just like how to get things to succeed technically and focus a little bit more on how to help people to succeed. And like, I think there's always, especially, especially in an environment that is giving so much autonomy to people but still needs to hold people accountable for making research progress. I think there's, there can almost be a little bit of a trick to like, how do you do that well? And like, how do you show the work that you're doing is impactful? How do you even like structure your work in a way that like you can describe well, how what you're doing is making advancements and things like this. And I think right, like rightly rewarding the people for their, for their good work, but like that takes a little bit of narrative to it, I think sometimes. I think I enjoyed like that part of the process helping people to be successful and that helping people to grow their careers, helping people to like, make connections with other colleagues who are working in similar space. 

Spending a bit more of my effort, like even tracking what my colleagues were doing rather than like focused on what I was implementing working with more different types of roles within the company than I think I necessarily would get to do if I was just doing engineering work. And I think in lot of ways got to be exposed to a lot broader set of research projects and supporting engineers in different domains and different parts of the world. Like I was able to learn a lot more, I think in a broad sense. Then they would have focused directly on their research engineering work. Love doing research engineering work when I find a chance to do a little bit. But these days that's not that much. But yeah. Different career path, like a lot more focus on the people than on the technical side. 

Jon:
And then that, that makes the second part of my question kind of even more interesting, which is, so if in this research manager role you are shifting a decent proportion of your attention towards helping other people succeed as opposed to the technical challenges, why do a master's in computer science instead of like, I don't know, like management? 

Alex:
Yeah. I mean, I think the, I think I felt like there was still room for me to grow technically. That was hard to do alongside the people work fully. Even just the first time I trained a convolution network was actually in the computer vision class at NYU and not in my work. And so having that gave me a little bit more like depth in other areas of computer science and specifically machine learning that I wasn't really getting to dig into. I think it, it's also giving more engineering depth outside of machine learning. Like this semester I am doing classes in distributed computation and multi-core computation. So kind of like core engineering topics that I haven't had as much of a chance to really dig into, but still are critical topics for like working with a lot of data, working efficiently with the compute machines that you have. And so, and I think like technical managers can make better managers. So I think making sure that I'm still like technically honed and like can really understand what my reports are doing and why they're doing it, and like help to counsel them in the right direction and like makes me a better manager and so I kind of wanted to make sure that I'm fully there and yeah, so I think that's all lot of the reasons.

Jon:
Great. Yeah. And those are great reasons. So then for our listeners, a question that I'd love to ask our guests, and since you are a technical manager, you'll be able to answer this, is what kinds of software tools do you use regularly day-to-day? 

Alex:
Yeah. So I'm, I'm mindful of the episode that you just had with Keith. I think it was. 

Jon:
Keith McCormick.

Alex:
Yeah, Keith, Keith McCormick. Yeah. And kind of talking about how you guys spent some time talking about no or low code. And I don't disagree with everything, with anything the two of you said. That being said, I think we definitely fall into the like cool guys camp where we're really like full code. Like there's very little automated tooling. We're like very bare bones in a lot of what we're using. I mean, we're using PyTorch. And then apart from that, like it's up to individual researchers. Sometimes they're, they're using like a little bit of flavor on top of PyTorch, like PyTorch Lightning or something like this. There's like, of course other libraries and machine learning that we'll we'll pull from. But there's, there's not much on top of that. We have some things like Slurm we use for scheduling jobs. But like other like data analysis tools, like it's, there's not that much there. It's a lot of like custom PyTorch. Because I think in a research setting we really want to have control over every single aspect of what's happening. And the parts that are abstracted away in a more automated tool, which can be actually quite good to abstract away. Cuz like you pointed out, like, you know, it's easy to introduce mistakes on accident in those parts. And if you don't do this very carefully, then your results may literally mean nothing. And those things can make you able to explore complex data in a much quicker way. I think for us it's like we may want to even be changing those pieces. And so like it has to be very much in our control. 

Jon:
But so you're using a bit of the PyTorch library, so it assumes that kind of the lingua franca, like in much of AI and data science is Python. 

Alex:
Absolutely. Yeah. It's close to a hundred percent Python. Every once in a while, somebody will like dig under the hood with something in C++ in order to get better performance. But yeah. 

Jon:
Well, awesome. Alex, this has been an amazing conversation. I've thoroughly enjoyed it and we've covered so much ground. I am a much wiser AI practitioner having gone through this conversation with you. No doubt, lots of audience members out there enjoyed it a lot as well, but all good things must come to an end. And so you get to my final questions now. The pen ultimate one is, do you have a book recommendation for us? 

Alex:
Yeah, a lot of the books I've been reading are textbooks and I don't necessarily recommend those. But I, the one I enjoyed most recently was Starship Troopers. I spent like a full weekend just diving into Starship Troopers, read the book, watched the movies and played through the whole campaign, which was on for the video game, which was on sale on Steam and just like got into it. So that was quite fun. That's my recommendation.

Jon:
Nice. And then how should people follow you? You're a deep researcher, so maybe, you know, you're, you're not posting on LinkedIn every day, like, someone like me is, so maybe it's like Google Scholar. 

Alex:
Yeah, you can definitely, I'm, I'm happy to take pings on LinkedIn or Twitter, but I do not spend that much time posting on them. Google Scholar definitely is the most up to date. Yeah, not, not a huge online presence. 

Jon:
Nice. Very cool. Alex, thank you so much for coming downtown recording with me in person and making this amazing episode on AI research, particularly the CICERO algorithm. Congrats again, huge accomplishment and yeah, hopefully we can have you on again in a couple years when you have another landmark AI paper to share with us. 

Alex:
Thanks a lot, Jon. Appreciate it. 

Jon: What an episode and an honor to be able to hear about the state-of-the art AI research right from the horse's mouth. In today's episode, Alex filled this in on why Meta invests in fundamental AI research, how Diplomacy blends Risk and Poker to create a game that Meta suspected it might take 10 years to master, but they built CICERO to perform in the top decile at Diplomacy in just three years. Alex then detailed how the CICERO algorithm works, including its most important submodules and the encoder-decoder transformer architecture that is involved in its natural language understanding and generation. He talked about how low-level Python is the lingua franca of AI research with Meta's own ubiquitous PyTorch library playing a key role. And he talked about how CICERO could prove useful in the Metaverse, as well as real world applications where actionable, strategic and highly targeted natural language conversation is desired. As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Alex's social media profiles, as well as my own social media profiles at superdatascience.com/663. That's superdatascience.com/663. 

If you enjoyed this episode, I'd greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel, and of course subscribe if you haven't already. I also encourage you to let me know your thoughts on this episode directly by following me on LinkedIn or Twitter and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show. Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience podcast for you. And thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the SuperDataScience team for producing another extraordinary episode for us today. 

 For enabling that super team to create this free podcast for you, we are deeply grateful to our sponsors whom I've hand selected as partners because I expect their products to be genuinely of interest to you. Please consider supporting this free show by checking out our sponsors' links, which you can find in the show notes. And if you yourself are interested in sponsoring an episode, you can get the details on how by making your way to jonkrohn.com/podcast. And thanks of course to you for listening. It's because you listen that I'm here. Until next time my friend, keep on rocking it out there and I'm looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 

Show all

arrow_downward

Share on