Jon Krohn: 00:00:00
This is episode number 559 with Melanie Subbiah, PhD student at Columbia University and a lead author of the first GPT-3 paper. This episode is brought to you by Neptune Labs, the metadata store for MLOps and by MLconf NY, New York’s machine learning conference.
Jon Krohn: 00:00:22
Welcome to the SuperDataScience Podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.
Jon Krohn: 00:00:53
Welcome back to the SuperDataScience Podcast. Holy moly are you ever in for a treat today with the rock star, Melanie Subbiah. Melanie was a lead author on the first GPT-3 paper in case you haven’t already heard of it, GPT-3 is a natural language processing model with 175 billion parameters that has demonstrated remarkable few-shot learning on tasks as diverse as translation between languages, question answering and performing three-digit arithmetic. Melanie’s paper sent shock waves through the mainstream media and was recognized with an outstanding paper award from NeurIPS, the most prestigious machine learning conference in 2020. Melanie developed GPT-3 while she was working as an AI engineer at OpenAI. One of the world’s leading AI research outfits. She also previously worked as an AI engineer at Apple. She’s now pursuing a PhD at Columbia University specializing in NLP. She holds a bachelor’s in computer science from Williams College.
Jon Krohn: 00:01:53
Today’s episode does have technical elements here and there that will appeal primarily to practicing data scientists, but Melanie and I put an effort into explaining concepts and providing context wherever we could. So hopefully much of this fun laugh-filled episode will be engaging and informative to anyone who’s keen to learn about the state of the art in natural language AI. In this episode, Melanie details what GPT-3, is why applications of GPT-3 have transformed not only the field of data science, but also the broader world, the strengths and weaknesses of GPT-3 and how these weaknesses might be addressed with future research, whether transformer based deep learning models spell doom for creative writers, how to address the climate change and bias issues that cloud discussions of large natural language models and the machine learning tools that she’s most excited about. All right, you ready for this epic episode? Let’s go.
Jon Krohn: 00:02:52
Melanie, thank you for coming to Manhattan and filming this episode with me in person. I’m so excited to film it. I’ve been excited about it for weeks. So your journey here was pretty easy, right? You live in Brooklyn.
Melanie Subbiah: 00:03:05
Yeah. Yeah. It was nice to just come over here for the day.
Jon Krohn: 00:03:07
Nice. Again, it’s a beautiful day here in New York. If you’re watching the YouTube version of the episode for the first time ever, I’m filming with my windows open, hopefully there isn’t too much street noise, but it’s just a beautiful sunny day here in New York, spring seems to be on its way. All right. So I know you through Claudia Perlich, she was in episode number 437 of the podcast, and she was alongside you in a Wired video on explaining machine learning at five have difficulty levels. So that was hosted by Hillary Mason and it’s a great video that I highly recommend to listeners to check out anyway, because whether you are a practicing data scientist, looking for a way to be able to explain what you do better to people at cocktail parties or family events or whatever, or whether you’re just getting into data science and you want to learn about what machine learning is. Hilary speaks to an elementary school students, a high school students, a grad student, and an expert, that’s only four levels, so I missed-
Melanie Subbiah: 00:04:12
The undergraduate also.
Jon Krohn: 00:04:13
An undergrad as well. So elementary school, high school, undergrad, grad school and then so expert was Claudia Perlich and you were the grad student.
Melanie Subbiah: 00:04:23
Yes.
Jon Krohn: 00:04:24
And we’ll get to the grad student thing later. But from the very beginning, I was like, “This is no ordinary grad student.” It doesn’t seem like in a lot of ways as will become clear to the listener immediately, you are also already a deep expert who happens to be in grad school. So we’ll talk about that more later in the program. First let’s dig into how you’re definitely an expert, which is that prior to going back and doing your PhD, which you’re doing now, you worked at OpenAI. So OpenAI is renowned for being one of the top few AI research centers on the planet, big names like Ilia Sotsgaver, who came with AlexNet, who was one of the three authors on the first AlexNet paper. Peter Abeel, who’s a famed roboticist, an entrepreneur and he’s in episode number 503 of the show. Ian Goodfellow, who came up with gendered adversarial networks. These are some of the amazing people who’ve worked at OpenAI. And it’s also pretty well known because it was founded by Elon Musk and Sam Altman, who is the former president of YCombinator and other well known people. And it’s produced lots of front page news innovation.
Jon Krohn: 00:05:33
And I don’t just mean machine learning news. I mean, in The Times and The Posts and The Economist, I feel like I’m constantly reading about new innovation from OpenAI. So for listeners, for practicing data scientists, something that’s really cool is OpenAI gym, which is great for reinforcement learning research. But then in terms of stuff that’s made a big splash in the popular press, there’s Dolly, which is really cool for generating art based on natural language input. I’ve had so much fun with that tool, but perhaps the most famous innovation of all to come out of OpenAI is GPT-3 and you, Melanie, worked on GPT-3 and you were a joint lead author on the first paper on it. So unbelievably lucky to have you here with us.
Melanie Subbiah: 00:06:21
Thank you. I’m really excited to be here and talk to you as well.
Jon Krohn: 00:06:24
Awesome. So what does GPT stand for?
Melanie Subbiah: 00:06:29
So GPT stands for generative pre-trained transformer model.
Jon Krohn: 00:06:34
Nice. And then this is the third, I guess, the third major release of GPT?
Melanie Subbiah: 00:06:39
Yeah. So there was an original GPT paper, then there’s GPT-2 and then GPT-3, which are all based off of the same framework, but increasing in power and complexity of what they’re able to do.
Jon Krohn: 00:06:52
Nice. And so a keyword in there is transformer. And so the other words actually relatively easy to understand, so generative, it can generate text.
Melanie Subbiah: 00:07:02
Yes.
Jon Krohn: 00:07:03
As part of its outputs. And pre-trained meaning that it’s already trained, you don’t need to necessarily train it. So transformer is really the only word they could be complicated. So what is a transformer? And what’s a former model?
Melanie Subbiah: 00:07:19
Yeah. So transformers are a really popular architecture in NLP right now. And they basically came about to address some of the challenges that people had with recurrent neural network architectures. And so two of the big advantages with a transformer model is that it allows the model to learn from previous context in the text very well, going back pretty far. So you can have a long input context and it also allows you to train the system in parallel, across many, many GPUs, which lets you scale it up to huge data sets, a huge number of parameters, which is a critical part of the success that we’ve seen with these systems. And a key part of the transformer is that it relies on attention. And so basically the model is learning weights to apply to different parts of the input, in the input context, to figure out what word to generate next or what output to generate next based off of which inputs are most critical.
Jon Krohn: 00:08:24
Nice. That was very clearly explained.
Melanie Subbiah: 00:08:26
Okay.
Jon Krohn: 00:08:28
And so to recap on some of that, so a problem with the predominant natural language processing architecture prior to attention and transformers. So things like recurrent neural networks, long short term memory units, is that the signal of information from preceding words or words ahead of a given word of interest, it decayed very quickly. And so roughly with recurrent neural networks, it was after 10 words, the signal from the word 10 words ago was lost, it can’t really have an impact on the current word. And so attention and transformers overcome that issue and allow it to consider a broader context like you’re describing and this is critical because when we speak or when we read as humans, just 10 words back isn’t enough to understand the context of what’s going on.
Melanie Subbiah: 00:09:24
Yeah.
Jon Krohn: 00:09:26
So these kinds of previous approaches like RNNs, recurring neural networks, long short term memory units, LSTMs, they were not very capable at this few shot learning that GPT-3 has proved to be very good at. So few shot learning is a situation where you could say something like… So the input to GPT-3, you could say, “Translate English to French and then give three examples.” So you say, “Translate English to French.” And then you say, “Dog to chien and cat to chatte.” And so you have three kinds of examples. And in that kind of situation, it looked like GPT-3 is more than 50% accurate in that few shot learning situation. And then even if you go to one shot or no shot. So if you say a one shot example would be where you say, “Translate English to French.” And then you just give it one example, dog to chien. So that’s one shot learning and no shot learning would be where you don’t give it any examples. You just say, “Translate English to French, dog.” And you expect it to output chien and so I think as we go and you can correct me if I’m wrong, but as we go from few shot learning where we’re more than 50% accurate with GPT-3, as we go to one shot and then no shot, the accuracy does go down.
Melanie Subbiah: 00:10:52
Yeah.
Jon Krohn: 00:10:53
But the huge innovation is that on this few shot learning GPT-3 absolutely crushes any preexisting architecture.
Melanie Subbiah: 00:11:01
Yes. Yeah. Yeah, it was really the first time that we were able to see any sort of successful few shot learning from this type of paradigm. And yeah, it’s really exciting because it’s much more similar to how humans actually perform tasks where we can just give each other instructions, maybe a couple examples all in natural language and then we’re able often to do a new task just through that simple instruction.
Jon Krohn: 00:11:26
Yeah. So that’s a big part of the pre-train part, is that the idea, I guess, with an architecture that is so large and is trained on so much data that takes advantage of transformers that you don’t need to train it to a specific task, it can perform translation like it just went through, it can answer questions for you, it can do simple arithmetic.
Melanie Subbiah: 00:11:49
Yeah.
Jon Krohn: 00:11:51
And so that diversity is a really amazing new thing that is emerging in these kinds of large transformer architectures. So that task, that ability to be able to form so many different tasks, is that almost like an emergent property that you’re surprised when you discover, “Wow, it can also do this.” Or does some of the design thinking revolve around, “Okay, what are the big, natural language tasks out there?” Question answering is a big one, translation is a big one. Let’s deliberately try to design an architecture that can do it. So I guess the question is, is it mostly just emergent that it does all of these things, do you have to design specifically to be able to be so broadly applicable?
Melanie Subbiah: 00:12:37
Yeah, that’s a great question. I think there were kind of two stages, I think actually building the model, most of these properties were very much emergent things like in the GPT-2 paper, we started to see hints that the models like this might be able to do this type of thing, but the performance really wasn’t that good at that stage. And so with GPT-3, we really wanted to just build this very powerful model and see what it could do. And I remember being there at the time, it was really exciting because every week it would be like, “Oh my gosh, someone saw this new, amazing thing from the model.”
Jon Krohn: 00:13:09
Wow.
Melanie Subbiah: 00:13:09
So it was very much a surprise to many of us. I think also just seeing it perform across a bunch of these tasks. And if you look at the results in the paper, also there’s certain cases where since we train models at different sizes, you’ll have a smaller model that really couldn’t do anything on a certain task and then there’s a sudden jump when you scale up to a certain size, where suddenly you have traction on that task. So there’s also things that we haven’t even seen that could emerge with larger models as well, which is very exciting. So that’s the part that was not really designed into the model at all. I think the part though, that does take some engineering, which is a lot of what I worked on as well, is we wanted to just throw as broad a suite of tasks as we could at the model and get a general sense of what can it do, what can it do?
Melanie Subbiah: 00:13:58
And that we were looking to these standard NLP benchmarks and data sets. But within this few shot paradigm, you do have to formulate that prompt to the model. So it matters a little bit, how do you phrase the instruction? How is it getting tokenized to the system? What examples are you using? And that’s something we didn’t over-engineer at the time, but a lot of people have continued to look at since then, which is really exciting. But that was one thing too, that we did notice variations depending on how that prompt was formulated sometimes. And so that was a part where we were looking at a specific task and how the performance looked on that task.
Jon Krohn: 00:14:36
That ties perfectly to my next question, which was going to be, when does GPT-3 struggle? So are there particular kinds of circumstances where there’s patterns that you’re like. “These kinds of inputs it’s going to do really well, these kinds of inputs, even on a similar task it’s going to fall down.”
Melanie Subbiah: 00:14:54
Yeah. So one data set that we struggled with when the paper came out was ANLI, there was almost random performance on that, which is a natural language inference task.
Jon Krohn: 00:15:05
What does that mean?
Melanie Subbiah: 00:15:06
So that means basically inferring the relationship between two phrases and whether there’s entailment between those two events. So it’s getting into causality and reasoning. And that’s another thing with multi step reasoning can be a little bit hit or miss. For example, if you set up a scenario and I put three glasses on the table and then Don comes in and takes two of them away, how many glasses are still there? And I think with things like that, as you get into more complicated levels of logic, it can be hit or miss.
Jon Krohn: 00:15:47
Right. Right. 99% of machine learning teams are doing a awesome things at a reasonable scale with say about four people and two production machine learning models. But most of the industry best practices that we hear about are from a small handful of companies, operating models at hyper scale. The folks over at Neptune.ai care about the 99% and so they are changing the status quo by sharing insights, tool stacks, and real life stories from practitioners doing ML and MLOps at a reasonable scale. Neptune have even built a flexible tool for experiment tracking and model registry that will fit your workflow at no scale, reasonable scale and beyond. To learn more, check them out at Neptune.ai, that’s Neptune.ai.
Jon Krohn: 00:16:36
And I’m really relieved that’s true at this stage. I think people, there were a lot of splashes made when GPT-3 came out that, this whole news article was written by a robot, but then when you look at the fine print, it was like, “Well, we gave it 10 tries and then we edited it.” And then the editor was like, “But we do that with human writing as well.”
Melanie Subbiah: 00:17:01
Yeah.
Jon Krohn: 00:17:02
But I’m like, “Yeah, but are you editing out stuff that just seems like completely random or that a human would ever write?” Which I think is inevitably true.
Melanie Subbiah: 00:17:10
Yeah.
Jon Krohn: 00:17:12
So obviously these systems are incredible at the capacity to be able to sometimes, or even the majority of the time perhaps generate really compelling text, but as things get more complex and I think even with these transformer architectures, the further away the language is in a given document, I think the more likely it is to be un-meaningfully related.
Melanie Subbiah: 00:17:39
Yeah. The further you get out in the generation.
Jon Krohn: 00:17:41
Yeah, exactly. So in adjacent words, it’s almost guaranteed to make sense. If you look at a string of five or six words, they almost always make sense. But if you look at two or three sentences in a row, maybe much of the time, it does make perfect sense, but some of the time it doesn’t. And then if you look over multiple paragraphs and then it starts to stretch things. So a term that I came across as I was researching this episode a lot and that I didn’t know a ton about and I’d love to understand better is auto regression. So what does that mean?
Melanie Subbiah: 00:18:17
Yeah. So usually we think about auto regression in contrast to bidirectional systems. So auto regressive systems are learning from a previous sequence to predict what comes next, whereas bidirectional systems can use both future and past information. So a great comparison here is BERT, another hugely famous NLP model, is bidirectional. So it can learn from everything within the model’s context. So both text that comes before and after, or maybe just like the whole paragraph or the whole document. Whereas with GPT or the GPT series models, you’re always feeding in some sequence and then predicting what comes next. So-
Jon Krohn: 00:19:00
Got it.
Melanie Subbiah: 00:19:01
Those two frameworks lend themselves to different problems. So using a bidirectional system can be very good for document classification or maybe reading comprehension questions where you’re processing something as a whole and then answering something about it, whereas using the auto aggressive framework is what really enabled this type of few shot paradigm because you can feed in that input and then generate forward some sort of output.
Jon Krohn: 00:19:30
Nice, really good explanation. You do have a knack for explaining of things very clearly. Cool. So changing the topic just a little bit, I mean, still staying on GPT-3, but an original strength of GPT-3 was the breadth of natural language capability that we already talked about. That it is pre-trained, the P in GPT-3.
Melanie Subbiah: 00:19:47
Yeah.
Jon Krohn: 00:19:48
So that means that without doing any model training, without updating any in gradients in the model, we can have it apply as we already described, to question answering, to simple arithmetic, to translation.
Melanie Subbiah: 00:20:00
Yeah.
Jon Krohn: 00:20:02
Now interestingly in December, OpenAI, nevertheless made it possible to fine tune GPT-3. So you can fine tune it now to your own data with a single line of code using their API. So how does this transfer learning work and what is the practical impact? So if GPT-3 was already capable of being so broadly applicable, what additional advantages are afforded with this new fine tuning ability?
Melanie Subbiah: 00:20:32
Yeah, that’s a great question. So we still live in a world where few shot is this ideal scenario, but in most cases we haven’t closed the gap between few shot performance and fine tuning. So if you look at the results in the paper, the few shop performance is very good and in some cases it is up there at human level. But in many cases it’s good enough and exciting in an academic way, but if you actually applied it to a real world business use case, it would still have too many failure cases.
Jon Krohn: 00:21:05
Peter Abeel in his episode, we focused a lot on deep reinforcement learning applications to industry.
Melanie Subbiah: 00:21:12
Okay.
Jon Krohn: 00:21:12
And he was describing that one of the biggest differences, relative to academia, is that in academia, you’re trying to get even just once, being able to get a robot to do something really crazy.
Melanie Subbiah: 00:21:23
Yeah.
Jon Krohn: 00:21:24
But in production that doesn’t matter at all.
Melanie Subbiah: 00:21:26
Right.
Jon Krohn: 00:21:28
So in production, you’re working at having robots become good at a task that maybe was interesting in academia 5, 10 years ago.
Melanie Subbiah: 00:21:37
Yeah.
Jon Krohn: 00:21:38
But doing it at extremely high accuracy.
Melanie Subbiah: 00:21:40
Yeah. Yeah.
Jon Krohn: 00:21:41
So that sounds similar to what you’re describing.
Melanie Subbiah: 00:21:44
Yeah. I think that’s a great way of putting it. Yeah. So I think with that, if you can achieve greater performance with fine tuning on your business use case, you definitely want to do that. And beyond that, also this starts to get into thinking about bias and safety around language models, because part of why you may want to fine tune also is to work with the specific data for your use case and to guide the model towards the types of responses and the range of responses that you’re actually comfortable with in your application.
Jon Krohn: 00:22:16
Nice. That makes sense. And I was reading that, it seems like in some cases, even just having a hundred examples in your additional data set for this transfer learning could be effective. And in case I didn’t define this term, transfer learning is where you take a big pre-trained model already, like GPT-3 and you fine tune it, it’s called transfer learning.
Melanie Subbiah: 00:22:40
Yeah.
Jon Krohn: 00:22:40
All right. So I was going to ask you this question later, but since you mentioned bias, I’m just going to jump to it right now. So OpenAI’s former VP of research, Dario Amodei admitted to GPT-3 performance on bias issues in some of his slides that we found online. And so we’ll provide a link to those in the show notes. And so part of this can be related to the men from developed countries are overrepresented as authors of online content.
Melanie Subbiah: 00:23:07
Yeah.
Jon Krohn: 00:23:08
Especially in the content used to train most language models, including presumably GPT-3. Yep. So this can lead to models, picking up on biases that disadvantage other groups like women and people that just aren’t from developed countries, for example. So what can we do to mitigate data set bias like this?
Melanie Subbiah: 00:23:24
Yeah, it’s a really important issue and unfortunately a reality just given that the models consume so much data and the only available source of that is this very un-curated online world. And so I think there’s a first really important component, which is just trying to study and understand how the model is performing and document that as well as possible and share that information and be transparent so that people are aware of the risks and understand what’s going on with this system. So that was a big thing, as we were releasing GPT-3 OpenAI has a whole policy group, and that I worked with some of our researchers there as well on building out these biased tests for our system, so that we could try to kind of be transparent about what we were seeing from the model. And as you’re saying, in most cases without further intervention, these systems do replicate just the average of what’s online, which is not representative of the average of the world.
Melanie Subbiah: 00:24:24
So I think that’s a really important thing at this stage is giving that information to people who are using these systems. And then the second stage is also enabling them to do something about that in their specific use case. And so there was a paper from OpenAI that worked with fine tuning, as you mentioned, and just small data sets of 100 examples to guide the model towards the type of output that they were looking for on specific, sensitive topics. And they found that they could definitely move the needle in a good direction using that type of approach. So I think that’s one way to go. Ultimately, I mean, ideally we would have great data that people could sort of collect as much data as they needed, that fit their standards for their particular use case, but I think we’re not in a place right now where that’s realistic. So for me, at least, I tend towards be really transparent, have very honest conversations about what is needed in a specific situation, and then really test the system and see whether it’s going to be up to your standards for that.
Jon Krohn: 00:25:32
Right. And it sounds like there’s something that every listener can do at home to make this situation better, is don’t be a dick online and that’ll help in the long run.
Melanie Subbiah: 00:25:42
Yeah.
Jon Krohn: 00:25:43
All right. So thank you for answering that question on bias. Really great, clear answer. So something that you mentioned earlier on, there are so many jumping off points for the conversation.
Melanie Subbiah: 00:25:56
Yeah.
Jon Krohn: 00:25:56
And so something you mentioned earlier that I want to come back to is this idea of how as you increase the number of parameters, on some tasks you would witness this big jump in performance.
Melanie Subbiah: 00:26:07
Yeah.
Jon Krohn: 00:26:07
So few shot learning, maybe on translation at the actual number of parameters that GPT-3 has, which is 175 billion parameters and that’s two orders of magnitude more than GPT-2 was, at that level we’re able to get this great few shot learning performance. So other people are taking this idea and expanding it further. So in 2021, there was a Chinese group that released Wu Dao 2.0, which is a model with exactly an order of magnitude more parameters than GPT-3. So GPT-3 has 175 billion parameters Wu Dao 2.0 has 1.75 trillion parameters. And it was able to generate compelling Chinese poetry, for example. So over the coming years or decades with GPT-4, GPT-5, Wu Dao 3.0 whatever, are we likely to need several more orders of magnitude to reach human level language capability, and as a reference point, the human brain, one human brain has about 1,000 trillion synapses.
Melanie Subbiah: 00:27:25
Yeah.
Jon Krohn: 00:27:26
Which in a loose, very, very, very loose way, because there’s all kinds of nuanced differences between biological neurons and artificial neurons like we have in these architectures.
Melanie Subbiah: 00:27:36
Yeah.
Jon Krohn: 00:27:37
But loosely, we have 1,000 times more connections between biological brain cells in a human brain relative to the number of connections between artificial neurons in Wu Dao 2.0.
Melanie Subbiah: 00:27:50
Yeah.
Jon Krohn: 00:27:52
Yeah. So I guess I’ve already asked the question, but it’s basically, do we need to keep going down this route of increasing parameters to get closer and closer to human level ability on language or are there other potential avenues to explore?
Melanie Subbiah: 00:28:05
So I think the increasing model size is the avenue that we’ve seen actually work in this so far. And as you’re saying, there is still this big gap between how big these systems are and how big our brains are. And so I think it definitely feels logical to continue scaling up these models as long as we’re seeing these returns.
Jon Krohn: 00:28:26
Yeah. Especially because it works, as you said, in examples in the GPT-3 paper that you co-authored, you look at these charts of performance and as we increase parameters performance clearly goes up.
Melanie Subbiah: 00:28:36
Yeah. Yeah.
Jon Krohn: 00:28:37
So I mean naturally I think to keep going.
Melanie Subbiah: 00:28:39
Yeah. I think the important caveat to that is that there’s definitely something we’re doing wrong or not in the most efficient way because when we think about the sample efficiency. And so by that, I mean how many words, how much exposure to language someone or the model needs to be able to use language proficiently and for humans and babies and young children learning language, the amount of language that they’re actually exposed to is very, very small compared to what even what we would consider small transformer models are trained on today. So that’s, I think, something I definitely think about is we’re going to… I expect that in the future, we’ll figure out a more intelligent way to do this, where we can actually learn more efficiently from a smaller amount of language. But I think given where we are now and what we’re seeing work, it seems to be very effective to continue scaling systems. As we have data available and compute available to do it.
Jon Krohn: 00:29:38
Being able to interact with the amazing guests we have on Super Data Science is the best part of my week. The only thing that would make it even better is if I could share the experience of filming episodes with you live in person. Well fantasize no longer because on March 31st at MLconf New York we’ll be filming a Super Data Science episode live and in person for the very first time. That’s at MLconf, The Machine Learning Conference, I’ll be interviewing a global deep learning leader and there will be a dozen other exciting talks from prominent machine learning experts. Held at a gorgeous rooftop venue in central Manhattan, MLconf has long been a special annual event for me. I can’t wait for MLconf on March 31st and hopefully I’ll get to meet you there too. Head to www.superdatascience.com/mlconf for all the details and a 30% discount. Again, that’s www.superdatascience.com/mlconf.
Jon Krohn: 00:30:29
Yeah. So something that I didn’t have written down as a question, but just came into my mind is something that I’d love to pick your brain on, is currently almost all of our leading approaches, whether we’re talking about natural language, machine vision, robotics, all of the leading AI applications today that I’m aware of, they tend to involve deep learning, but they almost always involve gradients. So some kind of function that we can differentiate and do partial deriv of calculus on. But that obviously isn’t… Well, some people actually do think that is how biological brains work in a way or that at least it is a model for how we learn. But I don’t know, that’s certainly contentious. However, a big difference between the way that we think about how we store information in human brain versus these differentiable machines that require gradient learning is that we can have semantic linkages between information. You can have an idea of the relationship between the meaning of words, like a hierarchy, all of these things are examples of a person. And so this kind of explicit representation of knowledge, isn’t something that you can easily differentiate over or at all. So I guess, just broadly speaking, do you think there’s a place for non differentiable information storage in future natural language models?
Melanie Subbiah: 00:32:09
Yeah. I definitely think there is a place and I think that’s a huge debate right now, whether people who are very much in a camp of we need more structured approaches and things that are very much built off of our knowledge of human language systems and how we learn versus approaches that are more let’s use more data, more compute and engineer our way to a solution that works, from that standpoint. Yeah, I think there’s definitely… I think for me, I’m very open intellectually to what future models might look like and I think there could be a place for that. I think I’m hesitant to dramatically move away from something that’s working very well and trains very well and scales very well.
Jon Krohn: 00:32:56
Why are you stop right now?
Melanie Subbiah: 00:32:58
Because I think that component of scale, being able to easily take advantage of more data is something we’ve just seen work over and over again in terms of improving performance on something. But I feel that we’re missing, the transformer architecture I don’t think is the be all end all of NLP. I think we’re going to continue to figure out tricks and I think something like incorporating memory is really important because as I was saying before, you are still limited to the context that you can feed into the model and how long that is. And so we do at some point have to incorporate some sense of memory or world context beyond just that page or document that you can feed into the system.
Jon Krohn: 00:33:42
Right.
Melanie Subbiah: 00:33:42
Yeah.
Jon Krohn: 00:33:42
And then that could potentially maybe be a step in the way of having models actually be able to make logical conclusions. So that natural language inference that you were describing, that models currently struggle at.
Melanie Subbiah: 00:33:54
Yeah.
Jon Krohn: 00:33:56
Cool. All right. So as I was researching of this episode, I discovered that you’re into human creative writing.
Melanie Subbiah: 00:34:05
Yeah.
Jon Krohn: 00:34:06
Not just generative models.
Melanie Subbiah: 00:34:07
Yeah.
Jon Krohn: 00:34:08
And so you’ve done workshops in Ireland.
Melanie Subbiah: 00:34:10
Yeah. I studied abroad.
Jon Krohn: 00:34:12
Studied abroad.
Melanie Subbiah: 00:34:13
And did writing program there. Yeah.
Jon Krohn: 00:34:16
And so you’re probably aware that GPT-3 is used for creative writing. It’s a common application actually, and it’s even been commercially deployed for copywriting and I’m going to provide some links in the show notes to articles that show examples of these commercial deployments of GPT-3 for creative writing purposes. So I can probably guess where your answer is going to go with this, but maybe you’ll surprise me. As each successive GPT model, GPT-2, 3 and then some feature 4, substantially outperforms the previous, does this eventually spell doom for your human creative writing interests?
Melanie Subbiah: 00:34:56
I think to me it’s more exciting than scary, I guess. Yeah, that was actually how I first got into NLP and AI. I was in my senior year of undergrad and I was trying to think, “How can I combine writing and computer science for some sort of thesis?: So I was like, “Okay, I’ll build an AI that can try and do creative writing.” And at that time I was working with LSTMs and it was a struggle to get even two sentences out that were coherent and creative. And so that really just blew me away, originally seeing the GPT-2 paper even come out and just seeing a paragraph of pretty much coherent creative text. And I was like, “Wow, this is so cool.” So I think I’ve always had more of that just this is very exciting perspective. And I think for me, I envision much more of a world where humans and these creative AI systems are collaborating and it’s almost more if you’re in a writing workshop, you’re reading and writing with other people and bouncing those ideas off of each other and I think there’s really cool opportunities for these creative systems to be almost like idea generators or they could be kind of standalone works, but I think there’s always going to be something different. Part of a story is the intention behind it and the feeling that’s being communicated and I think there’s always going to something unique about that from the writer, so it’ll be unique, whether that’s from an AI or it’s unique if it’s coming from a specific human. That’s my perspective. Yeah.
Jon Krohn: 00:36:27
I love this idea of some future, when we’re beyond transformers, we have some non-differentiable component and we have this model that’s like, “You don’t know what it’s been like for me.”
Melanie Subbiah: 00:36:39
Yeah. I guess that’s, if the system could just totally replicate you, then I guess that would make me feel not so great.
Jon Krohn: 00:36:46
No, I mean that the system on its own has its own, in the same way that some writer today is tainted by some high school breakup or their parents divorce.
Melanie Subbiah: 00:36:57
Yeah.
Jon Krohn: 00:36:58
You have machines that are scarred by some-
Melanie Subbiah: 00:37:01
They have some backstory that they’ve created.
Jon Krohn: 00:37:04
And every book that they write has that same thing to it.
Melanie Subbiah: 00:37:07
Yep.
Jon Krohn: 00:37:09
Fun. All right. I’m glad I asked you that one and super cool to hear the connection there between your creative writing and computer science. And it is cool, so probably around the time that you were trying to make LSTMs create one or two coherent sentences together, that was probably around the time that I started teaching deep learning and so similarly a common project that I would have my students do, was taking LSTM architecture or a gated recurrent unit architecture or experiment with different kinds of these recurrent architectures and see whether you could get some text to make sense. And the thing that we settled for back then was, I’m sure you remember this, something that was very common is that people would have it generate Shakespeare.
Jon Krohn: 00:37:55
So you train it on a Shakespeare Corpus and have it generates Shakespeare, because no one understands what Shakespeare’s saying anyway. So if you have Shakespeare style content coming out, you’re like, “I guess that sounds like it could be the bard. It’s close enough. I don’t ever understand what he’s saying anyway and I don’t understand this, but it seems to be in his style.” So we really have come a long way now with GPT-3, so cool that you saw GPT-2 as you were working at that age and that you were then later able to make your way to OpenAI. And so I was going to ask these kinds of questions later, but it seems like we’re at a good point in the conversation. So how did that end up happening? So you’re doing an undergrad in computer science, you have this creative writing interest and then after that you went to Apple, right?
Melanie Subbiah: 00:38:38
Yep.
Jon Krohn: 00:38:40
So that makes sense. I mean, you can tell us, but I imagine, Apple is one of the most exciting competitive companies to work for. So you’re probably applying to amazing places you could work and being able to be an AI research engineer there sounds like an amazing experience. So maybe tell us about that and then how you ended up after that at OpenAI.
Melanie Subbiah: 00:39:02 Yeah. So I went to a liberal arts school for my bachelors and it was a wonderful school, but there weren’t that many-
Jon Krohn: 00:39:08
Williams College.
Melanie Subbiah: 00:39:09
Yeah. Specific computer science classes. So I actually graduated with the fundamentals of computer science, but I hadn’t taken an AI class, an NLP class. I did this thesis and mostly just read papers myself, that was in machine learning, but I didn’t have any education in that at the time. So when I was looking for those first jobs, I actually wasn’t hired as a AI research engineer. I was hired as a computer graphics software engineer, but I picked a team in special projects there where I knew I was going to be close to all these other research teams. And I basically just started going to paper reading groups there and making connections with the researchers.
Melanie Subbiah: 00:39:48
And then as things shuffled around, I volunteered myself for projects and then was able to switch onto one of their AI research search teams pretty quickly. So that was a really cool opportunity because it gave me exposure to a bunch of different things in the space and I could try out things and I narrowed in on like, “Okay, I really do like doing machine learning and AI.” And was able to get onto one of these research teams without having the education at that point. So that was a really busy couple years because I was mostly reading textbooks and papers in my free time after work to teach myself all this curriculum on deep learning and ML and try to get up to speed with what I was doing with my team. So yeah, I was actually doing some more robotics and computer vision and RL and then circled back to NLP while I was there. And around that time was when GPT-2 came out and I’d been there long enough where I had the confidence to say, “Okay, actually I really want to apply to OpenAI.” Because they’d also been on my radar, I really liked the mission of the company and what they were trying to do in the AI space. So I want it to-
Jon Krohn: 00:40:58
Right. Something I didn’t mention when I first started talking about OpenAI is that they were originally created as a charitable organization and their mission was to bring about artificial intelligence applications in an ethical way. That was the modus operandi of the whole organization. Now they have some commercial revenue streams.
Melanie Subbiah: 00:41:20
Yeah.
Jon Krohn: 00:41:21
And actually one of the things that’s most annoying about that is it used to be cool because they were a charity before they had to publish everyone’s salaries.
Melanie Subbiah: 00:41:27
Yeah.
Jon Krohn: 00:41:27
And so you could see like, “Oh my God, that’s what Ilya Sutskever is making.” And that kind of thing.
Melanie Subbiah: 00:41:32
Yeah.
Jon Krohn: 00:41:32
And so that was a fun thing that’s now been taken away from us. But yeah, so you were attracted to the mission of OpenAI. I mean yeah, super cool organization. And then I wonder was also part of what drew you, those experiences during your undergrad where seeing the GPT-2 advances.
Melanie Subbiah: 00:41:50
Yeah.
Jon Krohn: 00:41:51
Wow. Yeah. Cool.
Melanie Subbiah: 00:41:51
Yeah. For sure. Yeah, I think just seeing that paper come out was, “Wow, this is exactly what got me so excited about this field and I’d love to work on systems like this.” So yeah, I transitioned over to work with them at that point and I didn’t know that they were working on GPT-3 at the time or that’s what I would end up working on, but it ended up being this great team. Yeah, I really enjoyed the team I was working with there on that project, so that was a really fun year and a half. Yeah.
Jon Krohn: 00:42:19
Nice. So a general question for you that might be helpful to listeners, is it sounds like part of how you were able to make this transition, whether it was from the undergrad to Apple or Apple to OpenAI was a lot of self study. So how do you go about that process? How do you identify what papers you should be reading and then how do you just stick to it? Especially if there isn’t the feedback of a formal study group or people to talk about these papers with.
Melanie Subbiah: 00:42:52
Yeah. I think it definitely helped having some peer motivation, having a research group that I was going in and maybe presenting a paper to each week or I needed to have something to say-
Jon Krohn: 00:43:03
Deadlines, classic.
Melanie Subbiah: 00:43:05
Right. Definitely gave me motivation to actually read that and prepare. And I think early on having people that you look up to, who are recommending papers and that can be people on Twitter or people in your immediate sphere. And once you get a couple, then I think it’s very easy to just go forward and backward in the citations and try and fill in the gaps. Just circle things you don’t understand and then go back and research what actually is going on here. Because I feel like a lot of times when people read papers, you are skimming over a lot of the math or a lot of the information. And ideally as you become an expert in something, you can follow that very quickly and fill it in. But I think that was important for me early on, was actually stopping at every point I didn’t understand and trying to put together that math or go back and fill in a citation that I needed to understand. Also just finding textbooks and then just giving myself, “Okay, I’m going to read a chapter this week.” I think that’s maybe unpopular, I feel like a lot of people don’t like textbooks as much these days, but I actually like that.
Jon Krohn: 00:44:12
Yeah. I do write books, so I’m definitely [crosstalk 00:44:16]. But part of why I enjoy writing books is because I have had so many incredible experiences from working through books, because unlike a paper, a well written book has somebody who’s thought about a huge body of stuff and can tie it all together coherently. And so a good book can be a big game changer.
Melanie Subbiah: 00:44:37
Yeah, for sure.
Jon Krohn: 00:44:38
And it can expose you to things that you might not otherwise have ended up studying. So when you do that process, maybe chapters one through three, you were like, “That’s exactly what I want to do.” And then five was also what you want to do. Four you’re like, “Eh, I don’t know, but I’ve really enjoyed this author for the first three chapters. So I’ll check it out.” And then you’re like, “Wow, I didn’t even know this was relevant to my interest or I have this whole new interest now.”
Melanie Subbiah: 00:45:02
Yeah.
Jon Krohn: 00:45:03
So yeah. So I love books.
Melanie Subbiah: 00:45:05
Yeah.
Jon Krohn: 00:45:06
On your point of people to follow on Twitter. So if listeners don’t have somebody that they can talk to about what papers they should be reading. I did an episode, number 530, on 10 AI thought leaders to follow. So you could get some ideas of who to follow potentially from that episode. And something else that I took away from what you just said, is that it sounds like the trick, as with most things to being really good at something, is just a lot of work.
Melanie Subbiah: 00:45:40
Yeah.
Jon Krohn: 00:45:42
You put the time in.
Melanie Subbiah: 00:45:43
Yeah. That’s definitely true.
Jon Krohn: 00:45:44
All right. So once you’re at OpenAI and you discover you have this opportunity to be working on GPT-3, that must have been amazing. But also while you were there, you worked specifically on the OpenAI API that allows people outside of the organization to access GPT-3. So how is having an API interface like this useful to people and how can they access it?
Melanie Subbiah: 00:46:14
Yeah. So the idea of the API is that people who maybe don’t have the expertise with building these systems or just don’t have the resources to build models like this are able to easily benefit from the technology. And so it was designed to be a very simple interface where you can interact with it through natural language, which is the whole idea of a few shot learning as we’ve talked about and just be able to give it very simple instructions and prompts and get your generated output back. And all of the compute, all of the engineering is handled on the OpenAI side. So you’re just paying for subscription service basically. And now, as you mentioned, there’s added bonuses of fine tuning or different things and so you can be closer to the engineering if you want to, but it’s also okay to be someone who is less comfortable with that and more comfortable working through the natural language interface.
Jon Krohn: 00:47:10
That actually makes it probably an unprecedented API in the sense that you can ask it what you’d like it to do in natural language.
Melanie Subbiah: 00:47:19
Right. And that was really the goal or what we realized with this few shot learning and actually being able to have this work pretty well is that it does enable this type of technology in a new way that we hadn’t seen before.
Jon Krohn: 00:47:34
Cool. Yeah. So it’s conceivable that in the future, probably not even in the distant future, if you’re okay with some inaccuracies, you could have a system like this working for somebody who has no programming experience at all, via an audio interface.
Melanie Subbiah: 00:47:51
Yeah. Yeah, exactly.
Jon Krohn: 00:47:52
All of the technology for that exists, there already exist algorithms that at a very high accuracy, like Siri on an iPhone or any Apple device will convert the audio wave forms into text and then once you have that text, it can go into the OpenAI API and then the API can bring you back results, but then why not speak it because we have that technology too.
Melanie Subbiah: 00:48:18
Yeah.
Jon Krohn: 00:48:20
Wow. Cool. Speaking of inputs and outputs, in addition to GPT-3 the API access also provides you with access to codex, which converts natural language to code.
Melanie Subbiah: 00:48:33
Yep.
Jon Krohn: 00:48:34
So how practical is that? How often does that actually work properly?
Melanie Subbiah: 00:48:38
So that portion of the API came up after I had left OpenAI and gone to grad school. So I don’t have personal experience with it. I think I’m really excited to see where that technology goes, because I think there’s a lot of daily programming tasks that at least for me as a programmer, I will Google something, find someone else who wrote it and fill that into my code with maybe some modifications. And I think that’s the idea of codex, is skipping that Google step that you can fill in this stuff in your code, that’s pretty standard, used across a bunch of different use cases. And again, like what we were talking about with creative writing, having that tool, that pair programmer that you can work with, that’s very good at specific things.
Jon Krohn: 00:49:24
Cool. I love that idea of a robotic pair programmer friend.
Melanie Subbiah: 00:49:28
Yeah.
Jon Krohn: 00:49:29
Yeah. That sounds really helpful. Okay. So now getting into another one of those tricky questions around these big models. So we talked about the bias issues, a big one that comes up. Another one also related to one of your interests, so I know that you’re interested in using AI to tackle climate change.
Melanie Subbiah: 00:49:53
Yeah.
Jon Krohn: 00:49:54
So we have actually done an episode on AI being used to tackle climate change. So listeners can listen to episode 459 for an hour on that topic. But something specific to that, for you, is that these huge models like GPT-3, for every additional model parameter, there’s a little bit more energy that’s required.
Melanie Subbiah: 00:50:19
Yeah.
Jon Krohn: 00:50:19
When we’re talking about hundreds of billions or trillions of model parameters to train these models on large data sets and then even at inference time, the impact in terms of climate change is non-negligible.
Melanie Subbiah: 00:50:36
Yeah.
Jon Krohn: 00:50:36
Especially as these potentially become more and more popular. So AI is increasingly a contributor to carbon emissions. On the flip side of that, insights from AI could be providing solutions to the climate crisis.
Melanie Subbiah: 00:50:52
Yeah.
Jon Krohn: 00:50:52
So what do you think about this in general? How can we mitigate issues around these large models generating emissions and then what are the more promising AI driven solutions to climate change?
Melanie Subbiah: 00:51:05
Yeah. Yeah. That’s a great question and a big issue and I think there’s thinking about it at training time, thinking about it at inference time, as you mentioned, and then also how this whole ecosystem also connects to general access to models. And I think with the training, the good news is that typically as we go forward compute gets more efficient, cheaper. And so what something looks like right now, 10 years from now, it’s not going to consume the same energy to train that system, but we’re also probably going to have bigger systems. So that’s a tricky situation. So I think that’s one issue. And I think the biggest thing there that’s really helpful is having renewable energy drive a lot of these.
Jon Krohn: 00:51:54
Right, right, right.
Melanie Subbiah: 00:51:55
The cloud services that are actually powering these systems. So Google is doing a great job of moving a lot of their compute in a really good direction in terms of renewables and I think that’s something I hope we can be really successful with that in the next 10 years. In terms of inference time, so a lot of these models are actually much more efficient to run an inference time. So once you’ve trained it and you’re querying it for different outputs, that can be pretty efficient and you also can engineer the system in a different way so that it’s actually running very quickly and in a very efficient way at that point versus how you trained it. So I think that’s another thing that we can do, is separating out those two use cases so that you are making it as efficient as possible, when you’re running it into the future.
Melanie Subbiah: 00:52:48
And the last thing is thinking about how this affects democratization of AI and access. Because I think something we talk about, there is this issue where only a few groups have the resources to train models like this, and then they have the dominant access to those systems. And so in some ways you might say, “Oh, ideally everyone could train systems like this.” But then suddenly we’re replicating these hugely compute intensive models, across many different groups and that becomes very wasteful of energy. So I think it’s going to be really important to move towards some coordinated system where we can both only train a model once and then let everyone use it, but we’re not hoarding access so that some people are totally shut out from something. And I think that’s a really hard line to walk.
Melanie Subbiah: 00:53:38
We want people to be able to research these systems, use these systems, but we also don’t want to overly replicate things that are very resource intensive. So that’s something I think about a lot, I think then going into ways that AI can help with climate change. I think two things I’m really excited about are energy efficiency and thinking about intelligent grid systems and just intelligent use of electricity. And I think that will also help with incorporating renewables better, if we have a better understanding of forecasting and how to integrate all these systems, it could save a lot of waste within our energy grid. So that’s one thing I’m excited about. And then also thinking about the materials discovery space, so AI can speed up discovery in a bunch of different scientific areas by speeding up simulations that people run to figure out what might be promising directions to explore. And I think we’re starting to see applications like this used in carbon capture or battery technology or things like that, that are going to help us have, again, more sustainable systems across the board.
Jon Krohn: 00:54:53
Very cool examples. So we’ll ask GPT-6 draw us a schematic of how we should design the tokamak reactor for a nuclear fusion energy production that exceeds the amount of energy we put in.
Melanie Subbiah: 00:55:08
Perfect.
Jon Krohn: 00:55:10
Nice. It’ll be easy. So it’s really just a natural language problem.
Melanie Subbiah: 00:55:14
Yeah, when you put it that way.
Jon Krohn: 00:55:17
Okay. Cool. All right. So you’ve answered all my tricky questions flawlessly, nicely done. So when you’re working at a place like OpenAI, working on things like the API, developing transformer models, like GPT-3, we also talked about things like Dolly, we talked about the OpenAI gym, so many completely different kinds of innovation. And so what’s the culture like in a place like OpenAI, you were largely there, if not entirely there before the pandemic hit. So you’re probably in office with people. So yeah, what’s that like, how do they bring about so much diverse innovation?
Melanie Subbiah: 00:55:59
Yeah, I think it was a really cool environment to be in and we were a pretty small office when I was there. So it was really nice. I was just sitting with my five person team. And I think two things that really stood out to me about the culture there, one was that there was a very clear research vision so that people were very collaborative around it and there was a big engineering focus on to support that vision and I think that was different, a lot of the other research environments I’ve been in have been more separate projects that one to three people are working on and you connect the things under a bigger vision, but it’s not as much top down of this is really the goal we’re driving towards and let’s build out these big research efforts under that.
Melanie Subbiah: 00:56:46
And then also having that engineering support was huge. So a lot of people at OpenAI started more as engineers similar to me. And I think especially when you’re building systems like GPT-3, like Dolly, a huge amount of that is solving engineering problems at scale also, and these systems running efficiently. So that’s another big component and then the last thing, which is more of a day to day thing, is that’s really the only work environment where I’ve been in where pair programming was just very common. So especially when I was starting and I had a question, one of my coworkers would just be like, “Let’s pair on this for a couple hours this afternoon.” And that’s something I really appreciated because you just learn so much from watching someone else do something and you also just get through questions much more quickly when you can just solve it together, as opposed to asking something and then you go back to your desk and you find out it’s slightly different than what the person said, and then you’re trying to figure it out and it’s… Yeah, so it was very much just like let’s work together quickly around this united vision and do the best that we can on that. Yeah.
Jon Krohn: 00:57:54
Yeah, that sounds great. So on the note of talking about engineering support and how many people at OpenAI come from engineering backgrounds, this is a recurring theme all on the show and something that I have tried to impress upon listeners before and I’m going to again now, is that the engineering aspect of data science, every year that goes by is even more important.
Melanie Subbiah: 00:58:18
Yeah.
Jon Krohn: 00:58:19
If you want to make a big impact in your career in data science, especially in AI, the more that you can learn computer science and programming principles and find somebody to pair program with, the better off you’re going to be because as data sets get larger and larger, as the models get larger and larger, it becomes more and more of an engineering problem than a science problem.
Melanie Subbiah: 00:58:42
Yeah.
Jon Krohn: 00:58:45
And then something that I’ve also whinged about on the program recurringly is one thing that… So I had been proved wrong in general about working from home. I thought that it wouldn’t work for research. My team does AI research and a lot of it is open ended and I thought that we needed to be together with a white board and being able to hear each other all day, what other people are working on and being able to jump in and solve problems for everyone. And so I’ve been proved wrong that you can absolutely still have innovation and progress. However, it is way more fun when you can do pair programming literally next to somebody.
Melanie Subbiah: 00:59:30
Yes.
Jon Krohn: 00:59:33
Yeah. I still do miss that. And so hopefully two or three days a week in the office is a nice balance that I’ll be able to strike in the near future. And if you’re into that listener, I hope you can too. And so you actually, so when you came to New York originally, you were living up by Columbia.
Melanie Subbiah: 00:59:53
Yeah.
Jon Krohn: 00:59:53
But then I guess the pandemic just kept dragging on and on and you were like, “Well, I might as well just move to a nice part of the city that I really like.”
Melanie Subbiah: 00:59:59
Yeah. Pretty much.
Jon Krohn: 01:00:00
So what’s that expected to be like going forward. Do you think that for the most part, you’ll be able to work from home? You’ll go into Columbia a couple days a week, maybe in the future?
Melanie Subbiah: 01:00:09
Yeah. I’ve been going in one or two days a week and I think something like that’ll probably continue. I know before the pandemic for my lab, my advisor would have people come in at least three days a week or something, which is actually the same at OpenAI. It was pretty standard if you wanted to only be in three days a week or something. That was good. So yeah, I think that’s my plan going forward. I’ll just put all my meetings on one day, so I’ll see everyone in person and go in for that day and that’s nice. And then doing research from home. Yeah.
Jon Krohn: 01:00:40
That sounds great. All right. So let’s talk about that PhD at Columbia a little bit more. So you decided after making this amazing journey from Williams College to Apple to OpenAI, which is this place that you’d been hoping to work at for years. Amazing you got that, but then you decided to leave industry and go back to academia, do a PhD at Columbia. I suspect given what I know about you and then even more so now that we’re doing this interview, you could have had your pick of opportunities in industry or academia. So why did you choose to do a PhD and why Columbia?
Melanie Subbiah: 01:01:21
Yeah, it definitely was a whole hard decision at the time. And I think there are a couple different factors, but the main ones were, first of all, that I just wanted to move out of the Bay Area and I was ready to try something different and New York seemed like a place I wanted to live. And there were very few places doing work like what OpenAI was doing and they were in the Bay Area and also very few places that you could do that level of research without a PhD also, so that it was…
Jon Krohn: 01:01:52
Right. Right. Yeah. It’s actually amazing. I hadn’t thought of that, but it is actually incredible that… And it goes to show how, in some ways, and we’re going to get into this now, when you talk about the PhD more and why you did it, but in some ways it’s ridiculous.
Melanie Subbiah: 01:02:06
Yeah.
Jon Krohn: 01:02:06
That a PhD should be a requirement for some jobs.
Melanie Subbiah: 01:02:09
Yeah.
Jon Krohn: 01:02:12
So it’s amazing that OpenAI was able to see, “Okay. Look, Melanie clearly has been studying lots of papers on her own.” I know from my personal experiences with people that you, as I said at the very beginning, with the five levels of expertise with machine learning, you already are an expert. You happen to be now doing a PhD, being a grad student, but the way that you have sought out and studied on your own and the experiences that you’ve already accumulated, you already have more of the hard skills and soft skills that a PhD student is after than most people who graduate with those degrees have. So yeah, I guess my one point is that it is ridiculous that you need PhDs for so many research jobs and then my follow up point is that it’s great that OpenAI is open to having AI research engineers who don’t have PhDs in situations like yours.
Melanie Subbiah: 01:03:17
Yeah.
Jon Krohn: 01:03:18
But then it’s, I guess, so then maybe part of where you’re going with your answer is that it does still nevertheless open doors. But you also, sorry I’m probably taking words out of your mouth, but I can imagine, I’d love to be doing a PhD now, again.
Melanie Subbiah: 01:03:33
Yeah.
Jon Krohn: 01:03:34
Because just to be able to take that step back and get support and be able to dig so deeply into topics and [inaudible 01:03:42]. Please you go.
Melanie Subbiah: 01:03:43
Yeah. I think that was the second part, I think when I initially graduated from my bachelor’s, I always thought of grad school as a question of, do I need a PhD to do research? Just what you were saying. And what I found through being on the research team at Apple and then OpenAI, was no, I don’t need the PhD to continue on this route. I can continue doing impactful research. But then I started asking, “Do I want it?” And then that was a question of do I want to take five years to just learn what I want to learn, read what I want to read, explore areas I’m interested in, figure out what problems I really care about working on and build my confidence as a leader in a research phase and with leading research vision. And that then became really exciting and appealing to me because I think what I was missing in my research experiences is I felt like I was developing really amazing technical skills and building really cool things, but I was very much coming up in the vision of the senior technical leaders around me. I did feel like I needed to take a step back and consider. I mean, some of the really hard questions we’ve talked about even on this podcast and just figure out my own mind around things and what I really wanted to be working on and what problems felt most important to me to develop or to devote years of my career too. Yeah.
Jon Krohn: 01:05:06
Awesome. Yeah. Great answer. So in your PhD research, one of the concepts that you are doing work on is a cool concept of identity signaling. So what is identity signaling and how can it be applied to identify online misinformation campaigns? Which is a really cool application.
Melanie Subbiah: 01:05:32
Yeah. So I’ve been working with thinking about missing for information and disinformation in news and something-
Jon Krohn: 01:05:40
Such a frustrating problem to watch happen in the world. I mean, we don’t get political on this program, but believe it or not, I don’t like disinformation as a data scientist. And it drives me crazy when every day there’s something that happens, I would go into it, but I don’t want to make the show political.
Melanie Subbiah: 01:05:58
Yeah.
Jon Krohn: 01:05:59
But yeah, it’s a problem. And so…
Melanie Subbiah: 01:06:01
Yeah. And my advisor also does a lot of collaborations between social science and NLP, which I think is an interesting place as well. So something I’ve been looking at is thinking about how personal identity comes in to play in persuasive context online and how we pick up on things about each other in terms of what our core beliefs are, where we’re positioned in the world and how that shapes how we interact in a persuasive context. So how we signal those things to each other, and then how we use that information in a persuasive context. So the long term goal of that is to think about identifying the intended audience between persuasive campaigns online and being able to ideally automatically pick up on some of the markers of who things are targeted at.
Jon Krohn: 01:06:59
Nice, super cool. Yeah. Very fascinating research. So I’m imagining, I don’t know exactly how Columbia PhDs are structured, but if this was my PhD, maybe this project identity signaling to identify misinformation online, that might be one of my dissertation chapters. So do you have other leads on what your other chapters might be?
Melanie Subbiah: 01:07:25
Yeah. Something I’ve been talking to my advisor about, beginning to work on is thinking about summarization of novel chapters. So again, going back to this piece of creative writing and also working with very long form text, which has always been interesting to me. So I think there’s a lot of interesting problems there, both in terms of understanding extremely stylized human language and also going back to what we were talking about before with long context and trying to solve the problem of how do you give a model enough context that it could meaningfully process something like a book, when you’re not going to be able to actually just feed in all of that text at this stage, at least.
Jon Krohn: 01:08:05
Nice.
Melanie Subbiah: 01:08:05
So that’s something I’m interested in thinking about.
Jon Krohn: 01:08:09
Cool. So as a PhD student, what tools do you use day to day and are they different from the kinds of tools that you were using as an AI engineer in the industry?
Melanie Subbiah: 01:08:21
Yeah, I think at a basic level, the tools are pretty similar, which is mostly Python and PyTorch and associated libraries. I think when I was at OpenAI, we were you using a lot more in-house libraries that we’d built up, versus now I would probably use models from Hugging Face if I want to access the same type of system.
Jon Krohn: 01:08:46
Yeah, the strict non-Hugging Face OpenAI policy. “No, those guys aren’t doing anything interesting. Ignore them.”
Melanie Subbiah: 01:08:55
So just in terms of like the types of models, that’s a little bit different. And then I think the biggest difference is the compute and that’s been the most interesting thing for me to see both sides of. So when I was in industry, it was very easy to access as many GPUs as you wanted, whereas-
Jon Krohn: 01:09:13
At the particular places you were working.
Melanie Subbiah: 01:09:14
Yeah.
Jon Krohn: 01:09:14
A very unusual situation.
Melanie Subbiah: 01:09:14
Yeah, that’s true. So that was what I was coming from and now we have one machine with four GPUs that I’m able to sign on to remotely and I’ll check if anyone else is using a GPU and then I’ll jump on one. So that’s, I think, just where the compute is, how much there is, how you’re accessing it, is probably the biggest difference in terms of tools.
Jon Krohn: 01:09:41
Right. Welcome to the real world.
Melanie Subbiah: 01:09:42
Yeah.
Jon Krohn: 01:09:44
With us commoners. So cool and then are there any particular libraries or approaches that you’re excited about? Hugging Face is something listeners should definitely be checking it out, they make lots of natural language models, very accessible and easy to use.
Melanie Subbiah: 01:10:03
Yeah. They have a great library and I think that’s the main one that I would mention probably in terms of NLP tools. Yeah.
Jon Krohn: 01:10:12
Cool. All right. And then, so we’ve talked about this already a couple of times, you did a computer science degree at Williams College, which is a prestigious liberal arts college in Massachusetts. And so how has your computer science background been helpful to you as an NLP focused AI engineer? So we’ve talked about this a little bit in terms of the idea that as data sets get bigger, as models get bigger, having more computer science skills is helpful, but is there anything more specific than that?
Melanie Subbiah: 01:10:46
Yeah, I think that definitely generally covers it. I think the main thing is maybe having an understanding of distributed systems and parallel computing at some level, because I think that was something, especially when I was interviewing at OpenAI that I saw that was a skillset that we would look for. And I think it’s uncommon that you’re always tested on that or that’s not as much part of the standard engineering suite, but I think that’s something that it’s hard to build if you’re just learning to program, that often doesn’t come with an understanding of these distributed systems and how to work with them and how to engineer systems that are going to run in parallel across many GPUs potentially and pass information between them. So I think that’s a big way that having that more full computer science education helped.
Jon Krohn: 01:11:39
Nice. Are there any particular distributed processing libraries that you tend to go to?
Melanie Subbiah: 01:11:46
Again, I mean the main place I was using this was OpenAI and we would pretty much write stuff in house. So I don’t have as much experience with general libraries. Yes.
Jon Krohn: 01:11:56
Yeah. Well, I mean, it’s one of the great things about the modern automatic differentiation libraries, like especially TensorFlow. I mean that still is its strength today. So PyTorch definitely way more fun for building your computational graph, but then I do quite like a TensorFlow for distributing my operations and for listeners who aren’t aware of it, there’s the ONNX Library, O-N-N-X.
Melanie Subbiah: 01:12:21
Cool.
Jon Krohn: 01:12:21
The open neural network exchange. So you could design a model in PyTorch and then if there was some distributing computing that you couldn’t do in PyTorch, you can use ONNX to port your graph over to TensorFlow.
Melanie Subbiah: 01:12:32
That sounds really useful.
Jon Krohn: 01:12:34
Yeah. Cool. All right. So here is a question that is my favorite question to ask on the program, but I save it only for special occasions, so as to not wear it out with the listeners. So Melanie, you were one of my special guests to get this one, because I think you’ll have an interesting perspective on this. So thanks to ever cheaper data storage, ever cheaper compute, which we’ve talked about on the show, ever more abundant sensors everywhere, interconnectivity allowing us to share papers on archive in real time, data modeling innovations that are constantly coming out at more and more labs around the world and being shared, because of these forces technology is advancing at an exponentially faster pace every year and AI is playing a huge role in that. So is there anything that excites you about the future of things that could happen in our lifetime?
Melanie Subbiah: 01:13:37
Yeah, that’s a great question. I think a big one that we already talked about is the AI and climate space and efficiency space. And I think that’s very exciting to me, with the flip side of this is like way too much surveillance and tech involvement, which is a concern for me, but I think handled in an ideal way, there’s just huge opportunities to have these very interconnected, smart systems that just make pretty much every aspect of life way more efficient in terms of transportation, energy usage, just really everything in that space. So that’s very exciting to me in terms of saving resources and I think the other way is just having AI become a very commonplace tool that everyone is able to use in different ways across tons of different research fields and outside of research too.
Melanie Subbiah: 01:14:35
And I think that’s just very exciting to me, that it’s something that enables everyone to do work better in whatever thing that they’re doing. And I was talking to a friend who works in public health the other day and she’s a PhD student there and I was thinking about how she does qualitative studies with huge amounts of interview data and just thinking about the parallels there with some of the NLP techniques that we use to process natural language texts. And so I think there’s just so many cool collaborative possibilities that are going to become more and more accessible to everyone.
Jon Krohn: 01:15:11
Totally. And it’s interesting for me to see more and more people growing up today, kids growing up today, they learn computing skills so that in so many different fields, you could do a creative writing PhD, but you might learn some machine learning and data science and computer skills because you’re like, “Well, something really cool I could be doing is working with huge amounts of data and trying to do some automated inference.” And so all kinds of traditionally liberal arts fields are being infused with these quantitative things. Thanks to so much data now being stored and collected.
Melanie Subbiah: 01:15:53
Yeah.
Jon Krohn: 01:15:54
Very cool. All right. So I asked the social media before recording this episode, if they had questions for you. And so I’m going to get to some of those questions, but before we do that, I noticed as we’ve been recording here that you’re wearing a WHOOP.
Melanie Subbiah: 01:16:11
Oh yeah. Do you have a WHOOP? Okay, nice.
Jon Krohn: 01:16:13
As well. So there are lots of different kinds of fitness trackers out there.
Melanie Subbiah: 01:16:20
Yeah.
Jon Krohn: 01:16:20
Or activity trackers. And typically Melanie, if you pick the WHOOP one, it isn’t just to track steps or to get you to stand up from your desk every once in a while.
Melanie Subbiah: 01:16:33
Yeah.
Jon Krohn: 01:16:34
It’s are doing some pretty rigorous exercise. So what’s going on there?
Melanie Subbiah: 01:16:40
I think that I would probably not fall into that camp. So I guess maybe I’m an atypical WHOOP user, but I think for me, partly my friends had WHOOPS and they were actual college athletes and do exercise more often. But I think for me, it’s actually about sleep mostly and general wellness. So what I liked about the WHOOP was that it’s not calorie or workout focused specifically. It’s also giving you your general rest and recovery and trying to sleep more regularly is something I’m working on in my PhD.
Jon Krohn: 01:17:14
Yeah. I’ve completely changed my life as a result of having this. So the WHOOP tracker it’s always on, that used to be their catch phrase, I don’t think it is anymore, but even to charge it, you put the battery pack on it, it didn’t even come off your wrist are charging it. And so you get this data on yourself, your heart rate all day long, in your sleep. And yeah, so one of the things that changed for me dramatically is seeing how even just one beer or especially two beers, my resting heart rate overnight can jump from 55 to 65.
Melanie Subbiah: 01:17:46
Yeah.
Jon Krohn: 01:17:47
And there’s like no amount of exercise I can do in a day that will cause my resting heart rate to go to 65.
Melanie Subbiah: 01:17:53
Yeah.
Jon Krohn: 01:17:56
So suddenly seeing the data, it changes the calculations internally for me dramatically, because before it’s like, “Yeah, I enjoy beer, but do I enjoy it that much? That I’m willing to do that to my body?” And so yeah, even things like I used to pretty frequently pull an all nighter on work things, and that was a bad habit since undergrad. But it was like through my undergrad, through my PhD, I was constantly, the night before anything major was due, I was guaranteed to be up all night working on it.
Melanie Subbiah: 01:18:32
Yeah.
Jon Krohn: 01:18:34
And I don’t do that anymore. So anyway, there you go, well that’s interesting. And so actually part of why I asked, because I thought that maybe you would still be quite into athletics, is when I was researching for the show, one of the photos that came up with you was you doing very high level track hurdling, maybe?
Melanie Subbiah: 01:18:54
Oh yeah. I did run track in college briefly and I like soccer and track. I do enjoy like athletic pursuits, but I am more about the sleep with my fitness tracker, but yes I was supposed to run a 10k and then I had to move unexpectedly.
Jon Krohn: 01:19:12
Right.
Melanie Subbiah: 01:19:12
That was my recent thing. Yeah.
Jon Krohn: 01:19:13
Maybe next time you’re on the show, you’ll have a big athletics update for us.
Melanie Subbiah: 01:19:17
Yeah. Maybe.
Jon Krohn: 01:19:19
All right. So let’s move on to some of the good questions that we got on LinkedIn. All right. So we had a question from Nicolai Thomsen, who’s a data scientist at NTT Data Business Solutions and it was a question about bias, which we already addressed in the program. So thank you Nicolai for that great question. And that was probably related to, so we now have Serg Masís who responded to your question Nicolai online. He is now our researcher for the program asking brilliant questions and he already slotted in a bias question, but hopefully he addressed your thoughts. Another question here is from Bernard who’s a software engineer and a machine learning engineer and Bernard is wondering if NLP could be involved in the future with non-verbal communication, so body language, so incorporating, I guess, a machine vision system in some way. And yeah, so is there some way that NLP can go beyond just verbal communication and be able to pick out patterns from non-verbal communication as well?
Melanie Subbiah: 01:20:28
Yeah, I think that’s a really interesting question and it’s not something I have worked on, but I’ve worked with some other researchers who specialize more in that, which I think is really cool. And I think it gets more generally at just cool ways that we’re starting to combine information streams from multiple modalities. So I think Dolly, like we talked about, is a great example of that with how you can leverage both vision and language information to augment the learning in both areas. So I think that’s going to be really cool going forward and that also connects to what I was saying with sample efficiency too, that as humans, we don’t learn language just from written words on a page. We learn it from seeing people interact, we’re reading facial expressions, we’re using all of that to figure out the meaning connected to that language. So I think what you’re saying of incorporating more of these non-verbal cues could be really cool for teaching models more about how it’s to read and interpret humans.
Jon Krohn: 01:21:30
Nice. All right. And then we also had a question from Ted Hallum, who’s also a machine learning engineer and host of the data canteen program. So his question was about how we can continue to improve NLP abilities and models more sustainably? So we’ve actually already talked about that one as well. So last question here is from David who’s into data visualization and he’s into space, so he calls himself the space data guy. And I guess his question follows along from his interest in space and science fiction, I guess. He asks, “With few shot learning, how close are we to a Star Trek universal translator level technology, being able to functionally decode new languages in minutes?” So I guess the key thing here is initially when I read this question, I was thinking, just having a system that can in real time translate between languages.
Melanie Subbiah: 01:22:26
Yeah.
Jon Krohn: 01:22:27
We have that. But what he’s asking about is a system that could somehow decode new languages in minutes. So that’s a tricky one for you, Melanie.
Melanie Subbiah: 01:22:34
Yeah. I think that would require some more fundamental language understanding than we’re seeing at this point, which maybe gets into the structured learning that we were talking about earlier of incorporating more understanding of is there a universal grammar or something that we’re encoding into these systems? So yeah, I don’t know about that. I think the other direction is just, I do think we have an issue with what languages are available online, which we were also talking about before. So there’s a lot of types of languages out there that we just are not training systems on because they maybe are only spoken or the people who are using those languages are not also using the internet a lot. So I think that’s probably the biggest barrier right now in terms of figuring out some of these more or universal language components, is not just hugely skewing towards the vastly overrepresented languages that we have.
Jon Krohn: 01:23:33
Right. Right. Cool. Well, thank you for those great answers and thank you audience for the great questions. So we are near the end, Melanie. Do you have a book recommendation for us?
Melanie Subbiah: 01:23:49
Yeah. My current book that I’m reading is The Song Of Achilles, which has been good so far. And then I am part of a book club with some friends and we’re going to read Say Nothing next, which I haven’t read yet, but came highly recommended to me and is about the troubles in Northern Ireland.
Jon Krohn: 01:24:07
Ah, interesting.
Melanie Subbiah: 01:24:08
I’m excited for that one. Yeah.
Jon Krohn: 01:24:09
Yeah. Those are really interesting because they are in the very recent past.
Melanie Subbiah: 01:24:14
Yeah.
Jon Krohn: 01:24:14
And up until recent events in Eastern Europe, I thought that violence was completely behind us. And yeah, hopefully by the time this episode is published, that situation has gotten a lot better, not worse. All right. Well, Melanie, I know you’re not big into social media, but if our listeners are interested in being able to follow the amazing things you’re doing, the next paper you publish, is there any way that they can be following that?
Melanie Subbiah: 01:24:44
I think probably just my LinkedIn or Google Scholar are the best.
Jon Krohn: 01:24:50
Google scholar. Great choice.
Melanie Subbiah: 01:24:52
Yeah.
Jon Krohn: 01:24:53
Yeah. That makes a lot of sense. Yeah. So set up a notification for Melanie’s Google Scholar or you can follow her on LinkedIn. Wonderful. Thank you so much for taking the time today. I’ve absolutely loved filming this. It’s been so enjoyable doing it with you in person and yeah, I’m looking forward to having the opportunity to work with you on something again soon.
Melanie Subbiah: 01:25:12
Yeah. Thank you for having me.
Jon Krohn: 01:25:19
Well, I really hope we can get Melanie on the show again soon. I learned a ton from her and had such a fun time filming with her. In today’s episode, Melanie filled this in on how auto regressive models like GPT-3 always generate forward from prior natural language inputs, while bidirectional models like BERT can get language context in either direction from a given word of interest. She talked about how GPT-3 doesn’t need to be fine tuned to perform a wide range of few shot learning natural language tasks with greater than 50% accuracy, but that this fine tuning via the OpenAI API can improve performance on specific tasks while also potentially reducing unwanted biases. She talked about how GPT-3 struggles with natural language inference tasks that may be overcome in the future by non-real representations of information. She covered how renewable energy sources and sharing access to a smaller number of large models can mitigate the climate risks associated with large natural language models.
Jon Krohn: 01:26:20
She talked about how pair programming can be useful for sparking innovation. And she talked about how returning to academia from industry could be the right option for you, if you’re keen to explore different areas of data science freely and deeply over several years. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show. The URLs for Melanie’s Google scholar and LinkedIn profiles, as well as my own social media profiles at www.superdatascience.com/559. That’s www.superdatascience.com/559. If you’d like to ask questions of future guests of the show like several audience members did of Melanie during today’s episode, then consider following me on LinkedIn or Twitter, as that’s where I post who upcoming guests are and ask you for your thoughtful inquiries for them.
Jon Krohn: 01:27:09
On that note, if you live in the New York area and would like to experience a Super Data Science episode filmed live and ask the guest questions in real time, then come to MLconf NYC, which will be held on March 31st. That’s MLconf, The Machine Learning Conference on Thursday, March 31st. In addition to filming a SuperDataScience Episode live, I’ll also be doing a book signing session for my book, Deep Learning Illustrated. The first 10 folks in line will get a free copy, generously donated by my publisher Pearson. And after that I’ll be signing them and giving them away at cost. This will be my first conference experience in over two years and boy, am I ever excited about it. Hopefully I’ll get to meet you in person then. All right. Thank you to Ivana Zibert, Mario Pombo, Serg Masís, Sylvia Ogweng and Kirill Eremenko for managing, editing, researching, summarizing, and producing another freaking incredible episode for us today. Keep on rocking it out there folks and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.