(00:06):
This is Five-Minute Friday on how Contrastive Search is What You Need for Decoding Generative AI Outputs.
(00:27):
Welcome back to the Super Data Science Podcast. I’m your host, Jon Krohn. Historically, when we deployed a machine learning model into production, the parameters that the model learned during its training on data were the sole driver of the model’s outputs. With the Generative LLMs that have taken the world by storm in the past few years, however, the model parameters alone are not enough to get reliably high-quality human-like outputs. For that, the so-called decoding method that we choose when we deploy our LLM into production is also absolutely critical. Let’s start off by talking about something called greedy search. So although we would rarely use it in practice, the simplest decoding method to understand is one called greedy search. And so we’ll start with greedy search.
(01:17):
With greedy search, all we do is select the highest probability next word in the generated sequence. This might sound like a good idea on the surface, but the problem with it is that greedy search is not optimal at generating high-probability sentences because it often misses very high-probability words that are hidden just behind low-probability words, in a sequence. A decoding method called beam search was devised to overcome the inability of greedy search to select these high-probability words that are hidden behind the low-probability words. You can check out the medium article that I linked to in the show notes to see visuals of this, which makes it much easier to understand. But essentially, beam search looks ahead several words, and at each of those times steps, it tries out several different high-probability words. How many words beam search looks ahead is a hyperparameter that you select, so too is how many words it tries at each of those timesteps.
(02:14):
As you might’ve guessed, computing these look-aheads in beam search can drastically increase the computational complexity of generating a sequence, but the upshot is that your model ends up outputting better, higher probability sentences than greedy search typically does. So why doesn’t today’s Five-Minute Friday episode end right here? Well, of course that’s because beam search has its own drawbacks. Most predominantly, beam search has a tendency to generate highly repetitive sequences. To overcome beam searche’s ugly repetitiveness, an alternative is to sample what the next word should be in a sequence. In this paradigm, the highest probability word will be selected most frequently, but not always.
(02:59):
More specifically using the technical terminology of probability theory, we sample words in this sampling paradigm according to the probability distribution that we extract from the large language model that we’re using. Output is generally more human-like when we do this and more coherent, although it should be noted that when we sample, as with any random sampling process, when we use sampling as our LLM decoding method, it means that text generation is not deterministic. That is unlike the greedy search or the beam search that I already described in this episode. When we use sampling, we’ll get a different output from our LLM every time we run it. This is why when you use, say, ChatGPT, you get a different unique response each time, even if you ask it the same question twice.
(03:49):
Two specific and popular decoding approaches that leverage sampling are Top-K sampling and nucleus sampling. I’ll leave it to you to dig into these if you fancy doing so, but the key takeaway is that both of these sampling-based approaches to decoding support the generation of fluent human-like outputs from a well-trained LLM. So that sounds pretty good, right? And sampling is! Sampling would be a fine choice for decoding your LLM in production – that is if the far superior contrastive search hadn’t been invented. Revealed for the first time at the venerable NeuraIPS conference last December, contrastive search was developed by an international public-private consortium of AI researchers from the University of Cambridge, Google DeepMind, University of Hong Kong, and Tencent in China.
(04:38):
I’ve provided the full paper in the show notes as well as the link to the GitHub repo for implementing contrastive search. I haven’t yet been able to think up an intuitive way of describing the mathematics of contrastive search that are detailed in the paper, but the essential takeaway from this episode is that from our experience training and deploying generative AI models at my machine learning company, Nebula, using contrastive search results in, by far the best, most human-like outputs from large language models. This means that wherever you have the opportunity to use contrastive search in production, and the services that make it easy to deploy LLMs increasingly do include an implementation of contrastive search in it, you should go ahead and use it. So try it out yourself and I’m confident you’ll be impressed.
(05:24):
All right, that’s it for today. I hope you found today’s episode to be both interesting and informative. Until next time, keep on rocking it out there and I’m looking forward to enjoying another round of the Super Data Science Podcast with you very soon.