SDS 728: Use Contrastive Search to get Human-Quality LLM Outputs

Podcast Guest: Jon Krohn

November 3, 2023

Learn how to achieve human-like outputs from LLMs in this week’s Five-Minute Friday with Jon Krohn. 

Early machine learning models solely relied on the training data we fed into it. This may have been acceptable for the simpler models of the previous decade. Today, however, we know that using training data alone isn’t an ideal way to get the best results for large language models.

The new wave of generative AI can take on far more complex tasks via a number of decoding methods. The question is: Which of these methods best serves our needs? In this Five-Minute Friday, host Jon Krohn walks through the options currently available to us: GREEDY SEARCH, BEAM SEARCH, SAMPLING, and CONTRASTIVE SEARCH. They each have their benefits: the first (greedy search) finds the word with the highest probability to follow another word, in order to complete a simple word sequence. Beam search improves upon greedy search by playing the long game: completing a sentence by identifying multiple high-probability words that may have been hiding behind low-probability words in the greedy search. So, the words that function best in the context of a sentence are better identified by beam search.
And yet beam search also has drawbacks, which can be overcome with the help of two other search options: sampling and contrastive search. Listen to the episode to hear how these last two options sidestep the previous options’ repetitiveness, and the benefits that Jon’s team at his machine learning company Nebula find in using contrastive search.

Podcast Transcript

(00:06):
This is Five-Minute Friday on how Contrastive Search is What You Need for Decoding Generative AI Outputs. 

(00:27):
Welcome back to the Super Data Science Podcast. I’m your host, Jon Krohn. Historically, when we deployed a machine learning model into production, the parameters that the model learned during its training on data were the sole driver of the model’s outputs. With the Generative LLMs that have taken the world by storm in the past few years, however, the model parameters alone are not enough to get reliably high-quality human-like outputs. For that, the so-called decoding method that we choose when we deploy our LLM into production is also absolutely critical. Let’s start off by talking about something called greedy search. So although we would rarely use it in practice, the simplest decoding method to understand is one called greedy search. And so we’ll start with greedy search.
(01:17):
With greedy search, all we do is select the highest probability next word in the generated sequence. This might sound like a good idea on the surface, but the problem with it is that greedy search is not optimal at generating high-probability sentences because it often misses very high-probability words that are hidden just behind low-probability words, in a sequence. A decoding method called beam search was devised to overcome the inability of greedy search to select these high-probability words that are hidden behind the low-probability words. You can check out the medium article that I linked to in the show notes to see visuals of this, which makes it much easier to understand. But essentially, beam search looks ahead several words, and at each of those times steps, it tries out several different high-probability words. How many words beam search looks ahead is a hyperparameter that you select, so too is how many words it tries at each of those timesteps. 
(02:14):
As you might’ve guessed, computing these look-aheads in beam search can drastically increase the computational complexity of generating a sequence, but the upshot is that your model ends up outputting better, higher probability sentences than greedy search typically does. So why doesn’t today’s Five-Minute Friday episode end right here? Well, of course that’s because beam search has its own drawbacks. Most predominantly, beam search has a tendency to generate highly repetitive sequences. To overcome beam searche’s ugly repetitiveness, an alternative is to sample what the next word should be in a sequence. In this paradigm, the highest probability word will be selected most frequently, but not always. 
(02:59):
More specifically using the technical terminology of probability theory, we sample words in this sampling paradigm according to the probability distribution that we extract from the large language model that we’re using. Output is generally more human-like when we do this and more coherent, although it should be noted that when we sample, as with any random sampling process, when we use sampling as our LLM decoding method, it means that text generation is not deterministic. That is unlike the greedy search or the beam search that I already described in this episode. When we use sampling, we’ll get a different output from our LLM every time we run it. This is why when you use, say, ChatGPT, you get a different unique response each time, even if you ask it the same question twice. 
(03:49):
Two specific and popular decoding approaches that leverage sampling are Top-K sampling and nucleus sampling. I’ll leave it to you to dig into these if you fancy doing so, but the key takeaway is that both of these sampling-based approaches to decoding support the generation of fluent human-like outputs from a well-trained LLM. So that sounds pretty good, right? And sampling is! Sampling would be a fine choice for decoding your LLM in production – that is if the far superior contrastive search hadn’t been invented. Revealed for the first time at the venerable NeuraIPS conference last December, contrastive search was developed by an international public-private consortium of AI researchers from the University of Cambridge, Google DeepMind, University of Hong Kong, and Tencent in China. 
(04:38):
I’ve provided the full paper in the show notes as well as the link to the GitHub repo for implementing contrastive search. I haven’t yet been able to think up an intuitive way of describing the mathematics of contrastive search that are detailed in the paper, but the essential takeaway from this episode is that from our experience training and deploying generative AI models at my machine learning company, Nebula, using contrastive search results in, by far the best, most human-like outputs from large language models. This means that wherever you have the opportunity to use contrastive search in production, and the services that make it easy to deploy LLMs increasingly do include an implementation of contrastive search in it, you should go ahead and use it. So try it out yourself and I’m confident you’ll be impressed. 
(05:24):
All right, that’s it for today. I hope you found today’s episode to be both interesting and informative. Until next time, keep on rocking it out there and I’m looking forward to enjoying another round of the Super Data Science Podcast with you very soon. 
Show All

Share on

Related Podcasts