Podcastskeyboard_arrow_rightSDS 674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)

5 minutes

Data ScienceArtificial Intelligence

SDS 674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)

Podcast Guest: Jon Krohn

Friday Apr 28, 2023

Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn


How does one avoid catastrophic forgetting and reduce the costs in training your Large Language Model(LLM)? This week, Jon introduces listeners to Parameter-Efficient Fine-Tuning (PEFT) methods for the efficient fine-tuning of LLMs. He takes a quick look at Low-Rank Adaptation (LoRA), explains the method in detail and shares the benefits of the approach. 
 

Models like Alpaca, Vicuña, GPT4All-J and Dolly 2.0 may boast relatively small model architectures, but they're prohibitively expensive to train even on a small amount of your own data. This week, Jon Krohn explores a number of Parameter-Efficient Fine-Tining (PEFT) methods that facilitate the efficient fine-tuning of large language models (LLMs). 

There are various PEFT methods, but Jon's focus in this episode is on LoRA, Low-Rank Adaptation, which is currently the most established of them all. Not only does LoRA perform on-par with training LLMs but actually outperforms it in certain cases. The best part? With help from the Hugging Face PEFT library, you don't need to be a linear algebra expert to implement LoRA. It certainly is an exciting time to be a data scientist, and Jon is looking forward to exploring even more possibilities in future episodes of the podcast, so stay tuned!

ITEMS MENTIONED IN THIS PODCAST:  

DID YOU ENJOY THE PODCAST?
  • Can you think of further benefits that the LoRA approach delivers to the fine-tuning of models?
  • Download The Transcript
(00:05): This is Five-Minute Friday on the Parameter-Efficient Fine-Tuning of Large Language Models using LoRA — Low-Rank Adaptation.

(00:27): For last week’s Five-Minute Friday, Episode #672, I introduced four models — Alpaca, Vicuña, GPT4All-J, and Dolly 2.0 — that were fine-tuned using tens or hundreds of thousands of instruction-response pairs to create a model that performs comparably to state-of-the-art natural-language generation models like the GPT-4 model that runs in behind ChatGPT. At the end of the episode, I suggested that you could then fine-tune one of these models on your own proprietary data to create a ChatGPT-like Large Language Model that is specific to your conversational A.I. needs or your clients’ needs. 

(01:05): But even these relatively small model architectures I covered last week, with on the order of tens of billions of model parameters, can be prohibitively expensive to train on even a small amount of your own data. Even worse is that the standard model-training protocol — in which you train most or all of the model parameters — can lead LLMs to so-called, and frankly, overly dramatic, “catastrophic forgetting”, yes, ) “catastrophic forgetting” wherein the model can no longer perform at the broad range of generative tasks it was originally trained to.

(01:41): Thankfully, a number of state-of-the-art Parameter-Efficient Fine-Tuning or PEFT for short, PEFT methods have recently emerged. And this facilitate well, the efficient fine-tuning of LLMs. In the show notes, I’ve provided a link to a GitHub repo from Hugging Face that includes all of the details and, at the time of recording, five different PEFT methods. All PEFT approaches bring benefits including having the model weights that you’re training taking up just megabytes of space per checkpoint, even when you’re fine-tuning an extremely large model like 170-billion-parameter GPT-3; Again, have the approaches avoid catastrophic forgetting; They can perform better when you’re fine-tuning on a small amount of data; They can generalize better to out-of-training-set instructions; And they can be applied to other A.I. use cases — not just NLP — such as machine vision.

(02:33): Of the various PEFT approaches, LoRA — which is short for Low-Rank Adaptation — is the most established so let’s focus on LoRA. The essential unit of Low-Rank Adaptation is a linear algebra concept called the low-rank decomposition matrix, which is a bit too technical to dig into in detail in a podcast format, but essentially is a matrix that represents data in a lower-dimensional space, making it easier to process computationally. So these relatively easy to process low-rank decomposition matrices are inserted into each layer of the original transformer architecture. That might make it sound like we now have even more model parameters to train in our LLM, but the trick is that we freeze all of the model weights in our LLM except those of the new low-rank matrices we’ve inserted. This ends up reducing the number of trainable parameters in our model by about 10,000 times, 10,000 times. And it also reduces the memory footprint when we’re training our model by about three times, which means that we don’t need nearly as large of a GPU or as many GPUs to train our Large Language Model. 

(03:43): And what’s the result of this hack? Well, in some cases, LoRA not only performs on-par with training the entire LLM but actually outperforms it. The best news is that you don’t have to be a linear algebra whiz to implement LoRA. You can simply use the Hugging Face PEFT library that I’ve included in the show notes. 

(04:03): Now, you may remember that I mentioned earlier that the Hugging Face PEFT library currently supports five PEFT, Parameter-Efficient Fine-Tuning approaches. Other than LoRA, that we already detailed, one other PEFT approach that you might want to take a good look at is AdaLoRA — short for adaptive LoRA. This was released just last month by researchers at Georgia Tech, Princeton, and Microsoft. AdaLoRA is less established, but purports to offer performance benefits over ordinary LoRA by not fine-tuning equally across the entire transformer architecture; instead, if fine-tunes adaptively in parts of the model architecture that appear like they could benefit from fine-tuning most. 

(04:44): All right, hope that sounds cool. Thanks to Dr. Grant Beyleveld on my data science team at Nebula for introducing me to PEFT and LoRA. Much of what I covered in this episode was taught to me by Grant, so thanks. I hope you enjoyed this practical episode. With all of these emerging tools for supercharging our capabilities as data scientists, there’s never been a more exciting time in my career to be one. And it seems to just be getting even better.

(05:10): Well, that’s it for today. Until next time, keep on rockin’ it out there, folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.  

Show all

arrow_downward

Share on