Podcasts SDS 791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert

57 minutes
Artificial Intelligence, Deep Learning, Machine Learning

SDS 791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Reinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to Jon Krohn about the technique’s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Nathan Lambert

Nathan Lambert is a Research Scientist at the Allen Institute for AI and the author of the AI newsletter interconnects.ai focusing on fine-tuning language models from human preferences and advocating for open-source AI. Previously, he helped build an RLHF research team at HuggingFace. He received his PhD from the University of California, Berkeley, during which he worked at Meta AI and Google DeepMind on machine learning and robotics.

Overview

As a research scientist for the nonprofit the Allen Institute for AI (popularly known as AI2), Nathan and the team at AI2 widen access to AI tools for good. Nathan’s special focus is on reinforcement learning through human feedback (RLHF). For Nathan, opening access to AI is so important because of its growing ubiquity. This, he believes, means that AI and tech literacy will be essential tools for everyone. Nathan has always had an interest in the usefulness of tech and AI in social applications. Early in his career, he focused on robotics and machine learning at UC Berkeley, which helped satisfy his curiosity about consumer resistance to buying robots. He feels these hesitations may change in the next few years, partly due to their increased capabilities through generative AI. Nathan uses the example of the Boston Dynamics robot to note the huge developments in this space and that, despite these developments, AI engineers still have several obstacles to contend with before offering robotics to the mass market.

Nathan and Jon also discussed how generative AI may change the landscape for a wide range of industries. For Nathan, the ability of generative AI to use mixed audiovisual media rather than only text will improve a number of capabilities, especially when it comes to teaching and working with younger generations. Nathan is also excited by its ability to teach, potentially democratizing education access across the globe. While he expresses concern about using AI instead of humans to label preferences, he highlights a “further” option in constitutional AI. Developed by the team behind Anthropic, this technique comprises two stages – supervised learning and reinforcement learning – where some of the preference data is AI-generated during the feedback stage, reducing subjectivity and bias by applying the governing principles of freedom and equality.

Finally, Jon quizzes Nathan about his emphasis on AI being “closer” to alchemy than science. Nathan says that AI involves some feeling in the dark, where after a certain point, hypotheses cannot be held up to the usual rigors of scientific scrutiny. In Nathan’s words, “deep learning being uninterpretable fundamentally makes it kind of hard to do science.” [39:05]

Listen to the episode to hear the efficacy and scalability of direct preference optimization, how long Nathan thinks it will take for the wide-scale adoption of robots, and the challenges of relying on human preferences when training AI models.

In this episode you will learn:

Why it is important that AI is open [03:13]
The efficacy and scalability of direct preference optimization [07:32]
Robotics and LLMs [14:32]
The challenges to aligning reward models with human preferences [23:00]
How to make sure AI’s decision making on preferences reflect desirable behavior [28:52]
Why Nathan believes AI is closer to alchemy than science [37:38]

Items mentioned in this podcast:

This episode is brought to you by AWS Trainium and AWS Inferentia
This episode is brought to you by Crawlbase – Use the special code SUPERDATASCIENCE (no spaces) to unlock 10,000 free requests
Allen Institute for AI
OLMo
Interconnects.ai
“RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback” by Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash
“Zephyr: Direct Distillation Of LM Alignment” by Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clementine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, and Thomas Wolf
Dexterity
Ambi
Covariant
“Proximal Policy Optimization Algorithms” by John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
“Camels in a Changing Climate: Enhancing LM Adaptation With Tulu 2” by Hamish Ivison
NVIDIA Project GR00T
“DeepMind uses AI to control plasma inside tokamak fusion reactor” by Matthew Sparkes
“The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback” by Nathan Lambert, Roberto Calandra
Anthropic
Von Neumann–Morgenstern utility theorem
“Behind the curtain: what it feels like to work in AI right now (April 2023)” by Nathan Lambert
Sapiens by Yuval Noah Harari
“Are Emergent Abilities of Large Language Models a Mirage?” by Rylan Schaeffer, Brando Miranda, Sanmi Koyejo
The Three-Body Problem by Cixin Liu
Going Infinite by Michael Lewis
Collision Conference
The Super Data Science Podcast Team

Follow Nathan:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00

This is episode number 791 with Dr. Nathan Lambert, Research Scientist at the Allen Institute for AI. Today’s episode is brought to you by AWS Cloud Computing Services, and by Crawlbase, the ultimate data-crawling platform.