Podcasts SDS 759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko

103 minutes
Artificial Intelligence, Data Science, Machine Learning

SDS 759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

SuperDataScience Founder Kirill Eremenko returns to the SuperDataScience podcast, where he speaks with Jon Krohn about transformer architectures and why they are a new frontier for generative AI. If you’re interested in applying LLMs to your business portfolio, you’ll want to pay close attention to this episode!

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Kirill Eremenko

Kirill is the founder of SuperDataScience and online educator who has created dozens of best-selling courses such as Data Science A-Z, Machine Learning A-Z and Artificial Intelligence A-Z. He is also a well-known instructor on Udemy where his courses have been enrolled in by over 2.5M students worldwide. Kirill is utterly passionate about the Ed-Tech Space and his goal is to deliver high-quality accessible education to everyone!

Overview

Kirill Eremenko presents “part 2” of his masterclass on transformer architecture. He begins the episode by recapping the function of decoders in transformer architecture, which he lays out in more detail in SDS 747. In a seminal paper on transformer architecture, “Attention is All You Need”, both decoders and encoders are used to facilitate learning tasks, such as translation, that demand contextual generation and alignment across the lexical and semantic structures of more than one language. So, for this episode, Kirill discusses how decoders and encoders work together.

Decoders in a transformer architecture, as covered in SDS 747, can be visualized as a five-storey building, where each level has a separate function necessary to the process of decoding. Here, Kirill visualizes the encoder as another distinct building and positions it next to the “five-storey” decoder building. The encoder is one storey short of the decoder building. Much like the decoder building, its storeys represent (1) input embedding, (2) positional encoding, (3) self-attention, and (4) feed-forward neural networks. Crucially, it does not have the decoder’s fifth storey – output – because only one result is necessary.

Using colorful examples that focus on translating sentences from English to Spanish, Kirill explains how encoders support decoders in returning accurate results. He also returns to the visual architecture of the encoder/decoder buildings to illustrate when and how the communication flows between these modules.

Listen to the episode to hear how encoders and decoders work together, the applications for encoder-decoder architectures, and how the SuperDataScience platform has become the hub for data scientists and machine learning engineers to learn about the tech industry’s groundbreaking developments as soon as they hit the market.

Large Language Models (LLM) A-Z Course

Most of the materials mentioned on this podcast are borrowed from Kirill & Hadelin’s newest course Large Language Models (LLM) A-Z. This is, by far, one of the best places to master LLMs – both in theory and hands-on application. Below are some previews from this course. Enroll today and get a head start into the future of your career!

Available ONLY on SuperDataScience, you can find this course at: www.www.superdatascience.com/llmcourse

In this episode you will learn:

How decoder-only transformers work [15:51]
How cross-attention works in transformers [41:05]
How encoders and decoders work together (an example) [52:46]
How encoder-only architectures excel at understanding natural language [1:20:34]
The importance of masking during self-attention [1:27:08]

Items mentioned in this podcast:

This episode is brought to you by Ready Tensor
This episode is brought to you by Oracle NetSuite CFO’s Ultimate KPI Checklist
This episode is brought to you by HPE Machine Learning Development Environment powered by Intel® Xeon® Scalable processors
SDS 747: Technical Intro to Transformers and LLMs, with Kirill Eremenko
Large Language Models (LLMs), Transformers & GPT A-Z
SuperDataScience Community
Kirill Eremenko on Udemy
GPT
Gemini
Llama
BERT
T5
BART
BloombergGPT
SDS 687: Generative Deep Learning, with David Foster
SDS 695: NLP with Transformers, feat. Hugging Face’s Lewis Tunstall
SDS 674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)
“Attention is All You Need” by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin
The Go-Giver by Bob Burg and John David Mann
The Super Data Science Podcast Team

Follow Kirill:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:03

This is episode number 759, with Kirill Eremenko, the founder and CEO of SuperDataScience. Today’s episode is brought to you by Ready Tensor, where innovation meets reproducibility, by Oracle NetSuite Business Software and by Intel and HPE Ezmeral Software.