13 minutes
SDS 860: DeepSeek R1: SOTA Reasoning at 1% of the Cost
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
DeepSeek-curious? This Five-Minute Friday is for you! Jon Krohn investigates the overwhelming overnight success of this new LLM, the product of a Chinese hedge fund. DeepSeek is a market newcomer, and yet it runs shoulder to shoulder with behemoths from OpenAI, Anthropic and Google like it’s all in a day’s work.
Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.
News of DeepSeek’s capabilities had an enormous effect on the economy, contributing to a significant fall in Nvidia’s share prices. Listen to the episode to hear just how significantly DeepSeek is disrupting the market, what DeepSeek’s comparatively low costs mean for that recently announced $500 billion Stargate project, and where we now stand on the map towards artificial general intelligence.
- SDS 707: Vicuña, Gorilla, Chatbot Arena and Socially Beneficial LLMs, with Prof. Joey Gonzalez
- SDS 748: The Five Levels of AGI
- SDS 778: Mixtral 8x22B: SOTA Open-Source LLM Capabilities at a Fraction of the Compute
- SDS 820: OpenAI’s o1 “Strawberry” Models
- deepseek-r1
- “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”
- “How Did DeepSeek Train Its AI Model On A Lot Less – And Crippled – Hardware?” by Timothy Prickett Morgan
- ChatBot Arena LLM Leaderboard
- DeepSeek GitHub
DID YOU ENJOY THE PODCAST?
- How do you think DeepSeek will change the landscape of AI development worldwide?
- Download The Transcript
Podcast Transcript
Jon Krohn: 00:05
This is episode number 860 on the DeepSeek R1.
00:27
Welcome back to the SuperDataScience Podcast. I'm
your host, Jon Krohn. Let's start off with a couple recent reviews of the show
like we sometimes do on Fridays the first ones from Remnasa who provided an Apple podcast
review that says that they listen often and that they've been listening to these Super Data Science
podcasts for a couple of years now and they always find the content fascinating and they say that
sometimes the content is a bit over their head, but you can bet that they look up the information
and learn from every episode. Very cool. Nice to hear that, Remnasa.
00:59
And we had a second Apple podcast review as well. This one is
from sailATX. It says the Super Data Science podcasts is a fantastic way to keep up with the
world of AI and the people who work in the industry. They say that the guests that we have on the
show are spot on and always interesting. They also have nice things to say about my YouTube
calculus course and it's helping them brush up their math for a data science course that they're
taking, cool. Good luck with that course, sailATX, and I hope both of you, Remnasa and sailATX,
continue to enjoy this show.
01:33
Thanks for all the recent ratings and feedback on Apple
Podcasts, Spotify, and all the other podcasting platforms out there, as well as for likes and
comments on YouTube videos as a bit of friendly competition. I mentioned this for the first time a
couple of weeks ago. Regular listeners may know that I've guest co -hosted the excellent last
week in AI podcast half a dozen times, and both regular hosts of that show, Andre and Jeremy,
have been my guests on the SuperDataScience podcast. Well, despite their show being many
years younger than the Super Data Science podcast, they are closing in on the Super Data
Science podcast in terms of number of Apple podcast ratings. At the time of recording, we have
286 and they're at 255. So we're staying ahead. And since I last mentioned this a couple of weeks
ago, both podcasts have had about five ratings each. So we're staying neck and neck, but it does
seem like I need you to keep going at it and press towards 300 ratings there on Apple podcast. So
help me stay ahead of Andre and Jeremy by heading to your podcasting app and rating, the Super
Data Science podcast there. Bonus points, if you leave written feedback, I'll be sure to read it on
air like I did today.
02:40
All right, into the meat of today's episode now. In recent weeks,
I'm sure you've noticed there's been a ton of excitement over Deepseek, a Chinese AI company
that was spun out of a Chinese hedge fund just two years ago. Deepseek's v3 stream of
consciousness chatbot style model caught the world's attention because it was able to perform
near state -of -the -art models like OpenAI's GPT-4 and Google's Gemini 2.0 Flash, but it was
Deepseek’s reasoning model, which is kind of like OpenAI's 01 reasoning model, so instead of
having a stream of consciousness, these reasoning models review what they've been "thinking"
before necessarily pumping something out and this kind of reasoning has turned out to be great for the same kinds of tasks that you might ponder and reason with a pencil and paper over math
problems, computer science problems, those kinds of things. So you can hear more about these
kinds of reasoning models in episode 820 of this show, but suffice it to say that DeepSeek's
reasoning model called R1 caused huge economic disruption, such as both Nvidia's share price
falling by 17 % and the Nasdaq falling several percent last Monday. At the time of writing,
DeepSeek's R1 reasoning model is statistically, so within a 95 % confidence interval tied for first
place on the overall LM Arena leaderboard with the top models.
02:40
It's literally in first place, statistically speaking, alongside GPT-
4o and Gemini 2.0 Flash from Google. So the LM Arena Leaderboard is one of many kinds of
leaderboards that you could use to compare LLM performance, but the LM Arena Leaderboard is
particularly interesting because it involves humans blindly rating the performance of one output
versus another.
04:38
It's an interesting leaderboard. You could actually hear a ton
about that leaderboard in episode number 707 of this podcast if you'd like to. Anyway, this great
performance being on top of the Eleanorina leaderboard and other kinds of leaderboards out there
caught global attention first because DeepSeek is an obscure Chinese company while all the
previous top models were devised by Americans, specifically by Bay Area tech giants. More
consequentially than even great power politics, however, Deepseek's R1 caused a global
economic tsunami because it is comparable in performance to the best OpenAI Google and
Anthropic models while costing a fraction as much to train.
05:19
There are all kinds of complexities, externalities and estimates
to take into account when trying to make a comparison in cost between two different LLMs at two
different companies. For example, what about the cost of training runs that didn't pan out? But
speaking in rough approximations, training a single DeepSeek V3 or DeepSeek R1 model
appears to cost on the order of millions of dollars. While training a state of the art Bay Area model
like 01 Gemini or Claude 3 .5 Sonnet reportedly costs on the order of hundreds of millions of
dollars. So about about 100X more.
05:52
As I've stated on this show several times, even without
conceptual scientific breakthroughs, simply scaling up the transformer architecture that underlies
O1, Gemini or Claude, such as by increasing training dataset size, increasing the number of
model parameters, increasing training time compute, or in the case of reasoning models like o1,
increasing inference time compute. And so doing any of that kind of scaling will lead to impressive
LLM improvements that overtake more and more humans on more and more cognitive tasks and
bring machines in the direction of artificial general intelligence. If you don't know what AGI is, you
can check out episodes number 748 and 824 more on all of what I just said in the last sentence. Implicit in this scaling statement that I made, however, is that if researchers can devise major
conceptual scientific breakthroughs with respect to how machines learn. So, you know, actually
making scientific breakthroughs instead of just scaling things up, we could accelerate toward AGI
even more rapidly.
06:54
If conceptual breakthroughs on AI model development can allow
machines to improve their cognitive capabilities while also learning more efficiently, this would
reduce server farm energy consumption, loss of fresh water through server cooling, and of course,
it would just save plain old financial costs associated with running AI models. DeepSeek has
achieved such a conceptual breakthrough by combining a number of existing ideas like mixture of
experts models, you can learn more about those in episode 778, combining those kinds of existing
ideas with brand new major efficiencies such as a GPU communications accelerator called dual
pipe that schedules the way data pass between the couple thousand GPUs DeepSeek appeared
to train R1 with to get the breathtaking results that they did.
07:40
Now, 2000 GPUs might sound like a lot, but it's again about 1%
of the number of chips met as Mark Zuckerberg and XAI's Elon Musk bragged about procuring in
a given year for potentially training a single ever larger next large language model. I'm not going to
go further into the technical details of the DeepSeek models in this episode. But if you'd like to dig
into the technical aspects more deeply I have provided a link to DeepSeeks full R1 paper as well
as an exceptionally detailed well -written blog post on an online tech news site called NextPlatform
that breaks down that paper.
08:19
Moving beyond technical aspects to geopolitics, DeepSeek
success demonstrates that American sanctions that prevent Chinese firms from accessing the
latest most powerful Nvidia chips have been ineffective. These sanctions were explicitly designed
to prevent China from being able to overtake the US on the road to AGI, particularly given the
military implications of having access to a machine that could far exceed human cognitive
capabilities. But now a Chinese firm has figured out how to approach U.S. firms' AI capabilities
with about 1% of the quantity of chips at about 1% of the cost and using less capable Nvidia chips
than American firms have access to. On a side note, in a separate quandary for the Chinese
Communist Party, for geopolitical reasons, they'd probably prefer the DeepSeek's intellectual
property be kept proprietary, and yet, DeepSeek graciously open-source their work for the world to
leverage and advance AI research, as well as AI application development.
09:13
All of the DeepSeek, V3 and R1 source code and model
weights are available on GitHub, I've got a link to that in the show notes, and all of that source
code and model weights are available for use under a highly permissive MIT license. All aspects
of proprietary models like those from OpenAI, Google, Anthropic, and xAI are, on the other hand, proprietary that's another big positive for the AI community from the folks at DeepSeek. This
level of openness from DeepSeek is far beyond even what so-called open LLMs like Meta's Lama
family offer because Meta provides model weights but not source code and Meta's unusual
license includes constraints such as limiting Lama model usage to companies with fewer than 700
million active users.
10:00
Beyond providing their models open source, DeepSeek also
created an iOS app. It was number one in the Apple App Store at the time of recording this
episode, but I would caution you against using the DeepSeek app because, per the app's privacy
policy, anything you input into Deepseek's app is collected by the company and stored on servers
in China. If you'd like to privately use a DeepSeek model but don't want to take your time or
money to download the model weights and run it on your own hardware, you can use a platform
like Olama. I've got a link in the show notes to the R1 model from DeepSeek provided by Olama,
so you can do just that.
10:35
Okay, so hopefully you're excited that you now have untethered
access to state-of-the-art AI capabilities, but that should only be the beginning of your excitement.
So, markedly more efficient LLM training does make recent $6 billion dollar raises by OpenAI, xAI,
and Anthropic, much of which would have been earmarked for training ever larger transformer
architectures for ever longer inference time. It looks like those raises may no longer be very well
allocated capital. And the DeepSeek’s release ended up being coincidentally, but nevertheless
comically timed with the announcement of the $500 billion Stargate AI infrastructure project that
included the CEOs of OpenAI, Oracle, and SoftBank alongside Donald Trump, that enormous
$500 billion stargate figure probably only made sense when bean counters assumed LLMs would
keep growing and growing by orders of magnitude in the coming years. And yeah,
correspondingly, NVIDIA's share price took a 17 % hit in one day, although at the time of writing
this podcast episode and recording at some of this share price hit had recovered.
11:44
But that share price took that 17% hit because shareholders
realized the LLM size increases they'd baked into future GPU orders may no longer come to
fruition. But for most of us, certainly for me and probably for most listeners, markedly more
efficient LLM training and a rehashing of the open source model that dominated AI model
research until just a few years ago is fabulous news. Increased LLM efficiency in particular means
fewer environmental issues associated with AI and it means that developing, training, and running
AI models is more economical and therefore developing practical AI applications becomes
cheaper and more widely available for all to use and benefit from around the world. These are
exciting times indeed. Dream up something big and make it happen. There's never been an
opportunity to make an impact like there is today.
12:37
All right, that's it for today's episode. If you enjoyed it or know
someone who might consider sharing this episode with them, leave a review of the show on your
favorite podcasting platform. Tag me in a LinkedIn or Twitter post with your thoughts. And if you
aren't already, obviously subscribe to the show. Most importantly, however, I just hope you'll keep
on listening. Until next time, keep on rocketing it out there and I'm looking forward to enjoying
another round of the SuperDataScience podcast with you very soon.
Show all
arrow_downward