Podcasts SDS 793: Bayesian Methods and Applications, with Alexandre Andorra

93 minutes
Data Science, Data Visualization, Statistics

SDS 793: Bayesian Methods and Applications, with Alexandre Andorra

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Join Alex Andorra, co-founder of PyMC Labs, and Jon Krohn as they dive deep into the world of Bayesian statistics. They tackle how Bayesian methods solve tough problems, use prior knowledge effectively, and make the most of limited data.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Alexandre Andorra

Alex is an applied scientist at the crossroads of Bayesian statistics, Causal inference and Sports analytics. A core developer of the blockbuster python package PyMC, Alex helps clients transform data into evidence-based decisions. It’s in the high-stakes world of sports analytics that Alex truly thrives, projecting baseball and soccer players’ performance. A passionate educator, Alex created the Learning Bayesian Statistics podcast and co-founded the Intuitive Bayes platform, where he demystifies Bayesian stats through examples, making complex concepts accessible and engaging.

Overview

Imagine navigating the complex world of data with a tool that doesn’t just crunch numbers but also incorporates your prior knowledge and handles uncertainty with ease. That’s the power of Bayesian statistics, and in this episode of the Super Data Science Podcast, Jon Krohn chats with Alex Andorra, the co-founder and principal data scientist at PyMC Labs, to unpack this fascinating approach. Alex dives right into how Bayesian stats set themselves apart by solving problems traditional methods can’t, especially when dealing with noisy or limited data.

The conversation shifts to the tools that make Bayesian modeling accessible and efficient. Alex highlights PyMC, the leading Python library for Bayesian stats, powered by PyTensor for lightning-fast computations. For beginners, there’s Bambi, a user-friendly tool built on PyMC. And for those who need to ensure their models are rock solid, Alex introduces ArviZ, a library designed for post-modeling diagnostics and visualizations. These tools collectively demystify the complexity of Bayesian statistics and make it practical for real-world applications.

One of the most exciting parts of the episode is the discussion on Gaussian processes (GPs). Alex explains how GPs can model nonlinear relationships in data, making them incredibly versatile for both spatial and temporal data. He shares a compelling case study from an NGO in Estonia, where Bayesian hierarchical models and post-stratification were used to glean insights from polling data. This example vividly demonstrates how Bayesian methods can turn limited data into valuable, actionable insights.

Alex’s enthusiasm for Bayesian stats is contagious as he talks about his work at PyMC Labs, where his team builds custom models to tackle complex problems across various industries. He also points listeners to resources like the Intuitive Bayes educational platform for those eager to learn more about this statistical approach. With engaging anecdotes and practical insights, this episode is a treasure trove of information for anyone looking to harness the power of Bayesian statistics.

In this episode you will learn:

Practical introduction to Bayesian statistics [04:54]
Definition and significance of epistemology [17:52]
Explanation of PyMC and Monte Carlo methods [27:57]
How to get started with Bayesian modeling and PyMC [34:26]
PyMC Labs and its consulting services [50:50]
ArviZ for post-modeling diagnostics and visualization [01:02:23]
Gaussian processes and their applications [01:09:02]

Items mentioned in this podcast:

This episode is brought to you by Crawlbase – Use the exclusive code SUPERDATASCIENCE to unlock an additional 10,000 free requests, a value of $42
PyMC Labs
Intuitive Bayes
Learning Bayesian Statistics Podcast
PyMC
PyMC GitHub
PyStan
NumPyro
Bambi
PyTensor
ArviZ
nutpie GitHub
Google Gemini
Claude
How to Use Gaussian Processes with Spatial Data (Blog post by Luciano Paz on the PyMC Labs website)
HSGP First Steps and Reference (Tutorial on the PyMC website)
How To Do Fast and Efficient Gaussian Processes [Webinar]
Hierarchical Bayesian Modeling of Survey Data with Post-stratification
Advanced Regression
Learning Bayesian Statistics Podcast Episode 50 with David Spiegelhalter
Learning Bayesian Statistics Podcast Episode 51 with Aubrey Clayton
Learning Bayesian Statistics Podcast Episode 63 with Luciano Paz
Learning Bayesian Statistics Podcast Episode 83 with Tarmo Jüristo
SDS 585: PyMC for Bayesian Statistics in Python with Thomas Wiecki
SDS 715: Make Better Decisions with Data, with Dr. Allen Downey
SDS 507: Bayesian Statistics with Rob Trangucci
SDS 607: Inferring Causality with Jennifer Hill
Bernoulli’s Fallacy by Aubrey Clayton
Probability Theory by E.T. Jaynes
Statistical Rethinking by Richard McElreath
Dreams From My Father by Barack Obama
How to Change by Katy Milkman
How Minds Change by David McRaney
Meditations by Marcus Aurelius
The Daily Stoic by Ryan Holiday
Intro to Probability Theory
Probability Level II
SDS special code for a free 30-day trial on O’Reilly: SDSPOD23
Collision Conference
The Super Data Science Podcast Team

Follow Alex:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00 This is episode number 793 with Alex Andorra, co-founder and principal data scientist at PyMC Labs. Today’s episode is brought to you by Crawlbase, the ultimate data crawling platform.

00:00:16

Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.

00:00:47

Welcome back to the Super Data Science Podcast. I’m so happy to have the tremendous Alex Andorra as our guest on our show today all about Bayesian statistics. Alex is co-founder and principal data scientist at PyMC Labs, a firm that develops PyMC, the leading Python library for Bayesian stats, and he consults with their clients to implement profit increasing Bayesian models for them. He’s also co-founder and instructor at an online learning platform called Intuitive Bayes, and that platform provides free Bayesian stat’s education. And he’s creator and host of an excellent podcast, of course, also on Bayesian stats and it’s called Learning Bayesian Statistics.

00:01:26

Today’s episode will probably appeal most to hands-on practitioners like statisticians, data scientists, and machine learning engineers, but the episode also serves as an introduction to Bayesian statistics for anyone who’d like to learn about this important, unique, and powerful field. In today’s episode, Alex details what Bayesian stats is, the situations where Bayesian stats can solve problems that no other approach can, resources for learning Bayesian stats, the key Python libraries for implementing Bayesian models yourself, and how Gaussian processes can be incorporated into a Bayesian framework in order to allow for especially advanced and flexible models. All right, you ready for this tremendous episode? Let’s go.

00:02:10

Alex, welcome to the Super Data Science Podcast. I’m delighted to have you here. Such an experienced podcaster. It’s going to be probably fun for you to get to be the guest on the show today.

Alex Andorra: 00:02:22

Yeah, thank you, Jon. First, thanks a lot for having me on. I knew about your podcast, so I was both honored and delighted when I got your email to come on the show. I know you have had very honorable guests before like Thomas Wiecki, so I will try to be on par, but I know that’s going to be hard.

Jon Krohn: 00:02:46

Yeah, Thomas, your co-founder at PyMC Labs was indeed a guest. He was on episode number 585, but that is not what brought you here. Interestingly, the connection, so you asked me before we started recording how I knew about you. And so a listener actually suggested to you as a guest. So Doug McLean, thank you for the suggestion. Doug is lead data scientist at Tesco Bank in the UK, and he reached out to me and said, “Can I make a suggestion for a guest? Alex Andorra, like the country.” I guess you say that. Because he put it in quotes, he’s like, “Alex Andorra, like the country, host the Learning Bayesian Statistics Podcast. It’s my other all time favorite podcast.” So there you go.

Alex Andorra: 00:03:35

Oh my god. Doug, I’m blushing.

Jon Krohn: 00:03:38

He says, “He’d be a fab guest for your show and not least because he moans from time to time about not getting invited onto other podcasts.”

Alex Andorra: 00:03:49

Did I? Oh my God. I don’t remember. But maybe that was part of a secret plan, Doug. Maybe a secret marketing LBS plan. Well, that worked perfectly.

Jon Krohn: 00:04:02

When I read that, I immediately reached out to you to see if you’d want to be on the show. I thought that was so funny. And he does say, he says, “Seriously though, he’d make a fab guess for his wealth of knowledge on data science and on Bayesian statistics.” And so yes, we will be digging deep into Bayesian statistics with you today. You’re the co-founder and principal data scientist of the popular Bayesian statistical modeling platform PyMC Labs as we already talked about, with your co-founder Thomas Wiecki.

00:04:27

It’s an excellent episode if people want to go back to that and get a different perspective, obviously different questions we’ve made sure. But so if you’re really interested in Bayesian statistics, that is a great one to go back to. In addition to that, you obviously also have the Learning Bayesian Stats Podcast, which we just talked about, and you’re an instructor on the educational site, Intuitive Bayes. So tons of Bayesian experience. Alex, through this work, tell us about what Bayesian methods are and what makes them so powerful and versatile.

Alex Andorra: 00:05:00

Yeah. So first, thanks a lot, Doug, for the recommendation and for listening to the show. I am absolutely honored. Yeah, go and listen again to Thomas’ episode. Thomas is always a great guest, so I definitely recommend anybody to go and listen to him. Now, what about Bayes? Yeah, it’s been a long time since someone has asked me that. Because I have a Bayesian Podcast, usually it’s quite clear I’m doing that so people are afraid to ask it at some point. So instead of giving you kind of like a… because our two avenues here usually, I could give you the philosophical answer and why epistemologically Bayes stats makes more sense, but I’m not going to do that because…

Jon Krohn: 00:05:56

Oh, that sounds so interesting.

Alex Andorra: 00:05:59

Yeah, it is, it is, but we can go into that, but I think a better introduction is just a practical one, and that’s the one that most people get to know at some point, which is you’re working on something and you’re interested in uncertainty estimation and not only in the point estimates. And your data are crap and you don’t have a lot of them, and they are not reliable. What do you do? And that happens to a lot of PhD students. That happened to me when I started trying to do electoral forecasting.

00:06:39

I was at the time working at the French Central Bank doing something completely different from what I’m doing today. But I was writing a book about the US at the time, 2016 it was, and it was a pretty consequential election for the US, so I was following it really closely. And I remember it was July 2016 when I discovered 538s models. And then the nerd in me was awoken. It was like, oh my God, this is what I need to do. That’s my way of putting more science into political science, which was my background at the time.

00:07:20

And when you do electoral forecasting, polls are extremely noisy. They’re not a good representation of what people think, but they are the best ones we have. There are not a lot of them, at least in France, in the US much more. It’s limited. It’s not a reliable source of data basically. And you also have a lot of domain knowledge, which in the Bayesian realm we call prior information. And so that’s a perfect setup for Bayesian stats. So that’s basically I would say what Bayesian stats is and that’s the power of it, is that you don’t have to rely only on the data.

00:08:02

Because sure, you can let the data speak for themselves, but what if the data are unreliable? Then you need something to guard against that and Bayesian stats are a great way of doing that. And the cool thing is that it’s a method, so it’s like you can apply that to any topic you want, any field you want. And that’s what I’ve done at PyMC Labs for a few years now with all the brilliant guys who are over there. You can do that for marketing, for electoral forecasting, of course, agriculture.

00:08:41

That was quite ironic when we got some agricultural clients, because historically, agriculture is like the field of frequency statistics. That’s how Ronald Fisher developed the P-value, the famous one. So when we had that we’re like, yes, we got our revenge. And of course, it’s also used a lot in sports modeling, things like that. So yeah, that’s the practical introduction.

Jon Krohn: 00:09:07

Nice. Yeah, a little bit of interesting history there is that, so Bayesian statistics is an older approach than frequentist statistics that is so common and is the standard that is taught in college, so much so that it’s just called statistics. You do an entire undergrad in statistics and not even hear the word Bayesian, because Fisher so decidedly created this monopoly of this one kind of approach, which for me, learning frequentist statistics say I think I guess it was first year undergrad in science that I studied.