Jon Krohn: 00:00:00 This is episode number 793 with Alex Andorra, co-founder and principal data scientist at PyMC Labs. Today’s episode is brought to you by Crawlbase, the ultimate data crawling platform.
00:00:16
Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.
00:00:47
Welcome back to the Super Data Science Podcast. I’m so happy to have the tremendous Alex Andorra as our guest on our show today all about Bayesian statistics. Alex is co-founder and principal data scientist at PyMC Labs, a firm that develops PyMC, the leading Python library for Bayesian stats, and he consults with their clients to implement profit increasing Bayesian models for them. He’s also co-founder and instructor at an online learning platform called Intuitive Bayes, and that platform provides free Bayesian stat’s education. And he’s creator and host of an excellent podcast, of course, also on Bayesian stats and it’s called Learning Bayesian Statistics.
00:01:26
Today’s episode will probably appeal most to hands-on practitioners like statisticians, data scientists, and machine learning engineers, but the episode also serves as an introduction to Bayesian statistics for anyone who’d like to learn about this important, unique, and powerful field. In today’s episode, Alex details what Bayesian stats is, the situations where Bayesian stats can solve problems that no other approach can, resources for learning Bayesian stats, the key Python libraries for implementing Bayesian models yourself, and how Gaussian processes can be incorporated into a Bayesian framework in order to allow for especially advanced and flexible models. All right, you ready for this tremendous episode? Let’s go.
00:02:10
Alex, welcome to the Super Data Science Podcast. I’m delighted to have you here. Such an experienced podcaster. It’s going to be probably fun for you to get to be the guest on the show today.
Alex Andorra: 00:02:22
Yeah, thank you, Jon. First, thanks a lot for having me on. I knew about your podcast, so I was both honored and delighted when I got your email to come on the show. I know you have had very honorable guests before like Thomas Wiecki, so I will try to be on par, but I know that’s going to be hard.
Jon Krohn: 00:02:46
Yeah, Thomas, your co-founder at PyMC Labs was indeed a guest. He was on episode number 585, but that is not what brought you here. Interestingly, the connection, so you asked me before we started recording how I knew about you. And so a listener actually suggested to you as a guest. So Doug McLean, thank you for the suggestion. Doug is lead data scientist at Tesco Bank in the UK, and he reached out to me and said, “Can I make a suggestion for a guest? Alex Andorra, like the country.” I guess you say that. Because he put it in quotes, he’s like, “Alex Andorra, like the country, host the Learning Bayesian Statistics Podcast. It’s my other all time favorite podcast.” So there you go.
Alex Andorra: 00:03:35
Oh my god. Doug, I’m blushing.
Jon Krohn: 00:03:38
He says, “He’d be a fab guest for your show and not least because he moans from time to time about not getting invited onto other podcasts.”
Alex Andorra: 00:03:49
Did I? Oh my God. I don’t remember. But maybe that was part of a secret plan, Doug. Maybe a secret marketing LBS plan. Well, that worked perfectly.
Jon Krohn: 00:04:02
When I read that, I immediately reached out to you to see if you’d want to be on the show. I thought that was so funny. And he does say, he says, “Seriously though, he’d make a fab guess for his wealth of knowledge on data science and on Bayesian statistics.” And so yes, we will be digging deep into Bayesian statistics with you today. You’re the co-founder and principal data scientist of the popular Bayesian statistical modeling platform PyMC Labs as we already talked about, with your co-founder Thomas Wiecki.
00:04:27
It’s an excellent episode if people want to go back to that and get a different perspective, obviously different questions we’ve made sure. But so if you’re really interested in Bayesian statistics, that is a great one to go back to. In addition to that, you obviously also have the Learning Bayesian Stats Podcast, which we just talked about, and you’re an instructor on the educational site, Intuitive Bayes. So tons of Bayesian experience. Alex, through this work, tell us about what Bayesian methods are and what makes them so powerful and versatile.
Alex Andorra: 00:05:00
Yeah. So first, thanks a lot, Doug, for the recommendation and for listening to the show. I am absolutely honored. Yeah, go and listen again to Thomas’ episode. Thomas is always a great guest, so I definitely recommend anybody to go and listen to him. Now, what about Bayes? Yeah, it’s been a long time since someone has asked me that. Because I have a Bayesian Podcast, usually it’s quite clear I’m doing that so people are afraid to ask it at some point. So instead of giving you kind of like a… because our two avenues here usually, I could give you the philosophical answer and why epistemologically Bayes stats makes more sense, but I’m not going to do that because…
Jon Krohn: 00:05:56
Oh, that sounds so interesting.
Alex Andorra: 00:05:59
Yeah, it is, it is, but we can go into that, but I think a better introduction is just a practical one, and that’s the one that most people get to know at some point, which is you’re working on something and you’re interested in uncertainty estimation and not only in the point estimates. And your data are crap and you don’t have a lot of them, and they are not reliable. What do you do? And that happens to a lot of PhD students. That happened to me when I started trying to do electoral forecasting.
00:06:39
I was at the time working at the French Central Bank doing something completely different from what I’m doing today. But I was writing a book about the US at the time, 2016 it was, and it was a pretty consequential election for the US, so I was following it really closely. And I remember it was July 2016 when I discovered 538s models. And then the nerd in me was awoken. It was like, oh my God, this is what I need to do. That’s my way of putting more science into political science, which was my background at the time.
00:07:20
And when you do electoral forecasting, polls are extremely noisy. They’re not a good representation of what people think, but they are the best ones we have. There are not a lot of them, at least in France, in the US much more. It’s limited. It’s not a reliable source of data basically. And you also have a lot of domain knowledge, which in the Bayesian realm we call prior information. And so that’s a perfect setup for Bayesian stats. So that’s basically I would say what Bayesian stats is and that’s the power of it, is that you don’t have to rely only on the data.
00:08:02
Because sure, you can let the data speak for themselves, but what if the data are unreliable? Then you need something to guard against that and Bayesian stats are a great way of doing that. And the cool thing is that it’s a method, so it’s like you can apply that to any topic you want, any field you want. And that’s what I’ve done at PyMC Labs for a few years now with all the brilliant guys who are over there. You can do that for marketing, for electoral forecasting, of course, agriculture.
00:08:41
That was quite ironic when we got some agricultural clients, because historically, agriculture is like the field of frequency statistics. That’s how Ronald Fisher developed the P-value, the famous one. So when we had that we’re like, yes, we got our revenge. And of course, it’s also used a lot in sports modeling, things like that. So yeah, that’s the practical introduction.
Jon Krohn: 00:09:07
Nice. Yeah, a little bit of interesting history there is that, so Bayesian statistics is an older approach than frequentist statistics that is so common and is the standard that is taught in college, so much so that it’s just called statistics. You do an entire undergrad in statistics and not even hear the word Bayesian, because Fisher so decidedly created this monopoly of this one kind of approach, which for me, learning frequentist statistics say I think I guess it was first year undergrad in science that I studied.
00:09:48
And in that first year course, that idea of a P-value always seemed odd to me. This is such an arbitrary threshold of significance to have it be that this is a one in 20 chance or less that this would be observed by chance alone. And this means that therefore we should rely on it. Especially as we are in this era of large data sets and larger and larger and larger data sets, you can have no meaningful…
00:10:21
With very large data sets like we typically deal with today, you’re always going to get as significant P-value because the slightest tiny change, if you take web scale data, everything’s going to be statistically significant. Nothing won’t be. So it’s such a weird paradigm. So that was discovering Bayesian statistics and machine learning as well and seeing how those areas didn’t have P-values interested me in both of those things. Anyway, Fisher, it’s interesting. I mean, I guess with small data sets, eight, 16, that kind of scale, I guess it kind of made some sense.
00:11:04
You pointed out there, I think it’s this prior that makes Bayesian statistics so powerful, being able to incorporate prior knowledge. But simultaneously, that’s also what makes for frequentist uncomfortable. They’re like, oh, we want only the data as though the particular data that you collect and the experimental design, there are so many ways that you as the human are influencing… There’s no purity of data anyway. And so priors are a really elegant way to be able to adjust the model in order to point it in the right direction.
00:11:39
And so a really good example that I like to come to with Bayesian statistics is that you can allow some of your variables in the model to tend towards wider variants or narrower variance. So if there are some attributes of your model where you’re very confident, where you know this is a physical fact of the universe, let’s just have a really narrow variance on this and the model won’t be able to diverge much there, but that then gives a strong focal point within the model around which the other data can make more sense, the other features can make more sense.
00:12:20
And you can allow those other features to have wider variance. I don’t don’t know, this is just one example that I try to give people when they’re not sure about being able to incorporate prior knowledge into a model.
Alex Andorra: 00:12:32
Yeah, no, these are fantastic points, Jon. To build on that, of course, I’m a nerd, so I love the history of science. I love the epistemological side. A very good book on that is Bernoulli’s Fallacy by Aubrey Clayton. Definitely recommend his book. He was on my podcast episode 51, so if people want to give that a listen.
Jon Krohn: 00:13:01
Did you just pull that 51 out from memory?
Alex Andorra: 00:13:04
Yeah, yeah, I kind of know. But I have less episodes than you, so it’s like each episode is kind of my baby, so I’m like, oh yeah, 51 is Aubrey Clayton.
Jon Krohn: 00:13:13
Oh my goodness, that’s crazy.
Alex Andorra: 00:13:17
That’s also how my brain works, numbers. And actually episode 50 was with Sir David Spiegelhalter, I think the only knight we got on the podcast. David Spiegelhalter, exceptional guest, very, very good pedagogically, definitely recommend listening to that episode too, which is very epistemologically heavy. So for people who like that, the history of science, how we got there. Because as you were saying, Bayes is actually older than stats, but people discovered it later.
00:13:55
So it’s not because it’s older, that’s better, but it is way older actually by a few centuries. So yeah, fun stories here. I could talk about that still. But to get back to what you were saying, also as you were very eloquently saying, data can definitely be biased, because that idea of like, oh no, we only want the data to speak for themselves. But as I was saying, yeah, what if the data are unreliable? But as you were saying, what if the data are biased? And that happens all the time.
00:14:28
And worse, I would say these biases are most of the time implicit in the sense that either they are hidden or most of the time they just like you don’t even know you are biased in some direction most of the time because it’s a result of your education and your environment. So the good thing of priors is that it forces your assumptions, your hidden assumptions, to be explicit.
00:14:56
And that I think is very interesting also, especially when you work on models which are supposed to have a causal explanation and which are not physical models, but more social models or political scientific models. Well then, it’s really interesting to see how two people can have different conclusions based on the same data. It’s because they have different priors. And if you force them to explicit these priors in their models, they would definitely have different priors. And then you can have a more interesting discussion actually, I think.
00:15:28
There’s that. And then I think the last point that’s interesting also in that why would you be interested in this framework is that also causes are not in the data. Causes are outside of the data. The causal relation between X and Y, you’re not going to see it in the data. Because if you do a regression of education on income, you’re going to see an effect of education on income. But you as a human, you know that if you’re looking at one person, the effect has to be education has an impact on income.
00:16:09
But the computer could like might as well just do the other regression and regress income and education and tell you, oh, income causes education. But no, it’s not going that way. So the statistical relationship goes both ways, but the causal one only goes one direction, and that’s a hidden reference to my favorite music band. But yeah, it only goes one direction and it’s not in the data. And you have to have a model for that, and a model is just a simplification of reality.
00:16:44
We try to get the simple enough model that’s usually not simple, but that’s a simplification. And if you say it’s a construction and simplification, that’s already a prior in a way. So you might as well just go all the way and explicit all your priors.
Jon Krohn: 00:17:01
Hello, Super Data Science Podcast listeners. Today’s episode is brought to you by Crawlbase, the premier data crawling and scraping platform designed for anyone needing reliable data. Forget about hardware, infrastructure, proxies, set up, blocks, and CAPTCHAs. With Crawlbase, they handle all of that for you. Simply call the Crawlbase API and gather website data. They literally support millions of different websites. It’s super easy. Try out Crawlbase today. As a offer to Super Data Science Podcast listeners, use our exclusive code “SUPERDATASCIENCE” with no spaces, to unlock 10,000 free requests, a value of $42. You’ll also find the code in the podcast description. Head over to Crawlbase.com and start crawling in minutes.
00:17:48
Well said. Very interesting discussion there. You used a term a number of times already in today’s podcast, which maybe is not known to all of our listeners. What is epistemology? What does that mean?
Alex Andorra: 00:18:00
Oh, right. Yeah, very good question. Yeah, so epistemology, in a sense, the science of science, it’s understanding how we know what we say we know. So for instance, how do we know the earth is round? How do we know about relativity? Things like that. So it’s the scientific discipline that’s actually very close to also philosophy. That’s I think actually a branch of philosophy, and that’s trying to come up with methods to understand how we can come up with new scientific knowledge.
00:18:42
And by scientific here we usually mean reliable and reproducible, but also falsifiable. Because for a hypothesis to be scientific, it has to be falsifiable. Basically that’s that. Lots of extremely interesting things here. But yeah, that’s basically how do we know what we know, and that’s the whole trying to define the scientific method and things like that.
Jon Krohn: 00:19:16
Going off on a little bit of a tangent here, but it’s interesting to me how I think among non-scientists, laypeople in the public, science is often seen to be infallible as though science is real. Science is the truth. Since that 2016 election, people have lawn signs in the US that basically have a list of liberal values, most of which I’m a huge fan of. And of course, I like the sentiment, this idea that they’re supporting science on the sign as well. But it says, the way that they phrase it is, science is real.
00:20:05
And the implication there for me every time I see the sign is that, and I think that could be, for example, related to vaccines, I think. There was a lot of conflict around vaccines and what their real purpose is. And then so the lay liberal person is like, this is science, trust science, it’s real. Whereas from the inside, you pointed it out already there, but it’s this interesting irony that the whole point of science is that we’re saying I’m never confident of anything. I’m always open to this being wrong.
Alex Andorra: 00:20:44
Yeah, no, exactly. And I think that’s the distinction that’s often made in epistemology actually between science on one hand and research on the other end, where research is science in the making. So science is the collective knowledge that we’ve accumulated since basically the beginning of modern science, at least in the western hemisphere. So more or less during the Renaissance. And then research is, well, people making that science because people have to do that. And how do we come up with that?
00:21:22
So yeah, definitely I’m one who always emphasizes the fact that, yeah, now we know the earth is round. We know how to fly planes. But there was a moment we didn’t. And so how did we come up with that? And actually maybe one day we’ll discover that we were doing it kind of the wrong way, flying planes, but it’s just like for now it works. We have the best model that we can have right now with our knowledge, but maybe one day we’ll discover that there is a way better way to fly.
00:22:01
And it was just there staring at us and it took years for us to understand how to do that. But yeah, as you were saying, in the… But that’s really hard line to walk because you have to say, yeah, this knowledge, these facts are really trustworthy, but you can never trust something 100%. Because otherwise, mathematically, if you go back to Bayes formula, you actually cannot update your knowledge.
00:22:35
If you had a 0% prior or 1% prior, like mathematically, you cannot apply Bayes formula, which tells you, well, based on new data that you just observe, the most rational way of updating your belief is to believe that with that certainty. But if you have zero or 100%, it’s never going to be updated. So you can say 99.9999% that what we’re doing right now by flying is really good, but maybe, you never know, there is something that will appear. And physics is a really…
Jon Krohn: 00:23:13
We’ve all seen UFOs, Alex. We know that there’s better ways to fly.
Alex Andorra: 00:23:16
Yeah, exactly. But yeah, I think physics is actually a really good field for that, because it’s always evolving and it’s always coming up with really completely crazy, paradigm shifting explanation. Like relativity, special relativity, then general relativity, just a century ago, that didn’t exist. And now we start to understand a bit better. But even now we don’t really understand how to blend relativity and gravity. That’s extremely interesting to me. But yeah, I understand that politically from a marketing standpoint, it’s hard to sell but I think it’s shooting yourself in the foot if you’re saying, “Oh, yeah, science is always like…” Science works, I agree.
00:24:12
Science works, but it doesn’t have to be 100% true and sure for it to work, that’s why placebo works, placebos work, right? It’s just something that works, even though it doesn’t have any actual concrete evidence that it’s adding something, but it works. So yeah, I think it’s really shooting yourself in the foot by saying that, “No, that’s 100%.” And if you question science, then you are anti-science. No, actually it’s the whole scientific methods to be able to ask questions all the time. The question is, how do you do that? Do you apply the scientific method to your questions or do you just question anything like that without any method and just because you fancy questioning that because it goes against your belief to begin with?
00:25:02
So yeah, that’s one thing. And then I think another thing that you said I think is very interesting is unfortunately, I think the way of teaching science and communicating around it is not very incarnated. It’s quite dry. You just learn equations and you just learn that stuff. Whereas science was made by people and is made by people who have their biases, who have extremely violent conflicts. Like you were saying, Fisher was just a huge jerk to everybody around him.
00:25:36
And I think it would be interesting to get back to a bit of that human side to make science less dry and also less intimidating things to that, because most of the time when I tell people what I do for a living, they get super intimidated and they’re like, “Oh, my God. Yeah, I hate math, I hate stats and so.” But it’s just like this is just numbers, it’s just a language. So it’s a bit dry. For instance, if there is someone who is into movies who does movies in your audience, I want to know why there is no movie about Albert Einstein. There has to be a movie about Albert Einstein, not only huge genius, but extremely interesting life. Honestly, it makes for great movie. He was working-
Jon Krohn: 00:26:27
Like a dramatized biopic, you mean?
Alex Andorra: 00:26:29
Yeah, yeah, yeah. I mean, his life is super interesting.
Jon Krohn: 00:26:29
That is crazy.
Alex Andorra: 00:26:30
He revolutionized two fields of physics and actually chemistry in one year, 1905, it’s like his big year. And he came up with the ideas for relativity while working at the Patent Bureau in Bern, in Switzerland, which was an extremely boring job. In his words, it was an extremely boring job. And basically having that boring job allowed him to do that, being completely outside of the academic circles and so on. So it’s like it makes for a perfect movie. I don’t understand why it’s not there. And then icing on the cake, he had a lot of women in his life. So it’s like, it’s perfect. You have the sex, you have the drama, you have revolutionizing the field, you have Nobel Prizes. And then he became a pop icon. I don’t know where are the movies.
Jon Krohn: 00:27:32
Yeah, it is wild actually, now that you pointed out. It’s kind of surprising that there aren’t movies about him all the time like Spiderman.
Alex Andorra: 00:27:41
Yeah, I agree. Well, there was one about Oppenheimer last year maybe that started a trend. We’ll see.
Jon Krohn: 00:27:48
Yeah, yeah, yeah. So in addition to the podcast, you also, I mentioned this at the outset, I said that you’re co-founder and principal data scientist of the popular Bayesian stats modeling platform, PyMC. So many things in data science, it’s uppercase P, lowercase y for Python. What’s the MC? PyMC, one word. M and the C are capitalized.
Alex Andorra: 00:28:13
So it’s very confusing because it stands for Python, and then MC is Monte Carlo. So I understand, but why Monte Carlo? It’s because it comes from Markov chain Monte Carlo. So actually it should be PyMCMC or PyMC squared, which is what I’m saying since the beginning, but anyways. Yeah, it’s actually PyMC squared. So for Markov chain Monte Carlo, and Markov chain Monte Carlo is one of the main ways that all other algorithms now, new ones, but the blockbuster algorithm to run Bayesian models is to use MCMC.
Jon Krohn: 00:28:57
So in the same way that stochastic gradient descent is like the de facto standard for finding your model weights in machine learning, Markov chain Monte Carlo is kind of the standard way of doing it with a Bayesian network.
Alex Andorra: 00:29:12
Yeah, yeah, yeah. And so now there are newer versions, more efficient versions. That’s basically the name of the game, making the algorithm more and more efficient. But the first algorithm dates back, I think it was actually invented during the Project Manhattan, so during World War II.
Jon Krohn: 00:29:32
Theme of the day.
Alex Andorra: 00:29:35
Yeah. And lots of physicists actually… Statistical physics is a field that’s contributed a lot to MCMC. And so yeah, physicists who came to the field of statistics and trying to make the algorithms more efficient for their models. And so they have contributed a lot. The field of physics has contributed a lot of big names and people to great leaps into the realm of more efficient algorithms. And so, I don’t know who your audience is, but that may sound boring. Yeah, the algorithm, it’s like the workhorse, but it’s extremely powerful. And that’s also one of the main reasons why Bayesian statistics are increasing in popularity lately, because I’m going to argue that it’s always been the best framework to do statistics that… to do science. But it was hard to do with pen and paper because the problem is that you have a huge, nasty, integral on the numerator, on the denominator, sorry.
00:30:45
And this integral is not computable by pen and paper. So for a long, long time, Bayesian statistics combined two features like campaigns, PR campaigns. Bayesian statistics was relegated to the margins because it was just super hard to do. And so for other problems, other than very trivial ones, it was not very applicable. But now with the advent of personal computing, you have these incredible algorithms. So now most of the time it’s HMC, Hamiltonian Monte Carlo. That’s what we use under the hood with PyMC. But if you use Stan, if you use NumPyro, it’s the same. And thanks to these algorithms, now we can make extremely powerful models because we can approximate the posterior distributions thanks to well, computing power. A computer is very good at computing. I think that’s why it’s called that.
Jon Krohn: 00:31:52
Since April, I’ve been offering my Machine Learning Foundation’s curriculum live online via a series of 14 training sessions within the O’Reilly platform. My curriculum provides all the foundational knowledge you need to understand modern ML applications, including deep learning LLMs and AI in general. The linear algebra and calculus classes are in the rearview mirror, but probability, statistics, and computer science classes are all still to come. Registration for both of the probability classes is open. Now we’ve got links in the show notes to those, and these will cover all of the essential probability theory you need for statistics applications, as well as machine learning. Intro to probability theory will be on July 23rd. Probability level two will be on August 7th. If you don’t already have access to O’Reilly, you can get a free 30-day trial via our special code also in the show notes.
00:32:39
Yes. And so that reminds me of deep learning. It’s a similar kind of thing where the applications we have today, like your ChatGPT or whatever your favorite large language model is, these amazing video generation like Sora, all of this is happening thanks to deep learning, which is an approach we’ve had since the ’50s. Certainly not as old as Bayesian statistics, but similarly, it has been able to take off with much larger data sets and much more compute.
Alex Andorra: 00:33:15
Yeah, yeah, yeah, yeah, very good point. And I think that’s even more the point in deep learning for sure, because Bayesian stats doesn’t need the scale, but the way we’re doing deep learning for now, definitely need the scale, yeah.
Jon Krohn: 00:33:25
Yeah, yeah. Scale of data.
Alex Andorra: 00:33:27
Yeah, yeah, exactly. Yeah, sorry. Yeah, the scale, because are two scales, data and computing.yeah, yeah, you’re right.
Jon Krohn: 00:33:33
And for model parameters. And so that has actually, I mean, tying back to something you said near the beginning of this episode is that actually one of the advantages of Bayesian statistics is that you can do it with very few data, maybe fewer data than with a frequentist approach or machine learning approach, because you can bake in your prior assumptions and those prior assumptions give some kind of structure, some kind of framework for your data to make an impact through.
Alex Andorra: 00:33:58
Yeah, yeah, completely.
Jon Krohn: 00:34:01
So for our listeners who are listening right now, if they are keen to try out Bayesian statistics for the first time, why should they reach for PyMC, which as far as I know, is the most used Bayesian framework period, and certainly in Python. And then the second I’m sure is Stan. And so why should somebody use PyMC, and maybe even more generally, how can they get started if they haven’t done any Bayesian statistics before at all?
Alex Andorra: 00:34:35
Yeah, yeah, yeah, fantastic question. I think it’s a very good one because that can also be very intimidating. And actually that can be a paradox of choice where now we’re lucky to live in a world where we actually have a lot of probabilistic programming languages. So you’ll see that sometimes that’s called PPL. And what’s a PPL? It’s basically PyMC. It’s a software that enables you to write down Bayesian models and sample from them. So it’s just a fancy word to say that.
00:35:10
And yeah, my main advice is don’t overthink it. If you are proficient in R, then probably you want to try… I would definitely recommend trying brms first because it’s built on top of Stan, and Stan is extremely good. It’s built by extremely good modelers and institutions. Lots of them have been on my podcast, so if you’re curious, just go there and you go on the website, you look for Stan, and you’ll get a lot of them. The best one is most of the time, Andrew GelMan, absolutely amazing to have him on the show. He always explains stuff extremely clearly. But I also had Bob Carpenter, for instance, Matt Hoffman. So anyways, if you know R, I would… Yeah.
Jon Krohn: 00:36:04
Have you ever had Rob Trangucci on the show, or do you know who he is?
Alex Andorra: 00:36:08
I know, but I have never had him on the show, but I’d be happy to. If you know him [inaudible 00:36:13].
Jon Krohn: 00:36:13
Yeah, I’ll make an introduction for you. He was on our show in episode number 507, and that was our first ever Bayesian episode, and it was the most popular episode of that year, 2021, the most popular episode. And it was interesting because also up until that time, at least with me hosting, 2021 was my first year hosting the show, and it was by far our longest episode. And that was kind of concerning for me. I was like, “This was a super technical episode, super long.” I was like, “How is this going to resonate?” Turns out that’s what our audience loves, and that’s something we’ve been leaning into a bit in 2024 is more technical, longer.
Alex Andorra: 00:36:58
Well, that’s good to know. Yeah.
Jon Krohn: 00:37:01
Yeah, yeah. I’ll make an intro for Rob. Anyway, you were saying? I completely interrupted you.
Alex Andorra: 00:37:04
Yeah, no, but great interruption for sure. I’m happy to have that introduction, mate. Mate, thanks a lot. Yeah, so I was saying if you’re proficient in R, definitely give a try to brms. It’s built on top of Stan. Then when you outgrow brms, go to Stan. If you love Stan, but you’re using Python, there is PyStan. I’ve never used that personally, but I’m pretty sure it’s good.
00:37:37
But I would say if you’re a proficient in Python and don’t really want to go to R, then yeah, you probably want to give a try to PyMC or to NumPyro. Give that a try. See what resonate most with you, the API most of the time, because if you’re going to make models like that, you’re going to spend a lot of time on your code and on your models. And as most of your audience probably know, the models always fail unless it’s the last one. So honestly, yeah, you have to love really the framework you’re using and find it intuitive. Otherwise, it’s going to be hard to keep it going.
00:38:21
If you’re really, really a beginner, I would also recommend on the Python realm, give a try to Bambi, which is the equivalent of brms, but in Python. So Bambi is built on top of PyMC, and what it does, it does a lot of the choices for you. It makes a lot of the choices for you under the hood. So priors, stuff like that, which can be a bit of overwhelming to beginners at the beginning. But then when you outgrow Bambi, you want to make more complicated models, then go to PyMC.
Jon Krohn: 00:38:56
Bambi, that’s a really cute name for a model that’s just like, it just drops out of its mother and can barely stand up straight.
Alex Andorra: 00:39:05
Yeah., yeah. And the guys working on Bambi, so Tommy Capretto, Osvaldo Martino. So they’re really great guys, both Argentinians actually. And yeah, they’re fun guys. I think the website for Bambi is bambinos.github.com, so yeah, these guys are fun. But yeah, it’s definitely a great framework. And actually this week we released with Tommy Capretto and Ravin Kumar, we actually released an online course, our second online course that we’ve been working on for two years. So we are very happy to have released it, but we’re also very happy with the course. That’s why it took so long. It’s a very big course, and that’s exactly what we do. We take you from beginner, we teach you Bambi, we teach you PyMC, and you go up until advanced. That’s called Advanced Regression. So we teach you all things regression, and at the time-
Jon Krohn: 00:40:11
What’s the course called?
Alex Andorra: 00:40:12
Advanced Regression.
Jon Krohn: 00:40:13
Advanced Regression.
Alex Andorra: 00:40:16
Yeah, Advanced Regression on the intuitive base platform that you were kind enough to mention at the beginning.
Jon Krohn: 00:40:22
Nice. Yeah, I’ll be sure to include that in the show notes. And so even though it’s called Advanced Regression, you start us off with an introduction to Bayesian statistics, and we start getting our feet with Bambi before moving on to PyMC. Yeah?
Alex Andorra: 00:40:37
Yeah, yeah, yeah. So you have a regression refresher at the beginning. If you’re a complete, complete beginner, then I would recommend taking our intro course first, which is really… here is really from the ground up. The Advanced Regression course, well, ideally you would do that after the intro course, but if you’re already there in your learning curve, then you can start with the intro course. It makes a bit more assumption on the student’s part. Yeah, they have heard about Bayesian stats, they’re aware of the ideas of priors likelihood, posteriors, but we give you a refresher about the classic regression, so it’s like when you have a normal likelihood.
00:41:19
And then we teach you how to generalize that framework to data that’s not normally distributed. And we start with Bambi. We show you how to do the equivalent models in PyMC, and then at the end when the model become much more complicated, then we just show it in PyMC.
Jon Krohn: 00:41:37
Nice. That is super, super cool. I hope to be able to find time to dig into that myself soon. It’s one of those things…
Alex Andorra: 00:41:46
Oh, yeah.
Jon Krohn: 00:41:46
You and I were lamenting this before the show, podcasting in of itself can take up so much time on top of, in both of our cases, we have full-time jobs. This is something that we’re doing as a hobby, and it means that I’m constantly talking to amazingly interesting people like you who have developed fascinating courses that I want to be able to study. And it’s like, when am I going to do that? Book recommendations alone, I barely get to read books anymore. That was something like since basically the pandemic hit… And it’s so embarrassing for me because I identify in my mind as a book reader, and sometimes I even splurge. I’m like, “Wow, I’ve got to get these books that I absolutely must read,” and they just collect in stacks around my apartment.
Alex Andorra: 00:42:33
Yeah, yeah, yeah. I mean, that’s hard for sure. Yeah, it’s something I’ve also been trying to get under control a bit. Yeah. So a guy who does good work, I find on that is Cal Newport. You probably know.
Jon Krohn: 00:42:54
Yes, Cal Newport, of course. Yeah. I’ve been collecting his books too.
Alex Andorra: 00:43:00
Yeah, that’s the irony. Yeah, yeah. So he’s got a podcast. I don’t know about you, but me, I listen to tons of podcasts, so the audio format is really something I love, so podcasts and audiobooks. So yeah, that can be your entrance here. Maybe you can listen to more books if you don’t have time to [inaudible 00:43:18].
Jon Krohn: 00:43:18
Yeah, it’s an interesting… I don’t really have a commute, and I often, I use… When I’m traveling to the airport or something, I use that as an opportunity to do catch-up calls and that kind of thing. So it’s interesting, I almost listen to no other podcasts.
Alex Andorra: 00:43:38
Okay.
Jon Krohn: 00:43:39
The only show I listen to is Last Week in AI. I don’t know if you know that show.
Alex Andorra: 00:43:43
Yeah, yeah, yeah. Great show.
Jon Krohn: 00:43:45
I like them a lot. They put a lot of work into… Jeremie and Andrey do a lot of work to get all of the last week’s news concentrated in there. And so…
Alex Andorra: 00:43:55
It’s impressive.
Jon Krohn: 00:43:56
… it allowed me to flip from being this person where prior to finding that show… And I found it because Jeremie was a guest on my show. He was an amazing guest, by the way. I don’t know if he’d have much to say about Bayesian statistics, but he’s an incredibly brilliant person. He is so enjoyable to listen to. And someone else that I’d love to make an intro for you. He’s become a friend over the years.
Alex Andorra: 00:44:20
Yeah, for sure.
Jon Krohn: 00:44:21
Yeah. Last Week in AI they… I don’t know why I’m talking about it so much, but I went from being somebody who would kind of have this attitude when somebody would say, “Oh, have you heard about this release or that?” And I’d say, “Just because I work in AI, I can’t stay on top of every little thing that comes out.” And now, since I started listening to Last Week in AI about a year ago, I don’t think anybody’s caught me off guard with some new release. I’m like, “Yeah, I know.”
Alex Andorra: 00:44:52
Yeah, well done. Yeah, yeah, no, that’s good. Yeah, yeah. But that makes your life hard, yeah, for sure. If you don’t have a commute, come on.
Jon Krohn: 00:45:02
But I’d love to be able to completely submerge myself in Bayesian statistics. That’s a life goal of mine is to be able to completely… Because while I have done some Bayesian stuff, and in my PhD, I did some Markov chain Monte Carlo work, and there’s just obviously so much flexibility and nuance to this space. It can do such beautiful things. I’m a huge fan of Bayesian stats, and so yeah, it’s really great to have you on the show talking about it. So PyMC, which we’ve been talking about now, kind of going back to our thread. PyMC uses something called PyTensor to leverage GPU acceleration and complex graph optimizations. Tell us about PyTensor and how this impacts the performance and scalability of Bayesian models.
Alex Andorra: 00:45:57
Yeah, yeah, great question. So basically the way PyMC is built is we need a backend. And historically, this has been a complicated topic because the backend, then we had to do the computation. Otherwise, you have to do the computations in Python, and that’s slower than doing it in C, for instance. And so we have still that C backend that’s kind of a historical remnant, but more and more we’re using, and when I say we, I don’t do a lot of PyTensor code, to be honest. I mean, contributions to PyTensor, I mainly contribute to PyMC. PyTensor is spearheaded a lot by Ricardo Vieira, a great, great guy, extremely, extremely good modeler.
00:46:54
And basically the idea of PyTensor is to kind of outsource the computation basically that PyMC is doing. And then especially when you’re doing the sampling, PyTensor is going to delegate that to some other backends. And so now instead of having just the C backend, you can actually sample your PyMC models with the number backend. How do you do that? You use another package that’s called nutpie that’s been built by Adrian Seyboldt, extremely brilliant guy. Again, I’m like surrounded by guys who are much more brilliant than me. And that’s how I learn basically. I just ask them questions.
Jon Krohn: 00:47:46
That’s what I feel like in my day job at Nebula, my software company. I’m just like, “Wow.” Yeah. Anyway, sorry, I’m just completely interrupting you.
Alex Andorra: 00:47:58
Yeah, no. And so, yeah, so Adrian basically re-implemented HMC with nutpie, but using Numba and Rust. And so that goes way faster than just using Python or even just using C. And then you can also sample your models with two other back-ends that we have, that’s enabled by PyTensor, that then basically compiles the graph of the model, and then delegates these computational operations, to the sampler. And then the sampler, as I was saying, can be the one from nutpie, which is in Rust and Numba and otherwise, it can be the one from NumPyro.
00:48:45
Actually, you can call the NumPyro sampler with a PyMC model and it’s just super simple. Like in PM.sample, there’s a keyword argument that’s nuts_sampler and you just say NetPi or NumPyro. And I tend to use NumPyro a lot when I’m doing Gaussian processes, because I don’t know why, but, so most of the time using nutpie, but when I’m doing Gaussian processes somewhere in the model, I tend to use NumPyro because for some reason in their routine, in their algorithm, there is some efficiency they have in the way they compute the matrices and GPs are basically huge matrices and dot products.
00:49:25
And so yeah, usually NumPyro is very efficient for that. And you can also use JAX now to sample your models. So we have these different back-ends and it’s enabled because PyTensor is that back-end that nobody sees. Most of the time you’re not implementing a PyTensor operation in your models. Sometimes we do that at PyMC Labs when we’re working on a very custom operation, but usually it’s done under the hood for you. And then PyTensor compiles the graph, the symbolic graph, and can dispatch that afterwards to whatever the best way of computing the posterior distribution afterwards is.
Jon Krohn: 00:50:08
Nice. You alluded there to something that I’ve been meaning to get to asking you about, which is the PyMC Labs team. So you have PyMC, the open source library that anybody listening can download and of course I have in the show notes for people to download and they can get rolling on doing their Bayesian stats right now, whether it’s already something they have expertise in or not. PyMC Labs, it sounds like you are responsible and just fill us in, but I’m gathering that the team there is responsible both for developing PyMC, but also for consulting because you mentioned there sometimes we might do some kind of custom implementation. So first of all, yeah, tell us a little bit about PyMC Labs and then it’d be really interesting to hear one or more interesting examples of how Bayesian statistics allows some client or some use case … Allows them to do something that they wouldn’t be able to do with another approach.
Alex Andorra: 00:51:09
Yeah. So yeah, first go and start PyMC on GitHub and open PRs and stuff like that. We always love that. And second, yeah, exactly. PyMC Labs is kind of an offspring of PyMC in the sense that everybody on the team is a PyMC developer, so we contribute to PyMC. This is open source, this is free and always will be, as it goes. But then on top of that, we do consulting, and what’s that about? Well, most of the time these are clients who want to do something with PyMC or even more general with Bayesian statistics, and they know we do that and they do not know how to do that. Either because they don’t have the time to train themselves or they don’t want to, or they don’t have the money to hire a Bayesian modeler full-time. Various reasons. But basically, yeah, they are stuck at some point in the modeling workflow, they are stuck.
00:52:22
It can be at the very beginning, it can be, “Well, I’ve tried a bunch of stuff, I can’t make the model converge and I don’t know why.” So it can be a very wide array of situations. Most of the time people know us because like me for the podcast or for PyMC, most of the other guys for PyMC or for the technical writing that they do around. So basically that’s like, that’s not really a real company, but just a bunch of nerds, if you would. But no, that’s a real company, but we like to define us as a bunch of nerds because that’s how it really started.
Jon Krohn: 00:53:03
And in a sense of you guys actually consulting with companies and making an impact, in that sense, it is certainly a company. So yeah, so tell us a bit about projects. I mean, you don’t need to go into detail with client names or whatever if that’s inappropriate, but it would be interesting to hear some examples of use cases of Bayesian statistics in the wild enabling capabilities that other kinds of modeling approaches wouldn’t.
Alex Andorra: 00:53:31
Yeah, yeah, yeah. No, definitely. Yeah, so of course I cannot enter into the details, but I can definitely give you some ideas. One I can actually enter into the details is a project we did for an NGO in Estonia where they were getting polling data. So every month they do a poll of Estonian citizens about various questions. These can be horse poll races, horse races polls, but this can be also news questions like do you think Estonia should ramp up the number of soldiers at the border with Russia? Do you think same-sex marriage should be legal? Things like that.
Jon Krohn: 00:54:24
Oh, I hear some Overton window coming on.
Alex Andorra: 00:54:31
Do you?
Jon Krohn: 00:54:31
That’s what I thought. I thought we might go there. This is now, I’m completely taking you off on a sidetrack, but Serg Masis, our researcher, came up with a great question for you because you had Allen Downey on your show, who is an incredible guest. I absolutely loved having him on our program. So he was on here in episode number 715, and in that episode we talked about the Overton window, which is related to what you were kind of just talking about. So people … What is … How does society think about, say same-sex marriage, where if you looked a hundred years ago or a thousand years ago or 10,000 years ago or a thousand years into the future or 10 years into the future, at each of those different time points, there’s a completely, well, maybe not completely different, but there’s a varying range of people who think what’s acceptable or what’s not acceptable. And you know, we were talking earlier in the episode about bias, so it kind of ties into this. You might have your idea as a listener to this show, you might be a scientist or an engineer and you think, “I am unbiased, I know the real thing,” but you don’t because you are a product of your times, and the Overton window is kind of a way of describing this on any given issue there is some range. It would fit a probability distribution where there’s some people on a far extreme one way and some people on a far extreme the other way, but in general, all of society is moving in one direction, typically in a liberal direction on a given social issue and this varies by region, it varies by age. Anyway, I think Overton windows are really fascinating and yeah. So completely derailed your conversation, but I have a feeling you’re going to have something interesting to say.
Alex Andorra: 00:56:33
Yeah, yeah, no, I mean that’s related to that for sure. Yeah, basically, and that’s funny because yeah, I had also Allen Downey on this show for his latest book, and that was also definitely about that.
Jon Krohn: 00:56:50
Yeah. “Probably Overthinking It” was the book.
Alex Andorra: 00:56:53
Yeah. Yeah, yeah, yeah.
Jon Krohn: 00:56:54
That’s great. Great, great, great.
Alex Andorra: 00:56:55
And yeah, great book. So basically the viz and geo they have, they had this survey data, right, and they’re like, but their clients have cautions and their clients are usually media or politicians. And it’s like, yeah, but I like to know on a geographical basis in these electoral districts, what do people think about that? Or in this electoral district, female educated of that age, what do they think about same-sex marriage? That’s how to do it, because polling at that scale is almost impossible. It costs a ton of money and also polling is harder and harder because people answer less and less to polls. And so at the same time, the polling data becomes less available and less reliable, but you have people who get more interested in what the polls have to say. It’s hard.
00:58:02
And there is a great method to do that. So what we did for them is come up with a hierarchical model of the population because hierarchical models allow you to share information between groups. So here the groups could be the age groups, for instance, and basically knowing something … what a hierarchical model says is, “Well, age groups are different, but they’re not infinitely different.” So learning about what someone aged 16 to 24 thinks about same-sex marriage actually already tells you something about what someone aged 25 to 34 thinks about that.
00:58:47
And the degree of similarity between these responses is estimated by the model. So these models are extremely powerful. I love them. I teach them a lot, and actually in the Advanced Regression Course, the last lesson is all about hierarchical models and I actually walk you through a simplified version of the model we did at PyMC Labs for that NGO called SOC in Estonia. It’s like a model that’s used in industry for real, so you learn that that’s a hard model, but that’s a real model.
00:59:24
Then once you’ve done that, you do something that’s called post stratification. And post stratification is basically a way of de-biasing your estimates, your predictions from the model, and you use Census Data to do that. So you need good data and you need Census Data. But if you have good Census Data, then you’re going to be able to basically re-weight the predictions from your model. In that way, if combine post-stratification and hierarchical model, you are going to be able to give actually good estimates of what females, educated, age 25, 34, in this electoral district think about that issue.
01:00:12
And when I say good, I want to say that it’s like the confidence intervals are not going to be ridiculous, right? It’s not going to tell you, well, this population thinks, is opposed to gay marriage with a probability of 20 to 80%. It just covers basically everything. So that’s not very actionable. No, the model has like, it’s more uncertain of course, but it has a really good way of giving you something actually actionable. So that was a big project. I can dive into some others if you want, but that takes time.
Jon Krohn: 01:00:56
No, that’s-
Alex Andorra: 01:00:57
I don’t want to derail the interview.
Jon Krohn: 01:00:57
That’s great and highly illustrative. It gives that sense of how with a Bayesian model, you can be so specific about how different parts of the data interrelate. So in this case, for example, you’re describing having different demographic groups that have some commonality. Like all women, but different age groups of women as a sub-node, as sub-nodes of women in general. So that way you’re able to use the data from each of the subgroups to influence your higher-level group.
01:01:31
And actually, something that might be interesting to you, Alex, is that my introduction to both R programming and, I guess, well to hierarchical modeling, is Gelman and Hill’s book, which yeah, obviously Andrew Gelman, you’ve already talked about on the show. Jennifer Hill, also brilliant, causal modeler and has also been on the Super Data Science podcast, and that was episode number 607. Anyway, we’re getting into, there’s lots of listening for people to do out there between your show and mine based on guests that we’ve talked about on the program. Hopefully lots of people with long commutes. So yeah, fantastic.
01:02:20
That’s a great example. Alex, another library, open source library, in addition to PyMC, that you’ve developed is ArviZ, which has nothing to do with the programming language R. So it’s A-R-V-I-Z, or Zed. ArviZ. And this is for post-modeling workflows in Bayesian stats. So tell us about why. What is post modeling workflows, what does that matter and how does ArviZ solve problems for us there?
Alex Andorra: 01:02:52
Yeah, great questions, and I’ll make sure to also, before related to your previous question, send you some links with other projects that could be interesting to people, like media mix models. I’ve interviewed Luciano Paz on the show. We’ve worked with HelloFresh, for instance, to come up with a media mix-marketing model for them. Luciano talks about that in that episode. I’ll also send you a blog post with spatial data with Gaussian processes. That’s something we’ve done for an agricultural client. I already sent you a link of a video webinar we did with that NGO, with that client in Estonia, and we got a bit deeper into the project. Oh, yeah, and I’ll send you off console the learn-based episode, because the president, Tarmo, the president of that NGO was on the show.
Jon Krohn: 01:03:59
Nice, yeah. I’ll be sure of course to include all of those links in the show notes.
Alex Andorra: 01:04:03
Yeah, yeah, because I guess people come from different backgrounds and so someone is going to be more interested in marketing, another one more in social science, another one more in spatial data, so that way people can pick and choose what they are most curious about.
01:04:19
So ArviZ, yeah, what is it? That’s basically your friend for any post-model, post-sampling graph, and why is that important? Because actually models steal the show and they are like the star of the show. But a model is just one part of what we call the Bayesian workflow. And the Bayesian workflow just has one step, which is the modeling, but all the other steps don’t have to do anything with the model per se. There is a lot of steps before sampling the model and then there is a lot of steps afterwards and I would argue that these steps afterwards are almost as important as the model.
01:05:10
Why? Because it’s what’s going to face the customer of the model. Okay? Your model is going to be consumed by people who, most of the time, don’t know about models and also often don’t care about models. So that’s a shame because I love models, but you know, lots of the time they don’t really care about the model, they care about the results. So a big part of your job as the modeler is to be able to convey that information in a way that someone who is not a stat person, a math person, can understand and use in their work. Whether that is a football coach or a data scientist or someone working in HelloFresh marketing department. You have to adapt the way you talk to those people and the way you present the results of the model. And the way you do that is with amazing graphs.
01:06:17
So a lot of your time as a modeler is spent thinking out how to decipher what the model can tell, what the model cannot tell, also very important, and with which confidence. Since we’re humans, we use our eyes a lot, and the way to convey that is with plots. And so you spend a lot of time plotting stuff as a Bayesian modeler, especially because Bayesian models don’t give you one-point estimates. They give you full distributions for all the parameters, so you get distributions all the way down. So that’s a bit more complex to wrap your head around at the beginning, but once your brain is used to that gym, that’s really cool because that gives you opportunities for amazing plots.
01:07:07
So yeah, ArviZ is here for you for that. It has a lot of the plots that we use all the time in the Bayesian workflow. One, to diagnose your model, so to understand if there is any red flag in the convergence of the model. And then once you’re sure about the quality of your results, then how do you present that to the customer of the model? And then ArviZ also has a lot of plots for you here. And the cool thing of ArviZ is that it’s platform-agnostic. What do I mean by that is that you can run your model in PyMC, in NumPyro, in Stan, and then use ArviZ because ArviZ is expecting a special format of data that all these PPLs can give you, which is called the inference data object. Once you have that, ArviZ doesn’t care where the model was run, and that’s super cool. And also it’s available in julian, so that’s a Python package, but there is the julian equivalent for people who use Julia. So yeah, it’s a very good way of starting that part of the workflow, which is extremely important.
Jon Krohn: 01:08:21
Nice. That was a great tour. And of course I will again have a link to ArviZ in the show notes for people who want to be using that for your post-modeling needs with your Bayesian models, including diagnostics, like looking for red flags and being able to visualize results and pass those off to whoever the end client is. I think it might be in the same panel discussion with the head of that NGO, Tarmo Juristo.
Alex Andorra: 01:08:46
Sí. Yes. That’s my Spanish. I’m in Argentina right now, so the Spanish is automatic.
Jon Krohn: 01:08:54
Actually, I’m relieved to know that you’re in Argentina because I was worried about that I was keeping you up way too late.
Alex Andorra: 01:08:58
No, no, no, no, no.
Jon Krohn: 01:08:59
Nice. So yeah, in that interview Tarmo talks about adding components like Gaussian processes to make models, Bayesian models, time-aware. What does that mean and what are the advantages and potential pitfalls of incorporating advanced features like time-awareness into Bayesian models?
Alex Andorra: 01:09:21
Yeah, yeah, great research, Jon. I can see that.
Jon Krohn: 01:09:26
Yeah, great research Serg Masis, really.
Alex Andorra: 01:09:27
Yeah, yeah, yeah, yeah, that’s impressive.
Jon Krohn: 01:09:31
I had a call with the people from Google Gemini today also. They’re very much near the cutting edge of developing Google Gemini alongside Claude 3 from Anthropic, and of course GPT-4, GPT-4o, whatever, from OpenAI. These are the frontier of LLMs. So I’m on a call with half a dozen people from the Google Gemini team, and they were insinuating, kind of near the end, with some of the new capabilities they have. There are some cool things in there which I need to spend more time playing around with like Gems. I don’t know if you’ve seen this, but the Gems in Google Gemini, they allow you to have context for different kinds of tasks. So for example, there are some parts of my podcast production workflow where I have different context, different needs at each of those steps, and so it’s very helpful with these Google Gemini Gems to be able to just click on that and be like, okay, now I’m in this kind of context.
01:10:35
I’m expecting the LLM to output in this particular way. And the Google Gemini people said, “Well, and maybe you’ll be able to use these Gems to kind of be replacing within the workflow of people working on your podcast. You’ll be able to use them to replace.” And I was like, for example, they gave the example of research and I was like, “I hope that our researcher, for example, is using generative AI tools to assist his work,” but I think we’re quite a ways away with all of the amazing things that LLMs can do. I think we’re still quite a ways from the kind of quality of research that Serg Masis can do for this show. We are still a ways away.
Alex Andorra: 01:11:17
Yeah, yeah. No, no, for sure. But that sounds like fun, yeah.
Jon Krohn: 01:11:22
Anyway, sorry I derailed you again, time-wise.
Alex Andorra: 01:11:26
No, no.
Jon Krohn: 01:11:27
Complexities and features.
Alex Andorra: 01:11:27
Yeah, yeah, so indeed. And then I love that question because I love GPs, so thanks a lot. And that was not at all a setup for the audience, honestly.
Jon Krohn: 01:11:38
Gaussian processes, GPs. Yeah.
Alex Andorra: 01:11:41
Yeah. I love Gaussian processes. And actually I just sent you also a blog post we have on the PyMC Labs website by Luciano Paz about how to use Gaussian processes with spatial data. So why am I telling you that? Because Gaussian processes are awesome because they are extremely versatile. It’s what’s called a non-parametric that allows you to do non-parametric models. What does that mean? It means that instead of having, for instance, a linear regression where you have a functional form that you’re telling the model, I expect the relationship between X and Y to be of a linear form, Y = A + B * X. Now, what the Gaussian process is saying is, I don’t know the functional form between X and Y. I want you to discover it for me. So that’s one level up, if you want, in the abstraction. And so that’s saying Y = F of X, find which X it is.
01:12:57
You don’t want to do that all the time because that’s very hard, and actually, you need to use quite a lot of domain knowledge on some of the parameters of the GPs. But I want to turn to the details here, but I’ll give you some links for the show notes. But something that’s very interesting to apply GPs on is, well, spatial data, as I just mentioned, because you don’t really know in a plot, for instance of, so not a plot, a graph, but a field plot. There are some interactions between where you are in the plot and the crops that you’re going to plant on there, but you don’t really know what those interactions are. It interacts with the weather, also, with a lot of things, and you don’t really know what the functional form of that is. And so that’s where a GP here is going to be extremely interesting, because it’s going to allow you to see in 2-D, try and find out what this correlation between X and Y are, and take that into account in your model.
01:14:10
That’s very abstract, but I’m going to link afterwards to a tutorial we actually just released today in PyMC, a tutorial that I’ve been working on with Bill Engels, who also is a GP expert. And I’ve been working with him on this tutorial for a new approximation of GPs, and I’ll get back to that in a few minutes. But first, why GPs in time? So you can apply GPs on spatial data, on space, but then you can also apply GPs on time. Time is 1-D most of the time, one-dimensional. Space is usually 2-D, and you can actually do GPs in 3-D. You can do spatial temporal GPs. That’s it. That’s even more complicated. But 1-D GPs, that’s really awesome because then, most of the time when you have a time dependency, it’s non-linear. For instance, that could be the way the performance of a baseball player evolves within the season. You can definitely see the performance of a baseball player fluctuate with time during the season. And that would be non-linear, very probably.
01:15:34
And the thing is you don’t know what the form of that function is, and so that’s what the GP is here for. It’s going to come and try to discover what the functional form is for you. And that’s why I found GPs extremely… They’re really magical mathematical beasts first, they’re really beautiful mathematically, and a lot of things are actually a special case of GP’s, like neural networks. Neural networks are actually Gaussian processes, but a special case of Gaussian processes. Gaussian Random Walk. They are a special case of Gaussian processes. So they are a very beautiful mathematical object, but also very practical. Now, as Uncle Ben said, with great power comes great responsibility. And GP’s are hard to yield. It’s a powerful weapon, but it’s hard to yield. It’s like Excalibur. You have to be worthy to yield it. And so it takes training and time to use them, but it’s worth it.
01:16:43
And so we use that with Tarmo Jüristo from that Estonian NGO, but I use that almost all the time. Right now, I’m working more and more on spots data, and yeah, I am actually working on some football data right now and well, you want to take into account these wheezing season effects from players. I don’t know what the linear form is. And right now, the first model I did, taking the time into account was just a linear trend. So I was just saying as time passes, you expect a linear change. So the change from one to two is going to be the same one than the one from nine to 10.
01:17:27
But usually it’s not the case with time. It’s very nonlinear. And so here you, definitely want to apply GP on that. You could apply other stuff like random walks, autoregressive stuff, and so on. But I personally don’t really like those models. I find them you have to apply that structure to the model, but at the same time, they’re not that easier to use than GPs. My test won’t use a GP. And I’ll end this very answer with a third point, which is now, it’s actually easier to use GPs because there is this new decomposition of GPs that’s called helper space decomposition, so HSGP, and that’s basically a decomposition of GPs that’s like a DOT product, so kind of a linear regression, but that gives you a GP. And that’s amazing because GPs are known to be extremely slow to sample, because it’s a lot of matrix multiplication, as I was saying at some point. But with HSGP, it becomes way faster and way more efficient.
01:18:41
Now, you cannot always use HSGP. There are caveats and so on. But Bill and I had been working on this tutorial. It’s going to be in two parts. The first part was out today, and I’m going to send you the links for the show notes here in the chat we have. That’s up on the PyMC website if you go, that’s called HSGP First Steps and Reference, and we go through why you would use HSGP, how you would use it in PyMC, and the basic use cases. And the second part is going to be the more advanced use cases. Bill and I have started working on that, but it’s always taking time to develop good content on that front. But yeah, we’re getting there, and it’s open source, so we’re doing that on our free time, unpaid. So that always takes a bit more time, but we’ll get there.
01:19:44
And finally, another resource that I think your listeners are going to appreciate is I’m doing a webinar series on HSGP where we have a modeler who comes on the show and shares our screen and does live coding. And so the first part is out already. I’m going to send you that for the show notes. I had Juan Orduz on the show, and he went into the first part of how to do HSGPs, and what are even HSGPs from a mathematical point of view, because Juan is a mathematician. I’ll end my very long, passionate rant about GPs here. But long story short, GPs are amazing, and that’s a good investment of your time to be skillful with GPs.
Jon Krohn: 01:20:46
Fantastic. Another area that I would love to be able to dig deep into, and so our lucky listeners out there who have the time will now be able to dig into that resource, and many of the others that you have suggested in this episode, which we’ve all got for you in the show notes. Thank you so much. Alex, this has been an amazing episode. Before I let my guests go, I always ask for a book recommendation, and you’ve had some already for us in this episode, but I wonder if there’s anything else. The recommendation you already had was Bernoulli, something about Bernoulli?
Alex Andorra: 01:21:20
Bernoulli’s Fallacy, The Crisis of Modern Science and the Logic of Science, I think. I’ll send you that actually, the episode with Aubrey and David Spiegelhalter, because these are really good, especially for less technical people, but who are curious about science and how that works. I think it’s a very good entry point. This book is amazing. Oh, my God, this is an extremely hard question. I have so many books and I read so many books that I’m taken aback. I would say books, which I find extremely good and have influenced me, because a book is also… It’s not only the book, it’s also the moment when you read the book.
01:22:13
If you like a book and you come back to it later, you’ll have a different experience because you’re a different person and you have different skills. So I’m going to cheat and give you several recommendations, because I have too many of them. And so technical books, I would say The Logic of Science by E.T. Jaynes. E.T. Jaynes is an old mathematician scientist, but it’s in the Bayesian world, E.T. Jaynes is like a rock star. I definitely recommend. It’s his masterpiece, The Logic of Science. That’s a technical book, but that’s actually a very readable book, and that goes like that’s also a very epistemological one. So that one is awesome.
01:23:05
Much more applied, if you want to learn Bayesian stats, a great book to do that, Statistical Thinking by Richard Michael Rath, really great book. I’ve read it several times. Any book by Andrew Gelman, as you were saying, definitely recommend them. They tend to be a bit more advanced. If you want a really beginner’s one, his last one, actually, Active Statistics, is a really good one. I just had him on the show, episode 106, Jon, for people who like numbers like that. And I remember that when I was studying political science, actually Barack Obama’s book from before he was president. I don’t remember the name. I think it’s The Audacity of Hope, but I’m not sure. But his first book before he became president, that was actually a very interesting one.
Jon Krohn: 01:24:04
Dreams From My Father.
Alex Andorra: 01:24:05
Yes. Yeah, yeah, this one Dreams From My Father. Yeah, yeah, yeah. Very interesting one. The other ones were a bit more political, I found a bit less interesting. But this one was really interesting to me. And another one for people who are very nerdy. I’m a very nerdy person. I love going to the gym, for instance, and I do my own training plan, my own nutritional plan. I’ve dug into that research. I love that, because I love sports, also. So another very good book I definitely recommend to develop good habits is Katy Milkman’s, How to Change: The Science of Getting From Where You Are to Where You Want to Go. Extremely good book full of very practical tips. That’s an extremely good one.
01:24:56
And then a last one that I read recently… Oh, no, actually two last ones. One last, two. Penultimate, I think you say. How Minds Change by David McRaney, for people who are interested in how beliefs are formed. Extremely interesting. And he’s a journalist. He’s got a fantastic podcast that’s called You Are Not So Smart, and I definitely recommend that one. And yeah, that’s how people change their mind basically, because I’m very interested in that. In the end, it’s a trove of wisdom, this book.
01:25:38
Last one, promise. I also am extremely passionate about stoicism, stoic philosophy, and that’s a philosophy I find extremely helpful to live my life and navigate the difficulties that we all have in life. And an iconic book in this is Meditations from Marcus Aurelius, reading the thoughts of a Roman Emperor, one of the best Roman Emperors there was. And it’s really fascinating, because he didn’t write that to be published, and it was his journal, basically. And that’s absolutely fascinating to read that, and to see that they kind of had the same issues we still have. So that’s a fantastic book. I read it very often.
Jon Krohn: 01:26:34
It’s mind-blowing. I haven’t actually read Meditations, but I read Ryan Holiday’s The Daily Stoic.
Alex Andorra: 01:26:39
Oh, yeah. Yeah, that’s really good.
Jon Krohn: 01:26:43
It’s 366 daily meditations on wisdom, perseverance, and the art of living, based on stoic philosophy. And so there is a lot from Marcus Aurelius in there. He’s probably the plurality of content. And wow, it is mind-blowing to me how somebody two millennia ago is the same as me. I mean, that’s holding myself in. I’m not a Roman Emperor, and the things I write will not be studied 2000 years from now. But nevertheless, the connection you feel with this individual from 2000 years ago and the problems that he’s facing, and how similar they are to the problems that I face every day, it’s staggering.
Alex Andorra: 01:27:32
Yeah, yeah, yeah. No, that’s incredible. Something that really talked to me was, well, that I remember is at some point, he’s saying to himself that it’s no use to go to the countryside to escape everything because the real retreat is in yourself. It’s like if you’re not able to be calm and find equanimity in your daily life, it’s not because you’re going to get away from the city, and Rome was like the megalopolis at the time. It’s not because you get away from the city that you’re going to find tranquility over there.
01:28:14
You have to find tranquility inside, and then yeah, you’ll go to the countryside and it’s going to be even more awesome, but it’s not because you go there that you find tranquility. And that was super interesting to me because I was like, wait, because I definitely feel that when I’m in a big, big metropolis. At some point, I want to get away. But I was like, wait, they were living that already at that time where they didn’t have internet, they didn’t have cars and so on, but for them, it was already something that was too much, too many people, too much noise. I found that super interesting.
Jon Krohn: 01:28:47
For sure. Wild. Well, this has been an amazing episode, Alex. I really am glad that Doug suggested you for the show, because this has been fantastic. I’ve really enjoyed every minute of this.
Alex Andorra: 01:28:59
Thanks.
Jon Krohn: 01:28:59
I wish it could go on forever, but sadly, all good things must come to an end. And so before I let you go, very last thing is do you have other places that we should be following you? We’re going to have a library of links in the show notes for this episode. And of course we know about your podcast, Learn Bayesian Statistics. We’ve got the Intuitive Bayes educational platform, and open source libraries like PyMC and ArviZ. In addition to those, is there any other social media platform, or other way that people should be following you, or getting in touch with you after the program?
Alex Andorra: 01:29:38
Well, yeah, thanks for mentioning that. So yeah, Intuitive Bayes, Learning Bayesian Statistics, PyMC Labs, you mentioned them, and always available on Twitter: alex_andorra, like the country, and that’s where it comes from. And because it has two Rs and not only one, and so when I say it in another language than Spanish, then people write it with just one R. And otherwise, LinkedIn, also. I’m over there, so you can always reach out to me over there, LinkedIn or Twitter. And also, yeah, send me podcast suggestions, stuff like that. I’m always on the lookout for something cool. And again, thanks a lot, Jon, for having me on. Thanks a lot, Doug, for the recommendation. Yeah, that was a blast. I enjoyed it a lot, so thank you so much.
Jon Krohn: 01:30:43
Absolutely loved today’s conversation with Alex. I hope you did, too. In it, Alex filled us in on how Bayesian Stats allows us to incorporate prior knowledge into our models of the world, allowing for more advanced models than other approaches, and potentially allowing us to do more with fewer data. He also talked about how PyMC, PyStan, and NumPyro are the leading examples of probabilistic programming languages, or PPLs, for Python that Bayesian models can be sampled from. He talked about how Bambi allows for a novice to get up and running with Bayesian models in Python immediately, how PyMC is implemented to compute efficiently, thanks to PyTensor, how ArviZ allows for post-Bayesian modeling diagnostics and visualizations, and how Gaussian processes or GPs allow for nonlinearities over one or more dimensions to be modeled, enabling exceptionally advanced modeling of the real world. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Alex’s social media profiles as well as my own, at www.superdatascience.com/793.
01:31:46
And if you’d like to engage with me in person as opposed to just through social media, I’d love to meet you at Collision in Toronto this week. On Thursday, I’ll be hosting an afternoon of sessions on the content creators stage. Beyond the sessions I host, other amazing speakers you can check out include the godfather of AI himself, Jeff Hinton. Yes. Aravind Srinivas, who’s CEO of Perplexity. Aiden Gomez, who’s CEO of Cohere, and tennis legend, Maria Sharapova. Thanks to my colleagues at Nebula for supporting me while I create content like this Super Data Science episode for you, and thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the Super Data Science team for producing another tremendous episode for us today. For enabling that super team to create this free podcast for you, we’re so very grateful to our sponsors. Please consider supporting this show by checking out our sponsors’ links, which are in the show notes.
01:32:39
And if you yourself are interested in sponsoring an episode, you can get the details on how to do that by making your way to jonkrohn.com/podcast. Otherwise, share this episode with people who you think will love it. Review this episode on your favorite podcasting platform or YouTube. Subscribe if you aren’t already a subscriber. But most importantly, just keep on listening. I’m so grateful to have you listening, and hope I can continue to make episodes you’d love for years and years to come. Until next time, keep on rocking it out there, and I’m looking forward to enjoying another round of the Super Data Science Podcast with you very soon.