Jon Krohn: 00:05
This is Five Minute Friday on model error analysis.
00:19 Like the preceding three weeks for Five-Minute Friday, today I’m having a short five-minute-ish conversation with a preeminent data science speaker that I met in person at ODSC West in San Francisco. For this final episode filmed live at the conference, our guest is Serg Masís, who discusses the importance and value of model error analysis.
00:40
Serg Masís, welcome to this special Five-Minute Friday episode that we’re filming live at Open Data Science West in San Francisco, 2022. ODSC West, it’s been an amazing conference and it’s been great to catch up with you here after a few years we hadn’t seen each other in person. For those of you listeners who aren’t aware, Serg Masís is a brilliant researcher behind the Super Data Science Podcast. When we have guests come on, Serg digs deep into their academic work, into talks that they’ve given, into podcast episodes that they’ve done, into books that they’ve written. And so all of the best questions that I ask on the Super Data Science Podcast are either written verbatim by Serg or paraphrased by me. And so I’m hugely grateful to have Serg on the show, and now you get to have this Five-Minute Friday with him. We did do a full guest episode with Serg not too long ago, episode number 539. It’s on explainable ai and it is absolutely fascinating. One of the most popular episodes of 2022, in fact.
01:47
Serg, other than his little bit of work that he does for us on the podcast as a researcher at his day job, is as an agronomic data scientist at Syngenta, which is a giant agricultural firm, and he works on climate models there. In addition to that, he’s already published a book, Unexplainable AI, and he’s working on a new book that will be published soon by Pearson. It’s called DIY AI. Do It Yourself Artificial Intelligence. It looks like it’s going to be a really fun book for hands-on applications of AI across a broad range of application areas. I can’t wait for that book to come out. Serg, at ODSC West, you gave a talk on model error analysis, a topic that does not get enough attention. Tell us what model error analysis is and why it’s important.
Serg Masís: 02:40
Yeah, well, model error analysis is about taking deeper than the metrics. Typically, when you’re building a model and you’re evaluating it, you’re looking at accuracy recall, like MSE MAE, one of those, right? But you’re just optimizing on those metrics and then you’re like, “Okay, well what do I do to get it better?” Okay, hyper parameter tuning, feature engineering, whatever you do. But I think it is really important to understand why those errors exist. Model errors analysis to find those errors, see where the residuals are. I explained during the talk an actual tool that exists that is the responsible AI dashboard by Microsoft. It includes a-
Jon Krohn: 03:32
Responsible AI dashboard by Microsoft.
Serg Masís: 03:34
And it includes a error analysis tool. And with this error analysis tool, it does something you could do on your own. You could always fit a decision tree on the residuals on the error and find the pockets where there’s most error, but it does this all for you. You can visually see the decision tree and basically navigate the notes where these errors exist the most. It has the error rate, the error coverage, you can see the precision, the recall, what are the samples where these errors are, and then you can ask the question, why is there more errors?
04:12
Because error is not distributed equally. Understanding where they are means you maybe can take some steps, whether it’s adding more data, whether it’s maybe more features, or perhaps there’s some kind of anomaly, or maybe you have to do some mitigation. Maybe you change the way the predictions are made to account for an imbalance of some sort. There’s a lot of things that can be done to correct that. As a sub-topic, I also went into uncertainty quantification. This is also not done enough, which is when you make a prediction, you just make what it’s called a point prediction. You’re just, “Okay, this is what I’m predicting.”
Jon Krohn: 04:59
Right. Right.
Serg Masís: 05:00
You’re not giving any kind of uncertainty-
Jon Krohn: 05:03
Right.
Serg Masís: 05:04
… no kind of confidence interval or anything of that nature.
Jon Krohn: 05:08
Yeah, so one place we see that often is in polling. When we have predictions of how an election result is likely to go, you often see those confidence intervals, but you don’t see it very frequently at all.
Serg Masís: 05:20
Yeah.
Jon Krohn: 05:21
Yeah.
Serg Masís: 05:22
Cool. All right. Having some uncertainty in outputs and also doing model error analysis… Thanks again for suggesting that library. What was the name of the library again for Microsoft?
05:34
It’s called Responsible AI Toolkit.
Jon Krohn: 05:37
Responsible AI Toolkit for Microsoft. Brilliant. That gives our listeners a tool that they can pick up today-
Serg Masís: 05:45
Yeah.
Jon Krohn: 05:45
… and be doing some error analysis, identifying the cause, the causal effect behind some of the residuals-
Serg Masís: 05:51
Yeah.
Jon Krohn: 05:52
… the deltas between what the model predicts and the actual reference data point. And that can lead to fixing those up. You could be having much better models or much better data collection.
Serg Masís: 06:05
Absolutely.
Jon Krohn: 06:07
Awesome Serg. Thank you so much for starring in this special Five-Minute Friday episode of the Super Data Science Podcast. And can’t wait to be reeling off more of your brilliant questions to guests.
Serg Masís: 06:22
Thank you so much.
Jon Krohn: 06:25
Okay, that’s it for this special guest episode of Five-Minute Friday filmed onsite at ODSC West, and this was the final of our four episodes taking that format. I hope you enjoyed the series. It’s been fun for me. I hope it was fun for you too. Until next time, keep on rocking it out there folks, and I’m looking forward to enjoying another round of the Super Data Science Podcast with you very soon.