92 minutes
SDS 641: Data Science Trends for 2023
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
It's that time of the year again... Sadie St. Lawrence is back to predict the future of data science and shed light on the top data science trends of 2023.
About Sadie St. Lawrence
Thanks to our Sponsors:
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
About Sadie St. Lawrence
Sadie St. Lawrence is the Founder and CEO of Women in Data, an international non-profit organization with representation in 55 countries and a community of 30,000+ data leaders, practitioners, and citizens. Women in Data has been named a Top 50 Leading Company of The Year, and has been rated as the #1 community for Women in AI and Tech. Prior to her work at Women in Data, Sadie worked in Data Science and AI Strategy. She has trained over 400,000 people in data science, along with developing multiple programs in machine learning and career development. Sadie has been awarded, Top 30 Most Inspiring Women in AI, Top 10 Most Admired Businesswomen to Watch in 2021, Top 21 Influencer in Data, and is the recipient of the Outstanding Service Award from UC Davis. In addition, she serves on multiple start-up and nonprofit boards, and is the host of the Data Bytes podcast for Women in Data.
Overview
As always, Sadie has some fantastic trends to share, but first, it was time to review the predictions we highlighted only 12 months ago.
So how did we do? Here are our top three trends and predictions that we saw grow significantly throughout the year:
- AutoML and its ability to facilitate the optimization and deployment of machine learning models. Conversely, this also accelerated the negative effects of automation within ML.
- Sadie's four principles for making AI scalable from prototype to production (cloud-first, standardized and automated workflows, monitor performance and production, and ensuring traceability). In 2022, we saw the growth of many ML startups that leveraged her principles and accelerated the process of make AI scalable.
- The intersection of blockchain and data science. Awareness surrounding blockchain data and the potential (ie. traceability, identity fraud detection, etc.) that exists as these two worlds collide grew stronger throughout the year despite the downfall of cryptocurrency.
The Top Data Science Trends for 2023
Now looking ahead to 2023, Sadie arrived prepared with some data science trends that we can't wait to see come to fruition.
Major Data Science Trend of 2023: Data As A Product
The main theme we'll be seeing far more of in 2023 is the concept of data as a product. According to Sadie, you can expect to see more products like large language models and ChatGPT that consumers will be able to interact with. Organizations will also begin thinking of their data as an actual product that holds inherent value that can be transformed into operational value. Sadies also predicts the following three trends as off-shoots of data as a product:
Multimodal Models
With the popularity of wildly popular products like ChatGPT, the AI renaissance has officially arrived, says Sadie, who also identifies the need for engaging video content as the next iterations of DALL·E and GPT-3.
The Data Mesh
After first introducing you to the data mesh in Episode 609, we're expecting to see the growth of this new trend as we head into 2023. The data mesh allows greater enablement for businesses to self-serve, giving stakeholders across the company the ability to make independent use of their data and freeing data scientists from these responsibilities. This shift provides data scientists with more time to focus on developing core models that deliver significant business value.
Privacy & AI Trust
As more products like ChatGPT are quickly adapted by consumers, more people become exposed and so do more problems, says Sadie. 2022 saw the release of the United Nation's AI Bill of Rights, and the US government also followed suit, giving way to frameworks on how to govern many of these models. Expect to see the further development into the regulation of the AI space.
Finally, as climate change remains a global emergency, environmental sustainability in AI and accelerated compute are topics we expected to gain further traction in 2023 in the coming years. Tune in to this episode to hear Sadie's complete insights.
In this episode you will learn:
- A recap of 2022 predictions [5:22]
- Our data science trend predictions for 2023:
- Data as a product [23:36]
- Multimodal A.I. models [32:26]
- The data mesh [42:49]
- Privacy & AI Trust [50:54]
- Environmental Sustainability [54:37]
- Sadie's goals for 2023 [1:16:04]
Items mentioned in this podcast:
- Kolena
- Women in Data
- Data Bytes Podcast
- SDS 537: Data Science Trends for 2022
- ChatGPT
- Lensa
- SDS 565: AGI: The Apocalypse Machine
- SDS 588: Artificial General Intelligence is Not Nigh
- SDS 590: Artificial General Intelligence is Not Nigh (Part 2 of 2)
- SDS 609: Data Mesh
- SDS 624: Imagen Video: Incredible Text-to-Video Generation
- SDS 621: Blockchains and Cryptocurrencies: Analytics and Data Applications
- SDS 625: Analyzing Blockchain Data and Cryptocurrencies
- Tesla A.I. Day
- Tesla Dojo
- US AI Bill of Rights
- The Great CEO Within by Matt Mochary
- Kirill and Hadelin’s new course: Machine Learning in Python Level 1: Beginner
- Jon’s virtual conference on natural language processing with large language models
- Jon’s Podcast Page
Podcast Transcript
Jon:
This is episode number 641 on Data Science Trends for 2023 with Sadie St. Lawrence. Today's episode is brought to you by Kolena, the testing platform for machine learning.
Welcome to the SuperDataScience podcast, the most listened-to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple.
Happy New Year. Welcome to the year 2023, and welcome back to this SuperDataScience podcast. To kick off the new year, we've got an annual data science trend prediction special for you. Today we're going to start the episode off by looking back at how our predictions for 2022 panned out from a year ago, and then we'll dive into our predictions for the year ahead. Specific trends we'll be discussing include data as a product, multimodal models, decentralization of enterprise data, AI policy, and environmental sustainability. Our very special guest to guide us through these predictions is the magnificent Sadie St. Lawrence, a Data Science and Machine Learning instructor whose content has been enjoyed by over 350,000 students.
She's the founder and CEO of WomenInData.org, a community of over 20,000 women across 17 countries. She also serves on multiple startup boards and is host of the Data Bytes podcast. Sadie was previously our guide to the 2022 predictions episode from a year ago. And if you listen to that episode, one of our most popular episodes ever, you know that you're in for a treat again today.
Today's episode will appeal to technical and non-technical folks alike. Anyone who'd like to understand the trends that will shape the field of data science and the broader world, not only in 2023, but also in the years beyond. All right, you ready for this visionary episode? Let's do it.
Another year. Another Sadie St. Lawrence episode. Welcome to the SuperDataScience podcast, Sadie. Yes, it's time for the data science trends of 2023. You were here to do the same in 2022. It's episode number 537. Listeners can go and listen back to it and hear how dumb we sounded a year ago. So as I probably said last year in that episode, Sadie is one of the most forward-looking data scientists that I'm aware of. So there are many, many dozens of guests on the show every year, and I often ask them questions like... Although I've started to do this less because I was getting such poor answers.
I would ask guests, what excites you about this whole field and how it's going to change over your career? And I'd ask these incredibly deep experts and I was expecting something so profound. And then basically it would be an iteration on whatever the latest trend that had happened in their space was. And I was like, damn.
But Sadie, St. Lawrence, you are thinking ahead. I am going to call you a futurist. You probably wouldn't because it's kind of an over the top thing to call yourself. But I think you are, and that's why you're back on the show again this year doing another Data Science Trends episode for the year. Sadie, start us off by telling us why the time of year where the year changes over is your favorite time of year.
Sadie:
Well, first of all, it's such an honor and pleasure to be back. And again, it's great to be back when it's my favorite time of year because as you mentioned, if you didn't hit it right with the trends, you have a new year and new blank piece of paper and you can do it all over again so we get a second chance. So if 2022 wasn't your best year, that's totally okay because it's a new year, new goals, new opportunity is an exciting time. If 2022 was a great year, then it's a time to celebrate. So either way, whether you're celebrating the last year or you're looking forward to the new year, it's a great time of year because it's in my book, it's a win-win either way. And there's nothing I love more than a blank sheet of paper, a new opportunity to define the road ahead, what we want to do and have conversations like this. So thank you for having me back.
Jon:
And my great pleasure. And at the end of the episode, we will hear what Sadie is doing with all of her blank sheets of paper because not only can Sadie see into the future with her crystal ball, she can also take action on things that matter. And so we'll hear about that more near the end of the episode. But let's spend the heart of this episode focused on trends. Let's start by recapping how we did last year. So the very first topic that we had, our very first prediction for 2022, was that AutoML will enable data scientists to more easily optimize and deploy machine learning models.
But that that may also accelerate the negative social side effects of automation if we're not careful. And this has certainly been the case that AutoML tools are proliferating in terms of venture capital fundraising, probably even in terms of sponsors that you've heard on the SuperDataScience podcast over the last year. There are a lot of tools out there that are automating data scientists' workflows and it makes a lot of sense. Sadie, I don't know if you have anything in particular to say about that. The thing that I'd most want to ask you about is if you saw whether this negative social side effects of automation have been accelerating as well as a result over the past year.
Sadie:
Yeah, I think that this was a hit in terms of putting this into the hands of consumers. So I think from data scientists using AutoML, usually they're equipped with the knowledge of the hazards with it and still do their own internal tests. I love data scientists because they want to see the numbers. They're curious. They're like, "How did you get to this result?" So I don't think from a data scientist perspective, we saw societal impacts. I do think we're seeing a little bit more of this when we put it into tools like Lensa where you're adding photos of yourself and it's recreating it and it may be a little biased in terms of the racial profiles it creates of you. So I think we're slowly starting to touch on that, but that may be more of a theme coming into 2023 as well.
Jon:
I also think something that has been happening a lot, a trend that has been accelerating year over year, is this awareness of the social issues of the data that are used to train models as well as the models themselves. And this is reassuring. It's something that, we see conference tracks focused on it more and more. It's something that for some conferences you have to address it specifically when you present any results. So hopefully we're going in the right direction on that. And hopefully the negative social side effects of automation are at least in the spotlight, are at least being acknowledged if not actually being removed from our models or minimized in our models more and more.
Sadie:
Yes.
Jon:
Okay. So our second trend for 2022 was how GANs, Generative Adversarial Networks will be used and misused to generate remarkably compelling deep fakes, and doing that with fewer examples of target person than ever before. I haven't been talking about GANs a lot lately. Sadie, have you?
Sadie:
I have not either. And this is again why I love the new year because we're going to do this all over again and talk about what's new and hot this year. Because wow, did those come and go. That was faster than some fast fashion trends that I have seen. So I hope that one year we all remake what happened and how we were using data science 10 years ago and we'll pull up GANs such and such. So yeah, I think this one was a miss and I'm excited to talk about the change of this in 2023.
Jon:
Totally. Say no more. We will dig into that. There was a big yawning gap in what we talked about in our data science predictions of 2022 and we will certainly address that in this episode. A third trend that you brought to the table, Sadie for 2022, I thought this was brilliant. You had four principles for making AI scalable from prototype to production. And namely those four principles were be cloud first, have standardized and automated data workflows, monitor performance and production and ensure traceability. Again, those four principles, kind of like the AutoML prediction, we're seeing so many startups get funded with tools that solve these problems. We're seeing tons of open source tools that allow data scientists to make AI scalable from prototype to production using very much the principles that you outlined last year. So I think we got that one right.
Sadie:
Yeah, I think the only thing I'd like to add to that is big portion of it is just the iteration. And we've seen this year, is the iteration now of models. And so when you create your models in a scalable, documented way that you can monitor, it's much easier to iterate and improve on that. So I think it was right aligned and I'm excited to see how people put it into practice this year by really iterating on their models and improving quickly.
Jon:
Totally. That is a huge thing. We have a sponsor of this very episode, Kolena does exactly that kind of thing. We see emerging startups that allow AI models to be iterated on and have workflows that test those iterations automatically. And yeah, I think it's brilliant that companies are getting into that. And as we in general automate more and more, not just of the AutoML, not just of the model fitting and model selection, but all of these processes around getting AI to production and in production, as more and more of that is automated it allows us as data scientists to have more models in production, kept up to date, kept reliable, and allowing us to be keeping an eye on the negative social side effects as well.
And then trend number four from last year. So you got me all excited about the blockchain as it pertains to data scientists and data science. So you got me excited about how the blockchain is this enormous store of publicly available data that everyone who's interested in data should be interested in because we've never had this kind of economic data that you can peer into in real time. And I learned so much from you in that episode. You gave me recommendations of speakers to have on the show in this year and I did that.
So for episodes number 621 and 625, we had Philip Gradwell and Kim Grauer, truly two of the leading minds on applying data analytics and data science to blockchain data. Really interesting episodes. If you're a listener who hasn't yet learned about how blockchain data is valuable to data scientists, regardless of what you think about cryptocurrencies, there's an enormous amount of potential there for data analysis.
And in particular, one of the points that you made last year, Sadie, was that NFTs, non fungible tokens could provide more accurate data on any real world object and potentially even prevent identity theft.
So this year, crypto indices, crypto prices have been hollowed out. I don't know how many people saw that coming. It's something that doesn't surprise me. I was surprised at how quickly it happened over the course of 2022. And I'm not surprised to see that there were so many scams being run by enormous crypto companies, crypto exchanges, that when the tide went out, it proved that, I don't know what the expression is exactly, but something about, I don't know, people being exposed. I don't know, people were swimming without their swimming trunks on and then the tide goes out-
Sadie:
The rug was pulled out. Or the emperor, it was a little bit of the emperor with no clothes is what happens. I'm glad I'm not doing crypto predictions because that would've been a brutal year. Very happy this is a SuperDataScience podcast, so that's a great thing. And I'm actually really happy with what's happening in this space because I think a lot of the noise is being removed that hopefully we can get to more of the technology and how the technology may actually be useful. And big fan, of both of the people that you interviewed on the podcast. Also would encourage everyone to go check those out. I listened to both episodes, I learned new things from them.
They've really done some cool things in terms of traceability and actually catching fraud. So if you want to be on the good side of crypto, you want to be on the side that catches fraud and you're going to be using your analysis skills to do that. So highly encourage individuals go check out those episodes.
Jon:
Are you unit testing your machine learning models? You certainly should be. If you're not, you should check out Kolena. Kolena is an ML testing platform for your computer vision models. It's the only tool that allows you to run unit and regression tests at the subclass level on your model after every single model update, allowing you to understand the failure modes of your model much faster. And that's not all. Kolena also automates and standardizes your model's testing workflows, saving over 40% of your team's valuable time. Head over to Kolena's website now to learn more. It's triple W dot Kolena dot I-O. That's K-O-L-E-N-A dot I-O.
Nice. And then any thoughts on the specific point that you made last year about how NFTs could provide more accurate data and potentially even prevent identity theft? So I haven't heard as much talking about NFTs in recent funds as I had near the end of 2021. So are NFTs still a thing? Are people still making NFT art or is this a perfect example of how with some of these spurious, frothily valued financial related applications, as people start to become more critical and thoughtful about those applications, we'll see more and more of blockchain being used for non-financial applications where they can actually provide some real world utility? And so maybe this NFTs and preventing identity theft is one of those kinds of things.
Sadie:
Yeah, so I'll say one thing. It's going through a brand makeover, maybe similar to how Facebook did last year, to change to Meta. And they may want to do another one. I won't comment on that. But-
Jon:
That's... Really, really quickly. That is a really staggering one. There's no way we would've predicted that the company that was called Facebook right up until we recorded the episode last year, that company from when they changed their name to Meta has lost 80% of its value. That is wild. It does actually seem like potentially a buying opportunity right now for those of you who trade in [inaudible] stocks.
Sadie:
And disclaimer, not that we are giving any financial advice.
Jon:
No, not giving financial advice.
Sadie:
No, I'll put that out there right now. So what I've seen happen is there's actually a change in the name of NFTs. People are now calling them digital collectibles. So there's a remake of branding on it. The biggest launch that ever happened, happened on Reddit. They didn't even call them NFTs. People were able to buy them with their credit card and they didn't sell it that it was a blockchain solution, but everything was written on the blockchain. So I think that's what we're going to see more and more of, which is what I alluded to with now, the text being utilized but not at the forefront. So you may encode the transaction on the blockchain, but that's not the selling point of what the transaction is.
That's really just an audit trail and an audit record, which is what the technology was always meant to be in the first place. And I think we've seen similar things in data science happen before, where originally we talked about big data and then it had a rebrand to data lakes and we continue to rebrand it. But what we're doing is we're just distilling to what the technology really is and useful for and cutting out the hype. So I'm really happy that this progression has happened, but just be careful that a lot of it is just rebranding and some of these things are still the same.
Jon:
Yeah, I think the biggest analog in data science would be the term artificial intelligence, which is so overused, and in particular with company valuations. So there was a period of time a few years ago where you could say that you're an AI startup and that was good for your valuation probably in the same way that saying you were a blockchain or crypto or NFT startup was a few years ago. And now it's more about wait, what do they actually do? What is the business value? Let's not focus on this being an AI startup or an NFT startup or a blockchain startup.
What does this do? What's the application? And let's focus on that. What's the size of the market? How does... We don't even really care that much what's going on in the backend, that it's AI or blockchain or NFT, whatever. What I care about is that you have a technological solution in the background that is somehow allowing businesses or consumers to do something much more efficiently than before or at a lower price. And that's where we should be focusing on our investments anyway.
Sadie:
Yeah, my favorite question to use for people who like to use a lot of buzz words or you think there may be a lot of hype-filled in, is just ask them, "What problem are you solving?" And that just cuts to all the... Cuts all the fat immediately. Because if they can't answer that, they're just selling the hype. If they can answer it, then you get to the real solution a lot faster.
Jon:
I can't wait till we're doing our 2033 episode and you're telling me about all of the companies that your venture capital fund has invested in, that you're running. I see that happening. That's what I see in my crystal ball, Sadie St. Lawrence.
Sadie:
Well, we'll mark it down. This is recorded so we'll play back the episode. There you have it guys. Jon Krohn's predictions for 2030.
Jon:
Jon Krohn. My name doesn't rhyme. I have Aunt Joan, her name does rhyme. And our final prediction for 2022 was how increasing the data literacy of the global workforce from its relatively low levels. About 21% of people on the planet have data literacy and increasing that is critical to allowing people across industries to be making better data-driven decisions. Not just data scientists. Not just data people. Everybody could be using data and models to be improving efficiencies in their field. And so we've got to get that up. Now that's something that's quite a personal thing to you. Have you seen anything on the front lines in terms of data literacy and education changing over the last year?
Sadie:
I don't see that, what I would call the application of data is changing. I think the awareness is getting there though, especially now as we talk about ML models being more familiar with people in their everyday life. I mean the recent Lensa app was huge all over the internet and that's direct to consumers. So I think that's the start of it, where consumers are engaging with these models. And I hope that is the start then of the curiosity of, okay, how does this actually work? How is this created and what do I need to know about this so that I can participate in this economy?
So I think this is, the data literacy is one that will be, I think a struggle for a lot of us for many years. And I also think that if you're listening to this podcast, you have an interest in data science if you're not already a chief data officer in the field. And because of that, you have somewhat of a responsibility to share the knowledge that you have with others. So I always use it as a call to action for individuals like, hey, wherever you're at in your career, take what you know, help educate, inspire, and bring others along because this world is here to stay. This digital data-driven world is here to stay.
Jon:
And I think that the kinds of instructors out there who are helping people of all stripes to become more data literate, I think that they are doing very well. Our recent episode number 637, we had Ann K. Emery on and she makes a living doing that. She... You're nodding your head. You know Ann Emery?
Sadie:
Yes, she's awesome.
Jon:
Yeah, she is awesome. That's such a good episode. She's so smooth. She delivered everything so smooth. I'm glad that she did because communication, data communication is one of her big things. But she's so smooth. The entire episode from beginning to end, it was flawless. And I've been using a lot of the advice that she gave in the episode, practically since. So that's a great episode for people who want to be improving their own data literacy, even if you are a chief data officer.
Sadie:
Yes.
Jon:
Okay, so by my count, Sadie, so I put check marks and crosses next to our predictions from last year and we did pretty well. We got four out of five. Going in the right direction. So those are the true positives, but we also have a glaring miss where we missed an enormous trend and listeners have probably already spotted what it is, especially the way we alluded to it when we were talking about GANs. I'm not going to spill the beans yet, so keep thinking about it in your subconscious, listener, if you haven't figured it out yet. Sadie, let's dig into our predictions for 2023. Last year you structured it so that we had these macro topics and micro topics. This year, we've got a big main theme and then subthemes that emanate from that. So Sadie, what's the big main theme in data science for 2023?
Sadie:
So the big main theme for me is data as a product. What I mean by that is really twofold: data as a product, that's consumers can now interact with large language models and ChatGPT, a big miss that we had in 2022, but also, in terms of how organizations like enterprise organizations are looking at their data and switching to more of a self-service model where they're enabling their business counterparts to think of their data, not only to make data-driven decisions, but to think of it as an actual product. To treat it as this has value and respect and we can turn this into something that's operational and create revenue from the company. So we're going to see this pop up, I think, in lots of different ways. ChatGPT has what I call broke the internet. I was trying to use it the other day and I couldn't because too many users were on it. I think it's been one of the fastest platforms to get to one million users and beat out things like Facebook, Netflix, Twitter-
Jon:
Oh, wow.
Sadie:
... just in terms of the growth.
Jon:
I didn't know that. Wow.
Sadie:
Yeah, so I have some stats. It took it five days to reach one million users. Other ones that are close, were about 2.5 months. So a lot of that comes in with social sharing and the internet, the fact that we have social networks. But I think that's something one we missed from last year, because we had no idea this was coming this year.
Jon:
Wow.
Sadie:
But more importantly, just think about this isn't just people who work in technology who are using it, this is your everyday consumer, is using an AI model, and to me, that's huge. The AI renaissance is here. This is the most excited I've been about AI since I got into data science in 2014. I feel like it's an AI rebirth. I don't know about you, but I'm just like-
Jon:
No, totally.
Sadie:
[inaudible] a lot.
Jon:
These foundation models, things like ChatGPT most recently at the time of recording, pretty soon after this episode is out we might have ChatGPT-4. So all of these capabilities, these ChatGPT capabilities that everyone's seen, probably that's listening, especially if you've listened to all the episodes, you couldn't have missed it because for Episode 638, I did my holiday greeting this year scripted by ChatGPT, which reminds me, should we read some of this script? So Sadie-
Sadie:
Yes, we should.
Jon:
... before all of you out there started using ChatGPT-2 and made it impossible for us to use it, luckily Sadie got a screenshot where she wrote, "Create a script of Jon Krohn interviewing", although I guess since you wrote it says "Create a script of Jon Krohn interviewing Sadie St. Lawrence on the SuperDataScience podcast discussing 2023 data science trends". I love it. It added in a third person, it added in a narrator.
Sadie:
Maybe it's a sign that you need a new person on the pod to narrate. We could get some suggestions.
Jon:
That would be great. I would welcome a helping hand. So the helping hand says, "Welcome to the SuperDataScience podcast where we explore the latest trends in developments in data science. Today we have a very special guest with us, Sadie St. Lawrence, one of the leading experts in the field of data science. She's here to discuss the trends that we can expect to see in 2023." And then Jon Krohn comes on and he sounds a lot like my voice and he says, "Hi Sadie, it's great to have you on the show."
Sadie:
And Sadie says, "Hi Jon. Thanks for having me."
Jon:
"So Sadie, what are some of the key trends that you think we will see in the world of data science in 2023?"
Sadie:
"Well, Jon, there are a few trends that I think will be particularly important in the coming year. One of the biggest trends we will see is increasing adoption of machine learning and artificial intelligence."
Jon:
Wow, that's really interesting.
Sadie:
Data is a product, right? It's because ChatGPT wrote it for me.
Jon:
Right, right, right. Want the best possible start in Machine Learning? SuperDataScience’s top instructors Kirill and Hadelin are back creating courses and have released a brand-new ML course that will give you that perfect start. It’s called “Machine Learning in Python, Level 1.” From their experience teaching Machine Learning for over 6 years and collecting feedback from their 2 Million Plus students, they know exactly what you need to be quickly on your way toward ML expertise. You will get crystal clear explanations of introductory machine learning theory backed by practical, hands-on case studies with working code. Enroll today at superdatascience.com/start and get ahead of the game! Again, that's superdatascience.com/start
So yeah, to be clear now, we have now stopped reading from the script. We stopped reading when the script started to be a little bit high level. So when Sadie posted this discussion, this script on Twitter, Bret Tully, who was our guest on episode number 533 of the show, he commented that he doesn't think that Sadie and I will be out of work anytime soon that our AI overloads won't be taking over podcast episode scripting. Yeah. So there's an enormous achievement. There's no question whatsoever. It's blown my mind, the jump from the previous GPT-3 interfaces that we'd seen where you could get at most about a paragraph of sensible content. But sometimes with the prompts that you provide, ChatGPT, you're getting pages and sometimes it's really coherent and good and you can use it and then you can have discussions with it and it's able to recall, it's able to use previous parts of the discussion, whether it was the algorithm or you, and work that into the conversation.
So definitely has a lot of limits though, for my holiday greeting, I thought it would be really fun if I could make it rhyme. So after it gave me the holiday greeting episode, which was great, I was like, this is perfect. This sounds like exactly the way I would've scripted it. Perfect. Then I was like, now make it rhyme. And it didn't do a great job, but I have seen other people like you say, I'm taking up too much of the mic time here, but tying back to one of your points here, that this is something that it isn't just data scientists that are using this. So I have a friend Zack Weinberg, who runs Canada's largest home brew supply store and he's been over the course of the week leading up to us recording this episode, he's sending me screenshots of conversations that he's having with ChatGPT. He loves it.
So yeah, it's great to see these foundation models, not just like GPT-related models, these natural language generation models, but also foundation models that leverage the models like GPT-3 to generate art like DALL·E 2, same kind of thing.
Zach, this home brew supply store owner and tons of other people around the world are sending me messages being like, holy crap, this is amazing. Yeah. So it is kind of like an AI renaissance that's bleeding into just kind of general interested consumers and creators of these applications are figuring out ways to package it up in a way that any kind of user can make use of it in a really intuitive and powerful way. That's cool times.
Sadie:
It's really cool times. And obviously ChatGPT is taken over the internet right now, but even the Lens app took over the internet where you put in your photos and it remakes them with DALL·E 2. So they're just more and more that are coming out. And it really leads me to, if I may, what I'm excited about is kind of like a spinoff trend of data as a product, which is multimodal models. And what happens when we combined bits of these, and my prediction in this space is as we talk about GPT-4 coming out, I think it's going to be something with video.
I've seen more and more need for video content. Everything is very much short form. People want it to be more engaging. And I think when you have something like DALL·E 2 with ChatGPT-3 and then they also have their audio transcript tool, my prediction for that is it's going to be something where you can provide prompts but also examples of looks and feels, kind of combined in what people are doing with the Lens app and start to create some really engaging video content. But we will see, I mean that's the final prediction so.
Jon:
100%. Yeah, we started to have a little bit of that kind of thing, a few seconds of video. So in episode 624 I did a Five-Minute Friday on Imagen Video and that isn't yet compelling like DALL·E 2 or ChatGPT. And so I think you're right on the money Sadie, I think that with how quickly things progressed in 2022, it is not inconceivable that in 2023 we will have really compelling video for more than a few seconds. That is believable. Yeah, I agree with you. That is a really good call. I'm feeling confident about that one. And I asked Serg Masís, our researcher on the SuperDataScience podcast who's been on the show a number of times, most recently in episode 634. And I asked him, because he does research for the show, I said, don't worry about doing research for this 2023 trends episode, but do you have anything that we should make sure that we talk about? And he said that 2022 had many unexpected breakthroughs in exactly what you're saying, multimodal models. So text to image, text to video, text to 3d. Yeah. So then he asked me to be sure to ask you what you expect to happen in the generative AI space in 2023. And you've already answered that question. Boom, compelling video. I agree.
Sadie:
Which I'm very much looking forward to because video is not my strong suit. So if I can get a little help from AI, I will be very happy in that regard. But I do want to go back just a little bit on data as a product. Because I think, I caution startups, I could see a lot of startups emerging in this world now to do things that OpenAI is doing or I see a lot of copycats coming out. And I think that there's a lot of opportunity in looking at data as a product from an enterprise level too. And one of the companies who's doing this really well is Tesla. And they're taking data now from their cars and creating whole new lines of business. So one of the ones that's pretty simple is the insurance, the car insurance company, because they said, hey, we have so much great data on our drivers that we can build way better risk models.
And so at an enterprise level, you're probably going to plug in and buy things, enterprise tools from OpenAI, probably not going to create those your own, maybe you will. But there's still a ton of opportunity to use the specialized data that you have at your company to create products, to create new lines of business, to come up with innovative ways. And so I would definitely make sure you're not feeling like the space is overcrowded or there's too much hype, but take this as an opportunity to look at the data that you have and what's unique about it and how you can use that to create a product because the differentiator is going to be the unique data you have available.
Jon:
Really great point. I love that. So yeah, so data as a product, people realizing that they have been collecting data that could be useful in under other industries and figuring out ways to monetize that. I didn't actually know about that Tesla application. That is interesting. It makes a lot of sense.
Sadie:
We could go in further too and talk about their human robots. I think that's still a little ways out. I don't know if anyone watched the Tesla AI day, highly encourage everyone to watch it. I think that's a good inspiration for trend predictions. But yeah, they're definitely taking data as a product to a whole new level. So I don't know if I'm quite ready to say that 2023 will have true AGI, but I think we have getting closer every day.
Jon:
I don't know, I mean this is something of some contention between me and some other people, prominent people who have been on the show, some people whom you know very well, specifically I'm thinking about Jeremie Harris. So Jeremie, I introduced you to him and you have been on each other's shows, you had great episodes. And Jeremie, when he was on this show, he was talking about how AGI is coming very soon because of these models getting bigger and bigger and that we have these emergent properties like these really compelling ChatGPT conversation, is emerging from just adding more and more model weights in two structures to neural network structures that we've had around since the 1950s. And so by that reasoning, as we eclipse or approach the number of model weights in these foundational models or large language models, as that starts to approach or eclipse the number of neurons in a human brain, some people say we could have this emergence of AGI.
And I did two episodes where I tried to give counter-arguments to that. So Jeremie Harris's episode is number 565, amazing speaker as you well know, Sadie. And it's a really compelling episode if you want lots of evidence that AGI is nay, then that episode is one that's that's packed full of it. And then I did a couple episodes later, so I call them AGI is not nay, part one and part two, those are episodes 588 and 590 of the program. So I don't know, it seems to me like there are, there's more to AI being successful than just number of model weights. I think that there's more complexity in the way that animal brains are structured that we haven't figured out how to recapture.
And that includes things like literally the physical structure of the brain, but it also includes things like causal reasoning, which puppies and infants can do and machines today are helpless at. So I think there are some barriers. I don't think we're going to have AGI this year. I feel very comfortable making that prediction, but I keep being surprised. I wouldn't have thought the ChatGPT was possible and boom, we have it here.
Sadie:
Here's my prediction on it is, we aren't going to know when it happens. And the reason we're not going to know when it happens is because we still today do not have an agreed-upon term for human consciousness. So when we start to talk about things like intelligence and consciousness, and the big thing that came out this year or in 2022 was the Google researcher who thought the AI was alive, who thought it was thinking. And a lot of people said no it isn't. But if you look into consciousness research, we still don't even fully know what that is yet either. So I think, no, we're not going to have AGI in 2023, but when we do get it, will we even know that we have it? That's my question. So I love Jeremie's explanation. He's amazing. I learn something new every time I talk to him. It's a very physicist explanation of things. So I'm like, how do we define it from a philosophy and a psychology standpoint, and can we get agree upon it as that? I think we'll be a ways from that.
Jon:
Yeah. Another trend that we missed in 2022 was Google Engineers coming out and declaring that their algorithms are conscious. That is now something I saw coming.
Sadie:
Yeah, be careful how long you're chatting with some of these. I'd warned that, huh? Right. Maybe that's a word of caution to all of us.
Jon:
Right. And well, we're really taking off boxes here in terms of, so I was asking around Serg Masís, is one person and also asking our audience on social media what kinds of topics you and I should be covering in this episode. And we have already been nailing those too. So Mike Nash said that we should be sure to be covering generative AI and GPT-4, wondering when those are going to be making an impact in 2023. And we've already been digging right into that. So, all right, so I've taken us off track a bit here. You had your main theme of data as a product and you've had some great examples from that. You've also dug a little bit into your first subtheme that emanates from that data as a product main theme. And that was multimodal models, specifically the example of compelling video happening in 2023. It's a bold prediction and I think it's possible. Anything else you want to say about multimodal models before we move on to another subtheme, Sadie?
Sadie:
No, let's go on to the next one.
Jon:
Nice. Cool. What is it?
Sadie:
The next one, it's really from a technology architecture and that is the data mesh. So I think I alluded to say-
Jon:
Say it in plural.
Sadie:
That I cannot do the data meshes. We're coining a new term here today, the data meshes.
Jon:
We must have listeners out there that know the data mesh singular much better than Sadie. And I do, I mean, I guess I should just be asking Zhamak who coined the term and who was on the show earlier in 2022 and that was episode number 609. So Zhamak coined the data mesh and something that's really confusing for me, even as I was writing the intro in the outro to that episode, I was like, how do I talk about this in the plural? Why don't people say data meshes? Anyway, somebody can point out to me, point out to both of us say to me why that doesn't happen. But anyway, your second theme is data mesh.
Sadie:
Yes. I think it'll say in the plural because if it's data meshes, then you're on a slippery slope to Data Mess. And that's not where we want to go with that at all.
Jon:
Oh, I like that. That's my subtheme for 2023 is that we're going to have a lot of Data Messes.
Sadie:
We'll get into that when we get into the privacy and regulation. I think that would-
Jon:
Yeah. Exactly. Perfect
Sadie:
But yeah, so I mentioned this earlier where we were actually talking about NFTs of how much terms get changed and it's all an evolution. And so we saw everyone was talking about the data lake and it ended up becoming what people call the data swamp. And so now we're moving all to the data mesh and really this is a decoupling of our applications, but more importantly is that enablement for the business to self-serve. And I think this will continue to be a really big trend, especially in enterprises and existing enterprises, mainly for the fact that we just have so much data and so much opportunity, we have to offload it from our data science and data engineering teams and enable that business so that we can get to work on those really core models that are going to deliver that business value.
Jon:
Yeah, maybe we should quickly, you've alluded to the capabilities there with the data mesh. And for people who haven't listened to episode 609, it's this idea of allowing stakeholders across your company to be able to make use of data. So it's instead of having a single centralized data store where a data analyst needs to run SQL or NoSQL queries to pull out information and provide that to the end user with the vision of a data mesh. You could have HR and finance have their own data in a decentralized way, but it has standards across the company so that data can be shared between these different nodes in the mesh. And it also has built in privacy rules so that somebody on the data science team can't see from the HR team's data what everybody's making on the company, for example. And so I don't know where you stand on this or what the terminology says to you, probably know more about data mesh, data meshes.
Sadie:
I know meshes, no mesh.
Jon:
And so Zhamak was making the point that anybody who says that they have a data mesh today, any company that says they have a data mesh that's available for people to buy, it isn't true, so a lot of companies will brand a product that is in the direction of her vision of a data mesh and they'll call that a data mesh, but it isn't really her full vision. She said that we don't have the technology today to build it yet, but maybe we'll be making inroads on that in 2023.
Sadie:
I'm going to agree with that and kind of twofold, right? One is that there is the architecture of a data mesh, but a big portion of it is the business operation. So even if there's a company out there that says, hey, we can build this all for you and do it all for you, so much of this actually working is in your people and process model of it. The whole technology is to enable all those business functions to be able to share the data. So again, I think it's more just a word of caution that if someone says they can do it all for you, there is no magic wand that I have found yet to date to make that happen. But if someone does have that magic wand, please do let me know.
Jon:
So what's the specific prediction for 2023 with respect to the data mesh? What are we predicting will happen over the year that hasn't happened yet in 2022? Is kind of this idea that will be, organizations will be taking a step in the direction of having that kind of decentralized data store. How are we going to monitor that or be satisfied that it's happened when we're looking back a year from now?
Sadie:
So I think I'm [inaudible] banking for next year's prediction, because I'm not giving you a strong metric for how we're going to, I'm playing some safe [inaudible] in there.
Jon:
I want some [inaudible]
Sadie:
I'll make sure I get those too. And so this is really more for our CDO audience in terms of at an enterprise level, what's going to be top of mind for you? What are you driving forward? And I think the pitfalls we're going to see this next year is back to what I mentioned last year, that data literacy. Okay, so you may have implemented this technology, but what's going to happen is when you put it in the hands of the users, are your customers in the business really ready to be enabled with this technology? And so I see us making inroads on the technology side of it and implementation this next year, but probably running into some barriers in terms of the full realization of the value that it provides and the cost savings, because all of our stakeholders are not quite prepared for this change.
Jon:
That makes perfect sense. So the prediction that we could look back on is something like the administrators of data across enterprise organizations, which is a CDO in many organizations. They need to think about data literacy and accessibility in order to be empowering people across the company to be making use of data without needing to talk to a data analyst. And maybe that'll be similar to, I mean I gave us a big green check mark for our AutoML prediction, which was also pretty vague, but-
Sadie:
I am learning my secrets, now you're learning the secrets of being a, you just add a little bit of clouds and people are like, yeah, I think I can see it. And then you always hit the mark.
Jon:
You're predicting that there will be more cloud data?
Sadie:
I do see that at, at our future, definitely stay with the cloud for some time.
Jon:
A year from now, we will have stored more data on the planet than there is now. Safe predictions. Okay, cool. All right. So I like number two. So we had subtheme number one for your data as a product. Main theme was multi-modal models. Subtheme number two was the data mesh and subtheme number three is the data mess.
Sadie:
Yeah. Subtheme number three is privacy and AI trust. And the reason why this is so big this year is as data, as a product, is getting more in the hands of the consumers, we have more people exposed to it, more problems exposed. The cool things are actually happening in regulation this year. So this year the UN came out with its AI Bill of Rights, the US government also came out with its AI Bill of Rights. And so we're also starting to see the foundations of frameworks for how we govern some of these models.
And so my prediction on this side is that we're going to start to see some big legal cases in terms of case law being created, but also regulation. So I would highly recommend everybody go and read at least the US AI Bill of Rights. I think it does just a really good job of outlining what is algorithm discrimination, what is data privacy, what's safe, what isn't? And is a thought provoking read. And I am looking at is the framework, what's going to actually shape up some regulation in this space, which I think is needed. And once there's regulation that happens, then there's lawsuits and case law and really interesting engagements where now it's not just technologists who are working with this, but also we need to make sure that there's lawyers involved in evaluating risk from a model standpoint of not just the AI model risk, but the human model risk.
Jon:
Nice. Well said. Yeah, and I agree with you and yeah, it is nice to see the US, I guess it's different from GDPR. So Europe led the way and then California to some extent, the Canadian government to some extent, with privacy regulations. This AI Bill of rights that came out in 2022 from the US federal government is different in that it's also talking about the ethics of model use and bias and that kind of thing, which GPTR wasn't as focused on.
Sadie:
Exactly. And this is the trend that I think really relates to the overall theme of we're moving into it as a product, because it's not just underlying what are you doing with my data? Because a lot of companies can get away from that and say, oh I'm not using your data as identifiable data, you're just a sample of our overall user base. And so I like where things are headed with AI Bill of Rights, because it's saying, hey, whatever form, even if it's anonymized data, how are you using this tool? And I think it's the next progression we need in some of the privacy and ethics work that's happening today.
Jon:
Nice. All right. So those are our subthemes. That's it, right? Multimodal models, data mesh, policy and ethics, improvements. I do feel pretty good about these predictions. I think they're good. I don't know where the big glaring hole is going to be. I feel like multimodal models and them being accessible to consumers, that's got to be one of the big trends that's going to be impacting our listeners the most over the coming year and feel pretty confident about that. Okay. So we've already talked about some of the topics that listeners brought up when I mentioned that you would be coming on for another 2023 data science trends episodes. We talked about Mike Nash's comment already about generative AI and GPT-4. We've discussed those. We've got a great one here from Mark Moyou. So he's a senior data scientist at Nvidia and he says a great topic to discuss is the emergence and normalization of accelerated compute.
So getting more compute per watt of energy to make data centers more sustainable. That is a really important one that we missed. It's a really good point.
So we talked about these foundation models like GPT-3 and GPT-4, which is coming, they are orders of magnitude larger than their predecessors. And that trend is going to continue for years to come. I mean to Jeremie Harris's point, if we're going to be able to realize something like AGI, it's going to require massive amounts of compute, many, many orders of magnitude than what GPT-3 does. So how do we do that in a way that is sustainable? So I don't know if Sadie, you have any thoughts on that?
Sadie:
I do. I think this is a really good call out in terms of what should be added and mentioned. Because I mentioned Tesla's AI day, probably the thing that got me most excited there was Dojo. So this is their super computer platform that they're building and they're looking at it in terms of, okay, if we want self-driving cars, if we want robots working in our factories, what do we really need to build this? It uses Invidia, but I think this is key to enabling a lot of these models in the hands of the consumers. The other side of things though is I also think that we're going to have a trend towards building models with less data, but better data. And some of the talk around GPT-4 was, it's not going to have as many inputs as people were originally predicting. And that's one of the reasons they're looking to use better data, but less data. And so I think that this is something-
Jon:
Oh, you mean the quantity of data is expected to be smaller potentially for GPT-4 than it was in GPT-3?
Sadie:
Yes. And so I think this is something that we're going to kind of see it on two sides. I think as practitioners in this space, you'll be pushed to use less data and make better models so that we're not just always ramping up our GPUs. I think that the more data is always going to be add to a better model. However, I do think that there's going to be some great work being done to make sure that we have the processing power. And what I'd encourage people to go look at is Tesla's Dojo, it's pretty exciting in terms of what they're building over there. They claim that they'll replace six GPU boxes with a single Dojo tile.
Jon:
Cool. Another thing in this space that comes to mind for me is that we need to be lobbying. Big tech companies listen, and I know that some companies like Google are great at this. They've been carbon neutral since, for a long time. It was certainly more than a decade. And so making sure that the companies that are creating these data centers, are powering them in a sustainable way. So it wasn't long ago, I remember having a dinner with a friend of mine, it might have been five or six years ago, where this person was working in energy markets and I got talking about sustainable energy. I was like, oh you're doing any sustainable energy stuff? And he had a British accent. He was like, Jon, what are you talking about? There's nothing, it's still not happening. He was like, look it up. There's like less than one percent of energy generation is sustainable.
And that blew my mind, because there's so much talk about it. There was so much talk about it then. There still is now, but today it is several percent. So it's growing from a low base and I think big tech companies are more mindful of it than corporations in general.
But yeah, something that all of us can be doing more, lobbying our politicians locally to be creating legislation that tries to build more and more sustainable energy infrastructure.
Because yeah, we're not going to meet, the world is already on track. It's like inevitable that we'll have two degrees of warming. There will continue to be more and more catastrophic consequences. Like Pakistan this year had really horrific flooding that was, it's in all certainty, human activities, carbon in the atmosphere caused the severity of that or increase the severity of those floods. And we're going to see more and more of that as years go on. And that's kind of like inevitable. But we can be making some progress. We shouldn't be opening coal fired plants. Some of them do. I read this week at the time of us recording, Sadie, the British government just announced opening their first coal mine in over 30 years, which is wild. So we need to be making sure that those kinds of out of date policies aren't being recapitulated.
Sadie:
Yeah, I think that's such a good call to action too as an individual. I don't think that's something that a lot of awareness is brought to in terms of building your models and the technology being used. So just asking what's the sustainability of the data center that we're choosing? Or if you're choosing a new one, making sure that that is a question you're asking. There's so many small questions and considerations we can bring up on a day-to-day basis. I think that's a great call to action for all of us.
Jon:
Nice. Cool. There's a couple other topics that Mark Moyou suggested here. He had one that we basically already addressed. It was resolving the gap in data science education and opportunities with the rise in normalization of large language models. I think we've covered that enough. But then his final one is an interesting point, which is also related to this LLM idea and we haven't discussed it exactly, is what will data science innovation look like when most models are now accessed through APIs, versus coming up with new models? So when I wrote my book, Deep Learning Illustrated, which was published in 2019, the vast majority of the AI capabilities at that time, I could have my readers create the model on their local computer and some of the models in the book, I was like, you're going to need cloud compute resources to do this.
You'll want a GPU in the cloud, one GPU in the cloud to be training this model architecture. And now just a couple of years later, these state-of-the-art, these foundational models, these large language models that we're seeing, that we've been talking about so much in this episode, GPT-3, soon to be GPT-4, DALL·E 2, ChatGPT, Imagen video, these require hundreds of the highest spec GPUs training for very long periods of time over very large data sets. And so most people use these out of the box without doing any model training. And that's part of what makes them so transformative and why we call them foundational models, because they allow us to be so that the generative pre-trained transformer, is what GPT stands for.
So generative outputting stuff, outputting text, pre-trained is the key thing here.
I'm talking about transformer is the architecture aspect of it. And we're not going to dig into that too much in this episode, but it's a way of architecting your model. But it's that letter, P, pre-train that is key here, which is that these large language models are capable of doing such an enormous wide variety of tasks that they haven't been explicitly trained to do. And so we can just use them by calling APIs.
So you can use the OpenAI GPT API for a very wide range of applications. My company, Nebula, has been able to prototype a number of really mind-blowing features for our users that would've taken us months of data collection and model prototyping. And we were just like, why don't we see if we can do that with the GPT-3 API? And we were blown away by the results. So that kind of thing is happening more and more. You can also fine tune some of these models. So there are API endpoints for fine-tuning, models like GPT-3 to your own particular data, but you would be doing that for a very, very, very small fraction of the time that the whole model had trained beforehand. So anyway, I've talked a whole bunch. Sadie, what do you think about Mark's point here?
Sadie:
I think this is a very good point. And beyond just plugging into some of these models, I think it's going to really revolutionize how we learn and how it assists us in our job. So as you may know, I teach a class SQL for data science and I pretty much take, I don't know if I should say this, but I take any question from that class and put it into GPTChat and it gives me a really great answer. So I know students have already probably figured this out, but if you start to think about that, no, things I've started to do with it as well as I asked and I said create a test script for me in Python, where I perform an EDA analysis. It provided the data. Then I said, I also wanted you to build a neural network.
And it built me the whole script and it gives me such a great framework to start to play around with and try and it took less than 30 seconds. So when you look at that, you go, what do you really need to learn? And then how is this either going to optimize your job or also how is it going to change your job? You can build a virtual machine within it. And so I think there's a lot that, not just from using it from an application standpoint, but using it in your day-to-day work.
And then what do you need to learn to get into the space? I don't know yet. It's been a week since I've played around with it, but it's making me start to rethink, is there a way I could reteach data science having an assistant?
And so that's maybe more of a personal prediction of a trend I'll be exploring more. Like is there a new way to learn? But a caution to that too is it's learning based off of past humans' work. So if humans aren't adding new work, is it going to come up with the new work, the new innovative ways? I don't know. It's going to be an interesting time to see how it continues to evolve.
Jon:
There is, and that touches on how it will also encode biases that humans have had in the past and written down and are abundant on the internet. And then the other thing about these large language models, which I don't know how this influences what it's doing with the questions that you asked it for your sequel course, for example. But these models, hallucinate is the term that we use. So they will very confidently say inaccurate things. Now lots of people do that. My sister says that I do that.
Yeah. So my sister will, she'll say, why are you saying that a year ago? You said the opposite. I'm like, you should listen to, I'm trying to be better, especially on air. I try to really hedge things and I hope I've been working with that feedback over the years and I'm less recklessly confident with things I say especially on air. But I'm sure it happens from time to time. And some people that we all know people who do it more than others.
So some people are really cautious about the things they say and some people are really reckless about the things that they say confidently. And that doesn't mean that they're lying, because they really believe it. So anyway, so we see that happening with these machines too. But it can end up being extra bizarre in the case of these large language models, because they can have a really expert vocabulary.
So you can ask them a question like an advanced SQL question or an advanced quantum physics question, and it might be able to use all of the right language so that you as a novice question asker, you don't have the capacity to distill whether this is right or wrong. So students in your course could be using ChatGPT to answer all the questions, but some of them could be getting answers that are completely off and they won't even have any idea. And so that has implications for us. And I'm not aware of immediately forthcoming solutions to this where models are able to be accompanied by some kind of truthfulness score.
Sadie:
I mean, we could get into a little bit of philosophy of some things of truth are subjective and that may take us down a different rabbit hole. But I think where I would like to go in my research is just looking at what came up for people when Google first really started to come mainstream and people realized that they had access to all this information. I wonder if some of the same questions and problems that people were proposing then, be true today. I think history is, even though we have innovative new technology, there's patterns that are similar, because humans are interacting with it in a similar way. And so I'm curious, in high school, I didn't have the internet and so writing papers was a different experience than in college when you had Wikipedia and you had Google. And I think we will adapt and I'm a little worried about what this next generation of kids will do, because I feel like I'm going to be left at the dust and soon be a dinosaur with the technology that they're growing up with today.
Jon:
Yeah. Did you know that every generation is more intelligent than the preceding generation. So if you were to administer a standard IQ test to a large population of young people today versus young people 30 years ago, 60 years ago, 90 years ago, they are like... So for all the kind of whinging that older people do about, oh these kids don't know anything. They actually, in terms of just the way, and defining the term intelligence is very hard to do. But one definition that I really like is that intelligence is whatever IQ tests measure. And if we're using that as the barometer of intelligence, every generation is smarter than us. And it is partly because they are more and more integrated with these kinds of informational tools.
Sadie:
And then I'm sure there's some gene selection too. Well, that's one area we didn't get into is AlphaFold. I think there were some really cool things and applications of how AlphaFold is being used today. I'm excited to see more in that space. That'd be something I'd like to see more entrepreneurs in, is AI healthcare. I think there's a lot of opportunity in that space. If you're wanting to be an entrepreneur in AI, I would say dive into health because that is a wide open field.
Jon:
And Sadie, you couldn't have more perfectly foreshadowed the very next guest episode of this program. So the next episode, the next Tuesday episode, episode 643, is going to have Professor Charlotte Deane who works on the same kinds of problems that AlphaFold solves. So at a high level AlphaFold 2, if you haven't heard of it, is this algorithm that can very accurately predict protein structure from the sequence of amino acids. These amino acid building blocks make up proteins. So all the proteins in your body that are doing all the work they're made up of these strings of amino acids that fold together. And AlphaFold 2 is capable of, similar to the way the ChatGPT is just so far beyond what anybody anticipated before it came out, AlphaFold 2 was like that too, where it completely crushed all of the existing models that predicted protein structure based on protein sequence.
And the interesting thing that I learned, so I've already actually recorded that episode, so I know what's going to happen in it. But there's really interesting ways that AlphaFold 2, there's lots of things that it can't do. And it's related to data limitations as well as the complexity of some particular types of protein problems that are really relevant to human health. It's one of my favorite episodes that I've ever recorded. Professor Charlotte Deane is absolutely brilliant. Listeners, Sadie, you're going to love this next guest episode coming out next week. And I learned so much. And it's exactly this kind of thing you're talking about.
There is so much ground to cover in medical applications of AI and she is a foremost world expert on healthcare applications of AI. So yeah, something to think about. And yeah, a nod to Jeremie Harris. Another really great career to think about in 2023 if you're just getting started in this space is AI safety.
So regardless whether we are going to have AGI in the next few years or decades or not, there are still huge issues around AI safety that become particularly important if a single algorithm is more intelligent than a human, but are important either way. And so organizations like 80,000 Hours that spend a lot of time thinking about how you can maximally use your skillset that you're born with or that you could learn. And actually on that note, if you want a whole episode from a prominent person at 80,000 Hours who did a great episode for us specialized on data science careers, that's episode number 497 with Benjamin Todd, one of the most popular episodes of 2021. So kind of all over the place here with this career thing. But yeah, I agree with you a hundred percent that healthcare AI. What about healthcare blockchain and healthcare NFT stuff, Sadie? Is there other opportunities there?
Sadie:
I think there could be. So I really liked the definition, and his name is slipping from me, who you interviewed was the chief economist for Chainalysis.
Jon:
For Chainalysis? Philip Gradwell.
Sadie:
Phillip. I really loved his definition of blockchain because I think it was just super simple and clean. It was just a shareable database that is verifiable and has a consensus for how it operates. And so in terms of healthcare, where we get a lot of power is being able to have one complete record of ourselves that we can share that information with who we want to share it with and who we don't want to share with. So I think there could be some potential there in terms of, Hey, I'm willing to add my data to this AI model for the good of humanity. It could be just the self-service self-listing that you do there, or it could be you have a particular disease and you need case trials and studies. Here's my complete health record. I'm happy to share that. And there's this trust and privacy that I know you could do with it. So I could see coming up with some use cases, but we'll see if we get there. Not in 2023.
Jon:
Maybe not, but certainly fertile ground for healthcare AI. And so I promised our listeners at the beginning of the show that we would get to have an update from you. So you haven't been on SuperDataScience for a year. What's going on? What do you predict is happening in Sadie St. Lawrence's life in 2023?
Sadie:
Lots of fun things are happening in 2023. So this past year when I've been up to is building out the Women in Data community a lot more. That's been my number one focus. I went full-time to the role two years ago. So it has been a fun adventure and I am so grateful for that community where I get to learn from others and meet amazing people across the globe. And so in 2023 I'm really excited from a Women in Data side are planning a Women in Data world tour. So we're going to visit each of our regions, interview people there, show where different countries are at on their data maturity, show a little bit of what it's like to live there. So showing both the technical but the human element for it. So excited for that. And then I also launched another company this year where I wanted to develop some innovative solutions in terms of blockchain and the intersection of data science.
And so one is a whole frame we're working on right now for Work 3.0.
And one of the things we're starting with is certification, blockchain certifications for education diplomas, et cetera. So MIT has done this, Stanford has done this and a few emerging markets have done this, but the reason is if you want to move from the US to another country or from South Africa, you cannot verify that today for a lot of the education that you have. And so it provides a standard platform to be able to do that. So doing some fun things there. And then one of the things I enjoyed most this year was being a mentor for a VC firm, and so working with startup. I knew you would know that.
Jon:
My 2023 prediction, oh, it's on track. Wow.
Sadie:
I didn't want to give you any data now you're going to feel really confident about that production. So yeah, I really this year fell in love with just business development and taking what I've learned and sharing that knowledge with others and continuing to do that. So we'll continue to do that in 2023 and probably be writing a book thanks to a little motivation from you. So hopefully that will be a great way to continue to share my knowledge and just get more people in the conversation.
Jon:
Nice. And I won't spill the beans on the topic, but we were talking about it before the show and it's going to be awesome. I can't wait for that book to come out. I'm so glad that that's happening. All right, but we haven't discussed the most important topic in life and on the show, we need to make some CrossFit predictions. Sadie. So who is going to be the CrossFit games champion for the female and the male individual categories in 2023?
Sadie:
So I'm going to have to defer to you on this because my journey to this space was not at all... I'm not a level of serious fitness.
Jon:
I forgot that you're deliberately not, you're trying, you get so sucked into things and you get so deep into them that you're really concerned that if you start drinking the Kool-Aid like I have in terms of all of the professional events in this space. Yeah. So actually, oh yeah because the way that we were talking about this is that you were saying you might not even do the open or you haven't done the open before, right?
Sadie:
No. So I got into the space because I really just wanted to get a strong body and I've never lifted weights in my life before. And my theme for this year was like a strong body leads to a strong mind. And so I found a gym that's really close, tried all their classes, ended up falling in love with their CrossFit classes and I'm getting sucked in more and more because I now heard that I go to the elite group, which I didn't even know was a thing and the A team, but I'm trying to not get so sucked in into doing the games. So I'm going to have to ask you, what's your prediction for the games and who do you think is going to win?
Jon:
I'll just give a tiny little bit of context for our listeners out there. So anyone in the world, any single one of you listening out there, you can pay, I don't know how it exactly is denominated around the world, but in the US you pay $20, maybe it's just a simple forex conversion if you're somewhere else in the world. You can pay $20 to be in the CrossFit Open. And hundreds of thousands of people do that every year. And it's three workouts over three weeks. So it gets assigned on a Thursday night, it starts in late February and you get assigned a workout on the Thursday night and you have until Sunday night or Monday morning, something like that to log the workout. And you have either a certified judge evaluate you or you record a video of it and you upload the video.
And it's a great way to benchmark your fitness year over year regardless where you are in your fitness journey. So these workouts are specifically designed so that even if you're listening right now and you're like, I haven't done any fitness stuff in years, or ever, this is the best time to be doing the CrossFit Open because you can give yourself a really low benchmark and it'll help you have motivation for over the course of the year knowing that you're going to retest a year later.
How are you going to do in the Open and hope that you move up a little bit in the global table, in the percentiles.
And then so as a result of that Open, it's a super meritocratic sport, in that, based on your performance in a single year, you moved from The Open to a series of increasingly competitive events, the quarter-finals, the semi-finals, and then the global CrossFit games, which the last few years have been happening in Madison, Wisconsin. And so super meritocratic, unlike the typical cartel sports of North America where even if you're the worst team, you don't get booted out, you just have profit sharing and salary caps. Part of why I love European football, soccer, because winning and losing really does matter.
But yeah, so CrossFit uniquely, at least in North American sports is not a cartel totally meritocratic. And that's part of what I love about it. And we see really interesting people emerge every year. Somebody that I think will do really well on the male side this year is Roman Khrennikov. I think he could win.
So we've had, Justin Medeiros has won the last couple years, but Roman Khrennikov unbelievably powerful athlete and he had qualified for the game several years in a row, but due to visa issues, he's Russian, due to visa issues, he was unable to travel to the US and compete in person at the games. So we didn't see him competing against these guys. And he podiumed, I can't remember if he came second or third, but I think now that he has a visa, he can be in the US, he doesn't have these kinds of question marks around his career.
He can focus on doing it professionally full time. I think that he could be our champion in 2023. And then on the women's side, I mean you can't bet against Tia Toomey, she's won six years in a row now, if I'm remembering correctly. And she's still in the game. She's injury free. She'll win the seventh time in a row. But people like Mal O'Brien, maybe Anna Lawson, really young teenagers are nipping at her heels. And so it'll be interesting when eventually she will have to retire. It'll be really interesting to see who emerges as champion after that or if somebody can overtake her this year, that'll be a huge story in itself. Well yeah, there you go. Listeners get my... I don't talk about CrossFit that much on air.
Sadie:
Here's one thing I will say though, I will say that I think if there's any workout that fits to a data scientist like personality, it is a hundred percent CrossFit and mainly because of what you mentioned, which is you can measure your success and you can track it and there's always further to go. And it's probably one of, to me, like the most data driven, may I say that about CrossFit, the most data driven-
Jon:
Totally.
Sadie:
... type of full body workout. So it's the beginning of a new year. You probably have some health goals, maybe something to go check out. I think it'll be right up your alley as a data scientist.
Jon:
Nice. All right, so now that we've covered CrossFit and our 2023 CrossFit predictions, we can start to wrap up the episode. Sadie, do you have a book recommendation for us for 2023?
Sadie:
I'm just going to drop you with what I'm reading right now and just finished and absolutely love, and it's definitely geared to CEOs, it's called The CEO Within. But I wish that younger in my career I would've read more leadership and management books because once you get into those positions, I feel like I should have been a lot nicer to my previous leaders and managers because I have a lot more empathy for them in the position. So I think it's helpful for individuals to read it from that perspective. But I love it because it's just a great business book and has amazing references to anything that you want to do. He's coached some of the top Silicon Valley CEOs and founders and I think if you're looking for some personal growth, it's worth the dive into.
Jon:
Nice. I love it. And that is also an area where I certainly feel like I can improve. I've written it down as something that I should be checking out personally because it so easy to think, I think a lot of people listening to the show, technical people like us, we have gone through so many parts of our life, many of us where we got to be one of the more clever people in class or whatever and that's kind of what led us into these relatively technical, relatively quantitative fields that we're in here. And so it can lead us to think that, oh, management isn't that hard, I'm smart, I'm a good manager. And there's a lot to learn and better to be reading it from a book than making the mistakes yourself first. So great recommendation there and certainly something that I should be working on in 2023. All right, Sadie, so if our listeners haven't already listened to a Super Data Science episode featuring yourself, they may not know about where to follow you, where should they be following you to get more insights over the course of the year?
Sadie:
Yeah, I'm fairly active on LinkedIn. Twitter also works. You can just find me @sadiestlawrence and love to stay connected with everyone. I would love to know what I missed, what you think some of your predictions are, or just even personal goals. So I'm a big fan of being inspired from others' personal goals, so definitely hit me up and share what you're working on or what your goals are for 2023.
Jon:
Nice. Let her know what your front squat target is. Total weight by the end of the year. All right-
Sadie:
Doing new max. And so that's where we're at is we're seeing what our new max is. I'm really excited.
Jon:
Nice. For front squat?
Sadie:
Yep. So I'm hoping to go big.
Jon:
Nice. I look forward to hearing about that. All right, Sadie, thank you so much for being on the program again this year for a Data Science Trends episode. You're certainly our front runner to be doing the 2024 edition. I hope that you'll accept my invitation when it comes. Always a treat to spend time with you on air or off air. Sadie, thank you so much and catch you again soon.
Sadie:
Thanks Jon.
Jon:
Well, I hope you're as jazzed as I am about how wildly transformative the field of AI will continue to be in 2023 after a watershed year in 2022. In today's episode, Sadie led our journey through predictions for the year ahead, including how Data as a Product will become increasingly popular as API endpoints like DALL.E 2 And ChatGPT provide a remarkably wide breadth of useful data outputs from large language models.
She also talked about how multimodal models that leverage large language models under the hood will continue to stun us in the year head perhaps by kneeling down compelling video. She talked about how while senior data administrators might not have access to a true data mesh per Zhamak Dehghani's definition, they will need to nevertheless increase data literacy and data accessibility across their enterprise in order to realize the commercial potential of data and data models.
And she talked about how more countries will adopt frameworks like the AI Bill of Rights and enshrine data privacy and AI algorithm ethics into law. As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Sadie's social media profiles, as well as my own social media profiles at superdatascience.com/641. That's superdatascience.com/641. If you too would like to ask questions of future guests of the show, like several audience members did during today's episode, then consider following me on LinkedIn or Twitter as that's where I post who upcoming guests are and ask you to provide your inquiries for them.
We talked a lot in this episode about large language models. If you'd like to learn more about them, coming up on March 1st, I'll be hosting a virtual conference on natural language processing with large language models like BERT and the GPT series architectures. It'll be interactive, practical, and it'll feature some of the most influential scientists and instructors in the LLM NLP space as speakers. It'll be live in the O'Reilly platform, which many employers and universities provide access to. Otherwise you can grab a free trial. Hopefully catch you then.
All right, thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the SuperDataScience team for producing another exceptional episode for us today.
For enabling that super team to create this free podcast for you. We are deeply grateful to our sponsors whom I've hand selected as partners because I expect their products to be genuinely of interest to you. Please consider supporting this free show by checking out our sponsors links, which you can find in the show notes. And if you yourself are interested in sponsoring an episode, you can get all the details on how by making your way to jonkrohn.com/podcast.
Last but not least, thanks to you for listening all the way to the end of the show. Until next time, my friend, keep on rocking it out there and I'm looking forward to enjoying another round of the SuperDataScience podcast with you very soon.
Show all
arrow_downward