Podcastskeyboard_arrow_rightSDS 537: Data Science Trends for 2022

76 minutes

BusinessData ScienceArtificial Intelligence

SDS 537: Data Science Trends for 2022

Podcast Guest: Sadie St. Lawrence

Tuesday Jan 04, 2022

Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn


In this episode, Sadie St. Lawrence joins us to explore the months ahead with an in-depth look at the trends set to take over the data science industry as we know it.


About Sadie St. Lawrence
Sadie St. Lawrence is the Founder and CEO of Women in Data, an international nonprofit organization working to close the gender gap in technology and get more women in the C-Suite. She was the first female data science teacher to teach on the Coursera platform and has trained over 300,000 people in data science.

Overview
Happy new year! We're kicking off 2022 with a look at the most significant trends that will dominate the data science industry in the months ahead. After making her SuperDataScience podcast debut in October, Sadie St. Lawrence returns to guide us on data science predictions for the year ahead and beyond.

We kicked off the episode by looking at this past year's most buzzed-about trends and taking note of the accuracy of our predictions. These included growing research on gender and ethnic biases, explainable AI, federated learning, and the increasing popularity of the machine learning library, PyTorch – all of which proved to be dominant trends throughout the year.

Now, on to 2022! The first trend that Sadie and Jon tackle is AutoML and how it will enable data scientists to optimize and deploy machine learning models more efficiently. And while AutoML will undoubtedly make our lives as data scientists easier, there are also drawbacks to this trend. Sadie highlights that it may also accelerate the adverse social side effects of automation if we're not careful.

Next, we spotlight Generative Adversarial Networks (GANs) and how they will be used–and misused–to generate remarkably compelling deepfakes. Using positive examples, like the improvements made to old Beatles video footage and more severe cases like revenge porn, Jon and Sadie discuss the ramifications that can develop with further advancements of this technology and the increasing need for regulation in this field. Ultimately, within this trend, we will see three significant advances: the quality of deepfakes will increase as they become easier to create; legislation will grow surrounding this technology; and lastly, we can expect to see the development of real-time deepfakes for use in live video.

Thirdly, Sadie predicts that scalable AI architectures will continue evolving and describes four sub-trends that will help expedite the prototype-to-production process. These include being cloud-first, establishing standardized and automated workflows, monitoring performance in production, and ensuring traceability.

Now, it wouldn't be a discussion about trends if we didn't discuss the ever-evolving remote work experiment. As organizations adapt to the demands of employees and the shifting landscape of the modern workplace, we explore how this trend will impact data science organizations. While Jon notes that data science work can definitely be conducted in a remote environment, Sadie shares a top-level perspective and predicts that organizations that embrace remote work will rise to the top and attract better talent. Leaders, in particular, should focus on updating hiring frameworks and building connection through one-on-ones, as well as getting creative on where they're outreaching.

Next, Sadie discusses the application of blockchain within data science and how NFTs could be used to provide more accurate data on any real-world object and potentially even prevent identity theft.

Finally, data literacy steps into the spotlight when Jon and Sadie predict that more workers will become data literate in 2022. With only 21% of the global workforce being data literate, this growing trend will prove critical to making data-driven decision-making accessible beyond technical data teams.

Jon and Sadie closed out the podcast by discussing some crowdsourced trends from our LinkedIn audience. Among them, the combination of Artificial Intelligence with the Internet of Things AIoT.
 
In this episode you will learn: 
  • A look back at data science trends for 2021 [4:03]
  • Micro and macro data science trends for 2022 [12:30]
  • AutoML tools [15:20]
  • The social implications of deepfakes [21:21]
  • Scalable AI [38:40]
  • Macro data science trends for 2022 [42:45]
  • The impact of the remote-working economy in data science [43:21]
  • Blockchain in data science [50:28]
  • Data literacy of the global workforce [1:01:07]
 
Items mentioned in this podcast: 

Follow Sadie:
Jon: 00:00:00
This is episode number 537 on data science trends for 2022 with Sadie St. Lawrence. 

Jon: 00:00:13
Welcome to the SuperDataScience podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple. 

Jon: 00:00:44
Happy New Year. Welcome to the year 2022 and welcome back to the SuperDataScience podcast. To kick off the new year, we've got an annual prediction special for you today. We're going to start the episode off by looking back at how our predictions for 2021 panned out from a year ago, and then we'll dive into our predictions for the year ahead. Specific trends we'll be discussing, include the AutoML tools that are automating parts of data scientist's jobs, the social implications of deep fakes, which are becoming so lifelike and easy to create, principles for making AI models infinitely scalable in production, the impact of the consolidating remote working economy on data science employment in particular, productive uses of blockchain and non-fungible token technology in data science, and improving the data literacy of the global workforce across all industries from desks to factories, to farms.

Jon: 00:01:43
Our very special guest to guide us through these predictions is the marvelous Sadie St. Lawrence, a data science and machine learning instructor whose content has been enjoyed by over 300,000 students. She's also the founder and CEO of womenindata.org, a community of over 20,000 women across 17 countries. Sadie is my first-ever repeat guest on the SuperDataScience podcast. Her episode in October on data science and machine learning courses, specifically episode number 517 was one of the most popular in 2021. Combine that with her engaging speaking style and how remarkably well-read she is on the future of technology across industries and it's clear that Sadie's ideally suited to guiding us on predictions for the multidisciplinary field of data science in 2022. Today's episode is relatively high level. It will be of interest to anyone who'd like to understand the trends that will shape the field of data science and the broader world, not only in 2022, but also in the years beyond. All right, you ready for this special annual episode? Let's do it. 

Jon: 00:02:57
Sadie, welcome back to the SuperDataScience podcast. Weren't you just here? 

Sadie: 00:03:01
I was, and it's so nice to be back talking about one of my favorite topics today. So it's going to be a super fun episode. 

Jon: 00:03:09
Yeah, you were just here in episode 517, and you are my first repeat guest on the SuperDataScience podcast, but I absolutely loved filming that episode with you. Usually, I think people can understand this, recording these episodes is exhausting. You are on air with somebody, having to listen very carefully to every single thing they say for maybe 90 minutes or longer. And so usually I'm pretty exhausted, but after filming your episode, I just felt really energized. 

Sadie: 00:03:39
Awesome. Yeah, I love that. Same. I think those are the best types of conversations is when you leave feeling more energized than when you came into the conversation. So hopefully, not to set expectations too high, but hopefully we'll have that again today. 

Jon: 00:03:51
I have no doubt and we have lots of exciting things planned for the listeners today. We are talking about 2022 trends. And to get started on that, I thought we could review the 2021 trends that Ben Taylor and I made a year ago back in episode number 433, which was released on the 1st of January 2021. Ben Taylor and I made predictions about what would happen in 2021, and we were pretty spot on. So we talked about research on gender and ethic biases in AI, and that has definitely taken off as a bigger research area. For example, big conferences like NeurIPS, which is the most is prestigious AI conference out there, they have a conference section dedicated to specifically that research stream. And they even ask, I think it's a mandatory part of their questionnaire, when you submit any paper to say, "Have you thought about the ethical ramifications of what your technology does?" And you had another example of something that's happened in this, in 2021, in this area, Sadie. 

Sadie: 00:05:07
Yeah. So this year I watched the movie Coded Bias. It came out on Netflix. It's a documentary. It talks about how all the research being done around computer vision and how it's biased. It actually went up to Congress. It's a fantastic documentary. I highly recommend anybody check it out. Again, it's called Coded Bias. I think it just shows really the breadth of how big of a deal this is, to think that there's a movie on Netflix talking about this specific subject. You guys were spot on with this as being a top trend for 2021. 

Jon: 00:05:40
Yeah, that's really cool. I didn't know about this Netflix special until just before you and I started recording and I can't wait. I am going to try to watch it tonight. 

Sadie: 00:05:49
Nice. 

Jon: 00:05:50
Yeah. So then our second big prediction for 2021 was that tools for understanding black box algorithms would take off more, and boy, did they ever. So Sadie, you had the brilliant idea just before we started recording that I could quantify this by looking at Google Scholar. So Google Scholar is a Google search over academic papers and I was able to subset based on years. So since 2020, there have been 17,000 papers on explainable AI and 12,000 of those have been just in the last year. So about twice as many, in fact more than twice as many papers have come out in 2021 on explainable AI relative to the preceding year. And there are some great resources if you'd like to learn more about explainable AI yourself. So you can check out episode number 513 of this podcast with Denis Rothman, who is a expert on this topic. And there's a book by Serg Masis who will come up again later in this episode because he had some trends that he predicted for 2022 that he provided to us, to Sadie and me before the podcast. And he wrote a great book called Interpretable ML with Python. 

Jon: 00:07:08
The third big prediction that Ben and I made was about training models without compromising user data. So specifically we talked about this idea of federated learning, where you can do learning on the person's own device as opposed to needing to bring it into a central server to learn on that. And I thought that an interesting thing that happened in 2021 that's directly related to this is the extent of the Apple privacy features that have been rolled out in their iOS system on iPhones and iPads, for example, blocking all third party cookies, if you'd like to, which there's a big fuss. Companies like Facebook were taking out full page ads that I was seeing in The Economist saying this is bad for small business. So it's so interesting to see that big tech has become so big that mainstream media publications have these propaganda ads on their different perspectives. So I thought that was interesting. 

Sadie: 00:08:03
I think that federated learning for 2021 had one of the potential for the biggest impact, because if we look at developing countries that are so mobile first and taking a mobile centric approach to these models, it really just opens up this space for use. And that's really what I look at is in terms of what are going to be the big trends is where are we going to get the biggest use out of it. And so love that you guys included this in there, and I think you'll continue to see it as a trend in 2022. 

Jon: 00:08:34
No doubt. This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SuperDataScience. It's the namesake of this very podcast. In the platform, you'll discover all of our 50 plus courses, which together provide over 300 hours of content with new courses being added on average once per month. All of that and more you get as part of your membership at SuperDataScience. So don't hold off, sign up today at www.superdatascience.com. Secure your membership and take your data science skills to the next level. 

Jon: 00:09:16
And then our final big prediction was about libraries that we thought would take off in 2021. We specifically focused on PyTorch, which is an automatic differentiation library for Python. And a lot of people use it for deep learning. It has a lot of functions in there that make building deep learning models easy, but it can be used for any kind of machine learning or any kind of computation really. Any time that you'd like to do calculus quickly, a library like PyTorch is useful for you. And in machine learning, we often do want to use calculus to calculate something called the gradient that allows us to optimize our algorithms. And so we thought that PyTorch could become especially popular in 2021 because of its ease of use relative to the big incumbent in the automatic differentiation library space, which is TensorFlow. And indeed, that is what happened. So prior to 2020, TensorFlow was the clear leader ahead of PyTorch. In 2020, they were about neck and neck. This is in terms of Google search trends that I looked up just before recording this episode. And in 2021, PyTorch, not by a huge margin, but definitely overtook in terms of popularity in Google searches. So yeah, PyTorch continues to get adoption. 

Jon: 00:10:36
Originally started in academia getting a lot of traction, but now we see more and more job descriptions, more and more people learning about it. And in fact, something that I did in 2021 was release a big linear algebra for machine learning course that's available free on YouTube, as well as I've gotten through most or like 80% of the way through releasing the videos for a calculus for machine learning course on YouTube. And both of those, I teach the fundamental ways of building a computational graph across those courses in PyTorch and TensorFlow and NumPy. But I focus primarily on PyTorch, because it's my favorite to use and I think it's going to continue to grow in popularity in years to come. 

Sadie: 00:11:21
Yeah, I don't see this train slowing down anytime soon. 

Jon: 00:11:27
Nice. Yeah. Yeah, it's a lot of fun to use. It's so easy to debug. 

Sadie: 00:11:35
I kind of look at it as what's the children's story with the three bears and there was the porridge? 

Jon: 00:11:41
Goldilocks. 

Sadie: 00:11:42
Yeah, Goldilocks. So there was the porridge, it was too hot, too cold and just right. TensorFlow in my opinion is too hot. There's so much you can do with it, but a little too complex. Then Keras came along, which is like a little too cool. And then PyTorch comes along and it's like that one in the middle that's just right. And I think that's why we're going to see it continue to grow in popularity, because it really combines the best of both worlds between Keras and TensorFlow. 

Jon: 00:12:10
Yeah, I love that analogy and I think you're exactly right, Sadie. All right, so across our predictions for 2021, pretty spot on. I'm kind of surprised that we were that right. Sadie, we've got big shoes to fill, to try to be as correct with our predictions for 2022. I think we've got a good shot though. And you've had so many brilliant ideas for topics to talk about in this episode, I can't wait to get going into them. And you broke them up into two main areas. So we have three big topics that are micro trends and then three big topics after that, that are macro trends. So why did you think to break it up that way? 

Sadie: 00:12:52
Yeah, so during the pandemic, I really got into weather forecasting and started studying weather forecasting. And one of the things I found was that there are micro factors that affect the weather, which we commonly hear about on the news when we watch The Weather Channel, the air pressure, the humidity, et cetera. And those are small factors that affect our day to day and change the weather rapidly. However, there's macro trends that affect it on a higher level and that is actually from solar flares. So solar flares from the sun- 

Jon: 00:13:26
What? 

Sadie: 00:13:26
Yes. I know, right? So check this out. The biggest factor to affect weather on our planet is from the sun and when the sun has these flares that burst out these big heat waves. And most people don't know about this, but that is the biggest predictor of weather. And it also is at a macro level, so when we look at it, we have to zoom out in terms of what's going to be factoring our environment here in the next one to two years, et cetera. So as I started to think about the trends for data science, I'm like, there's small factors like air pressure, humidity, if we use the weather comparison, that are going to affect us immediately within this next year. And then there's solar flares in data science that are big factors in our ecosystem of data science that are happening maybe at a slower rate of when they'll reach our planet, but they are going to affect our environment within data science. So that was my thinking in terms of breaking it down, it's like, what are those micro trends here within the next one year, but then how do we take a step back and look at it almost as a planetary system, an ecosystem of data science that we live in to think more five years out and long term picture of what will be affecting our industry. 

Jon: 00:14:43
Nice. That is a really cool inspiration for this. Did you come across anything? I feel like I've learned at some point in the past that solar flares could be so intense that it could knock out all of our telecommunications or all of our electrical systems. Is that true? 

Sadie: 00:14:59
Yeah, it really could. That's why I think they're the coolest thing to study. So I know not for this topic, but maybe another podcast, we can go into solar flares. 

Jon: 00:15:11
Wow. Yeah. All right. So for this podcast, for our 2022 trends, we're going to do the micro trends first and then the macro trends. We have three of each. For the micro trends, we're doing a software tool trend to start, and this is AutoML. So this is an idea that you can have algorithms that figure out what model is perfect for your particular application. So it tries out neural networks, it tries out decision trees, it tries out regression, and then it tries lots of different kinds of hyper parameters, how many layers do I have in my neural network, what's my learning rate. All these kinds of things are handled automatically by the algorithm. I think even things like automated feature creation could be part of these AutoML algorithms. Sadie is nodding her head in agreement. So this is your first trend. Tell us about it, Sadie. 

Sadie: 00:16:08
Yeah. So I am excited about this trend because I think there's a lot of opportunity to get more, as we like to use the term, citizen data scientists in this space. So people who may not have as much experience within Python or haven't taken linear algebra, but can understand the frameworks of how an ML model works, they can throw data in a tool and get an output. Now with that being said, I'm sure there's a lot of people who are cringing right now to think of someone just throwing data at a tool, getting outputs, knowing all the problems we're faced with today, with professionals creating these algorithms and using them and the bias that's created. It's a little- 

Jon: 00:16:49
What could go wrong? 

Sadie: 00:16:50
I know. Nothing. I mean, data in, data out, I'm sure it's all correct, right? But there are a lot of advancements happening in this space and the other side of things is there's a lot of money being thrown at it. So I just did a simple Google search on AutoML and I was so surprised with how many vendors came up, but more importantly, how many of them were paid ads. So this means that they have a ton of venture capital backed behind them, they're spending a lot on marketing, they're really pushing these products and tools. And if you do a search, the first six of these tools are just ads for AutoML. So we see everything from Alteryx, Databricks, GCP has their own stuff, Azure, AWS, DataRobot. I mean, the list really goes on forever. 

Sadie: 00:17:41
So I think this is something we need to be watching closely because yes, it may not be perfect from the standpoint of it's going to solve all your ML problems and you can just throw data at it and get the perfect solution, but things are advancing quickly in this space of what they're able to do. And I think it also should change our mindset as business leaders in terms of who are we allowing access to these tools, what are those guide rails for how they use them, we need to be thinking about, and then how do we take things from AutoML tools to productionalize them. And so I think that's where we need to be thinking when looking at this AutoML space. 

Jon: 00:18:21
Cool. Yeah, those are all really great questions to consider. This is massive. If we're allowing a larger group of people who don't need to necessarily be able to understand validation metrics, who don't need to necessarily understand what model they're choosing at all, they're just, just pick the best model for me, there's a huge amount of opportunity in that by opening it up to what you described as citizen data scientists. I love that term. But yeah, big risks because as we're going to talk about later in this episode, we have enough issues with ethics as it is with people who should understand what's happening behind the covers of the models that they're using and be aware of the kinds of issues, the kinds of biases that could creep in. So yeah. 

Jon: 00:19:14
And then on that production point that you made, I suspect that some of these tools also probably make it pretty easy to deploy these models into an API so that maybe you don't even have to do any backend engineering yourself, you can put your data into an AutoML tool and then have the same kind of provider like the big cloud providers, AWS, Google Cloud Platform, Azure, they probably have functionality that can then take the model weights, put it into some kind of system that can automatically scale up based on the needs that that model has, depending on how often it's culled and surface that. So yeah, huge scalability opportunities, both in terms of having more people be able to create these models as well as to be able to deploy them. 

Sadie: 00:20:05
Yeah, and I think also if you're a practicing data scientist right now and you pride yourself on the cleanliness of the code you write and the complexity of the models you develop, you may want to take a little bit different look and approach to that. I'm not saying your skills are going out of date, but at the end of the day, we're automating our job at the same time as we're doing it. So I think from that perspective too, it's important to think about what are then the core key skill sets as a data scientist that are valuable. Maybe it's not so much picking the perfect model, and that's going to be core as we're starting to automate part of that process. Maybe it's finding those business applications and how to implement that, or making sure the data we're putting into it is clean and reliable and representative of the population and not biased. So I think it's important from that standpoint too, to just think about how are we even automating our own job and where should our skills be focused moving forward. 

Jon: 00:21:02
Very nicely said. Cool. All right, so that's trend number one for this episode, AutoML. And I agree with you wholeheartedly that, that will continue to be a big trend, not least because of all the VC dollars that you're seeing behind all of the Google ads when you look up anything about AutoML. 

Jon: 00:21:21
So our second topic is kind of a fun one, but also a huge danger. So this is deep fakes. So this is a topic that can have big cultural and social impact beyond just the data science field, but in society at large. So you sent me this super cool video about The Beatles: Get Back film, which came out this year. And it's directed by Peter Jackson who is famous for doing movies like Lord of the Rings. And so Peter Jackson was working with this footage, this studio footage from when The Beatles were recording one of their later albums and it includes, there's this famous footage of their last concert together, which is on the rooftop of the studio that they were recording in. So they just put some amps up on the roof, they played from the rooftop and people gathered in the streets below to hear what you didn't know at the time, but that was going to be the last Beatles album. 

Jon: 00:22:23
So The Beatles got so tired of touring. They were famous for having the first big mega tours across the US selling out baseball stadiums and that kind of thing. Nobody had done tours like that before, which are common today. But they got so tired of that, that they stopped touring entirely, they were just recording albums. But they felt that there's this iconic moment that's captured in this film, as well as all the related indoor studio footage recording this album. And because the footage was from the sixties, it was grainy, the colors weren't particularly crisp. So the director, Peter Jackson put it through a generative adversarial network, a GAN, to improve the color and the quality. It got rid of graininess, it got rid of imperfections and it really made the imagery pop, gave it a sharpness and a vibrancy. So we'll be sure to put a link to the video in the show notes. And that isn't all. So not only did they use GANs to dramatically improve the color, but they also used machine learning models to split the mono tracks that they had from the recordings into separate tracks so that you could have ... So they trained a machine learning algorithm to be able to recognize the guitar, to be able to recognize the bass, to be able to recognize the drums, to be able to recognize the individual voices of The Beatles, so John Lennon, Paul McCartney, George Harrison, their individual voices. So you use this machine learning algorithm to split it into separate tracks and this allowed them to do some crazy things like ... 

Jon: 00:24:07
So The Beatles, when they didn't want their conversations to be heard, they would crank up their guitar amps and speak quietly to themselves, but Peter Jackson managed to reverse engineer their conversations with these machine learning algorithms so that now you can hear those private conversations in the movie. So actually there right away, we're already seeing the dark side of using machine learning and these kinds of things. So I don't know if you have anything else to add on that. I mean, you were the one who provided me with that resource, but I think it's a really cool example of how these GAN algorithms, which are behind deep fakes that can largely be used for mischief, can be used to do some good, to have some fun. 

Sadie: 00:24:56
Yeah, I just love that example because I'm a tech optimist and deep fakes, I usually don't find anything about them besides just negative information. And so I like it as an example of, no, here's a way we can use this technology and use it for good in a really fun manner to recreate something that will be a positive experience for people. I mean, I've seen other examples where they've taken historical photos and updated them to what that person would look like in modern times, so what Shakespeare would look like, what George Washington would look like, what Abraham Lincoln. I've seen where they've had now Einstein as a deep fake to teach classes to people. So these are kind of fun examples to provide interesting ways that provide some benefit and positive experiences for deep fake. And then there's new ones coming out in terms of sharing videos of your deceased loved ones and they recreate it. Now for some that could be traumatizing and a trigger, for others that could be a chance to feel reconnected with the person that they love. So lots of interesting things in this space. I know we're going to get into some of the negatives of it, but I really love that we started off with the positive of how to use some of these. 

Jon: 00:26:19
Yeah. Yeah, so thanks for sharing those other examples. And yeah, I've seen those kinds of things where they take a photo of somebody who could be deceased, a family member. And I think there are online tools for doing this cheaply, probably freely, restoring color to it, getting rid of graininess, just like Peter Jackson is in this Beatles film. But yes, this technology, these deep fakes, they came out of this idea. So generative adversarial networks, they rely on deep learning. So we have neural network algorithms that involve layering these neural networks deeply, and each layer of neural network can have more visual complexity, more abstraction that it can provide into say a visual image. And the idea behind GANs specifically is having two of these deep neural networks that act as adversaries against each other, where one is trying to forge images or videos, and then the other is trying to detect which ones are forgeries and which ones are real. And then, so we have this adversarial play back and forth between the two where in order for the forger to evade the detective, it needs to create better and better forgeries. 

Jon: 00:27:46
So basically at the beginning of training, when you train one of these, you start with just static, the images or the video that it outputs just looks like static. But over the course of training, the major features in the imagery start to come together. And then through quite a bit of training, you can have stunning, photo realistic images and videos. So it was created in a completely innocent way, but this technology has been used a lot for deep fakes, like we say. And so for example, in the UK, the former culture secretary, Maria Miller recently stated that people who create deep fakes that are pornographic, so where you take a real person, it could be a celebrity or maybe somebody that you want to have revenge against, you take real photos of them, and then you can use a generative adversarial network to make it look like they're in pornography. And Maria Miller says that people who do that should be put on a sex offender's registry. So yeah, some seriously bad stuff happening with deep fakes. 

Jon: 00:29:03
And they can be highly persuasive. Every year that goes by, these get more and more compelling and more and more difficult to discern from the real thing. Another example that you sent to me, Sadie, in the run up to recording was an MIT professor who had to publicly state that he didn't endorse a stock trading algorithm because there was a compelling video that had been published that purported that he did. It looked like he was saying that he supported it. So an example of somebody who isn't Donald Trump or some mega well-known person who's being imitated, this is a pretty ordinary person. So yeah, I'm sure you have lots more to say on this topic, Sadie. 

Sadie: 00:29:45
Yeah. Well, I think what we'll find is how few of images you need to actually train these models. So at a minimum it's like 300 images. If you pull up your Instagram, your Twitter, look and see just how many images you've posted so far. I would guess if you've had an account for a while, there's probably enough content out there on just a regular individual, through LinkedIn, whatever social media sites you use, that enough images could be used for someone to create a deep fake on yourself. So that is really scary in that term and then how people use it, whether they're endorsing something you would never endorse, whether they're using it for pornography. There are some really terrible things when people take your image and can make it come to life without your permission. 

Sadie: 00:30:39
And so I love the one example where it's saying that people who create these deep fakes and use them for pornography should be on a sex offender's list, because I think this is an example where we're finally starting to get some ideas in terms of regulation in AI. So we talk a lot about it, how, hey, we're behind the times in regulation, so what can we do. Is this a perfect solution? No, but I think it's good that we're getting the conversation started and saying, "Hey, this is harm. This is wrong and we need support and help from this, from all areas, not just as practitioners in this space, but also from regulators as well." 

Jon: 00:31:17
So I think there's two big trends that you've identified here already. So one of them is that we can make deep fake videos on fewer still images than ever before. So you talked about that 300 number. A couple of years ago, that might have been a much larger number, that you'd need a lot more number of stills in order for a generative adversarial network to be able to render a compelling 3D image that then we can make do whatever we like. And so probably one trend for 2022 is that that number will get smaller. So not only will the quality go up, but the number of still images you need to create a deep fake will go down. And then the other trend is the social side of things where we will probably, like Maria Miller's statement have more legislation start to catch up with where deep fakes are. 

Sadie: 00:32:10
Yeah. And then the last part on deep fakes is that we're soon going to be able to use them for live video. I've seen some tests with using it as a plugin over Zoom. Okay, so not to creep you all out even more, but just as you can get the Snapchat filters in Zoom as an app, you could have a deep fake in Zoom as an app. So in real time it will recode my face and my voice to be Jon Krohn if I want or whoever it is that I want to impersonate, but in real time. And so I think it's just really important to stay aware of these subjects because the technology and advancements in it are moving really fast, one for our safety, but also just in terms of making sure that we're being careful with how we're sharing our faces, our voice and our information online. 

Jon: 00:33:06
Cool. Yeah, really great points. And my understanding is that you have personally been impacted by this trend in- 

Sadie: 00:33:16
Yeah. Yeah, so I think maybe one of the reasons why it came up as the things I wanted to talk about is about a month ago, my Facebook account got hacked. And as most people know, Instagram is tied into Facebook and they're all connected. And so yeah, someone hacked into my account, they changed my emails, they changed my password, so I couldn't get in. 

Jon: 00:33:40
Oh my gosh. 

Sadie: 00:33:41
I notified Facebook. Facebook locked everything out and locked me down, which was great, because I didn't want anything else to happen. But now I've started to see over the internet people using my images as me, but it's not me. And that's a very scary thing to see. Some of the captions they have on the images are pretty funny, I'm not going to lie. But again, there's this whole sense that we are somewhat in the metaverse because our identity is already digitalized in a certain way and that one, there's causes for concerns from a security aspect, but a real psychological aspect with this as well. And I think that was one of the things that when I went through this experience of getting hacked and people using my images, I was surprised that there is almost a portion of neurons, I feel like in my brain that kept my identity in a digital space. And when that happened, I felt violated on a personal level, right? 

Jon: 00:34:44
Right. 

Sadie: 00:34:44
And I think that is where the big risk is, is how much of our identity is really in a digital space and do we truly conceptualize that as human beings? 

Jon: 00:35:00
Right. Yeah, that is really trippy. You mentioned a term there, metaverse, which is also something that has been big in 2021. Probably the biggest piece of news there is the company formally known as Facebook now being called Meta because they're investing so much time and resources into creating the metaverse. And so you made the analogy there that our existing digital identity is already part of metaverse, but can you define that term for us a bit more, Sadie? 

Sadie: 00:35:31
Yeah. So I think the metaverse in my opinion is still being defined. Facebook came out with a big push and a lot of people pushed back with their conference, Connect, in saying, they tried to show us what the metaverse would look like. That's just Facebook's one definition of what it would look like, which is like a VR, AR reality where you're able to work and connect, play with your connections from around the world. There's a lot of people who think of the metaverse differently from tools like Decentraland and think that we're already a portion of it in the metaverse by having just our social profiles and how connected we are with on our phones. But the whole idea of the metaverse is really that everything we do today in our daily lives will be encoded and interacted within a digital realm. So if you look at that definition, yeah, we are in the metaverse. I mean, Jon and I are talking right now through an internet connection, through a webinar linked in a virtual screen. Most of us have had that experience this year. So we're on the forefront of it taking those steps. It's just, this is going to encompass more of our life than ever before. 

Jon: 00:36:49
Yeah. As you've been talking about that, I've been thinking about how everything related to this podcast. It's these recordings that we put on the internet that people listen to of our conversations. We market them through social media profiles that have my name and my images. And if somebody took that from me and was using them to be publishing things that the real me, the one with biological brain cells and flesh and stuff wasn't behind, people wouldn't be able to tell. My digital presence could continue on doing things and without even me knowing in a lot of cases. 

Sadie: 00:37:31
Digital Jon could live forever, so. 

Jon: 00:37:33
Yeah. 

Sadie: 00:37:35
And that's actually funny you mentioned that. So people are creating their digital avatars to live forever, it's a new company called Life Legacy that will allow you to create a digital avatar and a digital legacy so that your legacy can live on forever in the metaverse. So lots of stuff happening. 

Jon: 00:37:57
Yeah, and I guess those will get even more compelling over the coming decades, to an extent that we probably can't even predict. So we're only trying to make predictions for 2022 really, because things change so rapidly. 

Sadie: 00:38:09
Yes. 

Jon: 00:38:12
All right, so that explains some of the issues around deep fakes. Coming up later, we have a potential solution, which is the blockchain is a potential way to keep, I guess, our digital personas tied to our real ones. So more on that coming up. Before we get to that, which is a macro trend, we have our final micro trend. So we did a micro trend on software, which is AutoML, we did a micro trend on the deep fakes, the cultural social impact of that, and then now we have a micro trend on hardware, so another category. And I thought this was a great topic that you picked for this, Sadie, talking about scalable AI. So we build more and more sophisticated machine learning algorithms. We want them to be able to be used potentially by a very large number of people working through web apps or iOS apps, or what have you, whatever kind of user interface that is surfacing our machine learning model predictions or interactivity. And so being able to have scalable AI architectures is critical. And so you have five specific principles that we can follow to guide scalable AI architecture decisions. 

Sadie: 00:39:40
Yeah. So this one I see big and it kind of relates back into that first one of the AutoML, is if we're able to operationalize how we're creating these models and creating them a lot quicker, we also need to make sure we're able to scale AI a lot faster as well. And so the first portion of that really is that it has to be cloud first. I think the debate is over on is the cloud safe or should you move to the cloud. If you truly want to scale, cloud is going to be your best option to be able to do that. 

Sadie: 00:40:16
The second portion of that though, is really on how do you standardize and automate your workflow. So again, if you have lots of data scientists and machine learning engineers off in silos building their own individual models, they in and of themselves are a black box of how they created the model and how they're going to continue to standardize and scale it. So I think it's essential for the leaders in the organization to be making sure that once that model is ready for production, how do we make sure, across the multiple models we have within our ecosystem, they're all standardized and automated regardless of the person who created it so that they aren't these one off opportunities that we're having to continually update. 

Sadie: 00:41:02
And then the third being just the performance monitoring. So I think this is where I see most data scientists not even taking it into consideration is, once your model's out in the world, how are we continuing to tune it and improve it and make sure that it's performing at the level of accuracy we want it to and where are those alerts and baselines for it. So making sure that's a key component in it. And then the fourth, it really goes back in a way to your trend from 2021, but what's the traceability of our models. So do we really truly know the impact that they're having over time as the data inputs change? Are we, again, checking for bias and risk that comes in this? And are we able to trace back how they're coming up with their scores so that we can interpret the meaning behind it? So those are really the four trends within the scalable AI. And I'll share an article, hopefully we can include it in the show notes, that outlines some really good architecture and just principles that companies use around scaling AI. 

Jon: 00:42:10
Yeah, I said five principles, but you had four. 

Sadie: 00:42:15
One was an article. Four principles and a resource. 

Jon: 00:42:18
It was a classic off by one programming error on my part. I thought we were using Python and we were using R. All right, so that covers our micro trends for software. We've got AutoML. For the cultural social impact, we've got deep fakes. And then most recently, scalable AI for our hardware section. Now let's talk about the bigger macro trends that could unfold over much longer time spans and have even bigger impacts potentially than the micro trends that we covered that could themselves be pretty impactful. So the first macro trend is talking about the remote economy, which has been a big thing since the pandemic hit. We'll dig into that a little bit. Then we'll talk about the blockchain in data science specifically. I know you have a lot to say about that. And then data literacy, a really important topic. So let's start with the remote economy. 

Jon: 00:43:25
So as every listener knows since March 2020, exactly the month varies depending on where you are in the world, but around March 2020, we had lockdowns worldwide that kept a lot of non-essential workers out of the workplace that they were used to working in. And so this assumption that a lot of companies had, including me and my company, that everybody had to be in office all the time together to be productive was turned on its head when we had to experiment with remote working. And there are still some things that I don't think work as well as being in person, but people are productive. We're getting a lot done. Companies are still being successful, even with totally remote workforces. 

Jon: 00:44:15
Now that we are returning to offices, to some extent, again depending on where you are in the world, and there are some reversals happening at time of recording, but some companies are using remote work as a benefit to say, "Oh, you don't have to come in at all," or, "You just have to come in a couple of days a week," or that kind of thing. So that has been a chip and these kinds of bargaining chips, like offering remote work, are critical because there's far more job openings than ever before. So I have the data for the US where there are 11 million job openings currently. Prior to the pandemic, the peak was 7 million. So 50% increase on the number of openings in the post pandemic world versus a pre pandemic world and part of this is fueled by people quitting jobs. 

Jon: 00:45:10
So in September in the US, 4.4 million people quit their jobs, which is a monthly record. And so all of these job openings, all this turnover, it does create a lot of opportunity for workers though, they need to be often re-skilling, which is a big part of this that it's like, there's still people who are unemployed, how do we have 11 million job openings but there's about 8 million unemployed in the US? So there's this three to two ratio of job openings to unemployed people and that's because the kinds of skills that we're looking for in this post pandemic world are different from the ones we were looking for pre pandemic. So companies are pivoting. 

Jon: 00:45:48
And one of the ways that we are shoring up this gap between talent availability and demand is that we're looking everywhere in the world. So prior to the pandemic, I worked only in person with my teammates out of our New York office and now we have contractors all over Europe and Africa. So there's a big thing here. I've been talking for way too long. I should be letting Sadie speak more. But this macro remote economy trend with greater competition for skills, now this opportunity to work globally with people, a big macro trend. And in the data science field specifically, no question that it is a big deal for us because unlike being a restaurant worker or a mail delivery person, you can do a lot of data science work totally remotely. So Sadie, over to you. 

Sadie: 00:46:55
Yeah, no, I actually really appreciate the story you shared because I agree with everything you're saying. And just to backtrack, some context for everyone, I really think this is where we're going to see the winners and the losers in data science in the next 10 to 15 years is who can really tackle the people space. And in terms of hiring and retaining and do that well, that's who's going to be really successful in this space. And so this is such a big trend because it affects so many of us. From an employer's side, the best people you have, you really got to be thinking about retention now, because you may be in a tier two city and now that person has access to jobs from all over the world. So if they're your top performer, they're not staying with you because you're the best company in their city anymore, they can go anywhere. And so I think that's important to know for companies from a retention side. 

Sadie: 00:47:48
And then from a job seeker or an employer side is really looking at the options that you have as well. From a positive, you have more options, but from a negative, your competition is greater too. So now you're not just competing with the people in your local area, you're competing with people again across the world. So this I think is a huge factor from multiple levels because there's so many old frameworks that we have in terms of hiring and retention that now need updated. And at the end of the day, data science and AI is just a tool. So it's all going to be based on the people that you have who are using this, who are implementing it, who are coming up with the great ideas of recreating The Beatles by using GAN algorithms. It was a person who came up with that idea. So I think the more you can focus on how this work environment is changing and what strategies you're taking to retain and gain the best people, the more successful you'll be in this space for the next 5 to 10 years.

Jon: 00:48:55
Do you have any particular tools or techniques that you've been using with your organization, Sadie, to adapt to this remote learning world or remote working world? 

Sadie: 00:49:07
Yeah. So when talking to organizations, they usually come to Women in Data in terms of looking at how do we gain more diverse talent. So we specialize in a way in consulting in this space. And one of the things we always ask them is, how do we take care of the people you currently have? So keeping good people on your team is the best way to get more people like them into your organization. So I think if you're concerned or a leader in your organization, the first thing I would do is take some time to really listen to everyone in your org. Set some extra time, maybe over the next couple weeks to have more one-on-ones and just check in and see where people are at because people are feeling really disconnected right now in this virtual space, and that disconnection can then make them go and take another decision. So make sure you're focusing highly on retention and start by just listening in one-on-ones to people. And then from the hiring aspect, I would say get creative. I love the fact that you have contractors in Africa and Europe now. Yes, that's fantastic. There's access to talent that we never could have had before, so get creative in terms of where you're outreaching to. 

Jon: 00:50:22
Nice, Sadie. Well, thanks for those practical tips for the remote working environment in data science. I know that the next topic is one that you know a ton about, applying the blockchain in data science. And I'm so excited to get into this topic because I really don't know anything about blockchain. It's a big gap in my understanding of the world. So yeah, what's going on? How are the blockchain or cryptocurrencies going to impact data science in the long term?
 
Sadie: 00:50:51
Yeah, well I have to first start off and say two, three years ago, I was not all on the blockchain bandwagon and honestly I still feel remorse that I am because it's taken a lot of my time. But I think the exciting thing is there was finally a light switch that clicked off in my brain on the impact of blockchain on data science. And I think just the maturity of where it's at, we're going to start seeing an impact of the two fields merging together. So for those who aren't familiar with blockchain, one of the simplest descriptions that I've heard and love is that it's like a giant spreadsheet on the internet that records everything and can only be modified or changed at all when every person in the network agrees that something can be changed. Okay? 

Jon: 00:51:43
What do you mean by records everything? Records everything on some specific event, like some transactions of some cryptocurrency or something? 

Sadie: 00:51:51
Yeah, whatever- 

Jon: 00:51:52
So it records every buyer and seller of Bitcoin or some other cryptocurrency and no individual can change what it says in that spreadsheet. That's often called a ledger. Is that right?

Sadie: 00:52:10
Yeah. So I think the ledger's a better way to think of it because cryptocurrency is just one aspect of blockchain technology, but for finances, right? 

Jon: 00:52:21
Right. 

Sadie: 00:52:21
So the other aspect of it is what became really popular this year with NFTs and non-fungible tokens. Essentially the NFT is recorded on the blockchain and it's just saying this one aspect which a lot of people use are JPEGs, so digital art, it shares on that blockchain, that ledger, who owns it, who purchased it, when and where did they purchase it and it can't be changed. And I think this for me is so exciting for data science because it means it's immutable audit trails, which to me, what that says is more data that's more accurate that we will soon have at our fingertips. And now not only are we using it for things like within finance of cryptocurrencies, but now we're using it to sell art, to sell assets. We're almost at this wave of this tipping point where blockchain technology isn't just used in cryptocurrencies, but we'll start to have a digital record of everything. So again, going back into talking about the metaverse of it's a digitalization of your life. There's already NFT fashion, there's already NFT food. Everything that you have in your digital life- 

Jon: 00:53:45
NFT food? 

Sadie: 00:53:46
Yeah. 

Jon: 00:53:46
What does that mean? 

Sadie: 00:53:47
So essentially you can think of this as anything. Look around wherever you're at right now listening to this. I'm sitting in my office. So I have a planner, I have pens, I have speakers, I have all these things. All of those things will soon become digitalized that'll have them in this digital world, which we're calling the metaverse, and all of that will be recorded on the blockchain. 

Jon: 00:54:12
Oh. So it's kind of like a digital twin that some organizations have in their factories where some of the most high tech factories today, they will have a digital copy of where everything is, so that if you want to rearrange things to optimize for some new car that you're producing in the factory, you know where everything is, the size of everything, and you can move it around in this digital world before you do it physically to optimize. So I hadn't thought of that before. I guess I'd thought about it in the context of that we could have a digital twin like that of, I guess our home and all the items in it, but it hadn't occurred to me that you could be doing that with the same kind of NFT technology that we're using today primarily to buy and sell artwork. But it makes perfect sense that anything around you could have a digital identity. And then I guess in a way, this idea of a digital twin, that's almost a synonym of metaverse as far as I can tell. They're kind of the same idea that you just have this nonphysical, this digital representation of physical things that allow for abstraction, that is impossible in the physical world.

Sadie: 00:55:32
Yeah. And so I think why this is so important for data science, again is just, everything is going to be timestamped and the accuracy of this information is going to be better than what we have today with paper ledgers that get transverted into digital records. But then just think of the proliferation of data that we're going to see. I mean, we're already dealing with a ton of data to create our models, which is fantastic, but we're going to get to the point, and I think this is part of where AutoML comes in and will start to merge is, there's going to be so much that we're going to need additional support from AutoML to run all the algorithms we'll need to run with these. 

Sadie: 00:56:17
So there's aspects for data science, just in terms of what data's being created from people using the blockchain. And again, if I was a data scientist at a large company, I would be encouraging them to put everything in the business on the blockchain, because I'd want to analyze it from that aspect. But then there's the aspect of just what people are using to create with this technology. And GANs are a really big thing in NFT art. So if you're familiar with NFT collection drops, usually they're around like 10,000 images. Now it is easier to create digital art, some may say, it's a little bit harder others say. But if you're going to release 10,000 images, that would take a long time to create individually by hand. So the majority of how these collections are being created is through artists create individual layers and then will throw them into a model, and then the GANs model is what actually creates the final outputs for these. So again, just another way that these worlds are merging together and converging and just the use cases that we can see from them here in the near future is really incredible. 

Jon: 00:57:34
Yeah, I wasn't aware of that, but it makes perfect sense. If you want to create 10,000 unique images that are non-fungible, meaning ... So the word fungible, it comes from finance with this idea that a barrel of oil, if it's a specific standard ... So there's a very common global standard for oil called Brent, which originates from this specific oil well in the North Sea. The name of that rig was Brent, and so Brent oil is this oil standard. And if you buy Brent oil from somebody, it's identical to buying it from somebody else, they're fungible. So that's what that word means. And so this idea of non-fungibility is that there's a uniqueness. So with digital art, for example, I guess each item would have to be unique in order to be not identical and replaceable. And so yes, that makes sense that if you're creating 10,000 unique pieces of art, you might not want to paint each one because it would take a very long time, but you can rely on generative adversarial networks again, to create something in your style with slight variations on a theme. Very cool. I hadn't thought of that, but it makes perfect sense. 

Sadie: 00:59:00
And then just to tie it back into what we were talking about earlier with deep fakes and myself getting hacked over these past couple months, one of the additional use cases I could see is every time before you upload a picture of yourself or something that you've done to the internet, that automatically becomes an NFT and there's a token that shows who is the primary owner for it. So if that would be the case, all of those photos that is shared on Instagram, even if someone tries to hack my account and steal those, it would be notified that that would happen and then it would show who was the previous owner of them prior. So I think there's some exciting things in how are all these technologies are merging together, how we can find benefit in data science from blockchain and find benefit from it within terms of how we combat deep fakes and people stealing our identity. So I think the more we diversify ourselves in terms of new and emerging technology, just the greater impact we're going to have in our day to day jobs as data scientists. 

Jon: 01:00:12
Very cool. Sounds like I need to invest some time in learning about blockchain, Sadie. Do you have any particular resources that you recommend for people learning about it? 

Sadie: 01:00:22
Yeah. So the first person I would follow is Balaji S on Twitter. And he has this website called 1729. He was a former CTO of Coinbase, and then he also created earn.com. But his website 1729, is essentially talking about how we build a network state and what a network state looks like using blockchain technology. So that's a really great resource to go to. And then there's a couple courses on Coursera, just intro courses to blockchain that I think are really helpful. So I would start there. 

Jon: 01:01:00
All right, that brings us to our final planned topic. So we've come a long way. We had the three micro trends and now we've gotten through two of the three macro trends. The final one is data literacy. So this is a fascinating topic. I guess, data literacy, this is something that you can probably define better than me, but it's this idea of just being able to understand data, to be able to read charts, to be able to make some kinds of little projections based on data. And according to a stat that you provided me, only 21% of the global workforce is data literate. So we anticipate that in 2022 and beyond, data literacy will improve. And that will allow for a transition from just technology teams like data scientists and software developers working to extract meaningful data from an organization's data and that power being more accessible across any business person in the organization so that they can look up their particular product, okay, what are the factors that drive sales of my product and what's likely to happen next month, or how should I be adjusting my marketing? All of these kinds of things can be done more and more by people who don't have the same kind of technical expertise as a data scientist. And there's an interesting episode for listeners, if you want to dig into this idea in a lot of detail, episode 499 with Barr Moses of this podcast is a good one for learning about that. So anyway, this is your trend, Sadie, that you brought up. I think it's a great one. Tell me more about it. 

Sadie: 01:02:50
Yeah. So I say this is a really big factor going back to the remote economy and what you shared with the stat of how many job openings there are and how many people are still unemployed. And you've hinted at this, a little bit of a skills mismatch, and this is only going to get exacerbated as we continue to move on is need for everyone to be data literate, which as you mentioned, is the ability to read, write and contextualize data. But I think even more so than just in terms of thinking of it as a business framework of data science teams in marketing and finance, I think we even need to think of it a little bit bigger from traditional aspects of blue collar and white collar jobs. So an example, we usually think of a farmer as really blue collar worker, but I don't know if you've ridden in a tractor lately, it's more like a spaceship now. I mean, the level of AI- 

Jon: 01:03:49
I have not been in a tractor lately. Have you been in a tractor lately? 

Sadie: 01:03:54
Yeah. 

Jon: 01:03:54
Awesome. 

Sadie: 01:03:54
I mean, I grew up in Iowa, so I visit back home and I get to see these beautiful machines that cost a million dollars, which is crazy. But yeah, the whole machine is filled with sensors that monitors the field and then that connects to the internet and goes back to, if it's a John Deere tractor to John Deere, where they store that data, provide algorithms on them. And then that information comes back to the farmer within the tractor to better understand how they should be operating the machine. So I use this example a lot for people, it's like, this is not just your marketing friend needs to have data literacy, no, farmers need to have data literacy. People who work in manufacturing are going to be working alongside robots with tons of sensors and they're going to need to interpret how those sensors are doing. This touches everyone. And I think that is why I see this as such a big trend is it's not just for us in the business space, but it's really all jobs are going to need some level of literacy to be able to function in this new environment. 

Jon: 01:05:05
Nicely said. All right. Well, we have already had a very long and highly educational and entertaining journey through your micro and macro trends. To wrap up the episode, we've got some predictions from listeners. So I posted on LinkedIn a couple of weeks prior to us recording this episode, asking if anybody else out there had predictions for us for 2022. And we got a couple good ones. So Irina Maria Mocan, she is excited about something that I hadn't heard of before, but actually doesn't seem like a big step beyond the things that I had heard of, this is the artificial intelligence of things, AIoT. So obviously we often hear about artificial intelligence and we also separately often hear about the internet of things, IoT, and so this blends those two together, AIoT. There's a word for you in 2022. A word for me, certainly in 2022. So do you know anything else about this, Sadie? 

Sadie: 01:06:12
No. This- 

Jon: 01:06:12
Have you heard this one before? 

Sadie: 01:06:13
I'm so glad you asked this question, because I always believe that the collective is most knowledgeable and it's true by the responses you were getting, because I had not heard of this term and I can see a lot of applications for it, so I think it's definitely something to watch in 2022. I am a bit dyslexic though, so I think with all these acronyms and letters mixed up and jumbled around, I may have to practice this acronym a little while, but definitely think it's something that we should keep our eye on and excited to see how it advances. 

Jon: 01:06:42
Yeah. So without having done any research, I'm going to take a guess at what AIoT is. So the internet of things is this idea that similar to what we were talking about earlier with being able to use the blockchain to track any digital thing, the internet of things is that more and more and more devices as we already see with all the kinds of tools that we can get from Amazon and Google that we put around our home, more and more items are being embedded with sensors and those sensors report information. And some of them could be running relatively simple algorithms on their own or sending information to a server to be able to be processed in a more complex way. And so presumably that's what the AIoT is. And yeah, no doubt that will get bigger and bigger. So embedding neural networks or other kinds of machine learning algorithms into small devices so that they can make their own little decisions incrementally on their own. And I guess the aggregation of all of these simple automations happening on more and more devices will have a big, big impact in 2022 and the years to come. 

Jon: 01:07:59
Nice. So one more point came from Serg Masís. So Serg Masís is one of our biggest fans for providing questions to guests on the show and I'm so glad that he chipped in with his two cents on this topic as well. So Serg is a data scientist and the author of a top selling book called Interpretable Machine Learning, which I already mentioned at the top of this episode, when we were talking about explainable AI. So Serg says that Moore's law, which is this tongue in cheek law, this idea from a prominent person at Intel decades ago, named Moore, that every year the cost of compute will about halve, or every 18 months it will about halve, just this constant reduction in the cost of compute over time. And we are seeing the end of that as microchips have become so detailed that we are running into a situation where the electrons themselves can skip over circuits to adjacent ones. And so Moore's law in terms of having single microchips, having their costs come down is starting to come to an end. We're seeing that happening right now. 

Jon: 01:09:30
So Serg is mentioning that this brings up the opportunity for more creative uses of resources in AI. So less brute force and more what he describes as Bayesian or causal driven approaches. So the idea here being that as we want to accomplish a lot with AI, we can't necessarily rely on compute just becoming half the price every year and that we can therefore have twice as much AI brute force complexity for the same price, something that we've been able to rely on over the most recent decades. And so we can have more clever approaches like Bayesian approaches that take into account prior probability, so we don't need to train from scratch, for example. So this is a really good point. I don't have too much else to say about it other than that it is a great idea. So no doubt in 2022, there will be a move in this direction as we see Moore's law slow down. Sadie, do you have any other thoughts? 

Sadie: 01:10:40
No, I think he's right in that it will slow down. I don't know if us optimizing AI will be the solution. I don't know why I don't have the good prospects for us being able to do that effectively. I think what the solution in my opinion will be, will be quantum computing. Now the question is, will we figure out quantum computing in time to be able to compensate? But that is what I look at is what we're going to jump to next is quantum and that will do away with all the issues we have and we'll be on a new track of Moore's laws with quantum computing. 

Jon: 01:11:18
Yeah, that opens up a whole bag of worms, but a potential solution. Today there's not all kinds of problems can be solved with a quantum algorithm, but I'm sure more and more will be. And yeah, a really, really big opportunity there where quantum computing can be a huge step chain problems, like the traveling salesman problem that could take a super computer and effectively infinite amount of time to solve can be solved by quantum computers or also really interesting applications of genetic computing, where you have biological DNA floating around in a liquid, coming up with answers to problems. There's some pretty interesting computational approaches that will come in the future, though maybe they won't be super widespread in 2022. 

Jon: 01:12:13
Nice. All right, Sadie, this has been an epic episode. I have loved filming it with you. You came up with such great topics, such great trends to cover in 2022. Do you have any parting thoughts for us? 

Sadie: 01:12:27
No, I just honestly wish everyone a really happy New Year. And I'm so excited to see what people are going to create and how they're going to innovate. And yeah, I have really high hopes for the future and what's ahead. So thank you so much for having me.
 
Jon: 01:12:43
My pleasure. And maybe in 2022, we'll be able to meet more people in person at conferences than we have in 2021 or 2020. So that'll be very nice. In the meantime, how can listeners stay up to date on your latest? 

Sadie: 01:13:01
Yeah. So best way to connect with me is not on Instagram, but actually connect with me on Twitter, @sadiestlawrence. Also, you can connect with me on LinkedIn. And for any of those out there who are playing in VR in the metaverse, my username is CCT2. So would also be happy to connect with you in VR anytime, if you want to. 

Jon: 01:13:24
Whoa, that is the first time that we've had that option as a way to connect with a guest on the SuperDataScience podcast. So welcome to 2022, everyone. 

Sadie: 01:13:35
Yeah. 

Jon: 01:13:35
You're getting virtual reality handles to meet up with guests on. Welcome and happy New Year. 

Jon: 01:13:48
What an epic episode that was. I hope you're as excited for 2022 now as I am. In the episode, Sadie led our journey through predictions for the year ahead, including how AutoML will enable data scientists to more easily optimize and deploy machine learning models, but how this also may accelerate negative social side effects of automation if we're not careful. She talked about how generative adversarial networks will be used and misused to generate remarkably compelling deep fakes with fewer examples of a target person required to make compelling deep fakes than ever before. 

Jon: 01:14:24
She provided us with four principles for making AI scalable from prototype to production, namely being cloud first, having standardized and automated workflows, monitoring performance in production and ensuring traceability. She talked about how NFTs could provide more accurate data on any real world object and potentially even prevent identity theft. And she talked about how increasing the data literacy of the global workforce from its current 21% level is critical to making data driven decision making accessible beyond technical data science teams. 

Jon: 01:14:57
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Sadie's LinkedIn and Twitter profiles, as well as my own social media profiles at www.superdatascience.com/537. That's www.superdatascience.com/537. If you enjoyed this episode, I'd greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. And if you'd like to ask our upcoming guests questions much like we had audience participation in today's episode, then be sure to follow my LinkedIn and Twitter accounts to be on the lookout for posts and tweets from me in which I ask for your input on forthcoming episodes. 

Jon: 01:15:41
All right, this year is off to a great start. Thanks to Ivana, Mario, Jaime, JP and Kirill on the SuperDataScience team for managing and producing another super episode for us today. Keep on rocking it out there folks. And I'm looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 

Show all

arrow_downward

Share on