63 minutes
SDS 671: Cloud Machine Learning
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
This week’s guests are mainstays of the data science online community, but where have they been this last year? In an exclusive for SuperDataScience, we can confirm that Kirill Eremenko and Hadelin de Ponteves took their sabbatical to launch CloudWolf, a cloud computing education platform that prepares students for certification in AWS (Amazon Web Services). Jon Krohn speaks with his guests all about CloudWolf and why accreditation in cloud computing could be the safest investment for your data science career.
Thanks to our Sponsors:
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
About Kirill Eremenko
Kirill is the founder and former host of the SuperDataScience Podcast. He is the founder and CEO of SuperDataScience, the platform hosting this podcast, as well as featuring a lot of data science and analytics courses ranging from tool-based courses such as R Programming, Python, Tableau to over-arching courses like Machine Learning A-Z and Intro to Data Science. He is also a well-known instructor on Udemy. His courses have been taken by over 1.7M students worldwide. Kirill is also the Founder of DataScienceGO Conference, created for data powered minds, where experts, mentors, and friends come to enlighten, click and inspire each other and skyrocket their careers. Kirill is absolutely and utterly passionate about Data Science and delivering high-quality accessible education to every human on this planet!
About Hadelin de Ponteves
Hadelin is the co-founder and CEO at BlueLife AI, which leverages the power of cutting edge Artificial Intelligence to empower businesses to make massive profits by innovating, automating processes and maximizing efficiency. He is passionate about helping businesses harness the power of AI. He is also an online entrepreneur who has created over 70 top-rated educational e-courses to the world on topics such as Machine Learning, Deep Learning, Artificial Intelligence and Blockchain, which have already made 2M+ sales in 210 countries. Hadelin is an ex-Google Artificial Intelligence expert and holds an Engineering Masters degree from École Centrale Paris with a specialization in Machine Learning.
Overview
When a company has an IT infrastructure on its premises, it must invest a great deal of time and money in purchasing the equipment, providing adequate space, tightening cybersecurity, and updating servers every few years. Such capital expenditure can put a real dent in a company’s budget. When companies use the cloud, however, they can become much more flexible to operations needs, and they also have access to considerably more models for training and analyzing data. Among the companies that have, in Kirill’s words, “outsourced the headache” are AirBnB, Netflix, Coca Cola and McDonald’s.
With that in mind, Kirill and Hadelin’s estimations that well over a third of data science and machine learning jobs require cloud skills shouldn’t come as any surprise. Kirill and Hadelin aim to give CloudWolf students the confidence to get AWS accreditation in just 21 days.
Listen to the episode to find out how CloudWolf came to have such a cool name, why it makes sense to learn AWS as opposed to other cloud providers, and AWS essentials that every data scientist needs to know, from databases to storage.
In this episode you will learn:
- About CloudWolf [07:04]
- Why learning the cloud is important for data scientists [09:12]
- Is learning cloud computing complex? [22:30]
- Essential AWS services [28:31]
- Database options on AWS [33:47]
- How to run analytics on AWS [40:58]
- Why an AWS certification is so helpful [56:35]
Items mentioned in this podcast:
Follow Kirill:
Podcast Transcript
Jon: 00:00:00
This is episode number 671 with the renowned data science educators, Kirill Eremenko and Hadelin de Ponteves. Today's episode is brought to you by Posit, the open-source data science company, and by AWS Cloud Computing Services.
00:00:17
Welcome to the SuperDataScience Podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple.
00:00:48
Welcome back to the SuperDataScience podcast. Today we've got not one, but two data science rock stars back on the show. Kirill Eremenko is one of our two guests. He's the founder and CEO of SuperDataScience an e-learning platform, and he founded the SuperDataScience podcast in 2016, and he hosted this show until he passed me the reigns, two and a bit years ago. Our second guest is Hadelin de Ponteves. He was a data engineer at Google before becoming a content creator. In 2020, he took a break from data science content to produce and star in a Bollywood film featuring Miss Universe Harnaaz Sandhu. Together, Kirill and Hadelin have created dozens of data science courses, and they're the most popular data science instructors on the Udemy platform with over 2 million students between them. They recently returned from a multi-year course creation hiatus to publish their Machine Learning and Python Level 1 course, as well as their brand new course on Cloud Computing.
00:01:42
Today's episode is all about the latter, so we'll appeal primarily to hands-on practitioners like data scientists who are keen to be introduced to or brush up upon analytics and machine learning in the cloud. In this episode, Kirill and Hadelin detail what cloud computing is, why data scientists increasingly need to know how to use the key cloud computing platforms such as AWS, Azure, and the Google Cloud platform. And they dig into the key services, the most popular cloud platform AWS offers, particularly with respect to databases and machine learning. All right, you ready for this super useful episode. Let's go.
00:02:21
Kirill, Hadelin. You guys were just here. What brings you back? Where are you guys calling in from?
Kirill: 00:02:26
Same places I think. Oh, no, I'm still in Australia. Hadelin is in France now.
Hadelin: 00:02:32
Yes, I'm in France. I'm in Paris.
Jon: 00:02:35
Nice. Paris. Hopefully everyone's heard of it.
Kirill: 00:02:40
It's unpredictable to know where Hadelin is, is like, he's been between Paris, Mumbai, and Dubai probably like 50 times in the past year. Like, every time we get on a call, I'm like, where are you today? And it's a surprise.
Jon: 00:02:53
And it's critical to ask if people, if listeners don't know where you're calling in from, they can't, you know, they can't enjoy the podcast episode properly.
Hadelin: 00:03:01
For sure.
Jon: 00:03:02
So awesome. Welcome back to the show. So, last time you were on the show, we did an episode, it was episode number 649, and it was focused on your Machine Learning Level 1 course. And so we did like a Machine Learning 101 episode that introduced the key concepts in machine learning. And we talked about how in the future you would have a Machine Learning Level 2 course, but that's not why you're here today. So you have other things going on. For our listeners that aren't already familiar with you, you're two of the most productive data science education content creators out there. So of course you have more than one iron in the fire. So tell us about what you've been working on besides your machine learning course.
Hadelin: 00:03:49
Okay, so basically last year we made a bit an important decision, which is to extend our teaching to another industry, which is the cloud, cloud computing. So basically that doesn't mean we're not going to you know, move away from data science and machine learning. We are going to continue teaching machine learning and data science. But you will see in this podcast's episode, we will talk about this, that there is this serious convergence between data science and cloud computing. And so it's not only a plus to teach the cloud computing, it's going to become a necessity. And that's why it makes total sense for us to, to make that decision and and teach cloud computing along with data science. So yes, we made that decision last year. We've been, we've spent a whole year learning working super hard on you know, becoming experts in the cloud. And now we're very happy and ready to you know, extend our offerings and teach about the cloud.
Jon: 00:04:54
Nice. For sure. I, I'm not, I don't want to step on your toes too much cause I know we're going to have lots in this episode on why this is so relevant to data scientists, but just really quickly upfront that with data science data sets getting larger and larger and larger and the models getting exponentially larger, cloud is a no-brainer for people to be learning about as well, because you need to be able to scale up your infrastructure to be handling all these data and these enormous models. So I, it makes a lot of sense to me.
Kirill: 00:05:21
Absolutely, Jon. And I also wanted to say there that that's why we got curious about it. Like we were, you know, like it's a technology, we just know it kind of happens, but we wanted to learn more, and as you said, we'll talk more about the why data scientists should learn the cloud. But I also wanted to add on what Hadelin's saying is that I wanted to be very clear and upfront with the audience that we're not here as data scientists who are dabbling in the cloud or who want to, you know, like be advocates for data science in the cloud. We're we, for the past year, we've actually shifted, we've pivoted while, while still, you know, being here for our data science audience, and we are releasing courses in data science from time to time, like the Machine Learning Level 1, but we've actually pivoted and we're completely immersing ourselves in cloud.
00:06:04
So what we are doing now, this new project is relevant not just to people in data science who want to learn cloud, but actually anybody who wants to master cloud, who wants to get AWS certified and go through that. So in this episode, our goal, we are not here to, you know, make sales pitches or anything like that. Our goal is to educate and show, like, give a preview of what we've learned in the past year to the SuperDataScience audience. And, you know, hopefully people can walk away and be able to have some level of beginning level of conversations about the cloud with their peers and colleagues. And if anybody at the end of the episode is curious about what we're doing now with the project we're working on and wants to join us in this journey of learning the cloud, then we would be very happy for that. And there'll be some great exciting things we'll be sharing towards the end of the episode about this as well.
Jon: 00:06:55
Yeah. So I know you're not here primarily to be pushing anything commercially, that this is an educational episode on cloud technologies and why they're relevant to data scientists, but you guys have also just launched a new platform, right?
Kirill: 00:07:08
That's right. Yeah. So exciting. It's launching this, this week as the podcast is going live. It's called cloudwolf.com. You can find at www.cloudwolf.com. And yeah, super, super pumped about it. So Hadelin can tell us more a bit why it, we chose the name Wolf, it was his idea.
Hadelin: 00:07:28
Yeah, yeah, absolutely. So first yes we think that the wolf is a fascinating animal, but also it has some you know, symbolism around it that can be described with a few words which are intelligence, strong family ties, loyalty what else? Education, communication, community, you know, with the wolf packs. So we thought it, you know, it's a perfect description of the values and principles we want to have for a CloudWolf. You know the wolfs of the cloud. And yes, that's exactly what we, how we see CloudWolf and how we see the community that we're going to build. We see you know, a lot of education, a lot of intelligence, because indeed the cloud is very technical, so you need to have the right intelligence of the cloud which we will teach, of course. And and also, you know, the strong family ties, loyalty, the community, because we see in cloud wolves, you know people helping each other people you know yes, giving advice, supporting each other, so you know, so that they all get to a great level in the cloud.
Jon: 00:08:42
Nice. That's really cool imagery. Nice. So really exciting platform. Why should data scientists be learning cloud? I gave a couple of examples upfront that, you know, we have lots more data than ever before, and those data sets are getting exponentially larger over time. Model sizes are getting exponentially larger over time, but why can't we just entrust that to other people? I guess like other kinds of practitioners, why should data scientists themselves be capable of handling their own cloud infrastructure?
Kirill: 00:09:18
That's a good question. To paint a picture, let's start with what is cloud of, you know, like as a first stepping stone. Because what, like, what, what is cloud? Because just to make sure everybody's on the same page of like, the benefits of using cloud. So basically a company can have IT infrastructure on-premises, but then it has to worry about buying the physical servers, maintaining a physical space for them, maintaining security, managing those servers, servicing those servers and like investing upfront a lot of money. And that's called capital expenditure. Whereas with the cloud, you don't need to do any of that. You basically rent servers or rent storage or databases, whatever you need from a cloud provider such as AWS, Amazon Web Services, Azure, Microsoft Azure, Google Cloud platform, or there are several others, but those are the three main big ones.
00:10:07
You rent those things on an as-per-needed basis. So if you need some servers today, you rent the servers today, and you don't need them tomorrow, you release them tomorrow, you de-commission or you stop using them tomorrow, and you are paying, basically, you're paying an operational expense, you're no longer investing capital expenditure upfront, you're paying an operational expense, and it's very flexible. It's very agile. You have access to way more different options. You don't have to guess your capacity in advance, your capacity or your needs, you need to train a big machine learning model this week. You have big servers this week, and then you don't need to train, you don't need to pay for them. So it's a much better costing type of model. The infrastructure is shared, but at the same time it is very secure. And so your data is not seen by the other companies using the same infrastructure.
00:10:54
And basically you outsource a lot of your headache. And the other big part is that there's economies of scale, because other companies such as Airbnb or Netflix or Johnson & Johnson, huge companies, Coca-Cola, McDonald's, are all using cloud. And because there's so many companies using the same infrastructure, the cost goes down. So the prices for cloud are very low. And that's the attraction for business. So that's cloud in a nutshell, you know, the benefits that you get there. And in terms of data scientists, it's more mostly about where data scientists are going, and maybe Hadelin can talk about this a bit.
Hadelin: 00:11:34
Yeah. So basically today to build a machine learning models and data science models, you need more and more compute-intensive resource to train them simply because the models are, you know, on the one hand, more advanced, but also sometimes you have a bigger, much bigger amount of data. And so you know, cloud resources is not only a plus to train your machine learning models, it has become a necessity. And that is why there is this very strong convergence happening between data science and the cloud. Because indeed now, data scientists, in order to train their machine learning models, they will need the cloud resources, which include, you know, two main types of things. First, the compute resources, which are the virtual servers with high and you know, strong and powerful GPUs. And also storage because we will talk about this later in this episode, you will see that in order to build machine learning models with the cloud, you can connect to the storage services of, for example, AWS and the compute services to build your machine learning models. And we will talk about that in a few moments.
Kirill: 00:12:49
And in addition to that, like if we even think, for example, of ChatGPT, which is completely, entirely cloud-based, it's using Microsoft Azure. It's got 175 billion parameters that gets a hundred million users per month. That is a massive scale that is just very hard to maintain something like that, on-premise infrastructure, you'd have to buy lots of servers and you have to be scaling all the time, buying new servers. And then what if the demand drops? Like they're paying millions of dollars for this I think it was like a hundred thousand dollars per day to maintain ChatGPT. And that's thankfully because they're using cloud, if it was fixed infrastructure would be much more expensive. So we're just seeing this, you know, one of the biggest examples, but that's where the world is going. Like we're going towards much heavier compute-intensive models and much more users that you will need to be servicing with these models and products that are being created. And that's why data scientists have to learn, it's like, I think Hadelin already mentioned this learning a cloud for data science is not just an advantage. It may look like an advantage today, but it is actually becoming a necessity. And the first people that jump on this train will be prepared for the future of tomorrow, which is coming really fast.
Jon: 00:14:03
Nice. Every company wants to become more data-driven, especially with languages like R and Python. Unfortunately, traditional data science training is broken. The material is generic. You’re learning in isolation. You never end up applying anything you’ve learned. Posit Academy fixes this with collaborative, expert-led training that’s actually relevant to your job. Do you work in finance? Learn R and Python within the context of investment analysis. Are you a biostatistician? Learn while working through clinical analysis projects. Posit Academy is the ultimate learning experience for professional teams in any industry that want to learn R and Python for data science. 94% of learners are still coding 6 months later. Learn more at Posit.co/Academy.
00:14:46
That makes perfect sense to me. And this explanation of the size of the ChatGPT model, and that's now orders of magnitude larger with the GPT-4 release that happened recently as well. And so this is something, these, these models, these large language models are getting bigger and bigger and bigger. And while there are also research efforts to try to make ways of pruning away aspects of the model that aren't contributing overall, or aren't contributing to specific tasks we nevertheless have this ongoing trend of bigger and bigger models. So how prevalent is it out there that data scientists need to have cloud skills?
Hadelin: 00:15:34
Yeah. Yeah, Jon, it's very good that you asked this question because there must be a realization from people that cloud skills have become a necessity. It's not just you know, a plus, an advantage. It has become a necessity. For the very reason that we just mentioned, which is that machine learning models, data science models are becoming more and more complex, and therefore, now to train them, well, the only solution is cloud resources. So, so yes, it's a necessity and I think that, you know, this convergence between the data science and the cloud is going to become narrower and narrower. It's going to converge more. And the more it converged, the more cloud skills will become absolutely necessary to, to train the models. So so yes, I think every data scientist now should at least know how to, for example, train a machine learning model with some AWS services, which we'll talk about in a few moments.
Jon: 00:16:37
Yeah. And so AWS recently became a sponsor of the show. They actually, they had no influence on the content in this episode. This is, purely, it's a coincidence. But I understand that because AWS is the leader in cloud, so the three main providers are AWS, Google GCP, and Microsoft Azure. And actually, I got the order wrong there cause it's I, you guys dug up these stats, but you looked up from statista.com that AWS has 34% of the cloud market, Azure, 21%, and GCP 11%. So I guess it makes sense. Maybe this is your recommendation that given AWS has more market share than the other two combined, if you're going to start with one of these platforms, you should start with AWS.
Kirill: 00:17:27
Yeah, that's, that's that's a good, that's about correct recommendation. And I also wanted to add that in terms of number of data science jobs that mentioned cloud skills, we did our own research and we check, make sure to check it for statistical significance or make a proper statistical research. So we wanted to come into this podcast saying that if you learn cloud skills, your salary's going to grow by, you know, X percent. And we did find a slight increase in the average or the median salary of a data scientist who's learning cloud versus the one who's not learning cloud, but unfortunately wasn't statist statistically significant. So we wouldn't be comfortable sharing that. But what we did, what definitely is a fact, is that the number of jobs. So right now we've found, we looked at a sample size of about 190 jobs on Glassdoor, literally yesterday.
00:18:21
And we found that 37% of them, these are data science and machine learning jobs, 37% of them mentioned cloud skills, whether there's a requirement or as a you know, a preference. And based on the sample size and doing a statistical test, we can say that overall in the world, not just in our sample size of 190, overall in the world, this number is somewhere between 34% and 40%. So between 34% to 40% of data science and machine learning jobs already now, already today, at the beginning, in the first quarter of 2023, mention cloud skills as either a requirement or nice to a, or a preference. And that's, you know, that's a big number already. It's already growing, so it's over a third. And another thing to mention about AWS, [inaudible 00:19:12] Jon, of those jobs, of all data science, machine learning jobs, about 20% of them mention AWS skills specifically.
00:19:19
So not only AWS has the highest share of the market, but also it's a safe bet. Like, unless you know that a company that you want to work for or that you are working for is using Microsoft Azure or Google Cloud, then sure learn those. Those are very good tools as well. And we are planning on learning them as well. Like, you know, once we are, we are confident, like we've done our first few certifications with AWS, we want to move on to those as well. But if, if you don't know which cloud platform to learn, start with AWS, it's a safe bet. There's lots of companies using it, from large businesses, like mentioned before, Nike, Coca-Cola, Disney all of Netflix. People might not know this, but all of Netflix is on AWS. They don't even have their own service at all. All of their videos, everything, their websites is running on AWS. Airbnb is running on AWS as well. So big companies are, and also startups because it's so agile to spin things up. So whether you are looking to work for a big company or a startup, AWS is a safe bet to add to your skill set.
Jon: 00:20:18
Yeah, from my perspective, getting startups going for the last eight, nine years, it's a no-brainer. Like there's, I have used, I've built a couple of servers for doing a lot of computation prototyping for my data science team, where I knew I was going to need really big GPUs and that these instances were particularly expensive. Like, I did the math and was like, okay, to have a couple of these on-premise, we can save a little bit of money cuz we assume that we're going to be using one or two of these all the time. But for the most part, like any production infrastructure, you need your machine learning models. You don't know how many users are going to have at any point. So you need to have your infrastructure, your production infrastructure scale up. And then even with this example of us having these data science servers on-prem, well, because models are getting larger and larger all the time when we're doing intensive computation, we need to be turning on cloud instances anyway.
00:21:13
So we might do a little prototyping on like small amounts of data on our local instances, on our on-premise instances. And then use the cloud when we're like, okay, we're going to use this full Google T5 model now on a huge amount of data. And this T5 model, I couldn't possibly fit it onto the one or two GPUs that I have on this one server on-premise. We're going to need lots of compute in the cloud. So yeah, it's a no-brainer to me that data scientists need to be making use of cloud infrastructure for training models today, and especially for production inference, because yeah, you can't, you don't want to buy all of the infrastructure yourself to be able to handle the maximum number of users at any given time point that you want to be have using your application.
Kirill: 00:21:59
Exactly. Exactly. And also what you mentioned about these servers, they, they get outdated, you know, technology, new technology is coming out all the time, and you buy these servers, you put them on-premises, and then what, two years later, you need to decommission and sell them. Whereas on the cloud, the cloud provider just automatically updates, some releases new versions, you can just switch to those with a click of a button and you're not having these kind of like sunk costs that you have to deal with all the time.
Jon: 00:22:28
Yeah, for sure. So our listeners might be out there now convinced if they weren't already, that they need to have cloud skills, should they be intimidated? Are, is this like a really, like, it's, it sounds complex, it sounds like there's lots of language that we might not be familiar with. You know, as data scientists or as machine learning experts, this cloud realm has all kinds of new vocabulary, new concepts. Are they like complex to learn, or is it relatively straightforward?
Hadelin: 00:22:59
I think Kirill and I can agree that, yes it is a bit complex at first, but Jon, you just said that it's a no-brainer. I agree with you. You know, at some point it becomes a no-brainer because we get the intuition of how things work. But even if at the beginning you know, it can seem quite complex, well Kirill and I agree that we make the complex simple. We've done that with artificial intelligence. So there is no reason why we shouldn't do it, shouldn't be able to do it for the cloud.
Jon: 00:23:28
What a great answer. Makes perfect sense.
Kirill: 00:23:30
Yeah. We've spent like a whole year learning cloud, like in and out like we normally do. And that's, that's one of the reasons why you haven't heard from us, our listeners, our students haven't heard from us quite a lot. We wanted to do this incognito. So, you know, like we would, we would only talk about it once we were confident and spending a whole year of doing it. We've, we believe we can teach it very effectively, and we've already created our first course and we think that you can, like, our estimate is that with this educational content, we can help people get their first AWS certification, which is the Certified Cloud Practitioner, which means you've learned the basics of cloud, you understand all the vocabulary, you understand how to learn it, how to use it, and it's an actually a badge, certification badge you can add to your CV and LinkedIn. We think, like we estimate that people can learn, we can help people learn in 21 days. So if at two hours of study per day, within 21 days, it's three weeks, yeah. We're confident that we can help people learn AWS and get their first certification in 21 days. Within basically three weeks, you can be AWS certified with the very first certification and add that to your skill set.
Jon: 00:24:41
Nice. Yeah, that doesn't sound too tough at all. And yeah, so I guess another big advantage, you've, you've talked a lot in this episode about how these cloud skills are really in demand, that a large portion of data science jobs specifically mentioned cloud technologies. But I guess another reason why data scientists should be interested in this is because it just, adding this to their toolkit allows them to broaden the impact that they can make.
Hadelin: 00:25:12
Absolutely. You know, I'll make a comment on that from my you know, personal experience with AWS for data science. To be honest, now, I practically only use AWS when I want to do data science or build machine learning models. And it's not because the data that I you know, train the models with is much bigger, you know, as we said before, which is one of the reason why we should, you know, have the cloud skills with us to, to train a machine learning model. It's not because of that. It's because even with small data, well the machine learning models scores, you know, performance scores that I get are better and higher with machine with AWS and SageMaker in particular, we'll talk about that. So, so, yes. And that because it, it has the ability to, you know combine an ensemble of models you know, including the Gradient Boosting models, the Neural Nets models, and do a lot of hyper-parameter tuning at the same time very efficiently because, you know, it uses you know, good, good resource to do that. And I get the best score at the end.
00:26:21
There is a good example. There is this data set that I use as a benchmark and many of our learning courses. And for this dataset, we obtain, you know, when we hard code the models, or, you know, when we use the classic libraries with our own resource, we obtain an accuracy of 94-95%. And with SageMaker with AWS, I obtained 97% simply because it was able to you know, combine this ensemble of models while doing a bit of parameter tuning, hyper parameter tuning very efficiently. So so yes, that's my personal experience, and that's why now I practically only use AWS for machine learning.
Jon: 00:26:57
Nice. Well, as you get deep into the technical bits Hadelin, you start to sound like somebody that I should trust on cloud technology. You start to sound like you know what you're talking about, but you guys have only been diving into this for like a year. Why, why should we trust your opinions?
Kirill: 00:27:12
Well it's, there's a couple of things. First of all we love teaching, and we know we've been teaching different topics for eight, seven years now. And we bring our teaching skills to, to the table, and that that is very, like, we find is very portable. We can port that to, to cloud and use the same teaching methodologies, which is really cool. And our research skills, understanding. In addition to that, we, we've also are following this learning. We've also become certified ourselves. And the final thing is that sometimes when you're a beginner, when you're learning something for the first time, you're better positioned to explain it to other people, because as an expert, you kind of get lost. You forget what it is like to be a beginner. And especially coming from the data science field, we know what data scientists pain points they have, what kind of like pitfalls are to be expected along the way. And so it's, for us, it's a no-brainer that the content we've created is by far superior to everything else that we've seen out there.
Jon: 00:28:18
All right. All right. So maybe you've convinced me, but let's do a quiz. So AWS is the biggest cloud platform out there. So tell us about the basics of AWS. What are the essential things that data scientists need to know about AWS?
Kirill: 00:28:35
Okay, sounds good. So we're going to talk about four different types of services. There'll be lots of services. So AWS has a total of over 200 services. There, we're not going to talk about all them. We'll talk about four main types that are relevant to data science and we'll mention a couple. First one will be compute. Then we'll talk about storage, we'll talk about databases, and we'll talk about machine learning. So the first one is EC2, which stands for Elastic Compute Cloud. And because there's two Cs there, Compute Cloud, that's where the "2" comes from. So, Elastic Compute Cloud is basically a way for you to rent a server like a type of, like a device. Like imagine, imagine like on-premises it would be like a server rack or a computer that's doing the processing. It can have certain number of CPUs, whether 2, 8, 16 CPUs, however many you want, and what number of, what amount of storage you want to an 8GB, 16GB and so on.
00:29:28
Same thing in the cloud. So there's these big server racks that they have, and they're split into these virtual machines. So virtual instances, they're called. You don't need to worry about the fact that the're virtual, you just say what you need and it's completely isolated from other clients. And then you'll get your 16 CPUs and, I don't know, 80GB of RAM or whatever 80GB of memory, whatever you need to be running a model. So you can, and the benefits of doing this through the cloud, we've kind of mentioned a few of them, is that you can select the right size for the right job. You can use it today and not use it tomorrow. You only pay for what you use, you only pay for the computer resources that you use. You don't pay for it if it's sitting there idly and not doing anything. It's very agile. You don't have to buy anything and install in your service, and you get access to the latest technologies out there, so they're constantly getting updated. So that's a service to know. So whenever you hear [inaudible 00:30:21] EC2, that is the Elastic Compute Cloud service of Amazon Web Services. And that's how, where most cases you will be doing your compute, even though we'll talk a bit about other ones when we talk about machine learning.
Hadelin: 00:30:33
Yes. And now I would like to talk about S3, which is another very popular AWS service. So S3 stands for Simple Storage Service, and as the name suggests, it's storage service offered by AWS. And it's completely insane because it basically offers virtually unlimited storage. You can store virtually unlimited amount of data within S3. So it can be any of your files. It can be a CSB files, you know, for machine learning. It can be even images, videos anything, anything you want in what we call buckets. And also the very powerful thing about S3 it has, is that it has 99.999 11 times data durability, which means that even within a billion years you will not lose any item within your S3 storage. Plus it is super cheap. It is only 2 cents per GB per month of storage. So so, and you even have, you know, a free tier option, which gives you 5GB of storage that you can have for free. So yes, it's mind blowing that all the things you can do and with such a power you know, for this storage service industry. But, but you know, by the way I'm talking, it's, it's like I'm selling AWS, but not at all. I'm just in an awe, I'm just very impressed of how this storage service is so powerful and how, you know, you can use it in an unlimited fashion.
Jon: 00:32:08
This episode of SuperDataScience is brought to you by AWS Trainium and Inferentia, the ideal accelerators for generative AI. AWS Trainium and Inferentia chips are purpose-built by AWS to train and deploy large-scale models. Whether you are building with large language models or latent diffusion models, you no longer have to choose between optimizing performance or lowering costs. Learn more about how you can save up to 50% on training costs and up to 40% on inference costs with these high performance accelerators. We have all the links for getting started right away in the show notes. Awesome, now back to our show.
00:32:49
Yeah, it is a good case in point on how you can do things with cloud storage. You can't possibly have that kind of reliability. I wasn't aware of that 11 nines of data durability. So 99.9999 and whatever percent where you wouldn't lose a bit of information in a billion years, that's wild to think about.
Kirill: 00:33:11
Yes, indeed. And like Hadelin said the awe is like, that's the feeling we've been getting throughout the learning. Like every time we learn a new service or about a new service, we're like, wow, that, that gets possible, that can be done, like on AWS. You can even control satellites if you really wanted to. Of course, like most of us will, I never need that, but [crosstalk 00:33:29], that's crazy, right? Satellites, blockchain, whatever, you know, like this, or it's interesting, very interesting how, how they're updating these technologies all the time. And speaking of a variety of services, let's talk a bit about databases. So we've talked about computing, we've talked about storage. Well, another kind of important storage for data scientists is databases. And we've got quite a few interesting points here. So first of all, whatever flavor of database you like, whether it's Microsoft SQL, or Oracle PostgreSQL, MySQL, MariaDB, all those databases are available in Amazon Web Services through a service called RDS. So whenever you hear RDS, that stands for Relational Database Service. And you know, because all of these databases mentioned are relational databases you can spin up any one of those. And of course there's more. I'm just saying that the ones that you're used to working with are available in AWS. You don't have to learn something new if you don't want to.
Jon: 00:34:23
You actually, just really quickly, there's a verb you use there that I'm not sure if we clarified what it means, but you said "it's very easy to spin up an instance" and that kind of phrase, what is, what does that mean to our listeners?
Kirill: 00:34:38
What does an instance, what is spin up? All of those things. Yeah. So in instances, we, what we talked about is before we spoke about EC2 instances, here, we're talking about databases. They also need underlying resources. So when you, we say, when I say spin up an instance, I don't know if that's a term generally used in the industry, but that's how I use it. I see it, you're basically launching a database instance of a database and you're able to put things in there, you're able to store it, and whenever you don't need it, you just spin it down, spin it off, I don't know, turn it off, so you [crosstalk 00:35:10].
Jon: 00:35:12
I've never heard anyone say spin down, but yeah, definitely spin up. And I think it's something to do with this idea of like, I don't, I'm guessing-
Kirill: 00:35:21
The disc spinning, yeah?
Jon: 00:35:22
A disc actually spinning that you're like, when you press an on button, it's like, "woo".
Kirill: 00:35:25
Yeah, yeah.
Jon: 00:35:26
It's spinning. But yeah, I don't think anyone says spin down. You just a-
Kirill: 00:35:29
Okay. And that, that's a really cool transition - close down, yeah - a really cool transition to what I wanted to talk about next is there's also a really cool database on Amazon Web Services, which is Redshift. And this is a data warehouse. And so in order to understand the beauty of Redshift, we have to talk about something called OLTP versus OLAP databases, data storages. So OLTP stands for Online Transaction Processing. OLAP stands for Online Analytics Processing. OLTP, all of those databases mentioned before Microsoft SQL, Oracle Postgre, MySQL, they're designed for OLTP. You can use them for analytics. Sure, you can go in and create averages of your columns. You can find out, you know, the medians, the stamp deviations, whatever, build your visualizations and so on, run machine learning models.
00:36:19
But that's not what they're designed for. And this comes to disk, right? So on disk these databases they store, it's a very simplified explanation. And so this is how I understand it, these databases store their rows like data from a row together. So you might have 15 columns in a row, and all of those values of that row, they're stored together. Then the next row, then the next row, that's, because that's how they're written into the database. You write them row by row and that's how they're stored on disk. Whereas Redshift or online analytics processing data storage is, or data warehouses such as Redshift, what they do is they change or they shift, hence the name. They shift how data is stored on disk. Now, all of a sudden, it's not your elements of a row are stored close together of each row, but now elements of a column.
00:37:08
So all of the values in one column are going to be stored close together, all the valves, the next column are close, stored close together. And why is this important? Well, because when we are doing analytics, when we are doing any machine learning model or any visualization, we're interested in things like features, we're interested in averages of those features, which are columns, we're interested in you know, performing operations on columns, not on rows. And because they're stored close together on disk, that means you're analytics and your machine learning is much faster. And that's what Redshift is all about. There's also, basically, I wanted to say that there's also a type of data warehouse, which you can easily move your data into and use that for now. So if you have, you know, terabytes and terabytes of data, that will really speed up your machine learning and other analytics.
00:37:51
So that's on Redshift. Another really cool database is ElastiCache, which is an in-memory type of database. Really fast, non durable. So you can't really store their things there forever. But it's not designed for that. It's more designed for like analytics, real-time analytics, if we're talking about analytics or, you know, it might be some gaming analytics or some really fast things that you need to do and don't really need to store them. Another cool database is DynamoDB. Completely different type of database. It's non-relational database. It's schemaless, it's serverless. We're going to go into detail about all those things. But basically, if you're looking for a no SQL type of database where you can create your own schema per every row, that's the DynamoDB is your go-to. And basically scales automatically as you put more data into it.
00:38:41
Very powerful type of database. And that's for our listeners who might be interested in NoSQL in terms of, and of course, you can install JSON documents, you can store other types of data. If you're specifically interested in something that's for storing JSON documents, in the NoSQL space, normally we would use MongoDB outside of AWS. In AWS you have DocumentDB with MongoDB compatibility, which is their kind of version of MongoDB. Really powerful. So if you are used to using MongoDB and JSON document storage, then DocumentDB is your go-to in AWS. And those are just some of the databases. There's many more that we could be, or database services and related services more, we could be talking about.
00:39:21
But one last one I want to mention is Amazon Glue, and that's an ETL service, so extract, transform, load. You have all these sources, whether it's the databases with smaller scale databases, whether it's your S3 data, like you have CSV files in S3 or something like that, and you want to combine all of that, put it together, do an ETL process then Amazon Glue is your service to do that. So as you can see, like even the usual tools in data science that we use in machine learning we use, they have everything covered. Everything and much, much more, like the possibilities are endless. So, yeah once you transition to AWS, it's like you can still use the skills you already know. They support that, you can learn additional ones, but basically your skills are very portable into AWS services.
Jon: 00:40:10
Nice. Really cool. That was a really detailed introduction into the different kinds of database options that there are in AWS. I learned a couple new things in there, like I didn't know, because for years now we've been using GCP and Mongo. I didn't know that there was a separate DocumentDB that was comparable to Mongo and has interoperability. That's cool. And I didn't know about Amazon Glue. That sounds super useful for people who are doing, who are engineering data pipelines. Those ETL, extract, transform, load pipelines, Amazon Glue sounds like a great tool for stringing all those different operations together.
Kirill: 00:40:49
Absolutely, absolutely. There's, there's lots more services we could talk about, but don't, we don't want to overload our listeners.
Jon: 00:40:56
Yeah, no, that's a great start. So once we've got our data into whatever format we want, like a structured database or a NoSQL database when, and then we want to actually do some analytics, or we want to do some machine learning, how can we do that in AWS?
Hadelin: 00:41:11
Okay. So you have different, different services for machine learning in AWS. I will talk about the most popular one, maybe, which is SageMaker, and that's typically the AWS service to build machine learning models. And you have two ways of doing so inside SageMaker. The first way is with the SDK. I will explain what it's in the second, and the second way is with AutoML. So let's start with SDK. SDK stands for Software Development Kit, and it basically allows you to build a machine learning model classically on a Jupyter Notebook, while at the same time being able to call for the AWS resources such as EC2, if you want to get a compute instance or storage, you know, S3, if you want to, you know, store your data sets in the storage service like S3, and then call you know, load the data set for your machine learning model.
00:42:03
So you have your Jupyter Notebook, you can call the SDK to, to get the AWS resources while at the same time use the classic deep learning or machine learning libraries like TensorFlow, and you combine everything and you build and train your machine learning model. So that's the first way. Now the second way, which is amazing also is with AutoML. And AutoML allows you to basically build your machine learning model. But not only that, build a whole pipeline of you know, implementing a machine learning model without coding any line of code, without typing any line of code. It's basically with just some plug-and-play. Here's some, a few clicks that you managed to build this whole pipeline. So it goes through several steps. First you know, you input the data, so you first store your data in S3, and then you load the data in the first step of this AutoML pipeline.
00:42:53
Then AutoML will automatically recognize what type of you know, problem it, it is like regression or classification. You just have to specify the target, you know, and then automatically AutoML will recognize the features, the input variables. Then the next step, so you just have to answer a few questions for example, on your data set. But generally AutoML will detect if there is any, for example input variables to pre-process, like categorical variables. Then in the next step, that's where you're going to choose your training process. And you have three options. The first one is auto. So that's the option that will basically test an ensemble of models that AutoML you know usually detects as the best potential candidates for your dataset. You know, it identifies this fit between your dataset and a good selection, you know, potential good selection of models, and will use these models to test them like in an ensemble.
00:43:54
So that's the first option. Then the second option is basically train and test everything. So here you will just have in the pipeline all, you know, all the TensorFlow models. You will have all the gradient boosting models, deep learning models, neural network-based models and others. It'll test all of them, and it'll return at the end, you know, the combination of models with the best combination of hyper parameters that leads to the best score. So that's of course probably the option that will give you the highest score, you know, the best score. But at the same time, it's the most compute-intensive, and also it's going to, it's the one that is going to take a lot of time. I actually waited a couple of tens of minutes with this option.
00:44:36
And the third option is just a hyper-parameter tuning focused. So it'll not test all the models that are in SageMaker, but it'll test, again, a good fit with your data set while doing a lot of hyper-parameter tuning to make sure to find the best per hyper-parameters. And among these three options, the one that I recommend is auto, because you know, it'll generally be the one that leads to, to, to, you know, the best compromise between having the best score and not taking too much time. And at the end yes without having implemented any line of code, you get your model, you know, the whole code of the model you get the training score, so accuracy for classification and means squared error for regression. And and then yes, you get amazing results.
00:45:21
I tested it several times, as I said before you get the best scores with SageMaker. And then finally you can deploy this model because, you know, you get this model as an endpoint, so you can use this model to then deploy it to make some new predictions. And so you can get this model, make new predictions on new observations. So, and that's really amazing because that's really also a way to democratize machine learning and data science, because now people without having, you know, coding skills can train the machine learning models super easily. They just need to understand how to pre-process you know, the data, how, how data should be pre-processed, how the machine learning pipeline works, and then they, they're ready to do machine learning. So that's pretty amazing.
Jon: 00:46:04
Yeah. So I've, I have two questions for you. One is probably a short one, and then one is maybe a longer one. So the first one, the short one is you mentioned the idea of being able to transform your model into an endpoint. What's an endpoint?
Hadelin: 00:46:18
So basically in SageMaker, you have well, in, in AWS in general, you have endpoints, which is the link basically to your model, or you, or to your instance or to your, anything. You know, any feature of AWS has an endpoint which you can access which allows you to connect this to other resources. And here, this is for you know, to make some predictions. So you have your models, which you can access through the endpoint, and then you can deploy it to make some predictions.
Jon: 00:46:51
Nice. Yeah, I think like the, maybe the kind of general idea there is, it's like an API endpoint.
Hadelin: 00:46:55
Yes.
Jon: 00:46:57
So yeah, so it's like this idea of microservices architecture where if you have all these discrete services programmed in their own way, and then they just, you have this endpoint for accessing whatever that service is. So it could be a model, it could be a data resource.
Hadelin: 00:47:14
Exactly.
Jon: 00:47:15
It could be any part of some production application. And so yeah, very easy to work with, and it's like the standard way of building applications today.
Hadelin: 00:47:24
That's right.
Jon: 00:47:24
And then the other question that I have for you it's funny, I thought that was going to be the short one. And so, but maybe this one will end up being the short one. I don't know. I think that this is kind of a more open-ended question. So you described how with SageMaker there, we can do all these incredible things. It can select our features for us automatically from all the data. It can pre-process those features. It can select from myriad different models and tune the hyper-parameters on all of those different models, and then ensemble them together into a super amazing model, and then provide us with all of the summary statistics and allow us to quickly create an endpoint on this super amazing model. So why isn't everyone doing that all the time? Are there situations where we're better off having more control? Or why, like, why does anybody learn how to make ensembles themselves or tune hyper-parameters themselves if they can be doing it in SageMaker?
Hadelin: 00:48:20
Yes, that's a very good question. I would say that it's because you know, people probably think it's quite complex to use SageMaker and AWS maybe. There hasn't been you know, that much training to democratize this. But indeed as I said before, now I practically only use SageMaker because that's with SageMaker that I get these best models, and even by going with the easy option of selecting AutoML and you know, just putting the whole pack of models in an ensemble and doing some hyper-parameter tuning and get this, no, but seriously, I would say that the way to, to not use SageMaker is if you want to have indeed more control. And that's maybe in a, in a research context with, within a research context here, maybe you have to you know explore more avenues on how you want to combine your models with, with the data to come to your objectives. But if you want to do a classic data science project with some classic machine learning predictions, yes, I would definitely recommend to go for a SageMaker and AutoML. I mean I don't think that more control is necessary to get to your amazing score that you need to get.
Jon: 00:49:42
Cool. Great answer. Other than SageMaker Hadelin, any other ML tools that we should know about in AWS?
Hadelin: 00:49:50
Absolutely. So there are a lot of ML tools. So you have okay, so I'll tell you about one of my favorite, which is a DeepRacer because I have a particular preference for reinforcement learning. So yeah, you have AWS DeepRacer to test your reinforcement learning models for you know car racing. So that's pretty cool. You have also-
Jon: 00:50:10
DeepRacer?
Hadelin: 00:50:10
Yeah, DeepRacer, yeah.
Jon: 00:50:14
Cool.
Hadelin: 00:50:15
Yeah. You have Augmented AI, which allows you to implement a human review of machine learning predictions. You have Amazon Forecast to make some forecast for example, on time series predictions. You have Amazon Translate to do you know, machine translation. You have Amazon Comprehend to do NLP. You have Amazon Recognition to do object detection and object recognition. So you see, you have many of them. You have basically an AWS service for every branch of machine learning, whether it is computer vision, NLP, machine translation. So yeah, you have Amazon Transcribe for speech recognition and you have Amazon Polly also, which turns a text into speech, yes.
Jon: 00:51:04
Wow. That is a lot of services.
Hadelin: 00:51:07
Yes. Yes.
Jon: 00:51:09
And most of those I didn't know about. All right. So with all that, with everything that you've covered around EC2 instances and S3 buckets and databases and machine learning in the cloud you've passed my test. You've passed my quiz.
Hadelin: 00:51:22
Oh, I'm really [inaudible 00:51:23].
Jon: 00:51:23
Congratulations.
Kirill: 00:51:24
Awesome.
Jon: 00:51:25
So I'm now happy to officially recommend your course to our listeners. So in lieu of a book recommendation, how about you guys fill us in on your upcoming course, the launch, and any other details that our listeners need to know so that they can learn about the Cloud for Data Science from you two.
Kirill: 00:51:45
Thanks, Jon. It's been an honor to come on the show. Again, thank you for tolerating us and inviting us and hopefully we were able to share as I mentioned at the beginning, we're not here to, to make any kind of sales pitch or anything like, our goal was to share some information with our or the SuperDataScience podcast audience so that they get some general knowledge about the cloud, and hopefully we were able to accomplish that objective. And speaking of the project as mentioned before, it's CloudWolf.com because clouds are cool and wolves are cool, and just going to say cuz wolves are cool when I said cloud. Anyway, both are cool. It'll be a membership. And there'll be lots of really cool stuff. We're building it up.
00:52:31
We're going to be adding more and more things. This is this, like when the podcast is live, this is literally the first week that it is available. So we have some very special things for the early wolves, not early birds, but early wolves. Most importantly, that you can lock in a very attractive price that is likely never going to be available ever again, because you will be part of the very first group of people. And of course there'll be some things that, you know, we'll need to fix in the platform, some things you'll need to be patient with us with, and we'll be adding more content. But that means, you know, you can lock in this great price for your membership.
00:53:08
Basically, we're starting off what we will have there eventually with time, we're going to be adding lots of labs on cloud. We'll have a community around the cloud and people who want to get AWS certified and want to grow together and learn these things together. We'll have study guides, exam samples, and so on. To start with, we're launching with some exam samples. We're going to have three, we aready have three exam, practice exams for the AWS certification. Don't forget, this is focused on AWS learning, AWS and getting certified in AWS. And we'll have this first, our first initial course on the AWS Certified Cloud Practitioner exam which is about 14 hours long. It's quite a big course. There's lots of services, but as mentioned before, we expect that with about two hours per day, it can be done in 21 days. But it covers lots of things. And our, what we're proud of is like we really focus on our educational style because it's, there's so many services we couldn't afford to just talk and talk and talk in our videos.
00:54:09
So we made our videos very concise. Our video average length is about three and a half minutes per, you know, per service or per part of service. So it is very sharp to the point, kind of like what you experienced in the podcast today. Like, we, we just like straight to the point what it's about, what you need to know, how to use it. As usual, I do the theory, Hadelin does the practical. So that's our main offering. But in addition for that, because we come from data science and because we have this like data science legacy, and, like we, we love data science, machine learning we couldn't help ourselves but include a special bonus mini course which Hadelin will tell us about just now.
Hadelin: 00:54:46
Yes, in that special bonus mini course is data science in the cloud, where we teach you how to use and mostly leverage the AWS resources to build and train your machine learning models and eventually get to that best score we were talking about during this podcast episode. So, we use SageMaker, of course, we use particularly AutoML. I'll teach you how to, you know, very easily use that whole and build that whole machine learning pipeline with your data set first that you load, and then you know, clicking a few options to, to pre-process your data set. Then choosing among the three options of how you want to train your model, you know, with this ensemble model or with this hyper-parameter tuning option. And then I show you how to you know, get the final trained model, the best model with the scores and the different features.
00:55:38
And then I train you how to deploy your model. And also you know, by doing this course, you will understand you know, already a lot on AWS resources because you will see that we actually use a couple of AWS services to you know, work with SageMaker. And this includes S3, but also IAM to manage your permissions within SageMaker and EC2. So you will see it's already a, even if it's a course for data science, it's already a good introduction to AWS services because we used the main ones. And also you will see that I show you a couple more services. I like the ones we spoke about, you know, for example, Amazon Recognition. So I'll, I'll show you where they are and how to use them. Not all of them, but the coolest ones.
Jon: 00:56:32
Nice. Sounds really cool. So, you guys talked a fair bit there about the AWS certification. Why does that matter? Why should somebody care to get a special certificate as opposed to just learning the relevant skills?
Kirill: 00:56:46
Well, the good thing about cloud that we found is different to data science is somehow, because it comes from like this on-premises background with where servers and things have existed for, you know, decades and decades, it's very established in terms of the different career pathways, in terms of the different skills that people need to know in data science. For now, we don't have a generally worldwide recognized certification source, but I'm sure listeners would agree that if there was one, it'd be a good idea. If you already know data science, you already know machine learning, it would be good idea to get certified and put that on your resume because it's just like a stamp of approval. Same thing here. Like you're going to be learning the cloud. If, if you make this choice, if you choose to, you know, join us on this journey and learn the cloud and get upskilled in this area and add that to your tools toolkit, why not? Why not get certified as well? And put that on your resume.
00:57:38
Imagine like an employers says like AWS skills required, or AWS skills necessary because they're planning on using the cloud or are using the cloud. Who are they going to go for with somebody who says they know the cloud? Or somebody who has a certified badge from AWS saying that they do know the cloud? That of course, you know, the, in the latter option, the second option, it's much more preferable. And so just for a little bit of extra effort, why not get the certification? And they're valid for three years. So you get it now, you keep it for three years, and that's going to be very useful as well for your career.
Jon: 00:58:10
Nice. That's crystal clear. That makes a lot of sense to me. And so just a bit of clarification here. So between the two of you, you've had many millions of students on Udemy. Is this course going to be on Udemy too?
Hadelin: 00:58:23
Yeah. Thank you Jon. So first we don't have many millions of students. We have two, but that's the start, I guess.
Jon: 00:58:29
Two students?
Hadelin: 00:58:31
2 million, 2 million students, but no, we're not going to be on Udemy, and that's for the simple reason. That's you know, we really want to build this new community, this big community, and inside this community we want to have more flexibility. So while Udemy offers a lot of great features, you know, to communicate with the students, we will have a lot more flexibility if we can you know, host these new courses and build this community on our own platform, it'll be much more interactive. There will be this better sense of community building and, you know, with the wolf pack, so on CloudWolf. So I guess there is a lot more that we can do by having you know, building this cloud educational platform with our own community and with our own you know website.
Jon: 00:59:22
Nice, crystal clear. Well, I wish you guys the best of luck. In fact, I feel like I don't even need to do that. I know that this is going to be tremendously successful, like everything else that you two have ever touched. And so really excited to see this course launch and have people be using it. I know the feedback's going to be great. Thank you so much for coming on the show, both of you, Kirill and Hadelin, and giving us a kind of a one-on-one episode for data science in the cloud. Thank you so much, and I'm sure it won't be long before we have you on again, giving us an introduction an invaluable introduction to another critical data science concept.
Hadelin: 01:00:01
We hope so.
Kirill: 01:00:01
Thanks a lot Jon.
Hadelin: 01:00:03
Thanks a lot.
Jon: 01:00:10
Always great to have Kirill and Hadelin on the show as there're lots of fun and certainly do all their homework, coming to their episodes terrifically well prepared. In today's episode, Kirill and Hadelin filled us in on how the cloud platforms enable us to rapidly scale up and down compute infrastructure as needed, minimizing our costs in many common circumstances. They talked about how as data sets and machine learning models get bigger and bigger, data scientists will benefit from being proficient at using cloud platforms themselves. How AWS, Azure, and GCP are all solid options for most data science use cases. And given that AWS is the most popular of these services, they highlighted particular aspects of it, such as EC2 instances for compute S3 buckets for storage, Redshift for an OLAP, that's online analytical processing database that is designed for efficient data analysis operations. And Hadelin talked a fair bit about SageMaker which is a really powerful tool for automated machine learning.
01:01:07
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Kirill and Hadelin's social media profiles, as well as my own social media profiles at superdatascience.com/671. That's superdatascience.com/671. I encourage you to let me know your thoughts on this episode directly by tagging me in public posts or comments on LinkedIn, Twitter, or YouTube. Your feedback is invaluable for helping us shape future episodes of the show. And if you'd like to engage with me in person as opposed to just through social media, I'd love to meet you in real life at the Open Data Science Conference East, ODSC East, which will be coming up soon in Boston from May 9th to May 11th. I'll be doing two half-day tutorials. The first tutorial will introduce deep learning with hands-on demos in PyTorch and TensorFlow, while the other half-day tutorial, which is brand new, will be on fine-tuning, deploying, and commercializing with large language models including GPT-4. In addition to these formal events, I'll also just be hanging around and grabbing beers, chatting with folks. It'd be so fun to see you there.
01:02:13
All right, thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the SuperDataScience team for producing another super useful episode for us today. For enabling this super team to create this free podcast for you, we are deeply grateful to our sponsors whom I've hand selected as partners, because I expect the products to be genuinely of interest to you. Please consider supporting this free show by checking out our sponsors' links, which you can find in the show notes. And if you yourself are interested in sponsoring an episode of the show, you can get the details on how by making your way to jonkrohn.com/podcast. And of course, thanks to you for listening. It's because you listen that I'm here. Until next time, my friend, keep on rocking it out there and I'm looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.
Show all
arrow_downward