SDS 782: In Case You Missed It in April 2024

In this episode of “In Case You Missed It”, Jon Krohn recaps his interviews from April. The conversations range from Chief Scientist at Posit PBC Hadley Wickham (episode 779) on the subtle differences between Python and R to Professor of Business Analytics Barrett Thomas (episode 773) explaining the variables that companies should consider when using drones or any other tech to improve their business operations and bottom line.

In this “In Case You Missed It”, you’ll also hear Aleksa Gordić, Founder of Runa AI (episode 775), on why an overhaul of the current educational system is long overdue, from primary school all the way to university level. Aleksa feels that there is a lot of scope for education to become a lot more tailored and dynamic for the student, and he notes that the industry shift in interest away from degrees and towards a concrete portfolio of projects should encourage emerging data scientists to focus on building and creating. Aleksa also gives his advice on how to stay motivated when starting self-directed learning alongside his work.

In a clip from episode 777, Bernard Marr discusses the future of GenAI and its impact on the world of work. He outlines 20 skills that workers of the future will need, three of which are technical skills. Beyond data literacy, Bernard believes the other 17 skills should focus on uniquely human capabilities like interpersonal communication and creative problem-solving.

Our fourth episode takes Jon back to SuperDataScience founder Kirill Eremenko’s lively workshop on gradient boosting (episode 771). Kirill’s workshops are loved for their attention to how data science and tech can solve a huge range of vital, real-world problems across the fields of tech, medicine, retail, and more. In this workshop, Kirill shows how gradient boosting helps those who use it zone in on the best-possible opportunity for improvement.

ITEMS MENTIONED IN THIS PODCAST:

DID YOU ENJOY THE PODCAST?

Python or R: Which do you prefer? Bonus points if you can avoid an internet argument in the process!
Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:02

This is Episode #782, our “In Case You Missed It in April” episode.

00:00:19

Welcome to the Super Data Science Podcast, I’m your host, Jon Krohn. This is an “In Case You Missed It” that highlights the best parts of converzation we had on the show in the last month. In episode #779, I speak to Dr. Hadley Wickham about putting the ‘R vs Python’ argument to bed. Here, the many-time bestselling author and world-renowned open-source developer highlights the relative similarities of the two most popular open-source programming languages for data science, as well as the key differences between them.

00:00:50

For our listeners, if there are listeners out there who don’t already use R, why should they be using it? For me, I can actually give one example, which is for me, for data visualizations, I still find I can do things way more quickly, have much more fun making visualizations in R, and get exactly what I want. There had been in the past attempts to create a ggplot style Python library, but the one that I had been using became deprecated and harder and harder to use. It never had all the functionality of your ggplot2 anyway. Anyway, so that’s my big example. I don’t know if you have big examples of why people might want to use R still today.

Hadley Wickham: 00:01:35

Yeah, I mean on the topic of ggplot2 specifically, I think the best Python equivalent is plotnine, and that’s actually by a developer Hassan Kibirige that we’ve been sponsoring at RStudio, at Posit I think. And I think that’s that’s the best possible realization of ggplot you can get in Python. But I think there’s things about the design of R language that just make certain tasks much easier and more natural to express in R code than you’ll ever be able to do in Python. And I think that comes down to at the heart of it, like R is more of a special purpose programming language. It’s designed from the ground up to support statistics and data science, and I think that has a lot of benefits, particularly if you’ve never programmed before. I think you can get up and running in R, using R to do data science, you can do that without learning a ton of programming, get up and running pretty quickly.

00:02:39

And then there’s just sort of things about the language that other languages look at R and they’re like, “Oh my god, that’s a terrible idea or that makes me want to throw up in my mouth.” But there’s just so many things that are just so well-placed to support interactive data science, where you really want that fast and fluid cycle where you’re trying things out. That obviously bends to maybe a little bit of weaknesses on the kind of like, now I’ve got this thing, I just want to do the same thing again and again and again and again. R tends to be a little bit magical. It tries to kind of guess a little bit more of what you want and that’s great when you’re working interactively and it guesses correctly. It’s not so great when you’re working on a server somewhere else and it guesses the wrong thing. But just R, like everything about R I think makes it such a fluid environment for really exploring your data, digging into it, figuring out what’s going on.

Jon Krohn: 00:03:38

Speaking of differences between R and Python, I seem to remember, and you can correct me if I’m wrong about this, but I feel like you have a famous tweet from years ago where somebody says something like, and it must’ve been a famous poster themselves that you responded to and I can’t remember, it might’ve been like Wes McKinney or somebody like that saying that one of the advantages of Python is that it’s faster than R, and then you have this super famous reply of, “What is that? And I will make it faster.” Do you know what I’m talking about?

Hadley Wickham: 00:04:15

I don’t, but I know I’ve seen things like that in the past.

Jon Krohn: 00:04:19

Yeah, it’s a misperception because Python isn’t actually that fast itself. I mean whole languages like Julia have come up to be faster than Python.

Hadley Wickham: 00:04:32

Yeah, I think one of the reasons often the biggest, you have the worst arguments with your family and not with strangers. With people who are so similar to you, you tend to have more friction than the people are really different. I think because R and Python are actually really close together in the spectrum of programming languages, it’s so easy to see all of the little things that look weird to you as opposed to looking at some programming language that’s miles away and it just looks, it’s totally different. I just think that, I don’t know, I think there’s something to that because we’re close, you can see these little differences. And certainly when I see things in Python that people are like, “Wow, that’s really cool.” I’m like, “Challenge accepted. I’ll make that better in R.”

Jon Krohn: 00:05:20

Nice. One of my favorite things that you can do really well, thanks to the dplyr library that you led development of is piping. And so you can extremely easily have functions passive, just like if people are familiar with Unix programming pipes there. Where you have output from one function goes the input to a next function and prior to me discovering dplyr, which was probably around 2010, if that makes sense. Prior to that I would have so many variables in my workspace. It was just such a pain to keep them all straight and you just end up in these weird situations, where should I be investing time thinking about the name of this intermediate variable? Am I going to use this later or should I just name it like intermediate variable 15 and have really ugly code?

00:06:18

And so piping gets rid of all that where you can read the flows like a sentence. You’re like, okay, this pre-processing step happens, then this next, and you can just see it so easily. It makes it so elegant to read. Do you think we’ll get to a point where, and I have used some kinds of piping attempts in Python, but my experience of that has never been, and I guess it’s been a few years since I’ve tried, but it seems like it’s never been as smooth or as easy as with R. And maybe that’s related to what you were talking about earlier with data visualization.

Hadley Wickham: 00:06:50

Yeah, the native equivalent of piping in Python is method chaining. If you’re using Pandas, you do something dot something.

Jon Krohn: 00:07:03

Yeah Panda’s-.

Hadley Wickham: 00:07:05

Dot something. But the big difference between method chaining and the pipe is in method chaining, all of those methods have to come from the same class. They have to live in the same library, the same package, whereas with piping, they can come from any package. And I think the thing that’s really interesting about that is that has meant Python has tended to have these fewer bigger packages like Pandas and Scikit-learn, Matplotlib, kind of everything in order to work with method chaining, everything has to be glommed into this one giant package.

00:07:44

Where with R, because you can combine things from different packages, the equivalent of Pandas is kind of like dplyr and tidyr and readr and a bunch of other things. It’s way easier to add extensions to ggplot2 than Matplotlib that work exactly the same way because you can just combine them with different pieces. So I think that’s just one of these interesting subtle differences in language design that lead to fairly big impacts on the user experience and almost even how the community has to work together and form.

Jon Krohn: 00:08:19

From the impact of small differences, we turn to modernizing the education system with Aleksa Gordić. In episode 775, the creator of a digital A.I- learning community of 160,000 people talks about the recent movement from formal education toward self-directed learning online.

00:08:37

So how do you think the tech industry’s perception of formal education is changing? So we talked in this episode a lot about self-directed learning, including just now, but with formal education, do you think with the tools that we have today, with all of the content that there is online, for careers like AI, machine learning, data science, software engineering, do you think that formal education should be changing because self-directed learning can play such a big role?

Aleksa Gordić: 00:09:09

A hundred percent, man. Don’t even get me started on education, we could have a whole podcast episode only on this topic. I mean, I definitely think that most of the education systems across the world are still stuck in the late 19th century. The industrial revolution type of a context where you’re just sitting down with a lot of people who have completely different interests from you, you’re just connected by the geography because you happen to live in the same space. So starting from there, there’s so many things that need to be changed about education starting from elementary school all the way to best PhD programs in the world at MIT or Stanford, even those are not really optimal.

00:09:49

And so there is definitely a shift of sentiment happening across the industry, especially now given the latest AI boom, and I see much less people encouraged and motivated to go and pursue the PhD path as opposed to just building because they realize, hey, you can do so much, so much without any PhD or master’s or even bachelor. You can learn so many of these things on your own. And that being encouraged by the likes of Elon Musk gives it a lot of gravity, right? Because some of those highly successful entrepreneurs are saying, “Hey, when we hire, I don’t actually care that much whether you’re from Stanford, man. Show me what you build, what makes you stand out? There is 2,000 people coming from Stanford every year doing CS.” I don’t know. I don’t know the number, I’m just throwing out a number.

00:10:43

And I don’t know, I noticed on my own that I definitely have achieved probably much more than a median Stanford person would and I know for sure that they told me already that at MIT and Stanford, they used to have classes where they would watch some of my videos I mentioned to you. The ones that are really in-depth. So from that standpoint, I became a teacher for some of them. Which makes me feel, not saying this to sound arrogant, but I feel proud about that but it also tells me, “Hey, if they’re learning from me, that means I’ve done it myself. I didn’t need to go to Stanford or MIT to achieve same level or greater levels.”

00:11:18

So it’s possible, but again here, self-awareness matters a lot. You have to know whether you are that type of person who can be that self-directed and prospering in that multitude of choices, as you said before. And having that, it’s not an easy thing for everyone. Because when you have a strict curriculum, you’re going to Stanford. You know that every day you’re getting closer to the credential of being at Stanford or a Stanford alum. And so it’s easier than, yeah, I’m doing this and at the end, there is no credential unless you build some public artifacts, but you have to be much more self-confident and self-directed and build your own curriculum and execute on it. So it requires different type of mentality.

Jon Krohn: 00:12:00

Nice. Yeah, we’ll include some of those Medium blogs in the show notes so that people can check those out, and blending some things that we’ve been talking about already in this episode. So you’ve been talking about learning now most recently, but are there ways that you envision AI transforming education, perhaps making learning more personalized, accessible to people with different backgrounds, interests? We talked about this in the language context already, specifically where somebody who only speaks Serbian could be learning entirely from English documents in the very near future. That is not science fiction, that is science today. But yeah, are there other ways that you could envision AI transforming education?

Aleksa Gordić: 00:12:42

I mean, 100%. AI tutors are the future, and that’s the only way we can scale this up because I complained previously in the episode that we still have this industrial revolution legacy of putting 30 people in the same classroom, which doesn’t scale because you cannot personalize or have that one professor teacher give the same level of attention customized to the learning style of every individual pupil in that class. It’s impossible to scale that up.

00:13:11

And also due to incentives, you basically don’t have the best teachers in the world, right? Because if somebody is that good, they’ll probably not go and teach at elementary school, they’ll go to MIT. And so there is also that incentive moment there that prevents us from having the best possible education. So the only thing that can scale are algorithms, software can scale. And so AI tutors are definitely the future and I see already myself using on a daily basis definitely ChatGPT and coding assistants. I think those were the two most important AI products for me personally, like code assistance, so Copilot, which I can use for free by the way, as an open source contributor, that’s a very nice gesture from OpenAI. And then secondly, I just use the chat assistant and mostly ChatGPT. I mean, 100% of the time actually I use ChatGPT. It just works.

Jon Krohn: 00:14:05

I love them. They’re so powerful. They’ve transformed how I do everything. And it’d be crazy if you’re listening to the show and you aren’t paying the $20 subscription for access to things like GPT-4 or Claude 3 or Gemini Ultra. These algorithms, it’s amazing how much more quickly I can learn topics instead of especially writing code. I think that’s where it’s most useful because I used to spend so much time getting stuck on small issues that it’s not a learning experience where you’re getting stuck on these trivialities of code semantics, but having to spend time digging through Stack Overflow, I mean, I guess before the internet and Google searches, it’d be even more laborious having to go through textbooks to figure out how to solve some problem in your code. And now you can focus so much more high level on the problems that you’re solving as opposed to getting stuck on the syntax, which is so nice.

Aleksa Gordić: 00:15:07

100%. Everything that’s repetitive, you as a human should just say, “Okay, go execute this for me N times.” You don’t want to be the for loop. We’re literally being, well, the history of civilization is us going from being calculators and dumb machines to being more and more free to do high level cognitive work, right? Because you previously literally had people who were computers. In Ancient Greece, you had people who were acting as memory sticks because they were learning and memorizing every single transaction. And that’s why you had so many memory techniques being developed back then, like Roman memory room, memory palace or whatnot. Everything happened back in Rome and Ancient Greece probably before that because people had to memorize, had to compute. So all of these methods were developed and all of a sudden we need less and less of that.

00:15:58

And now finally you’re getting freed up to do just creative stuff, hopefully. We’ll see. I mean when you get to superintelligence, it is just, all bets are off how the future society looks like and where do you find purpose and meaning. And one could make an argument that, hey, take a look at chess and what happened with chess or Go, it’s not like humans stop playing the game just because they’re not the best in that game. It turned out that it’s more of a symbiosis and humans became much better and are using AIs to devise new techniques and moves that they’ve never done previously, but with a caveat that these are not the AGI’s. That’s why I say it might be like all bets are off when you get to AGI, not just a very constrained type of specialized AI such as whatever AlphaGo or Stockfish or some engine of that sort for chess or those games.

Jon Krohn: 00:16:49

Exactly. It’s really mind-bending. It is difficult. I mean, I can’t wrap my head around what this future will be like when we are no longer… Humans have enjoyed for some time now being by far the dominant intelligence on the planet. And when there’s something else around, it’s like asking a chimpanzee to do calculus. The chimpanzee is very smart. It’s one of the most intelligent animals on the planet, but you’ll never be able to get it to graduate from a Stanford degree. And so when there’s something else around that, we can’t even, in the same way when the chimps sees us writing the equations on the board, it’s hopeless. And for us, we could be soon encountering this intelligence where it’s hopeless for us to try to understand it.

Aleksa Gordić: 00:17:48

Yann had a take on this and he said that, “Take a look at the current society and you’ll see many examples of greater intelligence is being controlled by much smaller intelligences.” And you see this across the companies. You have dumb CEOs who just grit and had the luck or whatnot, or I don’t want to diminish them, but oftentimes they’re not as smart as many of their employees. And that happens. But the thing is, the thing that remark of Yann’s, it doesn’t do it justice because we are not talking about small difference, a couple of points or tens of IQ points. We are talking about something that can exponentially then improve itself and you can scale it up and it can be much smarter than humans. So we are talking about cat compared to human. Cats never controlled humans. I mean, well, that’s maybe a [inaudible 01:24:33]-

Jon Krohn: 00:18:42

You picked the wrong animal.

Aleksa Gordić: 00:18:43

I picked the wrong animal. Maybe pigeons. Let’s take pigeons. They’re like less high agency. Cats are like the apex predator of this world.

Jon Krohn: 00:18:51

Exactly.

Aleksa Gordić: 00:18:53

I mean, but you get the point. It’s not going to be the same qualitatively speaking, when you have something that’s alien intelligence, that’s intelligence that makes Einstein look dumb. And then as I said, all bets are off. We don’t know what happens, how that dynamics plays out.

Jon Krohn: 00:19:10

We may have had a couple of fact-checks from cat owners about that last comment! We can probably agree that our pet cats are more interested in control than collaboration. And, when it comes to working with AI, many of us worry about these tools ‘controlling’ how we work and think. How can AI work with us and not over us? Or, to put it bluntly, how can we ensure AI doesn’t go the way of the domesticated cat? That’s what I ask the world-renowned futurist Bernard Marr in episode number 777.

00:19:40

In chapter five, you talk about powerful ways that organizations can harness GenAI that highlight the potential for human collaboration with AI as opposed to replacement. Do you want to talk about that at all?

Bernard Marr: 00:19:54

Yeah, I get this question asked all the time. What will this mean for jobs in the future? I think there’s a lot… I have three children. I’ve got three children between the age of ages of 12 and 18, and I worry about what that might mean for them in the future. My hope is that AI will not replace us, but augment us. What I’m also hoping for us that AI will make us more human instead of less human. Sometimes we position AI as men versus machine, and I completely understand why, because it sells newspaper and papers and magazines and people click on articles that say, “Okay, AI is coming for your job.” But my hope is, in practice, that will augment our jobs. I have actually written an entire book on future skills because I get this question asked all the time. What skills will we need? How do we compete with machines in the future?