SDS 477: How to Thrive as an Early-Career Data Scientist

We discuss options for starting from scratch in programming, the benefits of CS degrees, software and tools you need to get started, the powerful graph convolutional networks, RNA-based therapies, and more!

About Sidney Arcidiacono

Sid is a Data Scientist currently pursuing an Applied Bachelor’s of Computer Science at Make School in San Francisco. Approaching problems from every angle before diving into a solution allows Sid to uniquely solve complex problems. Sid’s ability to analyze, interpret, and predict from data is only outmatched by her passion to transform the world of healthcare.

Overview

Sidney and I met at the Open Data Science Conference East where she attended my lecture on linear algebra, calculus, and probability skills for data scientists. Currently, Sidney is studying computer science at Make School, a project-based 2-year bachelor’s program which was founded as an alternative to sitting through lectures. Sidney decided on a CS degree over their data science track partly because the data track is still in progress. As a result, Sidney has been asked to help build the track out, which requires a lot of research on her side. But, Sidney knew she was interested in AI from the beginning.

Originally, Sidney went to school for pre-med, working as a phlebotomist. Upon moving to California, she had to re-take credits that didn’t transfer, go through licensing again, take side jobs, and while working at a clinic she realized she was disillusioned with school and the process. She dropped out and started focusing on her interest in AI for healthcare.

From there we switched over to discussion of graph theory and specifically convolutional networks with graphs. We started by defining graph theory before moving into what really fascinated me, which was that Sidney’s graphs in this context don’t conform to that structure. So how can a convolutional neural network handle something non-Euclidean? A graph convolutional network will take a representation of the graph, understood by the computer, to embed the graph via adjacency matrix. This allows one to transform an incompatible structure. Sidney uses this in her work in data in biomedicine. Prior to starting her program and this project, she had almost no understanding or experience in programming which is a huge compliment to Sidney’s self-study and Make School’s program.

As far as tools and software, Sidney has been working with Spectral and PyTorch and Jupyter Notebooks. I personally love teaching in Colab, which only requires a Google login to pull up a Jupyter Notebook for your work. It’s an easy way to learn how to program and share your work. Spectral and PyTorch Geometric are also used for the graph convolutional network work Sidney does. Almost all her coding is done in Python and it’s in Python she learned how to code.

Outside of school, Sidney works in GreenLight Bioscience as a data science intern. They’re working on applications of RNA molecules (famously used for the COVID-19 vaccine). From there we discussed how others can follow in Sidney’s footsteps, jumping into computer science and data work. Sidney suggests doing the work and being willing to network as much as possible. Networking and community are incredibly important because most people in your life won’t share your interest and level of understanding on things like graph convolutional networks. Her goal, ultimately, for her work is to make a tangible impact—something to point to where she can see the quality of life is improved for folks.

In this episode you will learn:

What is Make School? [5:00]
Sidney’s interest in AI and computer science [10:56]
Graph theory and graph convolutional neural networks [19:53]
What tools does Sidney use for her work? [31:16]
Sidney’s internship [36:52]
How other beginners can get involved in data science [38:12]
Sidney’s goals [41:57]

Items mentioned in this podcast:

Make School
GreenLight Bioscience
SuperDataScience 99-day Career Challenge
PyTorch Geometric
Spectral Python
SDS 447: Commercial ML Opportunities Lie Everywhere
Thus Spoke Zarathustra by Friedrich Nietzsche

Follow Sidney:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon: 00:00

This is episode number 477 with Sidney Arcidiacono, computer science student at Make School and data science intern at GreenLight Biosciences.

Jon: 00:12

Welcome to the SuperDataScience Podcast. My name is Jon Krohn, chief data scientist and best-selling author on deep learning. Each week we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple.

Jon: 00:42

Welcome back to the SuperDataScience Podcast. I am delighted that we’re joined on the program today by the sharp and engaging Sidney Arcidiacono. Scarcely a year ago Sidney was inspired to get started in a data science career after reading news describing medical breakthroughs made possible by artificial intelligence. She had no experience with programming or data modeling so she decided to join an intensive program offered by Make School at their San Francisco campus, where she began undertaking an accelerated bachelor’s degree in computer science with a specialization in data science. Wrapping up her first year at Make School, Sidney is thriving. She’s become a curriculum assistant for the data science specialization at the school and landed her first full-time job in the field as a data science intern at GreenLight Biosciences, a fascinating biotechnology company that tackles medical issues with RNA, the same class of molecule that is currently making a splash around the globe in the Pfizer, BioNTech, and Moderna Coronavirus vaccines.

Jon: 01:45

In today’s episode, we cover a number of topics that will be primarily of interest to folks thinking of moving into a data science career or are early on in their career. We’ll discuss options for starting from scratch in programming and data science, including why it may be wise to start off by pursuing a computer science degree like Sidney is. We’ll talk about the software tools and approaches that are most essential when getting started in software engineering or data science and straightforward implementable strategies for being extremely successful early on in your data science career like Sidney has been during the extraordinary first year of her data science journey. On top of that, we’ll also cover a few topics that will be of interest to more experienced data scientists and well, anyone, including discovering the cool powerful graph convolutional networks that can be used to find patterns in graph data, and how data science is critical to the success of RNA based therapies like the BioNTech and Moderna vaccines.

Jon: 02:51

Sidney, welcome to the show. I’m delighted to have you here. Where in the world are you calling in from?

Sidney: 02:58

I am currently calling in from the mountains in Southern California.

Jon: 03:03

Oh, is it beautiful in the mountains of Southern California?

Sidney: 03:05

It is beautiful in the mountains of Southern California.

Jon: 03:09

Damn. All right. How has the pandemic been lately over there? Are things starting to open up?

Sidney: 03:16

Yeah, they are. It’s a little bit less crowded up here. I actually just moved down here from Oakland. So, getting a little bit more space to breath, so to speak.

Jon: 03:25

Nice. That sounds really nice. So, we met at ODSC East, the Open Data Science Conference, and it’s funny that they still call them things like east and west and Europe because, well, it’s all been virtual for the last 18 months. But I guess the time zones are at least lined up for somebody on the US East Coast or the US West Coast, the European one. So you were presumably still in California when you came into, I guess, a lecture that I was giving at ODSC East. I would think I was doing one on why linear algebra, calculus, and probability are key skills for data scientists to have. Is that right?

Sidney: 04:09

Yeah, that’s correct.

Jon: 04:12

I wasn’t sure. And then you came to a… So they do these… If it was an in-person conference, then I would have done a book signing. And there would have been an opportunity to meet people in person. And so, they did a virtual one. So was a Zoom room, and maybe 20 people showed up, and you really stood out to me as somebody that was really switched on, asking great questions, really thoughtful responses. And yeah, so I just wanted to connect. We had a brief call and instantly I was like, “Wow, I feel like I am recording a podcast episode right now because everything that Sidney’s saying is so interesting, and an audience would love to hear this.” So then we set it up, and here we are. That’s the back story.

Sidney: 04:59

Here we are.

Jon: 05:00

So, you are currently studying computer science at Make School. So tell us about that. What is Make School? And how did you choose to choose that particular option for getting into a software related career? And then maybe I’m asking way too many questions here. But if you can remember all of them the final one is, why study a computer science program if you want to be doing data science type stuff?

Sidney: 05:29

Definitely. Those are all really good questions. So I’ll start with the first one, what is Make School? Make School is a really new program. Actually, they’ve only been a bachelor’s degree program for I think, three years now. Basically, it’s an applied computer science college, meaning that it’s really project-based. So typically, if you were to go to Stanford and study computer science, you’re learning a lot of computer science theory, you might learn a little bit about Java or C. But for the most part, you’re not actively coding every day, you’re sitting in lectures.

Sidney: 06:02

Make School was founded by two MIT, I think dropouts actually who decided ‘we want a different way to teach because this doesn’t work for us’. So they founded Make School. And the premise of Make School is first to get people into tech who otherwise might be intimidated by, for example, the programs at Stanford, and also to get people into tech who really want to learn what they need to know to work. So that leads to why I picked Make School. I had gone to college before, and in the midst of a career shift. I picked Make School because I didn’t want to take three more years to finish a bachelor’s degree when I had already gone for three years. So the accelerated part was beneficial as well as the fact that it was actually I’m going to be hands on, I’m going to be doing the things that I need to know how to do.

Jon: 06:56

Nice. So, we haven’t mentioned that explicitly, but that’s one of the benefits of this program is that you get a bachelor’s degree, but it’s in two years, which is really accelerated.

Sidney: 07:07

Definitely, definitely, which actually, a lot of my peers went to college before. So most of us this isn’t our first shot at getting our bachelor’s degree. So for us it’s so we can finish faster.

Jon: 07:21

Have you met any peers in person or has it been entirely virtual?

Sidney: 07:24

The school experience has been entirely virtual. Although when I was living in the Bay, I did make a point to set up… My partner and I set up this group meetup thing so that those of us who did live in the Bay could go and have lunch or have a park day or go to the beach, so that we’re not totally sequestered.

Jon: 07:44

That sounds really smart. Does that mean that your partner is on the program too?

Sidney: 07:48

Yeah, that’s correct.

Jon: 07:50

Wow, that’s cool. But you didn’t meet on it? You decided to do together?

Sidney: 07:56

Yeah. So, he’s actually the reason that I chose Make School over other options because he had applied and got accepted. And we went to visit the campus in October of 2019. And that’s when I started to be like, “Oh, this is cool.”

Jon: 08:16

Wow. So they do have a campus, and presumably, you will actually get to experience. So you’re right now wrapping up the first year of the two year program, right?

Sidney: 08:26

Yeah, that’s correct.

Jon: 08:29

And you’re starting an internship, which we’ll talk about shortly, a data science internship, which sounds really cool. But I have some other segues planned before we get there. So just one more quick question about this is that I guess you will actually return to being in campus probably in the fall. It seems like in the US, at least, we’re lucky that vaccines have been taken up really quickly. And yeah, it seems like we’re not even going to be wearing masks. I mean, so at the time of recording in mid May, the Center for Disease Control, this federal body said that vaccinated people don’t need to wear masks anymore as of yesterday at the time of recording. So super interesting. Anyway, so I assume that you will actually be able to go back to a campus. That’ll be great. Anyway, I think I interrupted you with questions when you were trying to tell me more about the program. Okay, where did we get to? It’s two years, halfway through. It’s all been virtual, but there is a campus. You were saying the campus is beautiful, or?

Sidney: 09:38

It’s pretty cool. It’s a building in San Francisco. I don’t know that much about San Francisco. It’s somewhere near the civic center area. But basically, it’s a cool multi-storey older building, and they have all of these really unique classroom setups and things like that. So it’s not like… I remember going and visiting regular universities, you walk into a lecture hall and it looks like a lecture hall. Here, it’s like you walk into this hacker room of classmates sitting around or coding. And sometimes there’s a teacher with a whiteboard. But it’s really unique. They have this whole Grand Hall. There’s this ballroom that they just put a lot of work into this building. And it is set to be a hybrid in-person online experience in the fall. I haven’t actually decided if I want to move back to the Bay to go in person, but there will be that option.

Jon: 10:40

Very cool. And so, I think… Well, so one of my earlier questions about the Make School experience is not withstanding the unique situation of having a partner that had already been accepted to the program, what kinds of reasoning goes into choosing to do a curriculum like a two year computer science degree when I think you ultimately want to be in the data science field?

Sidney: 11:10

Definitely. Part of the answer to that question is honestly just when I first started, I just knew they had a data science track. And I didn’t know anything about data science.

Jon: 11:20

Right. So they have that. That’s cool.

Sidney: 11:23

Yeah, exactly. So they have a specific track for data science. The caveat with this track is that it’s still being built out. So the pro of that is, it’s been really cool. I’ve been able to be a big part of helping build that track out. I was working as a curriculum assistant for the majority of the year, helping one of our new instructors who actually came from NASA build out this track, figure out what students need to learn. The con is that a lot of my experience, so far, a lot of my knowledge is actually coming from self study to supplement on the side the fact that this data science program hasn’t been around for 10 years, so it’s not as robust as it could be.

Jon: 12:03

I guess the nice thing about that is that that’s what it’s like in the real world.

Sidney: 12:07

Right.

Jon: 12:08

The self study anyway, most of the time. Okay, cool. So you didn’t actually know that… So you didn’t necessarily have the outcome of being a data scientist in mind when you started?

Sidney: 12:23

Yeah, I did and I didn’t. I knew that I wanted to do AI. But did I know what that meant? No, not really.

Jon: 12:29

Right. I see. And then so… All right, so let’s talk about that. So what was the inspiration to be interested in AI?

Sidney: 12:40

Yeah, so this is the-

Jon: 12:42

What were you doing before? You already studied, you already went to college. And so, you’re making this career change into AI. How did that happen? How did that come about?

Sidney: 12:53

Definitely. So just a quick prerequisite to set up the story or the why. I was going to school originally to transfer to a pre-medical program. That was the goal. I wanted to go be a doctor. I was working at the time. I was actually living in West Michigan prior to moving back to California, and I was working as a phlebotomist. So I had worked at a blood center and was working at a Metro hospital. And in Michigan, it’s a lot less regulated you could say. You can do on the job training to be a phlebotomist. You don’t have to be licensed. I had gotten my NHA license, but when I moved back to California, I had to get my California licensing as well.

Jon: 13:38

I see. The NHA is some national qualification for phlebotomy?

Sidney: 13:40

Yeah, exactly. Yeah.

Jon: 13:43

What is phlebotomy?

Sidney: 13:45

Yeah, that’s a good question. Phlebotomy is effectively the people who draw blood and run your labs at the hospital. Or if you go to a blood center, the people who stick you, basically.

Jon: 13:56

Nice. All right. I see. It’s a very… Yeah, very essential career path.

Sidney: 14:04

Definitely.

Jon: 14:04

And yeah, plays a big important role in the hospital. So, the NHA was this national qualification. But moving to California, you had to get another qualification, though, like local, this state licensing.

Sidney: 14:18

Exactly. So I moved out here and I’m re-enrolling in another college to retake credits that didn’t transfer and to try to continue back on this path. In the midst of this, I’m spending all this money trying to get this new license. And part of that was effectively you have to do this thing called an externship, which is you go work for free for 40 hours somewhere to prove that you know what you’re doing and you’re not going to hurt anyone. So, I was doing this and it’s a very long drawn out process. It had taken me a year just to get to this point, just because I had to work at the same time. I had to go back to working at minimum wage like food service jobs and things like that, which was frustrating for me. And then I go work at this clinic in North Hollywood. And in the midst of this work experience, I fell in love with the staff at this clinic, and I really enjoyed being there. It was very similar to being in the neighborhoods that I grew up in, in San Diego. And it was people that I was used to being around.

Sidney: 15:24

But one of the things that dawned on me as I’m working at this clinic was I am feeling disillusioned with school, I’m feeling disillusioned with having to work and trying to pay for all this. And at the same time, I’m realizing I’m never going to see these patients, if I just go be a doctor and work at Kaiser, or Cedar Sinai, or any of the places that I dreamed of working, I’m never going to see these people. I finished up this semester, and effectively just dropped out of school and just said, “I don’t know what I’m doing. I’m going to move to LA and go do art or whatever. But I don’t know what to do next.” So in the midst of all of this, I start to become interested in this idea of AI for healthcare because of reading these papers that are talking about potentially trying to lower the cost of drugs by using AI to help with drug development or reduce the cost of radiology services through AI radiology services. And at that point I was like, “Okay, well, this seems like a great way to still make an impact, but possibly make a greater impact and to possibly make an impact on these accessibility gaps to try to actually help the people that I grew up with.”

Jon: 16:43

Totally. This episode is brought to you by SuperDataScience. Yes, our online membership platform for transitioning into data science, and the namesake of the podcast itself. In the SuperDataScience platform, we recently launched our new 99 day data scientists study plan, a cheat sheet with a week by week instructions to get you started as a data scientist in as few as 15 weeks. Each week, you complete tasks in four categories. The first is SuperDataScience courses to become familiar with the technical foundations of data science. The second is hands on projects to fill up your portfolio and showcase your knowledge in your job applications. The third is a career toolkit with actions to help you stand out in your job hunting. And the fourth is additional curated resources such as articles, books, and podcasts to expand your learning and stay up to date.

Jon: 17:39

To devise this curriculum, we sat down with some of the best data scientists as well as many of our most successful students, and came up with the ideal 99 day data scientist study plan to teach you everything you need to succeed. So you can skip the planning, and simply focus on learning. We believe the program can be completed in 99 days, and we challenge you to do it. Are you ready? Go to www.superdatascience.com/challenge, download the 99 day study plan, and use it with your SuperDataScience subscription to get started as a data scientist in under 100 days. And now let’s get back to this amazing episode.

Jon: 18:17

Old scales a lot better than in-person appointments with a physician, that’s for sure. So, that’s super cool. So, you get to understand, probably through like popular press articles that AI, generally speaking, is making a transformative effect in healthcare.

Sidney: 18:36

Yeah.

Jon: 18:37

That’s cool. And then you figure out, okay, computer science is a great foundation for artificial intelligence, and I totally agree because, yeah, I mean, in terms of getting a model into production then having it to do something in the real world, you need computer science skills to do that. And so, yeah, they’re equally important competence, building a model, being able to deploy it. And I dare say in the job market these days, I think that having both of those skills is hugely valuable. And that somebody like you who’s obtaining a computer science degree, getting a data science specialization, doing an internship in data science. I mean, think you’re ticking all the right boxes, and you’re going to have no trouble finding interesting work when you graduate.

Jon: 19:29

All right. We’re going to get into your internship in a moment. But you just finished up your term, you had a big project on graph convolutional networks. And I have a quote from you on a post that you wrote that says, graph theory is way cooler than you thought. And so, tell us what graph theory is and why it’s cool. And then tell me graph convolutional networks are. I genuinely don’t know. I know what convolution networks are. I know what graphs are. I think I know what graph theory is, but I can’t in my mind imagine how those two worlds blend together, convolutions and graph theory. So I’m looking forward to my mind being blown.

Sidney: 20:17

Definitely. So just the precursor, what is graph theory? Graph theory is a mathematical theory, and basically it describes a graph, which is a series of nodes and edges. And to give like a real world example of this, one example of a graph, for example, is a molecular diagram, right? A molecule can be represented in a graph. It has bonds, edges, and it has nodes or atoms. So another good example is a map, right? You have nodes, which are places, addresses, and you have edges, which are streets.

Sidney: 20:58

Graph theory is the theory that describes how to traverse this graph, how to get from one point to another, how to add nodes, how to perform calculations on this thing. And why this is important is because a lot of data, a lot of things are better understood when you understand network. So, you can look at an ecosystem and you can say, “Okay, I understand,” or to use a consistent example, if you look at a molecule, you can say, “Okay, I have two hydrogens and an oxygen.” And that’s great, to know that is great. And if you’re a chemist, you can then infer what the bonds are going to look like. You know that you’re looking at water. But if you’re not a chemist, if you’re a data scientist, and you only see this list, you might not have the whole picture of what this actually represents, why it matters, how it behaves. But when you can start to represent this thing as a graph, you can start to then also infer how these relationships affect each other. And you can actually make classifications on that on the whole graph thing.

Sidney: 22:11

So, that’s why graph theory is cool because as a data scientist we very often have this Euclidean data, this list structure. It’s very set two dimensional data. And most neural networks really understand that well. They are good at performing calculations on this data, and they get it, so to speak. When you get this graph that’s this arbitrary thing, it doesn’t take up a set amount of space, it doesn’t have a set amount of nodes, and you can’t measure distance. It’s non Euclidean. So your typical convolutional neural network can’t look at it the same way that it looks at pixels, which interestingly enough, an image is a graph as well. It’s just, it’s fixed.

Jon: 22:59

I’m just going to interrupt you for one sec to break down just a couple terms or repeat some things back to you to make sure that me and the audience are getting you completely. So, you gave some great examples of graphs like a molecule, a map. And so, the really big key terms here are nodes and edges. And so, nodes are kind of… I mean, I guess both a node and an edge are data, but you might often think about so a node, like you said, it could be an atom connected to other atoms. But it could even just be like your friends. So, it’s like me, and all of my friends, we could be a graph, and any of my friends… I know all my friends. So there’s an edge. I’m a node, you’re a node, there’s an edge connecting us. And then you have friends. And so, there’s edges connecting all those. If I knew some of those friends, then I’d have an edge connecting me to them. They’re all nodes. And you could actually have information on the edges, too.

Jon: 24:01

So, it could be a number that’s like the number of times we’ve met. And so there’s a five between us, but there’s a 10 between me and someone else in my graph of friends. I guess, but it could be any kind of data on the nodes or the edges. And so, anyway, so a really rich way of storing data. But like you’re saying, it’s not Euclidean, which just means a Euclidean space is something that we can like measure with a ruler. Just like if you throw a football, you can measure that with a ruler. It’s like everything that we do moving around in space is in a Euclidean world.

Jon: 24:44

Anyway, so computational neural networks. They’re typically a machine vision approach. They’re a way of recognizing spatial patterns in the Euclidean world, like you say it’s like pixels on a screen. Or even moving video to give that extra dimension. It’s still happening in this Euclidean world. Everything’s on this really fixed graph of a set number of pixels. But your graphs in graph theory, your nodes and your edges, they don’t conform, necessarily to that strict structure. And they can be, yeah, like you said, there could be any number of nodes, you could have any number of friends in the network, or whatever. And so, traditional convolutional neural networks break down. All of that I’m with you, the one piece I still genuinely know nothing about, and that’s the piece I need you to fill in on next is how a convolutional neural network can handle something that isn’t in a Euclidean space. How can it handle the graph? What does it do?

Sidney: 25:48

Definitely. So, to really get the detailed version of how this works, I did write an article about the mathematics behind this. But just to break it down for humans who might be listening and who don’t want to go read a bunch of math. What a graph convolutional network will do first is effectively take a representation of the graph that a computer can understand and do some things with that, so that we can embed the graph. So, let’s imagine that we take our molecule diagram, and we write it on a piece of paper. We kind of flatten it. They call it like embedding the graph. So we take an adjacency matrix, which is a way of representing a graph in a sparse matrix of ones and zeros that denote edges. And then you can perform some steps to then embed this into two dimensional vector space, that then can be classified just like an image of a cat or a dog.

Jon: 26:58

Cool. All right. Yeah. So, it’s that embedding step that’s key. And so, it allows you to transform what could be a structure that is incompatible with going into a convolutional network? You probably have to choose. You might even have to make some thoughtful choices about how you embed that graph, right?

Sidney: 27:19

Mm-hmm (affirmative). Yeah, exactly, exactly. And a good example of this is I just, for my project, embedded the proteins benchmark dataset. So it takes, I think, 1,300 graphs, which are all different molecular diagrams, effectively, and it’s able to then classify what protein is it out of two classes. So, it is basically able to take, I think there’s 40 some nodes in each graph. And it’s able to, you could just say, it stretches them out. It flattens it. It puts it into the vectors that our neural networks then are like our convolutional layer can then understand. And then it can do something meaningful with that data.

Jon: 28:07

Nice. So we take a 3D structure, this 3D molecular structure of this protein, and then you embed it into a two dimensional vector, which actually, yeah, that makes perfect sense because a protein is a single amino acid sequence. So, I guess that’s what it is. You flatten it out into that amino acid sequence, and then the convolutional layer. So we often think about two dimensional convolutions which are looking for patterns in pixels in an image. But equally, you can have a one dimensional convolution, for example, applied to natural language, which is just a string of characters or a string of words. And you can look for, say, a pattern of words that’s associated with a positive movie review as opposed to a negative movie review.

Jon: 28:56

And so, I guess that’s similar to what’s happening here. It’s looking for spatial one dimensional patterns in this protein sequence. And so, that’s like the input into your neural network. And then you said that there’s an output that you’re predicting, it was a class problem. So you have two… I think you mentioned 1,400 proteins, and those were split into two different classes of proteins. Was that it?

Sidney: 29:19

Yeah. And to be honest, I don’t remember which ones you’re classifying. If you download the data set it’s all just represented as numbers. I actually reached out to the author to try to get the English labels, but didn’t get a response.

Jon: 29:34

Oh, no. So, it’s a class zero and class one.

Sidney: 29:40

Yeah, pretty much.

Jon: 29:41

That you’re predicting.

Sidney: 29:42

I’m like, “Okay, it’s great, but how do I tell people what I did?”

Jon: 29:47

Right. Damn. All right. Still sounds like a successful project. That’s really cool. And it’s great to hear that one year into a program like Make academies you’re doing really significant modeling like neural networks, embedding, three dimensional structures into vectors. That’s amazing. I mean, what was your experience with programming, and data science related things prior to starting your program? Was it much at all?

Sidney: 30:19

No, not at all. I pretty much didn’t have a computer three months before I started Make School. I knew nothing about programming and a lot of it is self-study, but, I mean, they do a pretty good job of pushing you forward as long as you’re willing to meet them halfway and put in the work.

Jon: 30:46

Yeah, that’s a given for sure. And it sounds like you’re definitely meeting them at least halfway. That sounds amazing. I mean, given that you’re helping create their data science curriculum, they probably don’t just go handing those out to everyone. You’re probably meeting them more than halfway and demonstrating some real capability and determination. That is super cool. So, all right. Well, that adds a really interesting twist on my next question, which is, so in the Make Academy, in Make School, what kinds of tools do you use? Is there a development environment that you’re typically using? What programming languages are you typically working in? When you’re doing data science work? Like you just were talking about your graph convolutional networks what software packages were you using?

Sidney: 31:37

Yeah, so specifically for my graph convolutional networks term project which was actually an independent study course. So, I got to pick this topic and all that. I used a library called Spectral and PyTorch Geometric for two different implementations of graph convolutional networks. And typically in coursework, in terms of IDEs, we’re using either Colab or Google Colab, sorry, or Jupyter Notebook or Jupyter Lab. The benefit of Colab is the fact that you can very easily copy over Notebook and you have free GPU run times. So, we’ve been actually switching over to using more Colab, although there are limitations with that as well, of course, like RAM, for example is something I ran into.

Jon: 32:34

Yeah, so I teach entirely with Colab. For several years now, I absolutely love being able to teach a class with Colab. So, prior to Google Colab existing. So, to access Google Colab, all you need is a Google login, which is free. So like a Gmail login. You just create a login, you can go into Colab and a Jupyter Notebook comes up. So probably a lot of listeners are familiar with Jupyter Notebooks, but they allow you to execute code and view the output of that code in this nice, neat notebook. And that includes any images that you output, any tables, all of it, you just see they’re presented in this one big notebook. And so, it’s an easy way to learn how to program. And it’s an easy way to share notebooks of code with other people, particularly if you’re thoughtful about how you create them.

Jon: 33:28

And so, my book that I wrote, any teaching that I do, all of the code examples happen in Jupyter Notebooks. And when I’m teaching live, I do it in a Google Colab session because I know that anybody anywhere in the world can just log in for free, get access to this relatively powerful cloud compute instance that has already all of the software dependencies needed and that those software dependencies are going to be the same as mine. Now, yeah, so you mentioned RAM. So potentially, if you’re doing a really big data science project, and you’re going to need a lot of RAM, there could be constraints on these boxes. So it’s going to be… It’s almost certain to be more RAM and compute than you have in your laptop in Google Colab, so that’s great. But still it’s not like the Google Cloud Platform or Amazon Web Services where you could just tick a box and get access to an even bigger instance. So that’s one limitation is if the problem that you’re tackling is a bigger problem than the machine can handle.

Jon: 34:29

Another problem is that I mentioned how great it is that the software dependencies are there already for you. But the downside is, you can’t out of the box control what specific version is installed. So, some day, it hasn’t happened to me yet, but someday I’m going to be teaching a class and a line of my code is not going to work because a software version has changed. And then I’m going to be scrambling in class awkwardly to try to figure out how to fix it in Stack Overflow or something. And then there’s one other problem. Oh, yeah, this is the problem that probably trips you up the most.

Jon: 35:09

In order to prevent abuse of this free cloud instance, you can’t just have people running them all the time, mining Bitcoin. So, there’s if you’re inactive in the session for half an hour to an hour or something like that the session ends. And so, anything that you had in memory is lost. Now, that’s fine if it’s relatively not computationally intensive to just rerun all the code cells again that you had running. And you did everything in a thoughtful order, so that made sense if you just re-execute all the code. Anyway, Colab is awesome. I use it all the time. There’s some limitations with Listener. I’m sorry, you’re the guest. I shouldn’t be talking so much Sidney. But I just let people to know. So, that’s cool that you guys are using Colab. And then for your specific project, you mentioned for working with graph convolutional networks, there was Spectral and PyTorch something, PyTorch what?

Sidney: 36:05

Yeah, Pytorch Geometric, which I knew nothing about until this term, but it [crosstalk 00:36:11]-

Jon: 36:13

So, I guess, your… So, it sounds like you learn how to code in Python. That makes sense, primarily,

Sidney: 36:19

Yeah, yep. At Make School, there’s different web development tracks as well. So, depending on what track you’re in, you might also learn like JavaScript to Go. But for data science, we’re pretty much doing everything in Python.

Jon: 36:34

That makes a lot of sense to me. It’s the most widely useful language in data science today, particularly if you’re interested in production deployments. So, that makes a huge amount of sense to me. All right, so that’s great. So you finished your first year at Make School, and now you’re starting an internship. So, obviously, at the time of recording, you haven’t started it yet. I know that’s coming up soon. But it’s great that that’s happened. It’s a company called GreenLight Biosciences. What do they do?

Sidney: 37:11

Definitely. So, GreenLight Biosciences is a bio sciences company. Basically, they are working on RNA technologies. So, they’re doing a lot of research into applications similar to, for example, like the COVID vaccine that just rolled out is, of course, the most public thing that probably everyone will have a reference to. So, they are working on food production and food sustainability problems, as well as, life sciences and healthcare problems as well.

Jon: 37:47

Yeah, that sounds like a great internship. Congrats. And, yeah, I noticed a while ago, you’ve written a blog post about people from non traditional backgrounds coming into data science computing. Who, yeah, I guess, might not have had the same kind of mentorship or obvious examples to go into that kind of space. So, you pretty recently started learning programming from scratch, and we’re like, “Wow, AI could make a big difference in healthcare,” and you just jumped into it. So how can listeners who similarly are in a similar situation to you, how can they also get so deeply involved in computer science and data science like you have?

Sidney: 38:37

Definitely. One of the key things for me was, first of all, just discipline. I definitely needed to get started to be taken seriously, and also just to feel like I should be here, I needed to put in a lot of work and it needed to be very, very consistent work. Aside from that, I think one of the biggest pieces of advice that I could possibly just give is grow your network. Reach out to people, attend conferences. Just really integrate yourself into the community. Whether that’s open source community sharing data on Kaggle or whether that’s communities at conferences, communities within your school, people on Stack Overflow. I’ve found PyLadies meetups on Meetup.com or whatever.

Sidney: 39:34

Just growing my network has been so integral to having certain opportunities as well as just most of the time your family, most people don’t… The friends that I had in high school don’t want to hear me talk about computer science. It might feel very uppity or nerdy or just something that they can’t relate to, and therefore that might put them off a little bit or make them feel stupid or they just don’t get it. They don’t want to hear about math, it’s not interesting to them. So growing a network has just helped me stay motivated and given me opportunities because I have people that I can share, talk about graph convolutional networks with because most people just don’t care.

Jon: 40:25

That makes perfect sense. And one of the nice things about this virtual world we live in is that no matter where you are you can be involved in these communities. So, you and I have never met in real life. But through shared interests, you meet at a conference, and yeah, you grow your network. And so, if you imagine a graph, and your main piece of advice is that if you’re getting started in your data science career, and right now your graph is just a node with no edges, you’re going to need to grow some edges, and start adding a bunch of nodes to your professional data science network. And yeah, and I totally agree.

Jon: 41:11

I think it makes it way easier to stay motivated, to stay on track when there’s other people to share interesting things that you’ve discovered with. That’s actually, that’s been something big for me through the pandemic that I used to be sitting in a room of people who thought very similar to me, and were interested in the same kinds of things. And I’m super lucky that I get to do a podcast once a week and have that, and I still have conversations with my colleagues, obviously, but it isn’t all the time during the day, and I can’t wait actually, to get back into an office and have that very intimate, professional network. Yeah, awesome advice. I really appreciate that. All right, so you’re basically a year into your AI career? What do you want to look back on when you retire from your AI career decades from now?

Sidney: 42:08

Definitely. My main goal is I want to make some kind of a tangible impact. I know maybe that’s a little cliché. Everybody thinks they want to change the world and all of that. But I at least want to have like one thing where I can just be like, “Oh, hey, look, this makes a meaningful difference in people’s lives. There’s a measurable difference in their quality of life or access to this thing, and I did that, or I was an integral part of the team that did that.” Definitely, I don’t want to go through my career just passively working on ads for Google. It doesn’t interest me. I want to look back and actually say, “I helped actually make some kind of a positive difference in someone’s life.”

Jon: 42:58

I totally hear that. And I think that is a great motivation for your career. Luckily, data science, computer science, AI, all the above is an absolutely amazing place to be able to make impacts like that. In fact, I can refer the listener back to probably countless episodes of the SuperDataScience podcast recently. But one really good example is episode 447 with Michael Segala, who is involved in translating a lot of ideas and a lot of industries and we go over tons of them. But we talk about the medical industry, in particular, quite a bit in that episode. And there is an enormous amount of opportunity in the world where some data are being collected, but we’re not doing anything with it, or data aren’t being collected yet at all.

Jon: 43:52

And so, there’s this even greener pasture opportunity. So, over the coming decades over your career, Sid, there’s going to be so much change, so much more automation than ever before, so much opportunity for people like you to make a massive difference. And so, yeah, I think you’ve picked a great career, and to all of our listeners who are, wherever you are in your data science career, I hope that you’re really excited about the impact that you’re getting to make in the world, too. All right, so starting to wrap up here, Sid. In every episode, I ask our guests if they have a book recommendation for us. Do you have one?

Sidney: 44:34

I think always anytime someone asks me about a good book it typically comes from something that I’ve read recently. Recently one of the favorite books, one of… Wow, English. One of the best books that I have read recently was actually Thus Spoke Zarathustra by Friedrich Nietzsche. I’m reading some of his other writing but it’s older and not as well written. Whether or not you like his philosophy, he’s just a great writer. So, it’s a really interesting book.

Jon: 45:07

You’re into the new Nietzsche, not the old Nietzsche.

Sidney: 45:10

Yeah, yeah.

Jon: 45:12

Only the freshest Nietzsche.

Sidney: 45:14

Of course.

Jon: 45:15

So, I don’t really know much about that book. I also know that it’s like an opera or it’s a famous piece of music, which I am familiar with. I can’t remember who it’s by, Strauss or something, but what’s the book about?

Sidney: 45:33

So, Nietzsche had this misrepresented philosophy. He actually never got to really finish and his sister was let’s just say possibly on the wrong side of history during World War II, and she controlled a lot of his writings. So, she proposed this philosophy sense as a leg up for their view of the world. But Nietzsche actually had this kind of philosophy of something called the Übermensch or basically the next evolution of man. And Thus Spoke Zarathustra, it’s kind of a piece of prose in which this character, Zarathustra, comes down from the mountain and decries God is dead. This famous sentence in literature now. And then through a series of poetic stories begins to shaped this idea of how humanity can become better or evolve in this post-religious world that Nietzsche saw himself in, in the late 1800s.

Jon: 46:45

Cool, sounds fascinating. So it is prose. It’s a story that conveys this philosophy. And yeah, sounds super interesting, as I’m sure a lot of his writing is. All right, so Sidney, this was a fascinating episode. We got to hear a lot about your early career, which I’m sure is inspiring to tons of listeners, options for getting started, ways to stay motivated, ways to stay engaged with the data science community. And so, I’m sure lots of people will be interested in following you. I guess the best way is probably to find you on LinkedIn.

Sidney: 47:22

Yeah, I am Sidney Arcidiacono on LinkedIn. I’m sure we can share a link or something so no one has to remember how to spell my last name.

Jon: 47:30

Absolutely.

Sidney: 47:31

Perfect.

Jon: 47:32

It will be in the show notes. That’ll be easier for me than trying to even read out the spelling of your last name.

Sidney: 47:35

Perfect.

Jon: 47:38

All right, thank you so much for being on the show, Sidney, and maybe we can have you on in the future and fill us in on how your career is evolving. I’m sure that the guests would be into it because you’re an outstanding and engaging explainer of concepts. Thank you.

Sidney: 47:52

Thank you so much, Jon.

Jon: 47:59

What a cool and inspiring guest Sidney was. In this episode, we learned about why a full-time computer science degree may be the ideal starting point for a data science career, particularly if you don’t have any prior programming experience. We talked about traits and strategies to be a wildly successful early career data scientists like Sidney is including discipline, consistency and methodically growing your network in the industry. And we talked about why graph theory is cooler than you might have thought, and fascinating applications of graph convolutional networks for training models with graph data, including the Spectral and PyTorch Geometric Python libraries.

Jon: 48:41

As always, you can get all of the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, and the URL for Sidney’s LinkedIn profile, as well as my own social media profiles at www.superdatascience.comm/477. That’s www.superdatascience.com/477. If you enjoyed this episode, I’d of course greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel, where we have a friendly video version of this episode. To let me know your thoughts on the episode, please do feel welcome to add me on LinkedIn or Twitter and then tag me in a post to let me know your thoughts on this episode. Your feedback is invaluable for figuring out what topics we should cover next.

Jon: 49:23

Since this is a free podcast, if you’re looking for a free way to help me out, I’d be very grateful if you left a rating of my book, Deep Learning Illustrated on Amazon or on Goodreads. You could give some videos on my YouTube channel a thumbs up, or you could subscribe to my free content rich newsletter on jonkrohn.com. To support the SuperDataScience company that kindly funds the management, editing, and production of this podcast without any annoying third party ads, you could create a free login to their learning platform at www.superdatascience.com or consider buying a usually pretty darn cheap Udemy course published by SuperDataScience, such as my mathematical foundations of machine learning course. All right, thanks to Ivana, Jaime, Mario, and JP on the SuperDataScience team for managing and producing another amazing episode today. Keep on rocking it out there, folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 477: How to Thrive as an Early-Career Data Scientist

Podcast Transcript

Share on

Related Podcasts

May 15, 2026

May 12, 2026

May 8, 2026

Podcasts SDS 477: How to Thrive as an Early-Career Data Scientist

Share

SDS 477: How to Thrive as an Early-Career Data Scientist

Podcast Transcript

Share on

Related Podcasts

May 15, 2026

SDS 992: Tokenmaxxing vs AI Hardware Bottlenecks

May 12, 2026

SDS 991: Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz

May 8, 2026

SDS 990: Inside Mythos: Anthropic’s Locked-Down Frontier Model