SDS 829: Neuroscience Fueled by ML, with Prof. Bradley Voytek

Podcast Guest: Bradley Voytek

October 22, 2024

Neuroscientist Bradley Voytek outlines to Jon Krohn the incredible use of data science and machine learning in his research and how recent discoveries in action potentials and neurons have completely skyrocketed the field to a new understanding of the brain and its functions. You’ll also hear what Bradley thinks is most important when hiring data scientists and his contributions to Uber’s algorithm when it was still a startup. 

Thanks to our Sponsors:
Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.
About Bradley Voytek
Bradley Voytek is an associate professor in the Department of Cognitive Science, the Neurosciences Graduate Program, and the Halicioglu Data Science Institute at UC San Diego. He was the first Data Scientist at Uber. 
Overview
Neuroscientist Bradley Voytek outlines to Jon Krohn the incredible use of data science and machine learning in his research and how recent discoveries in action potentials and neurons have completely skyrocketed the field to a new understanding of the brain and its functions.
Bradley and Jon first discussed the breakthroughs at the Halicioglu Data Science Institute at UC San Diego. At the lab, which focuses on how brain regions communicate and how they might alter their patterns of information sharing through neurodegenerative disease, Bradley and his team discovered the electrical signaling units in the brain may not be “binary” as previously thought. Bradley challenges the perception of “all-or-nothing binary signals” [06:23] with new technologies that can acquire more samples and thus analyze the brain with 200 or more data points than was once possible. He says that the Institute’s next step is to learn what the brain’s set of activation functions is and how artificial neural network architectures might apply these new discoveries.
These discoveries, Bradley notes, are largely thanks to the free sharing of data across organizations. He mentions the data they used came from the Allen Brain Institute in Seattle, which releases data for public use. Bradley emphasizes that open datasets are essential to the continued push for increased accuracy in testing and analysis.
It is always exciting to hear from people who worked at a major tech player when it was still a startup, as is the case for today’s guest. After his PhD, Bradley worked for some time at Uber, where he focused on optimizing the company’s early ride-prediction algorithms. He outlines how important it is for data scientists to look beyond the gathered data and consider data from additional variables that could impact taxi demand. Everything from upcoming events in the city, weather patterns, and city-specific demand were included in Uber’s algorithm and ultimately made it the multinational transportation powerhouse it is today.
As a data science educator, Bradley also had a lot to share about curriculum structures. He believes that a good undergraduate and graduate education in data science means giving students hands-on experience and projects that they can include in their work portfolios. He also says that technical skills are the minimum requirement for jobs and that stand-out data scientists always bring creative problem-solving skills to the table and simply “think differently” when handed a dataset [1:25:09].
Listen to the episode to hear more about the libraries and software used at the Halicioglu Data Science Institute, how to navigate the multidisciplinary nature of data science, why Bradley is happy when others prove him wrong, and more about Jon Krohn’s PhD thesis!
In this episode you will learn:
  • Breakthroughs in brain region communication [04:08]
  • The future of brain research and MedTech [35:24]
  • The libraries and software used at the Halicioglu Data Science Institute [45:11]
  • Brain rhythm as a diagnostic tool [1:02:58]
  • Bradley’s curriculum structure at UC San Diego [1:12:21]
  • How Uber applies data science [1:20:07] 
Items mentioned in this podcast:
Follow Bradley:
Follow Jon:

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 829 with Dr. Bradley Voytek neuroscience professor at UC San Diego. Today’s episode is brought to you by epic LinkedIn Learning instructor Keith McCormick, and by Gurobi, the Decision Intelligence Leader.
00:00:19
Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple. 
00:00:50
Welcome back to the Super Data Science Podcast. Today we’ve got the absolutely extraordinary Professor Bradley Voytek on the show. Brad is a professor in UC San Diego’s, department of Cognitive Science, as well as the Data Science Institute, where he was a founding member and he’s also a professor in their Neurosciences graduate program, which is a big deal because UC San Diego may be the top neuroscience program in the world.
00:01:14
In addition to that academic background, another interesting fact is that Brad joined Uber as their first data scientist when it was just a ten-person startup and he helped build their data science strategy and team. His outreach work to the public has appeared in publications like Scientific American, NPR and Comic Con. It seems he is into amusing things because he also co-authored the book Do Zombies Dream of Undead Sheep? That’s a fun one. Today’s episode has some brief exchanges that will appeal most to hands-on practitioners, but overall this episode should be fascinating to anyone. 
00:01:54
In today’s episode, Brad details how large-scale data science and machine learning are accelerating neuroscience research. He talks about specific discoveries his lab have recently made that overturn nearly a century of neuroscience doctrine. He provides insights on structuring data science education to balance technical skills with creative practical problem solving, and he provides lessons from using data science to optimize Uber’s early ride-prediction algorithms. All right, you ready for this gripping episode? Let’s go. 
00:02:30
Brad, welcome to the Super Data Science Podcast. It’s awesome to have you here. How are you doing today?
Bradley Voytek: 00:02:34
Good, thanks Jon. It’s good to be back. 
Jon Krohn: 00:02:37
Nice. Yeah, you were on years ago with Kirill Eremenko, the founder of the show who still runs the show around here behind the scenes in case people are wondering and we’re super grateful for that. He’s actually, he’s been on a lot of episodes this year, Brad that we’re pretty good. I think the most popular episode of all time was one earlier this year where Kirill was the guest talking about Transformers. He went into a crazy amount of detail for a podcast, an audio only podcast. 
Bradley Voytek: 00:03:05
Like technical detail for Transformers? 
Jon Krohn: 00:03:06
Technical detail for transformer architectures. Yeah. 
Bradley Voytek: 00:03:09
Isn’t that funny how much engagement that deep technical communication can really get? There’s quite a number of YouTubers who are very good at describing this kind of stuff too, and they’re very popular. I don’t know. Very exciting to see that kind of stuff. Growing up as a kid, all we had was like Mr. Wizard and Bill Nye The Science Guy, which was pretty fun, but not going into deep technical details. 
Jon Krohn: 00:03:31
Yeah, and I think part of what might’ve driven so many views on that is that because it was so technical, we did have people write and say, “I listened to it three times.” Or one person said, “I listened to it six times.” So maybe that’s the trick to getting downloads. 
Bradley Voytek: 00:03:45
Yeah. You just have to make people listen over and over again. That’s a good idea. 
Jon Krohn: 00:03:47
Exactly. You’ve got to make it really cryptic. 
Bradley Voytek: 00:03:49
We’ll go deep into the details of the Fourier transform in this episode, really get people into a time series analysis. Nice. 
Jon Krohn: 00:03:57
So we’re going to talk about first the overlap between the neuroscience research that you specialize in and all the data science approaches that you’ve brought to that. So you have a PhD in neuroscience and you’ve been a professor of neuroscience now for over 10 years. Your lab focuses on how brain regions communicate and how these communications change due to development, aging or disease. Do you have any particularly surprising or maybe even counterintuitive findings from your research that you could relay to us in this audio only format? 
Bradley Voytek: 00:04:31
Yeah, we have some pretty fun new stuff that we are actually presenting in five days, which just happens to be at a conference in Chicago, The Society for Neuroscience Conference. It’s 33,000 people attend this conference every year, it’s massive. All over the world. Everybody comes in, all of these neuroscientists trying to solve the brain. And one of the PhD students working in my lab, Blanca Martin-Burgos, is presenting some of our new research that we’re hoping to send out for peer review before the end of the year on action potentials. So for those of you not familiar, but probably at least passingly familiar with the idea that we’ve got neurons in the brain, the brain cells, and they communicate using electrical signaling. And there’s about 86 billion or so of these neurons that are in this just crazy, messy, noisy biochemical electrical soup. And somehow that all gives rise to everything that we think and feel and do.
00:05:32
And the architecture or presumed architecture of the brain has given rise to artificial neural networks, they’re inspired by the brain’s neural networks. That’s why they’re called artificial neural networks. And that’s given rise to the modern deep learning AI revolution. And the presumption in the field is that these electrical signaling units in the brain are binary all or nothing spikes codified into what’s referred to as the all or nothing law. These neurons communicate using these binary signals and the computational neuroscience community picks that up and says, “Well, we can convert these into binary codes and this must be how the brain operates.” And what Blanca is showing in our research is that that’s probably not true. They’re probably not all or nothing binary signals. 
Jon Krohn: 00:06:27
No kidding. That is wild. 
Bradley Voytek: 00:06:30
If you zoom in enough, they look binary because we sample them at 20,000 samples per second in order to record them and 20, 000 samples per second. These signals last about one millisecond. If you just do the quick arithmetic, that means that there’s 20 data points approximately more or less that give rise to the sort of spike shape that we see. And people then just say, “Well, because they’re all or nothing, we’ll just keep track of the time stamp of when it occurred and save a lot of hard drive space so we don’t have to record 20 to 40,000 samples per second.” Newer technologies though, allow us to over sample in the temporal domain, so we could sample at 200, 000 samples per second. 
00:07:09
And when you do that, these one millisecond long action potentials now have 200 or more data points. And when you do that, you actually see that there’s variability that is not random, but seems to be at least according to our work, systematic. So action potentials exponentially rise very quickly in time. I mean, we’re talking about fractions of a millisecond and then exponentially decay, and that’s dominated by known biophysical properties, ion channels open, ion channels close, and those cause the flow of currents that give rise to the shape of these action potentials. But if an action potential happens decay faster, then it’s like reset more and it’s what’s called the refractory period, allows it to then signal again faster. 
00:07:53
And so each individual action potential’s waveform seems to be not binary, but governed by the type of inputs that it’s receiving. And that resets the time that the next action potential can occur. So we can predict when a next action potential is going to occur based upon the shape of the current spike. And so if that is true, then we have to rethink about what are some of the computational and neural codes that the brain uses. These aren’t binary codes, but they’re actually more analog codes. And to us, the most exciting thing is spiking in neural networks that are meant to mimic and be more biologically plausible to the brain, don’t actually perform anywhere nearly as well as more modern deep learning architectures that rely on neurons that have non-binary activation functions. So like ReLU and GeLU and all that stuff. 
Jon Krohn: 00:08:42
Exactly. 
Bradley Voytek: 00:08:43
But it appears the brain doesn’t have binary activation functions either. The neurons in the brain actually appear to also be non-binary. And so then the next step that we’re trying to figure out is can we learn what the actual different set of activation functions that the brain is using and build those into artificial neural network architectures to be more biologically inspired. So there’s a zoo of activation functions that artificial neural networks use, and they’re mostly ad-hoc people throw things together and see what works. I think there’s a review paper that came out recently on Archive and it’s like 200 to 400 activation functions that are being used. And it’s just, we use what works, but if we can actually instead figure out what the set of biologically inspired activation functions that neurons use looks like, can we then build those into artificial neural network architectures and does that improve how they learn? Does it improve energy efficiency and so on? So anyway, that’s the big new exciting area of work that’s going on in the lab and yeah, hopefully look out for the next couple months in peer review. 
Jon Krohn: 00:09:52
That is exciting. If I try to recap back to you some of what you just said. So I have now been teaching for many years and I also did a neuroscience PhD, and so to me, I took that law as law that this was an all or nothing action potential and that they’re all the same. And I guess it isn’t surprising that when you zoom in more closely, you can now with this high temporal resolution, you can in fact see that there is some variance in these action potentials. And the whole time that you were describing that, that made me think of how in modern neural networks, unlike the original designs from the 1950s, like the perceptron that only had a binary output. 
00:10:34
How we figured out over time, and when I say we standing on the shoulders of giants, people before I was born figured out that you could have yeah, a Sigmoid activation function, which is continuous or tanh or now, as you said, ReLU is very popular these days and derivatives of that. And yeah, all of those have a range of values. You can move continuously across them. Some are unbounded even. And so yeah, I guess it would make sense that if we see that in neural networks, they learn so much better when they’re unbounded like that. I guess it shouldn’t be too surprising that neural networks actually do have that kind of variance, that kind of subtlety. Yeah, it makes a lot of sense. 
Bradley Voytek: 00:11:21
Yeah, there’s a lot of opportunity here. And so I come from not a single neuron recording background. Those are the physiologists that do that. And actually if you go into the research that people who do these single neuron recordings, not computational neuroscientists, but the physiologists that are really trying to understand the physiology of action potentials. Apparently it’s very well known that action potentials aren’t actually binary amongst that community, but for whatever reason, the community in circles that I tend to live in, which are the computational theoretical neuroscience side, it’s like the spherical cow. We don’t want to worry about all the details. We just say, “It’s close enough to binary that let’s use that.” And from that, we drive these different kinds of neural codes, rate codes and temporal codes and population codes, but all of those are reliant on just the timing of when spikes occurred and how many per unit time. 
00:12:19
But the reality according to the physiologists is not that simple unsurprisingly, biology is never that simple, I guess. And it’s mostly been presumed though that the variability that we see in the shape of these action potentials is more or less just random noise. It’s just biological noise. But what we’re finding is that it is not, it is systematic and varies systematically as a function of the style of the inputs that the neuron is receiving over the nature of the inputs. And so yeah, there’s a lot of opportunity to start playing around with new ideas of like, “Well, what kinds of other neural codes are possible if we don’t assume this binary all-or-Nothing Paradigm.” And I don’t know, I don’t have an answer yet. I mean, this is all very new for the lab, but it’s been very exciting for us because it really has been a collaboration between many different fields. 
00:13:12
And so we wouldn’t been able to do this if different organizations didn’t share their data freely. So the data that we have that’s this over sampled at 200,000 samples per second, we can do it now through our collaborators here at UC San Diego, but the data originally came from the Allen Brain Institute in Seattle, which is just a nonprofit scientific organization that just releases some of their data for free for the public. And so we had this idea that maybe they’re not binary, how could we test it? We used some data sets from our collaborators and the sampling rate was too low, so we just went hunting for different data. The Allen Brain Institute had made their data available. We took a look at that and now we have this whole new path in our lab. And so it’s really just this amazing opportunity of leveraging open data sets that other people have freely shared that allow us to try and play with new ideas. 
00:14:05
And so my lab in the neuroscience is very heavily reliant on data sharing practices. We think of ourselves more like a theoretical neuroscience lab, akin to theoretical physics. We come up with ideas on the whiteboard and in code and play around with them, and then we hunt for data that could verify or disprove our theories and hypotheses, but we don’t do a ton of data collection ourselves. We do a little bit, but we’re not really set up for it. We’re not a wet lab. People come and ask to visit my lab to see what’s going on and think it’s going to be really cool, and it’s like 10 of us nerds sitting on our computers and laptops in one room. It’s not an amazing looking sciencey seeming space of test tubes and beakers, and it’s just nerds with laptops.
Jon Krohn: 00:14:52
Keith McCormick, the data scientist and prolific LinkedIn learning author is giving away a course called “Executive Guide to AutoML” and he’s giving it away exclusively to Super Data Science Podcast listeners. Nearly every ML platform has some support for AutoML, but there is both confusion and debate about which aspects of the ML pipeline can be automated. With this course, you’ll learn how to automate as much as possible and how to explain to management what can’t be automated! You may know Keith from episodes 628 or 655. Be looking for his return on an upcoming Friday episode. You can access his “Executive Guide to AutoML” course by following the hashtag #SDSKeith on LinkedIn. Keith will share a link today, on this episode’s release, to allow you to watch the full new course for free.
00:15:41
That makes a lot of sense to me. I mean, that’s how I ended up in data science was I was doing a neuroscience PhD and my colleagues, other people in the same cohort as me in the same program, they were learning skills like how to put a recording electrode in the brain of a ferret or how to grow a cell tissue culture. And I was like, “Hmm, if I don’t want to say in academia after this PhD, those are not super transferable skills.” 
Bradley Voytek: 00:16:06
Yeah, exactly. 
Jon Krohn: 00:16:07
And so I thought, “Well, if I focus on computational statistics and machine learning, I guess I could call it programming, but it’s really just scripting.”
Bradley Voytek: 00:16:17
Yeah, yeah. I guess that’s true. Well, I should defend ourselves a little bit. We do actually do programming, we write open source Python packages. We actually do that, I guess, but a lot of it really is scripting to be technically accurate. 
Jon Krohn: 00:16:30
I was only speaking for myself. I know that we have listeners out there that really do write computer programs. I’m not one of them. 
Bradley Voytek: 00:16:37
What was your PhD actually research in? 
Jon Krohn: 00:16:40
Oh, thank you for asking. It was on the genetic correlates of fear related behaviors. So we had a data set from mice, they’re called heterogeneous stock mice. So we started with eight purebred strains of mice. So listeners probably wouldn’t know. You obviously know that lab rats or lab mice, probably most lab animals, they are inbred so much so that they’re very similar genetically so that you don’t have genetic variants that is causing variants in whatever you’re measuring. 
Bradley Voytek: 00:17:17
Very well controlled genetically. Yeah. 
Jon Krohn: 00:17:19
Exactly. But so for my PhD, the heterogeneous stock mice weren’t created for my PhD. They were created in my lab. So I was working with Jonathan Flint, who’s now at UCLA and so Jonathan had led the development of these heterogeneous stock mice, and so you took eight of these purebred strains, and actually something like six of them were actually lab mice, and then they found two wild mice. I think they had added them into the mix for extra literally from a field. 
Bradley Voytek: 00:17:57
So just to try and get some more variability just to check? 
Jon Krohn: 00:18:01
Yeah, exactly. 
Bradley Voytek: 00:18:01
Okay. All right. Interesting. 
Jon Krohn: 00:18:02
And then interbred them for several generations and ended up with this cohort of 2,500 mice with genetic variants, and then they could be subjected to any number of tests. So for each one we collected what are called phenotypes, which obviously you know that Brad, but for our audience, phenotypes are your traits that can be measured outwardly about an animal. So for a human, how tall you are, how heavy you are, what your calcium levels are, but also you can do things with mice that you can’t do with humans, like get the gene expression in particular tissues of them. So we had gene expression readings from the brain, from the lung and from the liver. And so a lot of what my thesis was about then was taking those data and developing and applying computational statistical or machine learning techniques to analyzing it in new ways. For example, to find causal patterns. So you could say- 
Bradley Voytek: 00:19:12
Causality is hard. Causality is really hard, 
Jon Krohn: 00:19:15
But genetics makes it easier because genes, as far as we know, there’s no mechanism that genes could be systematically changed by some trait of the animal. And so you can use the genetic information as an instrumental variable to use the technical language. And so you can consider that in the same way that an experimenter can be instrumental in changing, even if you think about for our listeners, something that would be common would be like AB testing where you have two different designs of your product A and B, and you as the experimenter can decide either by randomly or by some other process to put some of your users in group A and some in group B. And so with these instrumental variables like genetics, you consider them to be like that to be, you assume that they cannot be causally impacted by any other factors in your experiment or in your data. 
00:20:15
Yeah. So then you could say, “Okay, if I know that cholesterol levels in these mice are correlated with the expression of some gene in the liver.” It is easy to measure that. Okay, so we have the correlation between these two things. They’re highly correlated, but you can’t infer causal direction just from that correlation alone, except possibly in the presence of genes because you could say, “Well, if I then condition that correlation between cholesterol and the expression of that gene on whether a particular gene is one variant or the other.” Then you can get a sense of whether the correlation goes one way or the other. So the high gene expression causes the cholesterol or the other way around.
Bradley Voytek: 00:21:03
Well, and you can plausibly then if you have a good enough target, you could do gene knock in, knock out experiments then too, if you actually specifically target knocking that gene into the animal you should then be able to come up with a strain of mice that have higher cholesterol levels or vice versa and knock it out and hopefully you improve the overall cholesterol. Yeah, that’s great. That’s pretty fascinating. I mean, that’s the really difficult, some molecular biology style work, genetics work that I didn’t have any training in. I guess it’s one of the things that makes biology and neuroscience in particular so fun but also so difficult is that nobody can be trained in everything. And not only can nobody be trained in everything, but we often don’t even know how to speak each other’s languages. But if I go to a neuro genetics talk next week at the Society for Neuroscience Conference, I guarantee I will not understand 75% of the talk. And that’s a very amazing but frustrating part of the job, which is you can’t know it all. 
00:22:09
There’s so much to know that you go into these conferences and you realize how far we still have to go before we figure out the brain, which is good for job security and good for scientific exploration. It makes it very exciting. There’s still a lot of opportunity, but it can be demoralizing if you are more on the pessimistic side. I find it to be very exciting. I think there’s a lot of opportunity, like I said, but it is so overwhelming how much… And that’s where I think the data science really comes into play on the neuroscience side of like, “Okay, I can’t be a neuro geneticist and a computational neuroscientist and a cognitive neuroscientist and a single cell physiologist.” You can’t be all of those things. And so is there a way to start pooling and collecting and synthesizing all of the different facets that we know about the brain into something that is comprehensible to any one specialist in the sub domains? And I think that’s where the future of AI, generative AI complement to research will end up going.
Jon Krohn: 00:23:25
Exactly. No one human can do it, but some o1 model or something like that, OpenAI maybe o3 in a few years and just do it. 
Bradley Voytek: 00:23:38
A really smart collaborator, that’s all I really want from GenAI is just to have a really smart collaborator to bounce ideas off of, but who I don’t always necessarily just immediately trust what they’re saying. I think, “Are you sure about that? Is that right? I’m going to go double check that.” 
Jon Krohn: 00:23:51
Yeah. A very creative collaborator who knows a lot from a lot of different fields. And it is interesting how when I was an undergraduate student studying neuroscience, the textbooks, even third year textbooks on neuroscience or neuro genetics, they seemed so crisp and it seemed like there were such well-known things that could be well studied, well understood. You get these great illustrations in publishers like Pearson, it’s just everything’s so beautiful and so easy to understand. You get quizzes that reinforce concrete concepts and everything seems so solid. And so I thought when I was starting a PhD in neuroscience that what I was going to be doing was doing what you said is impossible. Then I later learned is impossible. But it’s amazing how, I guess when I was 21 or 22, I genuinely thought that starting my PhD, I would come out whatever, five years later and know all of neuroscience like-
Bradley Voytek: 00:24:57
Yeah. 
Jon Krohn: 00:24:58
And so first week of neuroscience PhD, sit down in the library with Principles of Neural Science from Kandel and Schwartz. 
Bradley Voytek: 00:25:05
I was just going to name-drop that book. I did the same thing in my PhD. I’m like, “I’m going to sit down and read every chapter of Principles of Neural Science, and then I’ll know everything there is to know about the brain.” 
Jon Krohn: 00:25:16
Yeah, and- 
Bradley Voytek: 00:25:17
And it didn’t happen. 
Jon Krohn: 00:25:19
No, and it turns out that a PhD is almost the opposite of knowing everything about a field. You just become really narrow… I mean, there’s different approaches, but for the most part, with a PhD you’re becoming really narrowly specialized. 
Bradley Voytek: 00:25:33
And you become aware of what you don’t know. I mean, it’s sort of a cliche almost at this point, but the more you learn, the more you know that we don’t know, right? If you go onto Reddit or a blog or anything where somebody who’s not a specialist in neuroscience is talking about the future of neuroscience and AI, and people are like, “Oh, we’re going to solve the brain in 10 years.” I’m like, “I don’t even know what solving the brain means. What do you mean by that? What does the solution to the brain even look like?” Right? 
00:25:57
Then you start getting very academic about it, but at the end of the day, there isn’t really. What does that mean? Does it mean knowing every single neuron, every single atom, every single… Can you predict every single facet of a person’s life going forward? What does knowing mean? I often appeal to there’s something called the OpenWorm Project. Now I’m not very familiar with all the details, but there’s a very simple animal model that’s used in neuroscience, C. elegans, and it has a known number of neurons, and I think it’s like 172, and the animal has like a thousand. I’m going to have to look this up, I’m going to fact-check myself in real time and know that I’m getting it wrong. But anyway, it’s a known set of neurons, each one that develops at a very specific developmental time point. 
00:26:44
Every neuron has a name. We know the whole connectome of the entire animal. How all the neurons connect to each other at what points in what period of time. And the OpenWorm project is basically saying, “Okay, we know all of this. We know the entire connectome of the animal, how all the neurons connect to each other. We know what the identity of the neurons are, everything.” Now, we should be able to simulate this very simplistic animal, like the behaviors of movement and things like that. 
00:27:08
And this has been going on for more than a decade, and they still can’t do it. So we’ve got a couple of hundred neurons in this animal, and we can’t even figure out how this animal works. And very famously, there’s another neuroscientist, Eve Marder, who’s been studying this group of neurons in what is like a crayfish, the stomatogastric ganglion. And it’s a group of a dozen neurons that just controls the movements of the gut in this crayfish. And she’s been studying this for decades and can’t figure out just how these dozen neurons managed to give rise to this behavior. And so, then when I hear people at least going on social media or public talk shows or whatever, saying, “We’re going to solve the brain in 10 years,” I’m like, “We haven’t solved these 12 neurons in the gut of a crayfish in 30 years. We’re not going to solve the 86 billion neuron mess that is the human brain anytime soon.”
00:28:07
And I don’t want to be pessimistic, like I said, I think if there’s a lot of opportunity. But the 1990s were declared the decade of the brain by President Bush because solutions to mental health were just around the corner with all of the new brain imaging technologies. And that’s 30 years ago. And brain imaging still isn’t even potential, doesn’t even really have any clear potential for being used to diagnose different mental health disorders. We still rely upon clinical observation in order to diagnose ADHD and depression and anxiety, and all these kinds of things. There is no brain scan that does it. We’ve got cancer, brain cancers we can detect with brain imaging and epilepsy. That’s what we got. And those have been doable for decades. Everything else is still a shrug and some confusion. So we’re a long way off. But like I said, that means that we have a lot to figure out still. And that’s kind of the fun part of science.
Jon Krohn: 00:29:08
In a recent episode of this podcast, the mathematical optimization guru Jerry Yurchisin joined us to detail how you can leverage mathematical optimization to drive commercial decision-making, giving you the confidence to deliver provably optimal decisions. This is where Gurobi Optimization comes into play. Trusted by most of the world’s leading enterprises, Gurobi’s cutting-edge optimization solver, lightweight APIs, and flexible deployment simplify the data-to-decision journey. And, thankfully, if you’re new to mathematical optimization approaches, Gurobi offers a wealth of resources for data scientists, including hands-on training, comprehensive Jupyter-notebook examples, and extensive, free online courses. Check out Episode #813 of this podcast to learn more about mathematical optimization and all of these great resources from Gurobi. That’s Episode #813. 
00:29:57
Before I had my illusions shattered about what I could do in a neuroscience PhD, before that, I thought I wanted to be a psychiatrist because, similar to probably you, the way that you started off this episode by talking about how fascinating it is that this structure, this mess of liquids in chemicals just somehow creates everything that we think and we do, everything that we feel somehow happens in that blob, that organ, and it’s really mysterious. And I found that really fascinating. And so, I thought that psychiatry would be a place where I could learn about these things and also make an impact. And I volunteered in a psychiatric ward at a hospital where I grew up in Canada and quickly learned what maybe not everyone in the audience knows, which is that in psychiatry, I don’t think there’s a single drug. There’s not a single psychiatric solution where it’s known how it works. 
Bradley Voytek: 00:30:57
Correct. Yes. I feel like saying there’s not a single one feels like it should be too strong. I try not to broker an absolutes, but I think that’s true. I don’t think we actually know how or why any of these psychiatric drugs work. And I would love somebody to come into the comments later on and say, “Well, actually we do know X, Y, or Z. and they’d be like, “Okay, that’s wonderful.” But lithium, which is just an element that really does a great job for most people with bipolar disorder, and they’re like, “Nobody has any clue why in the world that works.” And forget about all the other more complicated pharmaceuticals. I mean, we’re talking about a single element, and it kind of works and it saves lives. So yeah, I had the same kind of experience as you. I thought I wanted to be a clinician or at least a, like clinically adjacent, like a psychologist.
00:31:54
Actually, initially I wanted to be an astrophysicist, but I failed out of that program. When I switched to psychology, ultimately, I decided I wanted to maybe go into the clinical aspects, and I worked at a board and care home for people with severe schizophrenia and bipolar disorder for a summer. And that was an interesting job because these are people that were in a full-time facility, and most every single day was just me hanging out playing basketball and smoking cigarettes with them because they all chain-smoke like crazy. And we sat around playing basketball and smoking cigarettes. 99% of the time, everything was okay. And then every now and then, something really surprising and shocking would happen, and I’d have to try and take care of it and then just go back to playing basketball and sitting and chatting with them. And that was a really eye-opening experience because it’s intense. It’s an intense environment. 
00:32:48
And when I thought about going to medical school and going into something like neurology, I had the long conversation with my wife after my PhD when I was considering this, and she basically was like, “You don’t have the constitution for this. You cannot be a physician. Just imagine the first time you’re working with a patient and something bad happens, they die because of a decision that you made, which was still the right call. What do you see your reaction to that being?” And I was like, “Yeah, I don’t think I could handle it. I really don’t think I could.” And so, I never let go of the idea of wanting to try and help people, improve quality of life using the scientific method, but the clinical approach wasn’t for me. 
00:33:37
And so, the sort of gambit that I ended up trying to rely on in order to keep motivation for myself and my career was… If I become a scientist, I could spend an entire scientific career and discover nothing of any consequence, but I could, then, in science, one small discovery, one small breakthrough, can have ripple effects that can improve millions of people’s lives. And so, that’s kind of how the entire scientific process works, is just that building on top of previous findings. And so, I like that idea of maybe never actually directly contributing to anything. Like I’ve created a new device that then cures x, y, and z, right? I don’t think that’s actually where I’m going to go, but maybe I can contribute to the body of knowledge that then a generation or two down the road helps people figure out what that device could be. And so, to me, that was what I held onto in terms of, like, “Why am I in this field, and where do I see my part being played in medicine and science?” 
Jon Krohn: 00:34:40
Nice. That’s a great story, an interesting story. And yeah, it does parallel mine in many ways, but I’m going to move on to kind of a different topic. I’m going to segue here. It’s related to what we were just talking about. So we were talking earlier about how we can definitely solve the brain in the next decade. I mean, you said it, I heard it, and no, so I know we’re not going to be able to solve the brain, but data science, particularly, I think things like LLMs, we touched on this a little bit, will, I think, be able to accelerate discoveries in a lot of different fields, including neuroscience. I mean, do you agree? Do you think that there are emerging data science technologies or methodologies that could accelerate our understanding of the brain in the coming 10 years? 
Bradley Voytek: 00:35:33
Oh, for sure. I mean, it’s almost a given that it has to and will, right? It’s like saying, “Do you think calculators will accelerate science?” Yes. Do you think search engines are going to… I can’t even imagine running a research lab without search engines. Just the rate at which I can quickly and easily discover information has a… Just a huge impact on the way that everybody does science. So, Google is probably one of those transformative aspects of science in the last 100 years. It’s significantly shaped the way that we are able to find and retrieve information that allows us to then continue to build science and do research better, more accurately, and faster. And so, I think LLMs are going to be something similar, right? Yes, there are so many problems with the current iteration of LLMs hallucinating and things like this, right? 
00:36:31
But they do … you can see the glimmer of where the future will be. And so, just to give a concrete example, when I was doing my PhD, I was looking specifically at the effect that very focal brain lesions in the prefrontal cortex or the basal ganglia, two interconnected structures in the brain that are known to be involved in higher level cognition. If somebody has a stroke that damages one of these brain regions, what impact does that have on their memory functions? That’s what I spent my PhD doing. At the start of my PhD 20 years ago, in 2004, in my naivete, I believed that there must be some kind of website that I could go to where I could click on the prefrontal cortex on the brain, like an image of a brain, and get a listing of what all the inputs and outputs to that brain region are. 
00:37:19
That didn’t exist, and it still doesn’t exist. Just very frustrating. And that drove me to ultimately leading to a project years later, at the end of my PhD, that my wife and I had published together, which was that this issue frustrated me and stuck with me for so long because instead of having a really easily to discover mapping of these inputs and outputs to these different brain regions, I had to go into the UC Berkeley where I did my PhD archives of peer-reviewed papers published in the 1970s, where they did all these anatomical tracing studies and digging through these old papers to try and figure out what the inputs and outputs were to these brain regions. I was at a conference on a panel at Stanford in 2010 or so, with quite a number of, actually, names that your listeners will probably be familiar with, senior eminent people and AI and neuroscience.
00:38:16
And somebody asked a question on the panel, and I answered by saying, “The peer-reviewed neuroscience literature probably knows the brain.” There’s something like 3 million peer-reviewed neuroscience papers that have been published that are indexed in PubMed, which is the National Library of Medicine, NIH, database of peer-reviewed biomedical research. If we could tap into all of that knowledge, we probably would be 50% farther along in neuroscience, but we as humans are limited to how much we can read and synthesize. And one of the faculty members that is a sort of giant of the field basically said, “That’s really dumb.” And I was like, “I’m pretty sure I’m right about this.” And so, back in 2010, my wife and I did a proto NLP project, where I just did… Well, I should say she wrote the Python code to scrape all of the text out of the abstracts of all of these papers to just look at co-occurrences of words and phrases with the hypothesis being that the more frequently two ideas were discussed in the peer-reviewed literature, the more likely they are to be related.
00:39:24
So very simplistically, if a paper is written about Alzheimer’s disease, they tend to also talk about memory because Alzheimer’s disease has a significant impact on memory. But it also mentioned like teleopathies, which is one of the mechanisms by which we think Alzheimer’s disease manifests. But papers to talk about Alzheimer’s disease are less likely to talk about bradykinesia, which is slow moving, which is much more commonly observed in Parkinson’s disease. And so, by looking at the word frequencies and co-occurrences, very simplistic NLP, proto NLP, we built a knowledge graph of neuroscience. 
00:40:05
And so, this was a paper that we published in 2012. We did the project in 2009, 2010, and it was a pain in the ass to publish because all the peer reviewers, we built this knowledge graph, and then we could find clusters in the graph and went to publish this paper, and we’re like, “Hey, look, we can naturally from natural language, free form, peer review text to discover clusters of topics that are interrelated, like Parkinson’s disease is highly clustered with dopamine, the neurotransmitter, and neurons in the substantia nigra, which are the dopamine neurons that die off in Parkinson’s disease that give rise to motor tremors and bradykinesia.” 
00:40:43
And this is naturally discovered just through text co-occurrences. And the peer reviewers said something like, “Yeah, we know these things.” And it was like, “Yeah, I know you as an expert who’s read the principles of neural science and has been studying neuroscience for 20 years knows,” but now math knows. Isn’t that amazing? And back in 2010, people weren’t really buying it. And now, I think we’re in an era where we can do that same thing. In my lab, we’re trying to build this right now, actually that same thing, but two orders of magnitude more sophisticated. So, we actually are building that site right now where you can go click on the prefrontal cortex or whatever brain region, and it is built on everything we know from publicly available data sets of human brain imaging about the brain. So the Allen Brain Institute has a database of gene expression in the human brain.
00:41:37
There’s about 20,000 or so different genes that are differentially expressed across the human brain. So we pull that data set in, and then there’s another data set of neurotransmitter densities based on positron emission tomography, and pull that data set in and pull this data set in. And this has already been done by collaborators up at McGill University, Bratislav Misic is the lab head there. And they created an open source Python package called Neuromaps, where I think it’s Ross Markello and Justine Hansen are the first authors on the Neuromaps paper published a couple of years ago, where they did all the legwork of actually going out and pulling in all these publicly available data sets. And so, what we’re doing right now in the lab is we’re building a brain viewer that collates all these different data sets in browser. So you can click on an arbitrary brain region and get a listing of everything we know about this part of the brain. 
00:42:25
And the next step that we’re trying to build on top of that with an industry collaborator who cannot be named yet because it’s not formalized, but alongside that browser is an LLM chat window where you can then say, “Show me the hippocampus,” and then the LLM will pop up, then illustrate on the screen in this sort of dynamic brain viewer where the hippocampus is, and then you can say, “Give me a listing of the top 10 genes that are most strongly expressed in the hippocampus uniquely compared to other brain regions.” Then you can ask, “What are the primary inputs and outputs?” And it’ll show the primary inputs and outputs. 
00:42:57
So we’re trying to build a brain discovery engine that is LLM powered, that is trained on these peer-reviewed papers and these open data sets so that you can do better neuroscience discovery so that we’re sort of dissolving the boundaries between, like I said at the beginning of the podcast, the neurogeneticists who don’t know anything about theoretical neuroscience, who don’t know anything about neuroanatomy, and trying to dissolve all those boundaries to bring all these different data sets together in one easy-to-digest platform. So that’s honestly still probably a couple of years away, but we’re prototyping it right now. 
Jon Krohn: 00:43:34
I was going to ask, is there a name or a URL? Sounds like there’s not a role. 
Bradley Voytek: 00:43:38
Don’t even have a good name yet. No, no. It’s very much in the infancy, but it’s looking promising. 
Jon Krohn: 00:43:45
Did you know that the number one thing hiring managers look at are the projects you’ve completed? That’s why building a strong portfolio in machine learning and AI is crucial to your success. At Super Data Science, you’ll learn how to start your portfolio on platforms like Hugging Face and GitHub, filling it with diverse projects. In expert-led live labs, you’ll complete an exciting new project every week. Plus, through community-driven projects, you’ll tackle real-world, multi-week assignments while working in a team. Get hands-on experience with projects like retail demand forecasting, building an AI model from scratch, deploying your own LLM in the cloud and many more. Start your 14 day free trial today and build your portfolio with www.superdatascience.com. 
00:44:26
That sounds really exciting. I mean, I can imagine that that will work. We are at the point now where those kinds of magical things are exactly the kinds of things we can do with LLMs. It is staggeringly mind-blowing to me on a daily basis what I can get these to generate and in a way that is helpful to me. So that is really exciting. To go into the nuts and bolts a bit of not just that project, but in your lab in general. And maybe there isn’t a good answer to this because you’re probably less monolithic in terms of the software tools that you use or research methodologies or data science approaches that you use relative to, say, a company in a lab like yours. But do you think there are some generalizations that you could give to our listeners around the kinds of tools that you use? So what programming languages do you use in your lab? Are there particular approaches that get used more often? Just to kind of give us a coloring of when somebody is doing the kind of exciting research that you’re doing in a university lab, what kinds of libraries, what kind of software are they using? 
Bradley Voytek: 00:45:37
We are a very Python-heavy lab. I actually became a Python developer. I didn’t write a piece of Python code until after my PhD. That project that I was referring to with my wife, Jessica, that we published, she’s a Python developer, most recently worked at Amazon. And I was a MATLAB programmer during my PhD, because that’s kind of what you did in the sciences back in the 2000s. And she challenged me to do a code-off on that project when I was sort of complaining. I’ve told the story a couple of times, but we were watching reruns of Star Trek: Deep Space Nine one night, as super nerdy programming couples do apparently. And I was like, “Hey, I think I figured out an algorithm for this brain NLP thing.” And she asked me what it was, and we whiteboarded it out a little bit, and she’s like, “I could totally code that faster than you in Python.” 
00:46:29
And so, I started writing it in MATLAB. She started doing it in Python. She finished that night and had it working, and had the prototype built. And I was like, “Okay, that was pretty impressive.” And so, then I became a Python developer. And so, we’re primarily operating in Python. And my lab’s GitHub repo is just like github.com/voytekresearch, all one word. No underscores or anything. And you can see we’ve got, I don’t know at this point, dozens if not hundreds of repositories where every research paper that we publish, we try and have a open repository associated with it, where we’ll write Python packages. So my lab has developed three open source Python packages, FOOOF, which stands for Fitting Oscillations and One-Over-F, which is being renamed to specparam, is a Python package that was to parameterize neural power spectra. That has since been adapted by fusion research labs and astrophysics groups to parameterize the spectral data that they’re getting off of those different sources. 
00:47:33
So we’re renaming the package to make it more generalizable because this is like an open source dream. We wrote code to analyze neural data, and now it’s being adopted by scientists all over the world for other things where it also just the math works. So we’re very proud because the code just works, which is great. So that specparam package, and then a bycycle, which is the cycle-by-cycle analysis of neural oscillations, which also is apparently being used in industry for other rhythmic time series analysis. And then neuroDSP, which is neuro digital signal processing. So the vast majority of what my lab does is Python-based work around time series analysis. How do we discover patterns and data? But it’s laughably simple, and I would say even simplistic, what the lab does. 
00:48:23
So instead of relying on transformers and AI and deep learning to discover patterns and time series, we say, “Okay, we know what some of the generators, the physiological generators of these signals that we’re recording are.” So instead of trying to discover patterns using deep learning, let’s try and bootstrap up our knowledge of physiology and try and just stick as close to the raw data as possible. 
00:48:51
And so, for the action potentials, we parameterize each action potential using a very simple number of parameters. So we find each action potential, and we then populate a Pandas DataFrame in Python where each row is an action potential. And each column is one of the features that we’ve parameterized, so we say, “Okay, at what time point did the action potential initiate in the five milliseconds leading up to, or two milliseconds, or whatever it is, leading up to the initiation of that action potential, what was the voltage ramp slope? How long did it take to go from the initiation point to the peak of the action potential? And what was the voltage change at the peak of the action potential? How spiky was that peak? Or how smooth was it? And then after that peak, what was the decay rate of an exponential decay function back to the lowest point of the hyperpolarization of that voltage?” Right? 
00:49:48
And so, it’s very simplistic. The joke in the lab is that the entire lab is practically built on scipy.optimize.curve_fit, which is just one function in the SciPy Python package that allows you to fit and parameterize curves. And so, most of what we do is basically built off of that where we say, “Let’s take our rich physiological data and let’s give respect to the variability that we’re seeing, rather than trying to average the variability away.” We’re at a point in science where we can leverage that variability and use that variability to discover any patterns. And so, rather than assuming that signals are binary like the actual potentials, let’s look at the subtle variability from spike to spike and see if that is systematically related to any kind of features of the inputs, right?
00:50:32
What’s amazing is that this very simplistic approach is actually incredibly useful across different domains. We have a project going on right now here at UC San Diego with Dr. Ulrika Birgersdotter-Green, who is the head of implantable cardiac devices. We’ve taken the same code for breaking each actual potential waveform into these components, but applied that to the heart electrocardiogram, the ECG signal, which has five known parameter peaks, features, the PQRST complexes. We know physiologically what each one of these bumps in the heart signal means. It has to do exactly with the different chambers of the heart opening and closing, contracting and relaxing. 
00:51:15
We’ve adapted the code for action potentials and we now are using it for heart data to try and see if we could use that to discover any kind of cardiac events in the time series. We use Python to discover features in physiological time series using very simplistic, and simple, I should say, but I don’t know, maybe simplistic interpretable features that stick as close to the biology and as to the data as possible. 
Jon Krohn: 00:51:43
That’s really fascinating, and it’s cool that you guys are developing so many different Python packages specifically in this area of temporal analysis in neuroscience. Something that came out in our research is that you’ve spoken in the past about how terms, neuroscience terms related to these temporal patterns like neural oscillations, how that sounds like a technical term. It sounds like that’s describing something concrete, but in fact, when you try to define it concretely, there’s little there.
Bradley Voytek: 00:52:23
Yeah, I would agree with that. Yeah, so the reason why we take the approach in my lab of trying to stick as close to the data as possible is because we strongly believe that the misapplication of analytical tools to our data have led us astray, led us down wrong paths. Concrete example with the neural oscillations. People probably have heard of brain waves and things like this, and if you’re a little bit more, you’ve taken an intro level psychology class or something like that, you’ll hear about delta waves and alpha waves in sleep.
00:53:06
The EEG, the electroencephalography, the noninvasive recordings of brain activity that we can use by placing electrodes on the scalp that is just about 100 years old now, you can see these really clear patterns in the voltage changes in the EEG as people are waking up and falling asleep. This is partly used to try and classify different sleep stages, REM, non-REM, all that kind of stuff. When you look at some of these data in the EEG, when the clinicians who were first doing this back in the 1930s, forties and fifties were looking at these signals, they would use language like, “These look like waves.”
00:53:48
The very first rhythm, the very first signal ever recorded in EEG back in 1920, published in 1929 by Hans Berger who invented EEG incidentally, while trying to prove that psychic phenomena were real. 
Jon Krohn: 00:54:01
I didn’t know that. 
Bradley Voytek: 00:54:01
But when he invented EEG, he discovered what was referred to for a long time as the Berger rhythm named after him, which we now refer to as alpha waves, which are these 10 hertz-ish oscillations in the visual cortex. Really clear, prominent, true oscillations in the data. Once digital signal processing came about, EEG used to be just scribbles on paper. The voltage signals from the EEG electrodes on the skull in the pre-digital era were transduced using little piezoelectric motors into wiggles in a motor. That motor is connected to pens and those pens would draw on an unspooling piece of paper that would unspool at a constant rate that allowed you to trace the actual EEG signals. 
00:54:50
In the post-digital era, people realized that one way of analyzing these brain rhythms, these alpha waves, for example, in the visual cortex, instead of having to count the number of times the wave crossed the zero voltage mark on a piece of paper, you can instead use something called the Fourier transform, which is, in my opinion, one of the most brilliant insights in the history of mathematics. Which is that any time-varying signal can be perfectly mathematically represented as a sum of sinusoids described by their frequency and their amplitude and also, their starting phase. Widely used for data compression and all this kind of stuff. 
Jon Krohn: 00:55:29
You joked earlier in the episode, but this is what we were going to get into.
Bradley Voytek: 00:55:33
Yeah, I told you I was going to get into the Fourier transform. We can dive into it. No, I’m not going to go any deeper than that. I promise I will leave it at the surface level. People realize that you can use this to look at your signals. Specifically, you could filter your data at 10 hertz around the alpha rhythm and isolate the alpha rhythm so you can more easily analyze it relative to all the noisy background signals of the EEG. That became the dominant form of analyzing EEG data then was filtering the data in these frequency bands that you know want to look at, alpha, theta, delta. Well, let’s go in order, delta, theta, alpha, beta, gamma, in the order of frequencies. These are the dominant rhythms of the EEG.
00:56:18
The math is such that if you take any signal, even a purely noisy signal like Gaussian distributed white noise, where by definition every data point is independent from the next, and you filter it in a frequency band, it sure as hell looks like an oscillation. We’ve become convinced that most people in the field started by trying to come up with simpler ways of analyzing data than counting the number of zero crossings, started using very reasonable mathematical tools like the Fourier transform and then forgot that the Fourier transform is a mathematical operation that will always return what you ask of it, and just started looking at their data only through the lens of these oscillations. What’s funny is we’ve known that this is wrong since the 1940s. 
00:57:03
There’s a very famous paper charting the sea of brain waves published in science by a neurosurgeon, Jasper, in 1948 where he coined what’s called the Fourier fallacy, which is, it is a fallacy to presume that because your signal can be decomposed in into sinusoids using the Fourier transform, that your signal is composed of sinusoids of oscillators. It doesn’t have to be, the math is just doing what you ask of it. When we look at the raw data coming off of most neural signals, there aren’t oscillations. There aren’t these rhythmic signals. More often than not, it just looks like noise, but there’s many different forms of noise. We’ve actually sort of spearheaded this sort of new direction in the field of looking at this, what we’ve referred to as aperiodic activity. 
00:57:47
So, you have white noise, which is purely static, kind of, every data point is independent from the next, but then you can have brown noise and pink noise, which are more like random walks where it’s still random, but there are correlations in the structure so that one data point isn’t totally independent from the next. It’s still moving randomly, but there are correlation structures in those. Neural data actually has more correlation structure in it. Different brain regions look more like white noise and other ones have more correlation structure. 
00:58:18
Our spectral parameterization Python package was developed to quantify this aperiodic signal, which has traditionally been just averaged out as noise and saying, “This, we know where this signal comes from physiologically it’s not just noise. In fact, it’s time varying in interesting ways,” and that has just exploded the paper that we published for this Python package, introducing this sort of concept and applying it to neural data. We published in 2020 in Nature Neuroscience, it’s just Parameterizing Neural Power Spectra is the name of the paper. That’s been cited 1,500 times or something at this point. Like I said, my field’s very wide ranging, 
Jon Krohn: 00:59:00
I just opened up Google Scholar to your page and that is by far your most popular paper, and it’s only a couple of years old. 
Bradley Voytek: 00:59:07
Yeah, and it’s really, it’s funny that I gave talks at this Society for Neuroscience conference 10 years ago talking about this issue, this problem of we can’t just presume that this is noise and we can’t also presume that everything is an oscillation. We have to really be thinking more carefully about our data. People would just be like either, “Yeah, kid, you’re wrong. Shut up,” or, “Okay, so what?” This is why my lab has become very big proponent of writing software that is easy and intuitive to use, and well-tested on many different kinds of data because if you offer people an easy way to analyze their data in new ways, that then opens up new opportunities for them to find new discoveries in existing data, publish new papers, which is what scientists really care about because that helps them get new funding and it opens up new research opportunities and so on.
01:00:04
We sort of took this approach of, all right, nobody’s actually listening to our theoretical arguments, so let’s write good code to force people to analyze the data in ways that we think are better. We’re not saying that our tools are the correct way and by no means, in fact, we actually sometimes say there are better tools that we should be using. We try and fold those into the greater sort of neural data analysis ecosystem. We’re not trying to build, we’re not trying to say we’re trying to build, but we’re not there yet. We’re not trying to say that our approaches are the correct end point. They’re just the next step in how we should be thinking about analyzing data.
01:00:44
I truly, genuinely, hope that people take the open source code that we’ve written and make it better, do it better, show that we’re wrong in some ways. There’s sort of a joke in my lab that anytime somebody publishes a paper that seems to overturn or pushes against something that we say, I try and hire that person ’cause I’m like, “Oh, all right, I like this. Somebody’s really pushing our ideas and trying to make it better. Let’s bring them into the fold. I want to hire them.” 
Jon Krohn: 01:01:10
This is really fascinating research in general. I’m glad I asked about oscillations. I had no idea that it was going to lead to all this interesting discussion. I also, looking at your Google Scholar page, it looks like you collaborated a number of times with someone with phonetically this exact same last name as me, Crone. 
Bradley Voytek: 01:01:26
Yeah. Yes. Nathan. Yeah, he’s great. He’s in [inaudible 01:01:31]. 
Jon Krohn: 01:01:31
Completely different spelling. I wish I had Nathan’s spelling of, because it’s C-R-O-N-E, and I feel like there’s no ambiguity about how to pronounce that, but everyone, seemingly, half the people that I meet for the first time want my name to rhyme. Jon Krohn, drives me crazy.
Bradley Voytek: 01:01:47
Oh, I could see the Krohn. Yeah. I mean I assume Crone is Nathan Crone is spelled, I assume that’s an Anglicized version of what’s probably an Eastern European Germanic spelling, right? 
Jon Krohn: 01:01:59
You are correct. Yeah. 
Bradley Voytek: 01:02:00
Yeah, and so I’m pretty sure it’s Anglicized, but I could see Krohn as being very popular. 
Jon Krohn: 01:02:05
It is even because I have an H in it is Anglicized kind of already, so if it was in the German, it’d just be K-R-O-N, and then I could forgive everyone for calling me Jon Krohn because that really, I mean, why would you think that that is? But anyway, so you talked about how your research is theoretical and what you just described there, everything about the neural oscillations that did sound more like theoretical than applied research to me. 
01:02:33
However, you do have some papers that touch on applications. To give our listeners an example, and you might have a different paper that you want to pick to go into applications, but you have a paper called Resting State is Not Enough: Alpha and New Rhythms Change Shape Across Development, but Lack Diagnostic Sensitivity. Actually, a lot of the point of the paper is in that long title. You studied how certain brain rhythms, neural oscillations, change as children grow, but you found that they don’t help much in diagnosing disorders like ADHD, attention deficit hyperactivity disorder, or autism. Yeah, so I guess that is an application of the relatively generally theoretical stuff that you do. Yeah, so I don’t know if you want to talk about that.
Bradley Voytek: 01:03:25
Yeah, so we do a little bit of that kind of work here and there. That paper is on by archive, it’s not peer review. Well, technically it’s undergone peer review and it’s in the next round of peer review, out of peer review journal. That is using a very large open data set of EEG collected from children. I think it’s over 1,000 participants of EEG, I think 1,700 actually, which is massive. The PhD student who’s working on that, Andrew Bender, in my lab with a former postdoc in my group who’s the lead author who really shepherded Andrew’s early PhD era in my lab during the pandemic, Natalie [inaudible 01:04:04], we’re looking at, there’s a lot of theories that exist out there in terms, especially in autism, relating these sort of alpha oscillations and new rhythms in the sensory motor cortex, suggesting that they might have some diagnostic value. 
01:04:26
We’re like, “All right, well, let’s use our methods of actually analyzing only oscillations. Let’s find any time point where we’re quite confident that an oscillation is in fact present.” Not only that, we then find each one of those oscillations and then we over-parameterize it using the same kind of tools that I was mentioning earlier that we’ve done for action potentials and for the heart where each oscillation cycle is given a row in pandas data frame. Each column is a parameterized feature of that single rhythmic cycle. The amount of time it takes to go from one trough of the cycle to the next, which is the period which is proportional, then, of course, to the frequency of that one cycle, the voltage change, how sharp is the trough, how sharp or smooth is the peak? How much time does it take to go from the trough to the peak or the peak to the trough?
01:05:24
That’s what the bicycle package that we developed in the lab does, which allows you to parameterize the non-sinusoidalities of these oscillation waveforms. Because neural rhythms aren’t sinusoidal, but when you use traditional wavelet-based or Hilbert-based, Fourier-based approaches for analyzing these, they get rid of, they actually destroy these rich non-sinusoidal features and smooth them out to make them look more sinusoidal. The visual cortex alpha rhythm tends to manifest as sharp deep Vs with kind of smooth peaks. Whereas, the sensory motor cortex has also, an alpha frequency rhythm at 10 hertz, but it’s a Mu rhythm it’s called because it tends to look like the letter M. The hippocampus has a very clear prominent theta rhythm in rodents. This is like this theta rhythm is probably coordinating cell assemblies in the hippocampus gave rise to the Nobel Prize 10 years ago on how the hippocampus probably is mapping space. That looks more like a shark fin. It’s kind of got a sawtooth sort of waveform, but filtering that in these sort of frequency bands that destroys that.
01:06:35
We not only, were looking at, if alpha is going to be diagnostic as some people suggest for these different disorders like ADD and autism, let’s use this massively large open data set collected by other researchers for other purposes, and apply our tools to it to see does that hold true using more precise methods for identifying oscillations and quantifying them? None of the features that we have seem to actually relate to any of the diagnostic criteria within this large data set. This suggested to us at least that we’re missing something more. On the flip side, this aperiodic activity that we mentioned earlier, one of the PhD students in my lab, Sydney Smith, published two papers earlier this year in the Journal of Translational Psychiatry looking at the potential for aperiodic activity as relating to recovery from major depressive disorder after electrical stimulation.
01:07:38
Before working with our clinician collaborator here, Dr. Mariam Sultani at UC San Diego who’s a psychiatrist, I had no idea that ECT, electroconvulsive therapy, electroshock therapy, is still considered to be the gold standard treatment for otherwise treatment resistant depression, major depressive disorder. It works really, really well. People are suicidal. Then they undergo a couple of courses of this ECT treatment, which has a horrible public perception, but it is absolutely life-saving. When we talk to the psychiatrist collaborator, not Mariam Sultani, but the other psychiatrist who’s actually performing this, when I was at a meeting with him and I was asking him, “Well, how does this work? He kind of just shrugged and was like, “Yeah, we don’t really know. We just kind of say it’s like rebooting the brain,” and as a neuroscientist that’s woefully inadequate. 
Jon Krohn: 01:08:32
Have you tried turning it off and on? 
Bradley Voytek: 01:08:35
Yeah, that’s basically it, right? You’re like, “Well, okay,” and so we began a years-long collaboration with them collecting EEG data from these participants before, during and after ECT treatments over the course of successive treatments over several weeks. Sydney Smith from the lab really found that this aperiodic activity seems to really nicely track the recovery from major depressive disorder with these ECT treatments. There are certain signals that people historically have said, “Well, this seems to have some diagnostic validity for something like autism and ADHD,” which we’re finding really maybe doesn’t if we use more sophisticated analysis methods. Whereas, on the flip side, we can try and discover maybe that there are other signals that have been previously overlooked that do have diagnostic criteria and other domains. 
01:09:31
We do play around a little bit in that space, so it’s not just theory. If we have a strong enough set of theories, we then will seek out collaborators who are collecting data under different scenarios or open data sets that allow us to try and test very specific, the possibility of some of the signals that we are measuring, having clinical utility. It’s not pure theory. 
Jon Krohn: 01:09:57
Yeah, nice. It’s really interesting. I’ve loved every part of this episode. It’s been awesome digging into your research and it kind of makes me, it would be really selfish of me to kind of have neuroscience guests on the show every week. Do you know Blake Richards at McGill by chance? 
Bradley Voytek: 01:10:14
Know him personally? No, but years ago before I left Twitter, he and I used to have some conversations on there, really intelligent, fascinating person. 
Jon Krohn: 01:10:24
Yeah, so I did an episode with him last year, episode number 729. If our listeners want more neuroscience or … Yeah, I guess you as well.
Bradley Voytek: 01:10:32
Yeah. 
Jon Krohn: 01:10:35
Yeah, I just had so much fun with that. I actually at the end of it, “I said to him, we’re going to have to do a part two,” and I haven’t scheduled that, but someday, maybe even before the end of this year, we can get that part two in there. I can’t remember now which way we did it, but he goes both ways using neuroscience to develop machine learning methods, but then also, using machine learning to help understand the brain better. We did one of those ways for the entire episode, and so I was like, “We’re going to have to do a part two where-” 
Bradley Voytek: 01:11:04
Good opportunity to go the other way. 
Jon Krohn: 01:11:08
Yeah, so it’s been really fun in this episode, hearing about the intersections of data science and neuroscience in terms of your research. I’d like to ask you a couple of questions now about data science education, which is probably something that interests our listeners a lot and that you also, have a lot of involvement with. You’re a data science and neuroscience professor at UC San Diego, which is perhaps the leading neuroscience institute in the world. Probably listeners who aren’t in neuroscience probably wouldn’t know that. 
01:11:39
You’re a neuroscience research fellow there. You’re a founding faculty member of the Data Science Institute at UCSD. With data science being so inherently interdisciplinary and the kinds of things that we were talking about earlier where just like neuroscience, you can’t know everything in data science, it’s impossible. There’s far more than anybody could ever specialize in and it’s expanding all the time. There’s more data scientists creating more tools, more approaches all the time. Data science is inherently interdisciplinary, requiring knowledge from fields as diverse as software engineering to ethics. Yeah, so this requires so many different specializations. As a curriculum designer and as a professor, how do you structure teaching to try to give people the right balance of breadth and depth? 
Bradley Voytek: 01:12:33
Yeah, so for a little bit of context, when I first started here as a new professor in 2014, I started teaching right out of the gate an intro to data science class. I had worked as a data scientist industry for a little while, and I came in as a professor of cognitive science. One of the specializations in our degree program at the time for COGSI here at UC San Diego was computation. I felt like we were missing critical aspect of understanding human cognition using computational approaches in that there wasn’t a course or courses on big data analytics and using these open data sources to try and understand human behavior and cognition. I created this, it’s like a ta-da, isn’t science cool kind of intro level class to try and get students excited about wanting to learn the technical aspects of data science.
01:13:23
It’s that first course I offered, intro to data science, was 24 students. Then the next time I offered it was 180. Then the third time I offered it was 500. Most of those students by that point were computer science undergrads. Unbeknownst to me at the time, UC San Diego computer science department was sort of arguing back and forth as to whether or not they should have a data science specialization. 
01:13:46
Then I came in as a new naive professor in a totally different department and created this wildly popular class that all of their undergrads were taking. The director of the program at the time, the computer science, came over to me and was like, “We need to have coffee and chat.” In my first year as being a professor, I got pulled on to the four-person committee to design an entirely new major for data science. That committee was two computer science professors, one math professor, and then me, the first year CogSci professor. Everybody else is full tenured professor that had been there for decades. it was a very strange environment for me to be in, but I really was trying to stand firm in the development of the curriculum to say, “I strongly believe that data science is not just the pejorative of programming plus statistics. I think it is deeper and richer than that, and we shouldn’t be constraining ourselves to thinking about it that way.” 
01:14:46
Those technical skills are an absolute requirement to become a data scientist. But what makes, in my opinion, an excellent data scientist is somebody that can move beyond that. How do you think about solving poorly defined questions, using many different publicly available data sources in order to bolster your confidence in your decision making? And so to give a concrete example, I have a lot of guest lecturers come into my class. And so I had somebody who works at Sony PlayStation last year come and give a guest lecture, and her boss, she said during the guest lecture, she was retelling a story about her boss came to her and said in the baseball game that she was working on, MLB the show, “I’m getting reports that the swinging is feeling weird. Figure that out.” 
01:15:39
Those are the kinds of things that as a data scientist, your boss who may not be a technical person at all, comes to you and says, and now you’ve got to figure out what does they mean by feels weird and then how do I use data to solve that? And so this is the kind of training that I really wanted to instill in our undergraduates. And so we set forth this curriculum that was intended to be as rigorous as any master’s program in data science. I have a lot of friends in the data science community, and I would ask them at the time when we we’re developing this, “Do you have any limitations where you would require somebody to have a graduate degree to be a data scientist on your team?” And they’d say, “No, of course not. We would take any good people we could find.” 
01:16:23
And then I would follow up and say, “Do you have anybody on your data science team that doesn’t have a graduate degree?” And they would say, “No.” Okay and you have this disconnect. And the follow-up question then is why? And they would say, “Well, a lot of people have the technical skills, but they don’t really understand how to think about really hard nebulous problems within a greater context and they don’t have hands-on experience.”
01:16:45
And so we developed the undergraduate degree program to be very project-based. So if you graduate with an undergraduate degree in visual arts, you have a literal portfolio of your artistic works to demonstrate your facility with different ways of approaching your art. And the way that I was thinking about developing the data science degree was as an art, we want students to come out with a portfolio, which is really just a GitHub with all of their artistic works, the computational notebooks, the Jupyter notebooks that illustrate the artistic thinking that they have using these publicly available data sources to answer really hard to answer questions.
01:17:31
And the great thing about the computational notebooks like Jupyter and stuff, is you can have a narrative format at the beginning of, “Here are the choices I made and the decisions and why I made them. Here’s the math underlying all the decisions in LaTeX notation that we used, here’s the openly publicly available data sets that we scraped together and here’s how we collated them and clean them.”
01:17:52
And so you’re mixing the code and the math and the narrative with all the visualizations in an artistic piece. And so that’s really how we conceived of the program is demonstrating your artistic facility, but using deeply technical thinking and tools. And that was just a blast. And now the undergraduate degree program here, we launched it in 2017 and now it’s 1200 majors in data science at UC San Diego. Each one of them graduating with a capstone project that is individually mentored or mentored in small groups, I should say, we don’t have 1200 people. But a lot of our industry partners and stuff run these capstone projects where they’re seeing all these undergraduates through to the end to put this final capstone artistic piece on the end of their project rich careers. 
01:18:48
And so that was the conception of how to educate students and consistently sprinkling these real projects throughout their degree program intermixed with the technical stuff while grounding them in the sort of my domain of CogSci that we are largely narcissistic beings. So most of the data that we care about are collected either by, about, or for humans. And you can’t understand humans without understanding social, economic and political context. And so you have to have that grounding. Humans are not just random number generators, statistical generating processes. We have motivations and cares and feelings, and you have to know that context to be a better data scientist. And I give many concrete examples from my day as a data scientist is in industry, why I came to that conclusion and how we leverage that. But I don’t know if we have time to continue. I’ll leave that to you. 
Jon Krohn: 01:19:44
We can do it if you want.
Bradley Voytek: 01:19:45
Okay. So I’ll tell one quick story from the earliest days when I was at Uber. 
Jon Krohn: 01:19:51
That’s perfect. I was literally, my very next question was going to be… so this is absolutely… this isn’t even actually 
Bradley Voytek: 01:19:57
It will segue then perfectly. Great, we’ll do it. 
Jon Krohn: 01:20:02
I’ll give a little bit of context on this.
Bradley Voytek: 01:20:03
Sure. 
Jon Krohn: 01:20:05
We mentioned this in the intro, but after your PhD in neuroscience, you joined Uber as their first data scientist when it was a really small company. I think about 10 people, right? 
Bradley Voytek: 01:20:12
Tenish people. Yeah. 
Jon Krohn: 01:20:14
So there’s a bit of context now. Go into the story. 
Bradley Voytek: 01:20:18
I got brought on short term to try and hire their data science team and build their data strategy with them. And that’s a whole story that is a whole other podcast of why I ended up not staying. But one of the problems that became very clear early on, and this by the way, again for more context, it was tenish people, we were a startup. At the time when I got hired, it was called Uber Cab before it rebranded to just Uber. And we were working in a startup space, a coworking space in San Francisco with 10 to 15 other hungry startups. And there was a dozen drivers total in Uber. But it became, even back then, very clear early on that in order to make the drivers more money, which was what we were trying to optimize for at the time, we wanted the drivers to be making a ton of money because that was great for them and also great for the company. We needed to make sure that the drivers were positioned well. The ideal goal is you as a person wanting to use Uber, walk outside and open the app because you want to take a ride home, call an Uber, and it’s right there. And so this is a prediction problem.
01:21:31
You need to predict when and where somebody is going to open the app and request an Uber ride. So very naively, how do you do that? You use historical data. You look at the time varying trends of demand by geographic location. And so unsurprisingly and intuitively, if you look at four AM there’s not a lot of people requesting an Uber ride. Whereas if you’re in San Francisco in the Mission District, for people who know the city, lots of bars and restaurants at two AM you get a big rush of people on Saturday and Sunday mornings at two AM because that’s when the bars close.
01:22:04
And so you want to be able to predict that. And so you can use the historical trends to bolster a prediction model, but weather also plays an important role. If it’s raining, people are more likely to want to use a ride-sharing service than not. And so in order to bolster that prediction model, you’re not only relying upon your own internal historical data, you’re also relying upon external data so that you’re scraping about weather. Great. 
01:22:26
But this is in 2011, and at the time the San Francisco Giants major league baseball team won the World Series in 2010, ’12 and ’14. So San Francisco was baseball crazy. I’m a huge baseball fan, I’m in San Diego, go Padres, currently in the playoffs. But San Francisco was going crazy over the Giants at the time. But baseball, I love it. It’s a boring game. It’s a long season. 162 games, 81 games played at home. Even the most diehard baseball fans, if your team is losing by five, six runs after the sixth or seventh inning of nine innings, people are like, “Okay, I’m done. I’m going to go home.” 
01:23:02
And so we realized that also in order to do better demand prediction, we had to be scraping real time sports scores so that we could know, “Oh-oh, people are going to be leaving AT&T Park now because the Giants are just getting whooped and people are going to want to go home early to beat traffic. So we need drivers there.”. And we need to know concert venues because when there’s a big concert in town, people are going to want to go to and from that concert. How do you know how popular a concert is going to be? Well, what if we scrape the number of Instagram followers that Beyonce or whoever’s having the concert has to get a sense of the scale of that concert. 
01:23:37
And so this is where data science meets the cognitive science, the social, cultural, political. Data science isn’t just about churning data through algorithms. It’s about understanding the context in which your data are embedded. If you want to make a better service, if you want to make Uber more efficient, you need to know that San Francisco currently cares about baseball and they may not care next year, but they currently care about baseball. And you have to know when the baseball games are, and you have to know how many innings are in a baseball game and how many runs is too many that are going to make people want to leave. 
01:24:08
You have to know what concerts are happening. And those features that work in the demand prediction model for Uber in San Francisco don’t work in Austin. They don’t work in Washington D.C., which is a different country in terms of what works for prediction in Washington and certainly don’t work in Tokyo or other parts of the world that Uber is in. So every city needs its own data scientist that knows the culture of the city that can help build individual features to improve the demand prediction models that I, as somebody from California, have no context of knowing when, and why, and where, people in Tokyo will want to use Ubers. And so that’s the kind of thinking I want to train people. And so the kinds of people that we would hire at Uber as data scientists, where the people that we would give some coding challenge of build a demand prediction model, and then you would just see on untrained data how well their demand prediction model worked using some root-mean-square error or something like that.
01:25:09
But the people that ended up getting hired sometimes weren’t the people that had the lowest error and the best demand prediction model. They were the people that thought differently and they’re like, “Well, you gave me this data set, but I don’t have to constrain myself to analyzing just this data set to build a demand prediction model. I can bolster that model using other data from the real world.” Those are the kinds of thinkers that we said “We want that person because they’re thinking bigger.” Data science isn’t being constrained to just the data you yourself as an organization have collected. You should be thinking more broadly and bigger because you can do better. And so that’s the kind of thinking we wanted to instill in the undergraduate education here too.
Jon Krohn: 01:25:48
Nice. And a great lesson for all of our listeners out there, probably not just in data science, but other kinds of roles where you can be considering in an interview question or in your role, you can think out of the box and be using data beyond just those that are immediately available in the SQL database of your company. 
Bradley Voytek: 01:26:08
Yes, exactly. You really want to be demonstrating your creativity. That’s why I keep using the word art. There’s an art to it. There’s a creative thinking component that is almost, in my estimation, more important than the technical skills. I run the Industry Liaison Program for data science here at UC San Diego. And so we hold these board meetings with a lot of industry partners, nonprofits, government startups and companies, major big companies. 
01:26:36
And I want to try and keep my finger on the pulse of what’s happening in the data science world outside of academia to make sure our undergrads are getting the best education they can get. And inevitably, every single time when I say, “What are the barriers that you’re finding to hiring good data scientists?” it’s never, “Oh, they don’t know enough of the latest Python package on X, Y, or Z.”. The technical skills are a given.
01:26:58
It’s always, “They don’t know how to think about the problems that we’re asking them to think about in a greater global context, and/or they don’t know how to communicate their technical results to all of the non-technical decision-making people on our team. They just don’t know how to communicate it because their bosses don’t know what RMSE means or why LoRa is better and how to even put that in context.” 
01:27:21
And so we really put a strong emphasis on trying to make sure that we’re educating not just a generation of technically skilled people, but socially and culturally creative thinkers and socially and culturally aware students who know how to communicate complex ideas more simply. And so it’s been an amazing… but I’ve learned a ton just in trying to help build this program here. And it’s really opened my eyes to thinking about new ways of teaching. I’ve learned new ways of applying those skills even in my own research, in my own lab on the neuroscience side. So I feel incredibly lucky to have been a part of all of this.
Jon Krohn: 01:28:06
I feel incredibly lucky to have been able to interview you and get your insights on your research, which has been fascinating, of course, about these teaching points more recently and even a bit of insight into what it was like at Uber in the early days. Really cool to hear about those models and the features that went into them. 
01:28:24
A few days before us recording, I knew that you were going to be on the show and I posted saying that you were coming up and if people have any questions and the post got a huge amount of engagement, however, no questions, but we did get one comment that I wanted to read out for you. So someone named Durgesh Ametha, who is a research scholar, looks like the TCS, so I’m guessing Tata Consultancy Services Research Scholar at an IIT institute in India, and Durgesh says, “I’m eagerly awaiting the episode. I went through multiple interesting works of Professor Bradley Vojtek.” 
Bradley Voytek: 01:29:06
That’s nice to hear. I don’t know, it’s really hard to… Okay, I should accept the praise, right? Thank you very much. I appreciate it. I have been told I am very bad at accepting compliments, so I’ll just say thank you. I hope the works really truly were interesting and I hope this was interesting for all the people listening. So thank you, Jon, for having me on. This is great. 
Jon Krohn: 01:29:33
It sure was for me. Before I let you go, do you have a book recommendation for us, Brad? 
Bradley Voytek: 01:29:36
Oh, geez. Okay. What if I give you one technical one that I like, which is Joel Grus has Data Science from Scratch, that’s one I tend to recommend to a lot of people in just how to think about data science problems. Just work through some exercises and build your portfolio. But if I want to go a non-technical one, I’ll give you a second one maybe and you could decide which one you like more. I’m a big fan of- 
Jon Krohn: 01:30:05
The tape is running. We’re not going to run out of time. 
Bradley Voytek: 01:30:09
Not that you’re not going to run out of hard drive space either. 
Jon Krohn: 01:30:11
I don’t think so. 
Bradley Voytek: 01:30:13
All right, good. My bloviating can take up a lot of room. So I’m a big fan of fantasy and sci-fi. There’s an author China Miéville, and he has a really great book that as a neuroscientist I love called The City & The City. And I’m a big fan of that one without giving too much away, it’s two cultures who don’t like each other and they occupy and live in the same physical space, but through generations of trying to prevent warfare and anger, they’ve learned to just not see each other. And so they’ve unseen each other and so they live their daily lives, but they don’t consciously even acknowledge the other’s existence. They untrain themselves out of it, and so they just unconsciously move around each other even though they’re occupying the same physical space. Fascinating. Just from a perceptual, theoretical, perceptual, alternative world, and he’s a fantastic author. And so that’s a fun book, The City & the City. 
Jon Krohn: 01:31:11
Nice. Very cool recommendation and well relayed. Some splendid bloviating for me there, Brad. 
Bradley Voytek: 01:31:17
Thanks. 
Jon Krohn: 01:31:19
Nice. All right. Before the very last thing that might be helpful for people who want some more bloviating in the future, what’s the best way to follow you? 
Bradley Voytek: 01:31:27
I’m not really on social media anymore. Okay. LinkedIn, I do post professional related stuff on LinkedIn for work that’s come out of my lab. I mostly use it to try and highlight and elevate the scientists whose work I really like and the folks in my lab. So it’s mostly promotional of cool science work for people in my group and other labs that are just doing fantastic stuff. But if you want to try and see what’s going on in the neural data science world, that would be it. 
Jon Krohn: 01:31:58
Nice. Thanks, Brad, so much for taking the time with me on the show today. Maybe in six years again, we’ll have you on again. 
Bradley Voytek: 01:32:04
Was it six years ago? Holy cow. 
Jon Krohn: 01:32:07
Maybe it wasn’t. 
Bradley Voytek: 01:32:08
That might be right though.
Jon Krohn: 01:32:09
Let’s look it up here. The last time you were on the show was five years ago. 
Bradley Voytek: 01:32:14
Okay. Still. Wow, that’s-
Jon Krohn: 01:32:16
Five and a half. 
Bradley Voytek: 01:32:19
That’s amazing actually. That does not feel like that long ago, but hey, maybe in five and a half more years we’ll have new work. Maybe we’ll have made this LLM brain browser. 
Jon Krohn: 01:32:33
I would not be surprised. I would not be surprised. 
Bradley Voytek: 01:32:35
Yeah. Thanks. 
Jon Krohn: 01:32:42
Well, what an incredible guest in today’s episode, Professor Bradley Vojtek, filled this in on how action potentials and neurons are likely not binary, but actually have subtle variations that could encode important information. How traditional analysis of neural oscillations may be flawed. Looking at a periodic activity and non-sinusoidal features of brain rhythms may be more insightful. 
01:33:03
He talked about how open data sharing and collaborative tools are enabling new discoveries in neuroscience, how effective data science education should emphasize creative problem solving and communication skills in addition to technical abilities and how understanding social and cultural context is crucial for applying data science in real world settings as exemplified by Uber’s early demand prediction models.
01:33:25
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Brad’s social media profiles, as well as my own at www.superdatascience.com/829. And if you’d like to connect in real life as opposed to online on November 12th, I’ll be conducting interviews in New York at the ScaleUp:AI conference that’s run by the iconic VC firm Insight Partners. This is a slickly run conference for anyone keen to learn and network on the topic of scaling up AI startups. 
01:33:58
One of the people I’ll be interviewing will be none other than Andrew Ng, one of the most widely known data science leaders, and I am very much looking forward to that. Thanks to everyone on the Super Data Science Podcast team, our podcast manager Ivana Zibert, media editor Mario Pombo, operations manager Natalie Ziajski, researcher Serg Masis, writers Dr. Zara Karschay and Silvia Ogweng, and founder Kirill Eremenko. Thanks to all of them for producing another gripping episode for us today for enabling that super team to create this free podcast for you. 
01:34:31
We are deeply grateful to our sponsors. You, yes, you, listener can support this show by checking out our sponsors links, which are in the show notes that literally does help us to produce this show and keep it going. So that’s the one way to support us, or you could literally actually sponsor the show. If you’re interested in doing that you can get details on how by heading to jonkrohn. com/podcast. 
01:34:59
Otherwise, please share, review, subscribe, all that good stuff that also helps us out. But most importantly, I just hope you’ll keep on tuning in. I’m so grateful to have you listening and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there and I’m looking forward to enjoying another round of the Super Data Science Podcast with you very soon. 
Show All

Share on

Related Podcasts