Podcasts SDS 789: ML for Wind-Powered Energy Generation, with Dr. Jason Yosinski

75 minutes
Business, Data Science, Machine Learning

SDS 789: ML for Wind-Powered Energy Generation, with Dr. Jason Yosinski

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Dr. Jason Yosinski and Jon Krohn explore the impact of machine learning on wind energy, focusing on how it enhances wind turbine efficiency and grid reliability. As CEO of Windscape AI, Dr. Yosinski unveils the latest ML advancements revolutionizing renewable energy practices. Their discussion delivers essential insights for both industry specialists and environmentally conscious audiences.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Jason Yosinski

Dr. Jason Yosinski is a machine learning researcher, cofounder & CTO of Windscape AI, and cofounder & president of ML Collective. He was a founding member of Uber AI Labs and serves as scientific adviser to Recursion Pharmaceuticals and several other companies. Dr. Yosinski’s work focuses on building more capable and more understandable AI using neural networks. As scientists and engineers build increasingly powerful AI systems, the abilities of these systems increase faster than does our understanding of them, motivating much of his work on AI Neuroscience, an emerging field of study that investigates fundamental properties and behaviors of AI systems. Dr. Yosinski completed his PhD as a NASA Space Technology Research Fellow working at the Cornell Creative Machines Lab, the University of Montreal, Caltech/NASA Jet Propulsion Laboratory, and Google DeepMind. His work on AI has been featured on NPR, Fast Company, the Economist, TEDx, XKCD, and on the BBC.

Overview

Jason Yosinski takes us on a thrilling ride into the world of wind energy, powered by the magic of machine learning. With precision, he’s cracking the code on wind direction, transforming breezes into reliable, efficient power sources for the grid. It’s not just smart; it’s revolutionary. As Jason and Jon dive deeper, Yosinski reveals how data from wind turbines is like gold for energy providers. Yosinski also takes us through his own journey from AI basics to the niche world of wind energy, where technology powerfully intersects with environmental concerns. His story offers valuable lessons for anyone looking to make a mark in the startup world, emphasizing the importance of targeting opportunities that tackle genuine problems.

Enter the Deep Vis Toolbox, Yosinski’s brainchild, designed to peel back the layers of neural networks. This tool is a boon for both educators and tech enthusiasts keen to get a closer look at what happens under the hood of AI systems. And while reflecting on his days at Uber, where he tackled extreme event forecasting, Yosinski draws a line to his current adventures at Windscape AI. The past and present collide in his “nowcasting” techniques, turning predictive models into powerful tools for anticipating the ebbs and flows of wind energy.

Yosinski goes on to discuss his innovative research on Loss Change Allocation, where he discovered that locking down the final dense layer of a neural network before starting the training can lead to better overall performance. He also highlights the ML Collective, an initiative he co-founded to nurture a global community of machine learning enthusiasts and researchers. This platform encourages vibrant collaboration and sharing of ideas, pushing the boundaries of what’s possible in AI research.

Wrapping up the conversation, Yosinski talks about the key traits that make AI entrepreneurs successful: deep technical knowledge, a keen understanding of the market, and the ability to lead a team towards creative and effective solutions. His insights paint a picture of a future where advanced technologies like machine learning not only transform industries but also pave the way for sustainable energy solutions and dynamic, AI-driven businesses.

In this episode you will learn:

Enhancing predictability in wind energy with ML [04:52]
Data utilization from wind turbines by energy providers [11:41]
Jason’s journey into wind energy [17:55]
Landing the right startup idea [22:47]
Visualizing neural networks with the Deep Vis Toolbox [31:29]
Extreme event forecasting at Uber vs. nowcasting at Windscape AI [45:13]
Discoveries from Loss Change Allocation research [47:48]
Engaging with Jason’s ML Collective [59:46]
Traits of successful AI entrepreneurs [1:10:26]

Items mentioned in this podcast:

This episode is brought to you by Crawlbase – Use the special code SUPERDATASCIENCE (no spaces) to unlock 10,000 free requests
Winscape AI
ML Collective
NASA Jet Propulsion Lab
Google DeepMind
EDP selects nine startups with renewable and green hydrogen projects for Energy Starter in Singapore
Deep Visualization Toolbox
“LCA: Loss Change Allocation for Neural Network Training” by Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski
Cross Entropy Loss
EPD’s Energy Starter Program
AlexNet
Rewiring America by Saul Griffith, Sam Calisch and Laura Fraser
Electrify by Saul Griffith
One Hundred Years of Solitude by Gabriel Garcia Marquez
Jon Krohn’s Mathematical Foundations of Machine Learning Course
The Super Data Science Podcast Team

Follow Jason:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 789 with Dr. Jason Yosinski, co-founder and CEO of Windscape AI. Today’s episode is brought to you by Crawlbase, the ultimate data crawling platform.

00:00:15

Welcome to the Super Data Science podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now, let’s make the complex simple.

00:00:46

Welcome back to the Super Data Science podcast. I’m delighted to have one of my all-time favorite AI researchers, Dr. Jason Yosinski as my guest on the show today. Jason is co-founder and CEO of Windscape AI, a startup using Machine Learning to increase the efficiency of energy generation via wind turbines. He’s also co-founder and president of the ML Collective, a research group that’s open to ML researchers anywhere. He was a co-founder of the AI Lab at the rideshare company Uber. He holds a PhD in computer science from Cornell during which he worked at crazy places like the NASA Jet Propulsion Lab, Google DeepMind, and with the eminent Yoshua Bengio in Montreal. His work has been featured in the Economist on the BBC and, coolest of all, in an XKCD comic.

00:01:30

Today’s episode gets fairly technical in parts, so maybe of greatest interest to hands-on practitioners like data scientists and ML engineers. Although there are also parts that will appeal to anyone keen to hear how ML is being used to produce more clean energy. In today’s episode, Jason details how ML can make wind direction more predictable, thereby making wind turbines and power grids in general more efficient. How to infer what individual neurons in a deep learning model are doing by using visualizations. Why freezing a particular layer of a neural net prior to doing any training at all can lead to better results. How you can get involved in a cutting-edge research community no matter where you are in the world and what traits make for successful AI entrepreneurs. Are you ready for this mind-blowing episode? Let’s go.

00:02:17

Jason Yosinski, welcome to the Super Data Science podcast. I’m blown away to have you here because I’ve been tracking you for almost a decade when your deep visualization toolbox came out. We’re going to be talking about this later on in the episode, because Deep Vis toolbox allowed for amazing intuitive understanding of the way that deep learning networks, particularly convolutional neural networks, which at the time of me discovering you were near the state of the art of what we could do with AI at all. And so I have been using your Deep Vis YouTube video since about 2016 to teach students an intuitive appreciation of what’s going on inside of neural networks. And we will for sure be linking that in the show notes and so people can check that out. Anyway, thank you for being on the show and glad [inaudible 00:03:11]. Yeah. Where in the world are you calling in from?

Jason Yosinski: 00:03:13

I’m calling in from San Francisco, Lower Haight.

Jon Krohn: 00:03:17

Nice classic choice for our AI entrepreneurs.

Jason Yosinski: 00:03:21

Yes, I’m sure. Yeah.

Jon Krohn: 00:03:22

Nice. Jason, before we get into the technical stuff in this episode, can you tell me why was six afraid of seven?

Jason Yosinski: 00:03:31

Because 7, 8, 9?

Jon Krohn: 00:03:32

Yeah, that’s right. You’re the guest in episode… You’re the guest in episode number 789 and what a treat. We only get to do that once. You’d have to have a whole other podcast and get to this episode number in order to be able to use that incredible joke. So beyond that, being able to guess the answer to my 7, 8, 9 joke, you’re also co-founder and CEO at Windscape AI where you’re using machine learning to help wind farms produce more energy. Tell us about that.

Jason Yosinski: 00:04:04

Yeah, so we are trying to make wind turbines more efficient. We’re trying to make them generate more power and generate the power at lower cost. This will do two things. It will help wind energy be rolled out more quickly. It’ll accelerate our transition to net-zero, so to a world in which we power our planet without using carbon. It’ll make our customers, people that own the wind turbines more money and because the turbines will be more efficient and generating more energy for lower costs, it’ll also make the energy cheaper for you and me, for people that use the energy. We do this with machine learning. We do this by looking at data from turbines as well as weather, and I can get into that as deeply or as shallowly as you’d like.

Jon Krohn: 00:04:51

Yeah, let’s go. Let’s dig into it. How does it work?

Jason Yosinski: 00:04:54

Okay, sure. So first we take data from turbines. So all turbines have on board a number of sensors. In particular, you can imagine a sensor on the back of the turbine that measures the wind speed and the wind direction. One thing not everyone knows is that the way turbines operate these days, all turbines track the wind. Some people think turbines are installed facing north, and if the wind is from the north, that’s great, and if it shifts to the west, you’re just out of luck. I’m happy to report that for many years now, turbines have all tracked the wind. So as turbines are tracking the wind, basically let’s imagine the wind is coming from the north, the turbines facing north, all is going well, and then the wind slowly starts to shift to the west or to the east or something. The sensor on the back of the turbine will pick that up and you’ll see that the wind is shifting west and there’s some control parameters that will involve some delays or some dead zones maybe, but eventually the turbine will start turning to the west and follow that changing wind.

00:05:57

Those sensors in the back are great in some ways. They provide obviously pretty good visibility into the wind at the correct height, so they’re right up in the middle of the big circle inscribed by the blades, but they’re also noisy in certain ways. The noise they receive is biased in certain ways. They’re in the back of the nacelle, which is behind the blades. So every time the blades pass by, they generate vortices that come off the trailing edge and hit that sensor, and Jon is looking like this is one level two deep.

Jon Krohn: 00:06:29

No, it’s not-

Jason Yosinski: 00:06:29

They generate noise, but the noise is shifted. It’s not zero centered.

Jon Krohn: 00:06:34

So the wind turbine itself, as it adapts to where the wind is coming from, it makes the data worse that it’s trying to use to detect wind direction?

Jason Yosinski: 00:06:46

Yeah, I would say it’s actually not even a consequence of the turbine turning, although as it turns, the distribution of noise will probably shift. Even when it’s not turning though there is still a lot of noise and that noise is not zero centered. So we’re trying to help turbines deal with this problem, also deal with problems of these sensors not being installed perfectly correctly and, or as they wear over the years, as they in some cases become miscalibrated over the years, we’re trying to learn that calibration, learn that distribution and kind of reset conditions to be straight, so that the turbines are pointed the correct direction. There’s some interesting studies showing that the median turbine right now, the median turbine around the world is mispointed by six degrees. There’s other studies showing that the average turbine loses out on a couple percent of production because it’s not facing the wind, it’s not following the wind correctly.

00:07:41

We’re hoping to fix those problems using our neural network models. So that’s kind of one part of the company. We also are taking this data from the farms and feeding it into modern weather models. So a fun fact about wind energy is both wind energy and solar energy are variable, right? So you have an electrical grid, you have a bunch of producers, producing energy feeding into the grid, flowing all around transmission networks, distribution networks, and then being sucked out by consumers using that energy. So if your light is on right now, those electrons were generated somewhere probably within 50 or 100 kilometers of you, and it might’ve come from a wind turbine or a solar panel. Let’s imagine you have solar panels and your light is on, and then a cloud just passes right over the solar panels, right? What happens? That production just drops in some cases nearly to zero. Or if you’re a wind turbine, let’s say it’s windy, you’re generating and then the wind dies.

00:08:44

Both of these factors, they’re not factors that are controlled by humans, it’s just something that happens to the grid and the grid needs to be robust to that and needs to react. So a second part of what our company is doing is trying to make wind itself predictable. The way we model weather as a… The way we model weather in general is sort of changing right now. So since the 1950s, we’ve been modeling weather by running physical simulations on supercomputers. You can imagine a bunch of little voxels.

00:09:17

Each voxel has what’s the air doing on the left side, the right side, the top, bottom, back and front, what temperature is it, maybe what relative humidity and some other factors. And then you just click play on that physical simulation and you simulate all the voxels one time step at a time. This has worked fairly well for seven decades. About a year ago, a few groups around the world showed that in fact you can train neural networks to do the same thing. If you just have a bunch of recorded historical data, you just throw it all in, train in let’s say an autoregressive way, and it works. Cool, right?

Jon Krohn: 00:09:53

It’s nice.

Jason Yosinski: 00:09:54

Also, similar story to chatbots, train autoregressive models on text, make the big enough, make the dataset big enough, train for long enough, big enough computers and you get kind of magic out. We’re just seeing that now with models for weather.

Jon Krohn: 00:10:09

Nice, nice. Yeah, and autoregressive there meaning predicting the next token in a series of tokens in the case of a large language model. So when you’re speaking to a generative AI model that is putting natural language or code, it’s outputting the next tokens. It doesn’t have a future information, future tokens that it’s outputting. And so what you’re describing is similar in the sense that you have all of the data on say, wind up to a point in time and you’re trying to extrapolate forward auto regression.

Jason Yosinski: 00:10:39

Yeah. Yeah. Some people would call this an unsupervised approach or a self supervised approach. Just requires a big dataset of what’s happened in the past or in the case of chatbots, what was written on the internet in the past, and you can train those models then.

Jon Krohn: 00:10:52

Nice. Yeah, no explicit labels. Very cool. And that’s great because then it means you have a lot of data to work with typically.

Jason Yosinski: 00:10:59

Exactly, exactly, and it’s turned out, it’s one of the lessons of the last few years is right, training models on these really huge datasets is what enables them to perform really well.

Jon Krohn: 00:11:10

Nice. Certainly exciting things that you’re doing there on both fronts. So both with helping turbines attract wind direction more effectively, and so getting more bang for each buck out of each turbine, and then secondarily using ML to make wind direction more predictable, writ large across regions. And so I guess that allows… And you might’ve already said this and I’m just kind of being foggy, but how does that information get used by say, an energy provider. If they know wind direction… And you also gave the example of solar. Does this relate to the way that different renewable energy sources interplay within a grid?

Jason Yosinski: 00:11:57

Yeah, very much so. Again, we could jump in here for 35 minutes or we could talk for one minute. The problem with these renewables that I mentioned is that they’re intermittent. So that they’re coming online and going offline, not in a way that’s under human control, just driven by nature. The grid needs to be able to react to this. We can do this in a couple ways. So one simple approach we could imagine is we take a bunch of batteries, we put them on the grid, what do the batteries do? Well, let’s say they charge up when it’s sunny and windy and their charge goes up to eventually 100%, and they hang out there at 100% for a while, and then as soon as the sun dies, as soon as the cloud comes over or the wind dies, they start discharging into the grid.

00:12:41

So this is one approach we could take. In order to use those batteries most optimally, you really want to know when is it going to be windy, when is it going to be sunny? You don’t want to just be reactive. You want to anticipate periods of sun or wind so that you’ll be ready to charge. The way the grid actually works is there’s a real-time pricing signal that tracks how much energy is available and at what cost. If it’s very sunny, if it’s very windy, the energy is very, very cheap, sometimes literally free. So what you want to do if you own batteries is you want to charge. Then later when the wind dies, the price goes up and you want to discharge then and then you’re paid that difference in price. This is already the way the grid works, but to make it more efficient, to make it more cheap for people and to make it more resilient, we need to make all these natural factors more predictable. And that’s where the kind of weather modeling comes in.

Jon Krohn: 00:13:39

I imagine that in a lot of grids around the world today, we might not have the battery or renewable capacity always available. I suspect that that’s the case actually on the majority of grids. Obviously we hope to fix that as soon as possible, but today there’s probably also an advantage with this kind of planning that you’re describing to allow us to know, Hey, we’re going to have to turn a traditional kind of coal or oil-fired plant on in anticipation of a prolonged stretch of both darkness and no wind, say.

Jason Yosinski: 00:14:11

Yeah, absolutely, absolutely. So you mentioned a problem of if there are long stretches of no sun or wind, then we need to turn on dirtier fuel sources like gas and coal, and that’s absolutely true. What’s less immediately obvious is that by making things more predictable, just by knowing that period is coming, we can do certain things to prepare for it. But if you get into how energy actually moves around on the grid and is bought and sold and traded, having greater predictability does enable you to do some of those things.

00:14:41

And this is really important to work on right now. Certain grids, if you start out at 0% solar and wind and you slowly add a little bit of solar and wind, 1 or 2%, the whole grid kind of still works. But as you push that percentage higher and higher and a higher fraction ends up being intermittent, it can start to cause problems so much so that without all these batteries which are not yet deployed everywhere, like you said, you hit a limit and you need to keep using coal and gas and these dirty sources because the grid can’t take any more wind and solar. So we’re all around the world in the process of switching one step at a time, 1% at a time to wind solar. We are trying to kind of accelerate that process however we can.

Jon Krohn: 00:15:25

Today’s podcast episode is brought to you by Crawlbase, the ultimate data crawling and scraping platform tailored for data scientists, AI developers, and Python developers. For ML and AI, high-quality data are of course essential. With Crawlbase, you get a powerful, user-friendly solution that guarantees seamless integration, lightning-fast performance, and unparalleled reliability.Crawlbase supports your needs with a 2-minute integration process, AI-powered efficiency, and 99.99% uptime. Crawlbase also excels in bypassing CAPTCHAs, avoiding IP blocks, and handling proxy failures, making them the go-to solution for all your data needs.Use the special code “SUPERDATASCIENCE”, with no spaces, to unlock 10,000 free requests. Visit Crawlbase today and supercharge your data collection process with the best in the business!

00:16:15

It’s interesting, this is digressing from possibly your expertise and it’s certainly from the line of questioning that I had planned. But when you described the mix of energy sources that you were interested in using, you didn’t mention fusion. Which I suspect was deliberately left out.

Jason Yosinski: 00:16:36

Sorry, I meant to mention nuclear fission and that is an important source in the US and in many countries. I didn’t mention fusion. It doesn’t work yet. If it does work someday, that would be amazing. That’d be great.

Jon Krohn: 00:16:50

That’s true.

Jason Yosinski: 00:16:52

If we never get to fusion, we will need to power the world on those other four wind, solar, hydro, which we are using quite a lot in the US now and fission. If fusion does someday work, that will be amazing. We will build those plants as quickly as we can. It might be many years, many decades before they’re all rolled out. So in the meantime, we still need quite a lot of wind and solar.

Jon Krohn: 00:17:13

Yeah, yeah, yeah, that makes a lot of sense. And as soon as I asked the question and you started answered, I started to feel silly.

Jason Yosinski: 00:17:18

No, no. Yeah. Even fission though. Even fission for whatever historical reasons, and this is the edge of my knowledge now, it’s really hard to build these plants. They’re very slow and very expensive. They’re more expensive per megawatt hour than wind and solar, so they could be part of the right answer for the next decades, but they’re not a silver bullet.

Jon Krohn: 00:17:43

Yeah. Conveniently they do fill that gap when it is dark and there’s not enough wind, [inaudible 00:17:48] fusion. Yeah, yeah, yeah, the fission reactor.

Jason Yosinski: 00:17:51

Base load. Exactly.

Jon Krohn: 00:17:52

Yeah. Nice. How did you get into this space in general? You had a storied history, including things like being a co-founder of Uber AI Labs. What led you to tackling this particular problem? I can imagine that there might be things like you want to be making a big social impact, but then how did wind in particular end up being the problem that you’re tackling?

Jason Yosinski: 00:18:19

Yeah, yeah, great question. I spent about 10 years of my life, I would say 2010 to 2020 working as a scientist. I did my PhD, worked on a startup, worked at Uber AI just as a scientist. So publishing papers, patents, giving talks, going to conferences. Honestly, super fun. Around 2020, COVID happened. I looked and saw the state of AI research, ML research, and I would say things were really… This may sound silly to say, but things were really slowing down. So the number of papers per year being published that I thought were really deeply interesting was kind of shrinking. They were being disproportionately published by a few large companies with great resources. And for example, grad students with one or 10 GPUs under their desk or set their friends’ desks, couldn’t really compete as much. So I really saw the process of research and ML research is changing and I had a life moment.

00:19:22

What am I going to work on for the next 10 years, right? What am I going to sink my teeth into that’s got a longer runway? For various reasons, I decided to try to find something a little bit more applied. I got really interested in climate change. I started reading a lot of books, podcasts, talking to people, a friend and I… I interviewed probably 150 people by the end of it asking about their different industries, what do they work on, in what ways could they see data maybe mattering? I talked to farmers. I ran a pilot with this farmer to assess his soil health from space using satellite data and tell him which grasses his cows were eating. Worked on carbon credits, kind of spent a lot of time learning about a broad space of climate related topics. Became convinced that working on energy, is kind of one of the most meaningful things we could do. It’s like a lever that we can pull today, that’ll matter today and tomorrow. It’s not a technology that’s 20 years off into the future.

00:20:30

And then I went looking for the right entry point. So if you want to work on climate change, one maybe problem of working on climate change as a data scientist or a machine learning person is that climate change is fundamentally about atoms, about physical infrastructure and electrons power flowing back and forth, atoms and electrons. What is our world about? What is data science about machine learning? It’s about software and bits, data, right? GitHub repos. These are very different worlds. They certainly overlap sometimes, but if you imagine this Venn diagram, right? It’s like a pretty small overlap in the middle. You have to find the right problem. Oftentimes climate change, what we really need is you really need people to vote. We need the governments to shift money and policy and infrastructure. We need to build big concrete and steel things. We need to build transmission lines across our whole country, AC lines, DC lines, right?

00:21:26

We need to get the energy from where it’s sunny here to where it’s cloudy somewhere else. But these are huge projects involving billions or trillions of dollars of investment and you aren’t, and I’m not going to be able to affect all that change just by some clever code and models. Okay, but all is not lost. There are still entry points, right? So there is an intersection in this Venn diagram. If you want to work in energy using data, you should find something that’s maybe not trivial to predict, but hard to predict but possible, right?

00:22:02

And so I started looking more at wind and solar and realized wind was a good place to be because it’s fundamentally a pretty big, beautiful, chaotic, messy, turbulent system, but there are patterns. If you use ML cleverly, you can learn these patterns and you can use that knowledge, those model predictions to really make a difference. To make your customers, people that own wind turbines more money and to make the grid more predictable and so on as we’ve been talking about. So it was a long path to find a foothold where data matters for climate. And there are many other footholds that… Don’t let me discourage you, but this is the one that I found that I found was meaningful and a beautiful problem.

Jon Krohn: 00:22:46

That was a great explanation. For our listeners out there who may themselves be, say, searching for that startup idea and maybe it’s their first startup idea, what do you recommend? It sounds like you had a bit of a process there, different consulting projects, soil health, different social impact projects, and so it seems like you use that as a period to land on what the right problem to start a company with was. And you mentioned podcasts. I don’t know if you have more insight into the kind of structure that you had over that period, or did you formally say, “I’m going to give myself this much time before I make a decision. I have this much runway, or I’ll just keep working on consulting projects until something starts to click and I’m like, wow, there’s a whole startup idea here.” How did that period go for you?

Jason Yosinski: 00:23:39

Yeah, I would say at the outset because for me it was a big shift into a completely new domain. I am mentally prepared for it to take quite a while, so I didn’t give myself an immediate deadline. I said, “I’m just going to start reading and exploring and doing whatever I want.” And I imagined that eventually I would start to feel nervous and start to realize, oh, I should hurry up otherwise I’m going to be jobless forever or whatever. But I was actually… I was pleasantly surprised that after a while I was still enjoying learning and I did not feel too much pressure too soon. I forget the other part of the question.

Jon Krohn: 00:24:14

I think that was basically it, it was just kind of if you had structure around-

Jason Yosinski: 00:24:17

Oh, yeah, structure. Yeah, not much. I just read whatever was interesting. A book that I really liked was called Rewiring America by Saul Griffith. There’s now a new book which I imagine has a lot of the same content called Electrify, which is very informative. It was one of the first bits of content that I read that was written really by an engineer, a scientist, an engineer, not by either a politician or a hippie, both of which have their own kind of ways of presenting the world. And I found that it resonated much more with an engineering mindset. If you just want to solve the problem of climate change, imagine you have coordination and everything, what would it take? How would we solve it? How much would it cost? And he just goes through it all very directly and it made it feel much more simple for me.

Jon Krohn: 00:25:08

Very cool. Yeah, I’m sure that kind of engineering mindset is applicable to a lot of our listeners and it seems like your approach is working. So EDP, a large Portuguese utility company recently selected Windscape as one of nine startups for its renewable innovation program in Singapore to accelerate the global energy transition. What opportunities do you see emerging from Windscape AI’s participation in this program?

Jason Yosinski: 00:25:34

Yeah, well thanks for mentioning that. We did apply for this program. We were selected. EDP is a huge utility. I believe they’re the fourth-largest wind owner in the world, so they own tons and tons of turbines. They generate a lot of wind energy. When I met with folks from EDP, I found them to be a very, very forward-looking organization. Sometimes you get a big company and they’re impossibly slow or something. But these folks are really pushing the boundaries, all the boundaries they can, which I thought was super cool. What we hope to get out of it and where that collaboration might go is to pilot our technology, start working with them, see how it works on their wind farms around the world, and then if it does work really well, hopefully we roll out more broadly and we can also maybe use that as a demo for new potential customers.

Jon Krohn: 00:26:26

Very cool. So it sounds like EDP is forward-looking, but in general, do you counter resistance or hurdles as you try to come to energy utilities and say, “Hey, you could be using AI like Windscape’s to be improving the efficiency of your systems.” Do you encounter resistance or hurdles or is it relatively straightforward to convince people that you’re doing something valuable?

Jason Yosinski: 00:26:49

I wouldn’t say it’s straightforward, no. Convincing people that what you’re doing is valuable is maybe always hard. I would say saying the words AI or machine learning doesn’t immediately open all the doors. It can open some doors. Some of these companies realize that AI might be revolutionizing things that happen internally and they’re not quite sure how yet, but maybe we should talk to these randos from Windscape and see what they think. It does open some doors, but not all, just as probably within any industry. There are some organizations that are very forward-looking and others early adopters of any technology and others that are slower, that are later adopters. They literally… Some have told us, “We don’t care what you’re [inaudible 00:27:35], just show us when four other companies are using it and then we’ll consider it because how we work,” right? Which is potentially an efficient choice from their perspective.

00:27:45

There’s also small energy companies and large energy companies and there’s a spectrum there of how you sell to these companies and how you get adoption and so on. Yeah, and convincing everyone, it can be hard. You have to convince people that your technology will work, that it won’t be a huge headache to adopt. The people in the field need to buy into it. It can’t ruin their workflow or something. It has to be possible to actually integrate. So some of these systems run software that’s hard to work with and simply integrating can be difficult at times. So I don’t know, there’s a lot of factors probably as in any industry.

Jon Krohn: 00:28:28

Yeah. It makes so much sense and hopefully I’m not going too deep here, and if I am asking a question that would give away some kind of IP or just feel free to not answer this. But it seems to me like in a situation like yours, where you are providing software to hardware companies, say the turbine manufacturers, you are not at least in the immediate term planning on building say your own turbines, your own wind farms, your software company. You need to be partnering with turbine and manufacturers, with wind farm operators. How does that work? I guess maybe your response is going to be similar where there’s a range of responses where some turbine manufacturers are relatively early adopters. They see the potential. They say, “Wow, Jason’s done a lot of amazing research in the past. He seems like the kind of person we should be working with to accelerate our roadmap.” And then other folks are just like, “Yeah, we’ve got our own team,” or I don’t know. How does it look for you?

Jason Yosinski: 00:29:26

When we started this whole endeavor, what we imagined would happen is we would first build products that we would sell to people that own the turbines. Why do they want them? Because our product would help them make more money starting next month, right? We help them make more money. They like our product, we roll out, they tell their friends, we deploy to more and more farms, more and more companies. As we start to increase our market penetration in the industry, then much later turbine manufacturers would notice. And they would say, “Hey, everyone’s using these windscape people. Maybe we should talk to them and consider integrating their thing off the factory floor rather than as aftermarket add on.” That’s kind of still the process we’re following, although we’ve been surprised that some OEMs are kind of interested in chatting early, I think they just want to have on their radar what’s going on in the world and if there’s any promising technology, they want to be there first. So I guess we’re already having some of those conversations too.

Jon Krohn: 00:30:25

Mathematics forms the core of data science and machine learning. Now with my Mathematical Foundations of Machine Learning course, you can get a firm grasp of that math, particularly the essential linear algebra and calculus. You can get all the lectures for free on my YouTube channel, but if you don’t mind paying a typically small amount for the Udemy version, you get everything from YouTube plus fully worked solutions exercises and an official course completion certificate. As countless guests on the show have emphasized to be the best data scientist you can be, you’ve got to know the underlying math. So check out the links to my Mathematical Foundations and Machine Learning course in the show notes or at jonkrohn.com/udemy. That’s jonkrohn.com/udemy.

00:31:09

Nice. That’s cool. All right, so I’m going to switch gears a bit now from Windscape to more broadly the research you’ve been doing. So you were describing from roughly 2010 to 2020, if I’m remembering correctly, that was kind of like your research phase and in that phase you were prolific. So one main focus of your research has been interpreting and understanding deep learning models. We already talked about the Deep Vis toolbox. So being able to visualize intermediate layers in the many layers of a deep neural network, which is what makes it deep for people who aren’t already familiar with the way that deep learning works, that you have layers of these things called artificial neurons, which are very loosely based on the way that biological neurons, biological brain cells work in your own brain in a human brain or an animal brain. And by layering these together, there’s a lot of capabilities.

00:32:01

So when you are having a conversation with your ChatGPT or your quad or your Gemini, that incredible amount of nuance and understanding comes from just layers of these artificial neurons being able to do increasing complexity, increasing abstraction as you go deeper, but simultaneously that all of those layers, all of those neurons make it difficult to interpret what’s going on inside of a model. So tools like your Deep Vis toolbox allow you to see… And people should really check out this YouTube video. It’s amazing. It allows you to see not just layer by layer what’s going on, but neuron by neuron. And so for example, my favorite part in the video is when you are on camera and you are highlighting a specific neuron in… It’s in convolutional layer five of the network. There’s 256 artificial neurons in that layer and one of those neurons based on the particular training data that this machine vision model was trained on.

00:33:06

So the machine vision model had to become capable in a broad range of different kinds of images. So some of the neurons became specialized in detecting text, some became specialized in detecting dog faces, and one of them in particular became specialized in seeing human faces. And so in real time in this video, you’re on camera and as you move your head to the left to the right, you can see this white-hot activations happening for that specific neuron on the pixels that your face is in. And to make it even more compelling, there’s a point where you bring a colleague in to the video and he joins you in the frame, and then we have these two white-hot areas of pixels representing where the faces are in the video. So really cool tools for being able to allow us to understand what’s going on. And I think convolutional neural networks-

Jason Yosinski: 00:34:00

It’s funny that we just had a one-minute explanation of that, but if we could actually just show it would be more obvious in 10 seconds or something, which is maybe-

Jon Krohn: 00:34:08

Yeah, yeah, for sure.

Jason Yosinski: 00:34:09

In the first place.

Jon Krohn: 00:34:11

Absolutely. And there’s all kinds of things we would love to show people, but because almost 90% or more, 95% or more of our listeners in a typical are audio only. So yeah, although you and I-

Jason Yosinski: 00:34:23

Describing a white-hot activation is as good as we’re going to get. Yeah.

Jon Krohn: 00:34:26

Yeah, exactly. So yeah, my point is getting all this with convolutional neural networks with machine vision problems, those are cool because visualizations are… There’s something that kind of comes quite naturally from that. Whereas today, some of the most impressive generative models, certainly the most widely used generative models are outputting text. And so that becomes harder to visualize in a cool way like you did. Anyway, as models continue to get bigger and bigger and bigger… I mean, that’s a whole other dimension here. So as we’re talking about models that aren’t visual and as they get bigger and bigger and bigger, how can we continue to understand what’s going on in them or does that matter at all?

Jason Yosinski: 00:35:19

Yeah. No, great questions and questions that I don’t really have the answers to. And a lot of people maybe don’t necessarily have the answers to. A lot of topics, so as models get bigger, they certainly get more complicated. They certainly get harder to understand. Even back in 20… I think it was ’15 when we published the Deep Vis toolbox, that model had 60 million parameters and like you mentioned on one of the layers, conv5 had 256 neurons. Even that model, even I who wrote the paper played with the toolbox maybe as much as anyone scrolling around to all the different neurons, even I can’t claim I understand what was going on inside, right? We found a face detector, that was great, but it just happened to fire a lot for faces. We didn’t actually know whether it would fire .6 and not .5 for another part of an image that was not really face-like, but a little face-like.

Jon Krohn: 00:36:15

Yeah.

Jason Yosinski: 00:36:16

Or we know that that-

Jon Krohn: 00:36:18

.5 was very important, like maybe that was-
00:36:20 [inaudible 00:36:21]. Or a dog face or a clock face.

Jason Yosinski: 00:36:24

Yeah, for sure. For sure, it would fire for monkey faces and dog faces, but we just by seeing it fire doesn’t mean you understand everything it’s doing in all the downstream layers. And actually the layer right after conv5 was in this network FC6. So the sixth layer fully connected layer, it had 4,000 neurons, [inaudible 00:36:45], right? So you see these 4,000 little things spiking. You could try to scroll through every single one, but what would it take for you to claim you understand what’s happening? We’re probably not going to get there.

00:36:56

Nevertheless, showing the Deep Vis toolbox, I think teaches people who don’t know how networks work a lot about how they work very quickly, which is I think something I was proud of in that paper. Also, if you work with convolutional networks, it teaches you subtleties of how these networks work, that might not have been obvious before as a practitioner, as someone trying to debug a network that might be broken or not training well or not generalizing well, just seeing high bandwidth visualizations, I think can often help. So I think that fact, let’s now fast-forward a couple of years to get to where we are today with much, much larger models, I think that fact still holds. So seeing visualizations for how networks are working is probably still useful to people trying to debug problems with those networks. But it has not led to and probably will not lead to full human understanding of what’s going on inside. So I think it’s useful as a tool in a practitioner’s tool belt, but will not make models on their own explainable.

Jon Krohn: 00:38:01

Nice. Makes sense. And so do you think that that experience… And I realize that now as we get to models, large language models behind things like GPT-4 might have, and we don’t know because they haven’t officially published it, but there might be a couple billion artificial neurons in there. There’s no chance of us in the way that you were able to go over the 4,000 neurons of the 256 neurons in an architecture. It was [inaudible 00:38:31], I think that-

Jason Yosinski: 00:38:32

It was AlexNet or… AlexNet, yeah, yeah, yeah.

Jon Krohn: 00:38:34

AlexNet. Yeah.

Jason Yosinski: 00:38:34

AlexNet. Yeah.

Jon Krohn: 00:38:40

And so of course… What was I thinking? It’s way too many convolutional layers for [inaudible 00:38:45]. Anyway, yeah, so as you’re saying, it becomes even there where you’re looking at hundreds of thousands of neurons in a layer, millions of neurons total, it is difficult to interpret too much, but you can still get some sense of what’s going on, maybe learn a bit, understand where things are working well, things aren’t working well.

Jason Yosinski: 00:39:08

Yeah, just having the basic starter visibility into something about the network was at the time really important and is still important. So when I started working on this, I was at… It doesn’t matter where I was. Somewhere working for the summer and I was working on kind of my real project, which had some possibility of publishing like a normal paper. And I realized nobody had made plots of what’s happening inside of the network, just like a live video plot of what’s happening as you feed some video stream. And so this became just like a weekend project. I just wanted to see it for myself. And as I worked on it, I was realized, I was really frustrated with just how little we know about what’s actually happening in the network. As you train a network, what do you do? You watch the loss, it starts out really high and it shrinks over time.

00:40:00

You hope, and that’s the training loss, and maybe you watch the validation loss too, and you see that shrinking a little bit too, but maybe not quite as much. But that’s like you’re watching a scalar and there’s so much magic or maybe broken parts happening inside of a network training, and you can only see this one little number decreasing. So for example, let’s say I went into your network, you initialize it, I went into your network and I just randomly set half of one of the layers to zeros and I left the other half, okay, the same. If you start training, it’ll probably still train, the loss will go down. It might be really subtly broken in some tiny way, but you as a network trainer have no basic visibility into the fact that I just broke half of one of your layers. Because there’s no visualization, you even watch as a matter of regular course to see this.

00:40:49

Or let’s say I go in and multiply one layer by 10 and divide the next layer by 10. The net Jacobian or whatever beginning to end is the same, but the training process will proceed very differently. Would you be able to detect that by looking at your whatever plots you normally look at? Probably not. So it was funny, this was true in 2015, and so we started working on a couple of these papers to try to produce greater visibility, but I think it’s still true today. I think most people that train networks, they click go, either it works or it doesn’t, and if it doesn’t, they try something else. Why don’t we have the oscilloscope for training? Why don’t we have the oscilloscope for network operation representation and so on?

Jon Krohn: 00:41:30

Yeah, you’re right. It seems like there is still a lot of potential in there. I also realized I misspoke earlier when I was talking about model size. So as these models get bigger, these kinds of things like something that could act as in oscilloscope over the whole network as opposed to having to individually probe yourself neuron by neuron or layer by layer. I said that some of the biggest networks, I think I said one to 1 to 2 trillion is where we’re at now.

Jason Yosinski: 00:42:00

We can also clarify the number of parameters versus number of-

Jon Krohn: 00:42:03

Oh, neurons, that’s right. Of course. Of course. Course.

Jason Yosinski: 00:42:08

Very, very big networks though is the point.

Jon Krohn: 00:42:09

Yeah, and it’s funny, I do that embarrassingly often where I interchange between parameters and neurons. When you can have orders of magnitude more parameters trivially relative to the number of neurons, and often I am quite aware of that fact. I mean obviously when I’m working through the math or when I’m building a network, but it’s funny how we can use it interchangeably and you often catch people in arguments say, talking about AGI, they’ll say, we now have a network with 2 trillion artificial neurons, the human brain… Or sorry. So really there’s one to 2 trillion parameters, connections.

Jason Yosinski: 00:42:50

I think neurons to a lay audience is easier to map to the concept of a brain, which I guess you could say is connections like dendrites or something.

Jon Krohn: 00:43:00

Exactly. But then you’re starting to really get into some… You’d have to at least have five minutes of neuroanatomy before explaining, so you end up in the situation where you have one… There might be about 2 trillion parameters in GPT-4, and not all of those are active on a given call. So it seems like there’s eight based on the rumors, you have these eight expert networks in this mixture of experts. But anyway, but you have that many say 2 trillion parameters, but the number of neurons could be many orders of magnitude, fewer than that. Anyway, I am kind of off on an unnecessary-

Jason Yosinski: 00:43:42

It’s hard to talk about in a convolutional network. It’s almost like the neurons are replicated in space, so are the unique neurons are not in a transformer, the neurons might be applied at every single token or those the same neuron or not. So even using these words is under-specified.

Jon Krohn: 00:43:59

Yeah, yeah, yeah, that’s right. So when you were at Uber Labs, Uber AI Labs, your research, it seems like there’s a little bit of a connection in terms of forecasting. So in particular, some of your papers from then were on extreme event forecasting, and so that highlighted the challenges of making accurate predictions during high variance periods. So say when you are hailing a Uber and a Taylor Swift concert just ended nearby, it’s going to be harder or New Year’s Eve. And so some of these New Year’s Eve, that’s probably relatively predictable. You could hand code something in to be expecting that kind of situation. But all kinds of things happen like there are manmade or natural disasters that happen that are completely unexpected or other kinds of events protests that could completely change the forecasting that a car hailing app needs to be able to do in order to get a car to you and set a market appropriately. So is there any kind of relationship between that kind of extreme event forecasting or forecasting that you’re doing at Uber AI Labs and the kind of now casting that you’re doing today with Windscape?

Jason Yosinski: 00:45:26

That’s interesting. I had never thought of making that connection, but yeah, actually the problem formulation is not so different. In both cases, you might formulate a network to model a problem. You might use a loss, which is mean squared error or something which optimizes for the general case, but doesn’t directly optimize for extreme events for Uber. Uber might care about… Yeah, like that Taylor Swift concert. That happens very rarely, but there’s a huge price surge and you might care about forecasting the probability of these four sigma, five sigma events. For example, I think at ride-sharing companies, they would directly message drivers to try to get them to get on the road because they have a chance of making a lot of money at these specific events.

00:46:17

So it can be worth a lot of extra effort just for that one time thing. In wind, we might also think about optimizing for the general, just like is it windy or not? Use a regression loss or something to track expected amount of wind. But it turns out that a lot of the instability of what happens with the grid and a lot of the money that changes hands changes hands in very rare cases where something is mispredicted by a lot. So you might’ve heard about the blackouts in Texas when there was the freeze, I want to say 2021 winter.

00:46:52

So I don’t know two or four sigma events happened then that led to, I want to say… I might [beep] the figure. I want to say it was $5 billion worth of energy changed hands. A lot of people made money, a lot of people lost money. Also, the grid, parts of the grid went down, they blacked out, chunks of the grid to save other chunks around near hospitals, for example. Okay, back to the modeling side. Yes, to model rare events explicitly might be useful at Uber, might be useful for the grid as well for these reasons.

Jon Krohn: 00:47:30

Nice. Yeah, and I did quickly look this up. February 2021 was when that happened.

Jason Yosinski: 00:47:34

Okay, yeah.

Jon Krohn: 00:47:38

Yeah, so interesting that there ended up being a little bit of a parallel there. Very cool. So another presentation or another paper that you had at Uber, it was probably both, that we thought was particularly interesting and wanted to highlight here was you talked about the learning process in neural networks. So you talked about top down versus bottom up or even synchronized. And so that isn’t something that I’ve thought about before. So when I think about training a neural network, I think about having what’s called the forward pass where you go from some input to some output. So for example, if it’s a machine vision algorithm, the input could be the pixels of an image and then the outcome is the prediction of the class of that image. It says, Hey, this is a cat or this is a truck or this is Jason Yosinski or whatever.

00:48:31

So you have this forward pass from the input to the output and then the gradient descent that allows us to update all of the parameters throughout all the layers of the neural network that goes backwards. We call it back propagation from the output layer back towards the input layer. Yeah, so Jason, fill us in and let me know if my high level summary or any of my ideas there made any sense related to your paper from 2019, which was in the most prestigious AI conference called NeurIPS and the paper was called LCA, Loss Change Allocation for Neural Network Training.

Jason Yosinski: 00:49:06

Yeah, prestigious conference also basically the last conference for a couple of years. The last conference we all met in person in Vancouver before COVID killed in-person conferences for a couple of years. Yeah, so Loss Change Allocation was a paper. The first author is Janice Lam who was at Uber at the time. Our goal here was to really start to build something like that oscilloscope or maybe a microscope that helps us examine training. So let’s imagine you have a network and it has 10 layers and you start training that network and you watch your loss go down. Now let’s imagine as you watching that loss go down, I’m sneaky and I grab one of the layers and it just freeze it, and the rest of the nine layers keep training, but layer four in the middle or something is frozen and stops learning. Do you think you would notice that in the actual loss signal?