Kirill Eremenko: 00:00:00
This is episode number 405, with Lead Data Scientist at Axpo Group, Thomas Obrist.
Kirill Eremenko: 00:00:12
Welcome to the SuperDataScience podcast. My name is Kirill Eremenko, Data Science Coach, and Lifestyle Entrepreneur. Each week we bring you inspiring people, and ideas to help you build your successful career in data science. Thanks for being here today, and now let’s make the complex simple.
Kirill Eremenko: 00:00:44
Welcome to the SuperDataScience podcast everybody. Super excited to have you back here on the show. Today’s episode is going to be more of the advanced type. We’ve got Thomas Obrist joining us, who is a lead data scientist at Axpo Group. Now, while Thomas’s lead data is lead data scientist, the work that he does more resembles the work of a quant. A quantitative analyst in a financial firm. But in this case the difference is that this is not stock trading, this is not financial trading, this is energy trading. But the principles are the same.
Kirill Eremenko: 00:01:18
Why is this episode quite advanced? This episode is more advanced because we’re going to be talking about how you can analyze data as a data scientist, versus how you can analyze the same data as a quant, as a quantitative analyst. What are the differences? What are the approaches? How do they differ? We’ll be mentioning things like Monte Carlo simulations for example, stochastic principles and things like that.
Kirill Eremenko: 00:01:47
This episode will be useful to you if you’re specifically interested in analyzing data in the space of trading, of stochastic processes, of financial markets, analysis like that, or if you’re specifically interested in the energy sector. If you’re interested in the energy markets, and what’s going on there, this episode will be also be useful to you. If you’re in one of those two groups you might find some very valuable insights in this episode. Just keep in mind that it’s quite specific to those areas.
Kirill Eremenko: 00:02:21
Things that we’ll talk about, long versus short. Long trading versus short trading. Psychology in trading, versus data quantitative analysis versus data science. We’ll touch on the Monte Carlo simulation. We’ll learn about the energy industry. Thomas is going to share a use case called the grid losses for one of the European countries, an analysis that he was doing, very interesting.
Kirill Eremenko: 00:02:47
You’ll hear about how he has to deal with uncertainty that comes from other uncertainty, where a lot of inputs like wind data, solar data, weather data, are being input into his model, and he has to model them to find out what the prices are going to be, but in the first place those models, that data that’s coming in is actually a model itself.
Kirill Eremenko: 00:03:11
He doesn’t know the wind data, the solar data for the next day. Dealing with uncertainty driven by more uncertainty, how he goes about that. We’ll talk about out of sample testing, and shadow trading. We’ll talk about the trade-off between testing and trading, and we’ll talk a bit about organizing hackathons, something that Thomas has experience in.
Kirill Eremenko: 00:03:32
We’ve got this advanced episode coming up. Hope you enjoy and without further ado I bring to you Thomas Obrist, lead data scientist at Axpo Group, Switzerland.
Kirill Eremenko: 00:03:48
Welcome back to SuperDataScience podcast everybody. Super excited to have you back on the show. Today we’ve got a special guest calling in from Switzerland, Thomas Obrist. Thomas, how are you doing?
Thomas Obrist: 00:03:57
Hi Kirill, thanks a lot for having me, very good. How about you?
Kirill Eremenko: 00:04:02
Very good as well. Super pumped to finally have this podcast. We’ve known each other for quite some time, right? Like what? It’s been a year and a half or two years?
Thomas Obrist: 00:04:13
I think around two years. Two years ago we met.
Kirill Eremenko: 00:04:18
Yeah. You’ve had quite an interesting career growth since then. You’ve moved from … Were you still finishing your university back then when we met?
Thomas Obrist: 00:04:33
I think we just met after my master’s degree. I started my trading at Axpo. Since then now I’m the quant for short term trading for Axpo Origination.
Kirill Eremenko: 00:04:46
Got you.
Kirill Eremenko: 00:04:47
How are you feeling about this podcast?
Thomas Obrist: 00:04:50
I mean, it’s great. I’m a bit nervous, but it’s going to be fine.
Kirill Eremenko: 00:04:55
I’m sure it’s going to be fine. Lots of cool topics to cover off. Before we get started, before we dive into your profession, your role, tell us a bit about, what your background is. What did you study at uni? Have you always been in Switzerland? I forget. Are you originally from Switzerland?
Thomas Obrist: 00:05:16
Yes, born in Switzerland, I grew up in Switzerland, and I studied in Switzerland. I studied mathematics at ETH, in my bachelor’s. Mostly focused on probability theory and statistics. Then I worked for one year as a consultant, the year between bachelor’s and master’s. Then I did my master’s in quant finance.
Thomas Obrist: 00:05:40
With my math background I mostly focused again on the math part, on probability and deepen my knowledge in probability theory. During my master’s actually I got really interested in data science. Back then, it was not long ago, but still during my year any IT course or lectures or data science or more machine learning approaches, they were not part of my curriculum, but I took them anyway, because at ETH you can basically take more or less each, every class, you just don’t get the points.
Thomas Obrist: 00:06:16
I mean, they write it on your diploma, but you don’t add it up. I took a lot of IT lectures during my master’s because I think it was really fun to take. I was using it for my master thesis. My focus was probability theory, a bit IT, and then some finance lectures on top of it.
Kirill Eremenko: 00:06:39
What was the thesis?
Thomas Obrist: 00:06:41
My thesis was, I actually don’t remember the full name. The topic was like, I used deep reinforcement learning to predict bitcoin currencies.
Kirill Eremenko: 00:06:56
That’s so exciting. Were you able to predict it?
Thomas Obrist: 00:06:59
I would say not really. I actually should use it again. The issue was like, it was during the hype. During the hype, everything went up. It went to, I think January-
Kirill Eremenko: 00:07:18
2018, end of 2017, start of 2018.
Thomas Obrist: 00:07:22
It went up to 21,000?
Kirill Eremenko: 00:07:22
Mm-hmm (affirmative).
Thomas Obrist: 00:07:23
Yeah, end of 17. During this time, I mean, the area algo was I thought nice, because if you only go along, and everything goes up, nothing can fail. Then at the end of-
Kirill Eremenko: 00:07:39
Kind of like Tesla stock prices right now.
Thomas Obrist: 00:07:42
Exactly, you cannot fail at Tesla for the last half year, because-
Kirill Eremenko: 00:07:46
This is not trading advice for everybody listening to this podcast, right? We’re not advising to buy or sell any kind of stocks, it’s just speculation I guess.
Thomas Obrist: 00:07:57
It’s a huge move. During this time period you can run an algo, and algo basically cannot fail if you can only go along. Because in 18, 17, a lot of the exchanges, they didn’t allow us to go short. You could not design an algorithm who would short bitcoin. Now there are way more exchanges who very can do that. Therefore, I designed this world algo who always goes along, and close to dollar, normal dollar. During the samples, like the back test or in out of sample testing was-
Kirill Eremenko: 00:08:34
Thomas, can you explain long and short? I just realized that it’s not common terms that maybe some people are not familiar with.
Thomas Obrist: 00:08:45
Of course. I mean, easy speaking without all the financial transaction, if you go along on an asset like Tesla, you’re betting basically that the stock price will go up, and you profit from that movement. If you go short you’re betting that the price goes down.
Thomas Obrist: 00:09:03
Assuming you would have shortened bitcoin at 21,000, and you would have closed your position at 10,000 for one bitcoin, you would have made 11K by bitcoin going down. You can bet on both directions. There is basically long and short, I mean it’s just a directional view. My lectures, and my out of sample testing for my master thesis was really good.
Thomas Obrist: 00:09:35
Then even whatever algo you have during the period afterwards where it falls from 21,000, I think the lowest was $4,800 per bitcoin. I mean, during the time period if you only can go along, you automatically lose all the time. I mean, as soon as you do something you basically lose. The algo was not that nice.
Thomas Obrist: 00:10:01
I think the issue as well, for a deep reinforcement learning, that the time period you have for bitcoin to actually do something wasn’t huge. There was not that much data. I mean, now I have been more advanced I would say after some years of actual using data science in real work environment. I would say my algorithm was kind overfit quite heavily. Because there was not so much data.
Thomas Obrist: 00:10:30
Another thing is like there is not so much fundamental data, where you can actually predict what it’s dependent on? What should we you use in impact? Yeah, you could use all the indicators, and build a lot of stuff based on price data, but that’s not something more fundamental like oil prices correlated to Bitcoin. My sample house, I never tested that.
Thomas Obrist: 00:10:56
It makes it really difficult to actually fit such a heavy structure like a deep reinforcement learning framework to bitcoin prices. Now with my more experience it looks like not heavily an overfit, but a generalization bet. The issue is for bitcoin you have limited data, and then this data is kind of free as well, because actually it’s just one realization of reality. It’s a stochastic process but it’s actually adjusted for one timeline. I mean, this is how life is.
Thomas Obrist: 00:11:30
It’s difficult for trading, because actually in the situation when you want to predict bitcoin at 2K, like $2,000 value. It’s a complete different story than at 20,000, but if you use a deep learning neural network you assume independence, so that they have more centers to train on. This is actually not true because they are heavily correlated.
Thomas Obrist: 00:11:58
I mean, to some extent people behave differently if Bitcoin is at $10,000, than if they were if Bitcoin was at $10. Even if you have the theory, the points are not independent of each other, and the whole will run. I mean, this is difficult then to generalize on.
Kirill Eremenko: 00:12:20
I totally understand. The way I understood it is that your deep reinforcement learning algorithm is looking at prices as these price points which you compare each other? That go in comparison to each other, and movements in the price. It doesn’t really care, whether it’s 20,000 or 20 euros, but for people that’s a big difference in terms of psychology.
Thomas Obrist: 00:12:46
Exactly. The issues [inaudible 00:12:48] I had inputs the price itself. I mean, those standardize, normalize and so on. So there was only like 20,000. The algorithm knew exactly it was 20,000 it was 20,000, it was more looking at the difference between how it moved. The psychologic factor, I mean, there was a lot of sense that Bitcoin cannot stop before 10,000, because just to change from four numbers to five numbers had a big impact on how people behaved. Bitcoin was traded heavily with a psychology approach.
Thomas Obrist: 00:13:25
There was a lot of emotions in the market basically, and mostly dependent on the level where it was. An algo who got standardized inputs. I mean, he wasn’t aware of this, and how could he? Because how do you treat emotions to a trading bot? You can, you can build features based on 10,000 could be a one or a zero or something like this, and you can make borders, or you could build features around this.
Thomas Obrist: 00:13:55
Then you need to know which they are, and if you already know then, why should you build an algo for doing it? You can just trade it like then you don’t need an accurate structure. I mean, if you know where the borders are, then there is no point in using an algo, then you just create it.
Kirill Eremenko: 00:14:11
Yeah. Absolutely.
Kirill Eremenko: 00:14:14
This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. We’ve got over two and a half thousand video tutorials, over 200 hours of content, and 30 plus courses with new courses being added on average once per month.
Kirill Eremenko: 00:14:34
All that and more you get as part of your membership at SuperDataScience. Don’t hold off, sign up today at www.www.superdatascience.com. Secure your membership and take your data science skills to the next level. Very interesting. Let’s move on to what you do now. Tell us a bit about your role. So you’re the lead data scientist at Axpo Origination for west and east Europe. What is Axpo, and what does the company do?
Thomas Obrist: 00:15:10
Axpo Group is a Swiss security. In Switzerland, we have a lot of assets, like river plants, water, pump storages, as well as some nuclear plants, which are partially owned or mostly owned by Axpo and operated on. Myself, I work for Axpo trading or Axpo solutions it’s called. This is a part of the group, and what we do we don’t have any assets. We trade our own things. We bring the assets essentially to the market. Because Axpo Group, they just produce energy, but we manage their energy.
Thomas Obrist: 00:15:57
Actually, myself I’m in Axpo origination. I have actually nothing to do with Switzerland. Axpo has as well some trading activities in other parts in Europe, and in the U.S. For example myself, I am the quant for origination, for short and time part. I do everything about data science and quant stuff for several European countries like Belgium, France, Netherlands, Austria, Czech, Slovakia, up to Turkey.
Thomas Obrist: 00:16:31
What origination is, origination is basically, if you are our steel clients, my department could offer you a contract to supply energy for your production for the next two years, or one year or a three years. And that’s actually your hedge. If you’re a steel producer this is enough, because you don’t need to worry about power prices. I mean, you can produce as long as you want, because you don’t have any power risk. We take the risk for these companies, and we manage this risk.
Kirill Eremenko: 00:17:04
Got you.
Thomas Obrist: 00:17:07
We have this PPAs it’s called. This is Power Purchasing Agreements, where we buy the power from wind parks and solar parks. If you’ve got huge wind parks, then you don’t want to worry about production and risk as well. I mean, production to power price risk. You just want to have a good power price for your plants, and then you want to produce as much as possible.
Thomas Obrist: 00:17:34
We take care of this risk as well. We manage these wind parks on the market for people who have built these parks. I mean, there is part of the company who build wind parks as well, and solar parks for Axpo, but it’s not the trading part. So, we just manage then, after they are built, we manage their production on the market. We go and sell.
Kirill Eremenko: 00:17:58
[crosstalk 00:17:58]. It’s like Axpo is a massive company that on one hand produces energy itself, with different kinds of energy with different sources. But then on the other hand you also purchase energy from other companies out there? Wind parks, solar parks, and other energy producers. And you also sell that energy, supply and sell that energy to whether it’s clients, not mom and dad clients, but big companies like you said, a steel production plant, which requires lots and lots of energy per year.
Kirill Eremenko: 00:18:35
You create agreements with them, so that they know what they will be paying for energy in the next year, three years, or five years. Is that about right?
Thomas Obrist: 00:18:45
Exactly. We manage their risk for all these things. Yes.
Kirill Eremenko: 00:18:55
Good. You said you’re a quant. What is the difference between a quant, a data scientist, and a data analyst?
Thomas Obrist: 00:19:00
For example, a data analyst, we have a lot of data analysts. They study the market really deep. They read from newspaper, browsing newspapers and try to see where gas prices might be going, or they read all the news. We’ve got the news development with new tech breakdown in Germany, or in France, what’s going to happen in France? What politics decide, politic and regulations?
Thomas Obrist: 00:19:34
I mean, it can be quantitative, but it’s a lot of seeing where markets are going based on news, events, and all that stuff. It doesn’t need to be quantitative because they have a lot of experience, and they read and see well.
Thomas Obrist: 00:19:53
Differentiating now between data scientist and quants … I mean, I think they’re kind of a mixture, and a good quant can use data science, while a good data scientist has quant skills. I mean, I see data science like people who go, have a trading test set, and they build machine on top of it. While quants, they can run simulations, like Monte Carlo simulations, and then they can calculate the probability, and based on the probability they can make a price. If this price is better than what you get at the market, you go on buy and sell.
Thomas Obrist: 00:20:27
It’s kind of a different approach. It’s all free data heavy. Like you both do a descriptive analytics of the data. I would say the mathematic methods they use is a bit different. I mean, traditionally, you see a lot of quants in risk management do pricing analysis. You can do as well quants related models for trading, like predictive models as well.
Thomas Obrist: 00:20:54
It’s less, I mean, I would say easy quant models to differentiate just as an example would be, you may get stochastical outliers. You calculate the probability if something like this happens, which is an outlier, you assume this will follow afterwards, because if you look at the outliers, there you have like … You might have, you don’t need to, but you might have higher correlations between different prices.
Thomas Obrist: 00:21:17
A data scientist, I mean, he can think of these things as well, he’s not excluded, but I would say the approach is a bit different. You go, and you try to build the models, you fit the dress space, you try to build features. The delaying which is a bit different.
Thomas Obrist: 00:21:33
At the end, the models don’t need to be necessarily hugely different, but I would say the language is a bit different. I think right now since data science is quite new to a lot of companies, I think it’s a little bit split. What the quant does and what a data scientist does. I think it’s dependent on which field of course they will mix a bit, because I believe … I mean, you need to … If you want to be a good quant or data scientist, you don’t want to use just a hammer. If you need to saw something, you need to have a saw.
Thomas Obrist: 00:22:08
I see one tool or any other as a good, a worker can use both tools. Therefore, I explain it a bit differently between these three groups. I mean, the analyst, he can do a lot of things as well. He can go do data science models as well. I mean, for his daily work, most of the times he doesn’t just need it. I would differentiate the three different groups.
Kirill Eremenko: 00:22:37
Interesting. I was specifically interested in the quant versus data scientist. Let’s dive into that a bit more. The difference between the two models. For a data scientist, let’s say you want to do a simple prediction of price based on a linear regression, you just use your … You have your training data, you have your test data, very straightforward, you pass it through your model, and you have a model.
Kirill Eremenko: 00:23:08
Simplifying things, how would you say a quant approach, you said Monte Carlo, how is that different? What is the principle thinking behind it that is different?
Thomas Obrist: 00:23:24
I would say in general a quant goes and first like, he looks at for example different quantiles. Assuming in linear regression you would just take your price data. Let’s say a power price data for … I mean, not that this will work, so don’t try it, but you take price data of Germany, you put it in a linear regression, do like 10 years, and then you make a prediction based on this.
Thomas Obrist: 00:23:49
A quant will go, perhaps they may look like, okay, I normalized my data. I looked at all the quantiles. If power prices moved yesterday by 10 years. This will be in 10 years power prices in Germany for the car is a lot. This will be a really high move, a history move. Then you could compare historically, you look at two years data, and you see what happened afterwards if I was in this high move environment, or this high volatility environment? What happened the day afterwards?
Thomas Obrist: 00:24:21
Then if you see normally, which I believe you could say, “Okay, probability of 60% movement, after a [inaudible 00:24:28] move, the price is reverted down. You could down a model and say, “Okay, if I observed this high move, I go short.” So I will sell at cap and I bet that the prices go down. I mean, this will be one idea which could come to the same conclusion as a data science model, but it’s a different approach, obvious data, and kind of strain. Because you’re depending on values that you order, it could be a 10% quantile move, or a 5% quantile move and so on.
Thomas Obrist: 00:24:58
So, you have a training and a fitting face, but this can have a different approach in my view. It’s a bit different I would say. I mean, this is less about Monte Carlo. Monte Carlo you could more use for pricing as a quant. You run a simulation, and you get a price. For example, on my work, as I said, my department we go and have this power purchasing of wind parks, not wind parks itself, the power of wind parks.
Thomas Obrist: 00:25:34
The question there is like, what will be the short term risk? Because I’m doing just the short term. You’ll get short term risk in two years. For this, I need to know where the whole wind in Europe is going, and what could be the price of this in two years? Because I need to make a quote for let’s say our originator, our sales person. The originator who goes to the client. He needs to have a price. I give him the price, therefore, I need to run a simulation, what could happen? Because it’s kind of a probability. It’s less a prediction. It’s more an expected value for example. Because I know what we can get from there, it’s not a forecast, because I know it’s going to be wrong, but it’s more like a risk for you.
Kirill Eremenko: 00:26:26
Could you explain Monte Carlo in a few sentences? How does the simulation work? I find it quite interesting.
Thomas Obrist: 00:26:36
I would say rather easy. If you simulate different stochastic processes, you can assume distributions, you can take historical distributions, or other things. Then you run the suspicion, and you look at how they interact in hierarchy, it’s based on a numerical approach to mixing distributions. Easy said. Then you can see how it will converge.
Thomas Obrist: 00:27:02
For example, the issue with itself is that you only observe one year of data, or two years, which is relevant. I mean, let’s say in Europe, the short term has changed a lot during the last few years. There has been way more wind parks and solar parks, because Germany, every European country is building more wind and solar.
Thomas Obrist: 00:27:27
Germany was a front runner, they built a lot of wind, and solar power. Since it’s changed so much, you cannot go back 10 years. I mean, I have data for 10 years perhaps, but 10 years ago the data is useless, because it’s such a different environment now. With Monte Carlo simulations, I don’t just take historical cost of one year, I can run what would that be in a fair price if this would have happened based on different stochastic process? Which I can interfere and see, “Okay, this is an expected value.” Because one year is just, it has a huge variance.
Thomas Obrist: 00:28:07
You need to filter out this variance, because once you have variance it’s like … For example, you have seen the cost on the short term for wind park was in 2019 was huge. Apparently, you are unlucky. I mean, wind is still huge, this was extremely huge, you don’t want to give for 21 for example, the 2019 price, what you observed. Because this might be way too high, and then you don’t get the contract.
Thomas Obrist: 00:28:43
I mean, the goal is to sign contracts, and to manage more assets. We want to give a fair price, which is what … 21, could be low again perhaps, but we want to know what is the expected price, and then we can manage the risk.
Kirill Eremenko: 00:29:01
It sounds like you’ve got a lot of things going on, it sounds very complex. You don’t only have to think about the data, but all the contracts and managing. I wanted to ask you about this, you are responsible for trading at Axpo. That’s trading energy as I understand on a daily basis. To put it into perspective, what is the amount of funds that you’re responsible for trading every year?
Thomas Obrist: 00:29:35
I will say it’s … I mean, the data, the models, which are live on the market. I only trade over models basically. Currently they’re on let’s say 20,000,000 a year, which gets traded over these models.
Kirill Eremenko: 00:29:48
20,000,000 euros?
Thomas Obrist: 00:29:51
Yes.
Kirill Eremenko: 00:29:52
That is a huge amount. How do you approach this? For instance, what kind of things do you look at? Tell us what’s your day to day? What is involved in your day to day as a quant on the trading space?
Thomas Obrist: 00:30:16
Since my department is mainly basically origination, or let’s say contact with clients, we don’t have a huge department with a lot of quants. I do a lot of different stuff, which I really like, so it’s a lot of variance. I started as a data scientist, but as a mathematician, I use more and more quant models, but it’s really different.
Thomas Obrist: 00:30:41
Sometimes I do pricings, which is more quant related, where I run simulations, and see what will happen in three years, or what is my view on two years short term pricings? Sometimes, just a new contract, for example, for a steel client, or let’s say for example a grid loss client. Then we need to forecast these grid losses.
Thomas Obrist: 00:31:05
I get a lot of data, and then the thing is, I need to build a model, which goes every day to the market, and buys the energy to supply this client. Then I spend days working on this model, fine tuning it, looking if it works well. Then I put it live on the market, and it starts trading. It can be hugely variance, but I myself, as well, sometimes I trade minimally. Sometimes I get calls to execute some trades on the market. It’s really a huge variance, which I really love about my job.
Thomas Obrist: 00:31:41
My day to day job looks basically a little bit different. It’s always about short term trading, but in some sense re-trading, like going to market and trade imbalances, we get live updates from a lot of wind parks, and sometimes we need to go and manage them manually. I build models, like read let’s say data machine learning models. We try to predict as good as possible different client profiles. Then I do the pricing on the short term, so it’s really quantitative related.
Thomas Obrist: 00:32:17
I would say these are the main three things. I would say most of my time I spent doing prescriptive analytics. I try to understand what’s happening. Also times if you really understand what actually happened, then you can add a few models to be better features, or do really simple adjustments in the future. It’s really a lot of things, so many things going every day, you get a lot of feedback. There is so much data coming back every day from each European country, so much price data, things that happened.
Thomas Obrist: 00:32:51
Things might go wrong, and then you need to understand, what happened? I mean, for example COVID was an interesting time period out of several aspects. I mean, just on the market. At the beginning of the lockdown, everything shut down. For short term risk on energy trading, on power trading, you trade the next day. I trade tomorrow at 12 o’clock today. I need to make a forecast today at 12:00 for tomorrow, 24 hours.
Thomas Obrist: 00:33:26
During COVID what happened was that all your demand forecast, because you don’t know which factory shut down, when, and which machine, or which homes did more energy during this time? Nobody knew what’s going to happen exactly. I mean, everybody expected that there would be less demand, but when and how? You need to forecast on an hourly basis. It gets traded on an hourly basis in most European countries.
Thomas Obrist: 00:33:56
For example in Germany, you can trade up to 50 minutes. What I mean with 50 minutes, you can trade 50 minutes delivery times. It’s really, you need to be precise. During this time period, all your data, the month data was wrong, but you never knew how wrong? Market, what happened in the market was kind of that … Not everywhere, but there was too much energy produced because thought, “Okay, they will need it.” But at that time they didn’t needed it. The market was loaded with energy. Again, the balancing mechanism needed to take out energy. There was too much energy around on the short term.
Thomas Obrist: 00:34:41
Another factor is you need to understand and think about, was it just now or it will be just the future? How long will this trend last? For this period where everything shut down, it was really short because this happened during a weekend, then you knew, “Now we have on the new levels, and markets got regulated, or normal again.” There are other trends, and you’re thinking, why is this happening? What could be the cause of it? How could I adapt to it? How could I position myself to not get harmed by it, and manage the risk?
Kirill Eremenko: 00:35:19
You mentioned you do several different things. Are you able to share a use case with us to give an example?
Thomas Obrist: 00:35:30
Yes, for example. I think one really interesting use case was grid loss.
Kirill Eremenko: 00:35:34
What is grid loss?
Thomas Obrist: 00:35:39
Exactly. If you pump energy through a cable, the energy gets lost on the way. If you have a starting point, and an end point, you pump energy in at the starting point, and you take out energy on the endpoint, there will be a difference, because during transport you lose energy.
Kirill Eremenko: 00:35:57
How much energy do you lose?
Thomas Obrist: 00:36:02
Actually, on percentage, I’m not so sure. I mean, it’s not that much. It depends on the cable, if it’s high frequency, low frequency-
Kirill Eremenko: 00:36:11
The distance.
Thomas Obrist: 00:36:13
A lot of stuff, how many transformers you have, and so on. On an actual level, I don’t know to be honest. I mean, what I did as a use case was, we needed to supply this energy on the day ahead. So, I got two years of data. Then the idea was to forecast this for the next day, over a different time period.
Thomas Obrist: 00:36:44
There are a lot of physical factors on what are dependent, high frequency, low frequency, or other things. But just the aesthetic factors. This is really easy to spot with your data. You can really mark down these levels. There are variable technical losses on grid losses. These variable losses, they’re basically dependent on how much energy runs through it?
Thomas Obrist: 00:37:15
The longer the cable and the more energy runs through it, the higher the grid loss. This is where the problem starts. It’s really difficult, we’re not speaking about grid loss in a small home, it’s on a country level of one of the European countries. It’s a big grid loss with a big cable.
Thomas Obrist: 00:37:39
How much energy runs through it? It’s temperature dependent, which temperature do you take in a whole country? I mean, there are many factors. Then, with more renewable energy, normally renewable energy is not produced where people live. A good example is Germany, there are a lot of wind parks in the north of Germany, but a lot of people live in the south of Germany.
Thomas Obrist: 00:38:05
If wind it’s produced it needs to be basically transported from north to south. There is more grid loss based on the nearness. But for example, on the other side, if you have more solar panels on your rooftops, these people have less grid losses, because they don’t need energy from the grid, and so on.
Thomas Obrist: 00:38:25
Wind parks for example, you have huge offshore wind parks in parts of Netherlands, Belgium, Germany, and so on. If they produce they have a long time to get to the people because they’re offshore. They’re out in the ocean basically, out in the sea. This takes way more time.
Thomas Obrist: 00:38:47
I would say, this makes it really interesting. It’s not just price data, there are fundamental things why this grid loss is happening. It’s temperature dependent, then it’s dependent on solar production, wind production, wind speed, and other things.
Thomas Obrist: 00:39:04
Then another difficulty is the demand itself, like how much energy is actually needed? It’s difficult to pin down. For example, for a country, if you look at grid losses near a city, this is really dependent on how much energy the city produces. For example, if temperature goes up, the city perhaps heats more. So it needs more power to actually heat to proceed. There might be more grid losses. All those factors come together-
Kirill Eremenko: 00:39:34
You mean if the temperature goes down they need more heat?
Thomas Obrist: 00:39:38
Exactly. Sorry. If temperature goes down they need more heat, so grid losses might go up. For example, in summer, if it’s too hot, they turn on AC and [inaudible 00:39:51]. There are other factors to take into account which are not linear. The issue comes with … I mean, if you would know all these inputs exactly, it would not be that big of an issue. It would still be difficult, because which temperature?
Thomas Obrist: 00:40:04
There is a lot of questions around wind production, the one in the north, or the one in the south, or which solar? And so on. There is a lot of uncertainty, but the worst thing is all this data we’re using, we need to decide it today for tomorrow. For example, wind and solar-
Kirill Eremenko: 00:40:22
So, you don’t know the wind speed tomorrow, you don’t know the temperature in different areas tomorrow? All of your inputs are unknown as well?
Thomas Obrist: 00:40:31
Exactly, all of these inputs, they’re other forecasts. Wind production has an MAP of around 15 to 20%.
Kirill Eremenko: 00:40:44
What’s an MAP?
Thomas Obrist: 00:40:46
Mean actual percentage error. If you look at forecast model, wind production can be wrong, up to 20%. I mean, can be wrong even more, but in average, the absolute error is about 20% wrong of your wind production on the day ahead. Today for tomorrow, up to 15 to 20% error in my forecast. This is just for wind, solar is extremely wrong as well.
Thomas Obrist: 00:41:13
It’s difficult to forecast solar. I mean, just imagine, it’s no clouds at all tomorrow. You don’t see anything on your weather models, but sometimes there is a small cloud. You don’t spot them, but they might be exactly when you want to produce at noon, so you have your high peak. You want to produce as much as possible, and perhaps then, exactly then, there is a small cloud over your solar panel. This is impossible to forecast. There is no way to forecast this on the day ahead.
Thomas Obrist: 00:41:52
Therefore, all these inputs I take, they’re hugely wrong. I know they’re wrong. I need to deal with how wrong will they be? And how could I find the data itself? What I did, and this was really describing what I see, if I look at wind, can I spot how big their error is in wind generation?
Thomas Obrist: 00:42:16
For example, I don’t receive only data for the next day. I receive wind data, yesterday for tomorrow, three days ago for tomorrow, four days ago for tomorrow. One idea could be like-
Kirill Eremenko: 00:42:29
So you receive the wind forecast?
Thomas Obrist: 00:42:33
Yes.
Kirill Eremenko: 00:42:34
Which were in place a day ago, for tomorrow, two days ago for tomorrow, and so on?
Thomas Obrist: 00:42:38
Exactly.
Kirill Eremenko: 00:42:41
So you can observe how the forecast changed over time?
Thomas Obrist: 00:42:45
Exactly. This could be a feature to study. Assuming, if it changes a lot, does it make the wind forecast worse or better? The same thing for solar. Can you spot, perhaps I know now wind might be wrong tomorrow. Should I position myself differently? Should I really just look at the day plus one, or should I look at day plus two, day plus three, day plus four? And look at the different things. There is not just time perspective that I need to focus 24 hours. I need to focus, 12, 1:00, 2:00, 3:00 and so on.
Thomas Obrist: 00:43:21
I need to look at the data itself like, the forecast for one o’clock tomorrow, I have one today, I had one yesterday, I had one two days, three days ago, and so on. I can study this as well. The data I have is a lot. The same thing goes also true for solar, for temperature forecast.
Thomas Obrist: 00:43:46
Then there is demand forecast, like how much each city needs? And so on. It’s a more year, going back to city per se, but a year. All these things change, and they have variants and they have uncertainties. You need to think about, is there a way to analyze the different wind inputs? All of these things have impact on the grid loss. At the end, this is interesting about my job, I start a model and it gets feedback immediately.
Thomas Obrist: 00:44:19
I start trading today, and basically tomorrow during the day I can see if I was right. Metering takes a bit of time, but let’s say I start trading today, and in five days I got my feedback. If it was right or wrong, or was my action good or bad?
Thomas Obrist: 00:44:37
This is really I would say rewarding, and challenging, since it’s shorter term. To think about what is right or wrong? You get immediate feedback. There is a lot of data to think about, all the different wind forecasts and solar production. Then you can think of more things. For example, grid losses could increase as well if, just a random example, Switzerland, if they buy energy from Germany, or import energy from France, it’s a different high grid loss if they take it from France and they will produce it inside of Switzerland.
Thomas Obrist: 00:45:16
You need not just to think about one country, you need to think about several countries. One big country we always think about is Germany. There is so much wind production. What’s happening if Germany doesn’t produce that much wind? They import a lot of energy. If they have too much energy and they produce a lot with their wind production, they flow to other Europeans market with their energy. So cross border trade activity is really high. It’s not just like you need to focus on one country. It’s basically all Europe to worry about, and to think about, how could this impact?
Thomas Obrist: 00:45:56
Of course, if you look at Spain, you don’t need to worry too much let’s say about [inaudible 00:46:06]. Actually I never looked at this data, but I suppose there is not much correlation going on. Countries between let’s say Belgium and France, they have huge impact on each other, or mostly process impact in Belgium, because Belgium is a small country in relation to France of course, but there are so many things going on. So much data to consider, and all of this data I had is wrong, because there is high uncertainty in each data point.
Thomas Obrist: 00:46:38
I think this is really interesting to study, because you can spend an eternity just going through, “Okay, what happens if wind forecast 10 years ago was on a huge different quantile level, hugely different than one day ago, or two to three? And so on. There is so much data around, it makes it really interesting.
Kirill Eremenko: 00:47:02
How did your grid loss case study end?
Thomas Obrist: 00:47:07
I mean, I produced a model. Actually, this is really difficult to produce a model. I think I got a good model, which generalizes. I did all the testing, but actually there was a third party who claimed they could do better. Then there was a challenge, my management wanted, “Of course, we should do the same thing. We should be even better than they are. Why are we worse?”
Thomas Obrist: 00:47:35
We had shadow trading we call it, when we get the fair point inputs on a day to day basis, but we actually don’t trade them. The issue is, nobody send a bet back to us. If you’re a third party you need to send the bet to us, and you say, “Okay, this is what we have done.” As I said, nobody sends a back practice, so you never know if it’s overfit or not.
Thomas Obrist: 00:48:00
We have the shadow training where we go and see, “What is the real out of sample testing?” Because if their out of sample performance is like … In sample performance they should have more or less the same weight, if you receive day to day the data then to replicate their back test.
Kirill Eremenko: 00:48:23
What is schedule trading?
Thomas Obrist: 00:48:24
Shadow training is like-
Kirill Eremenko: 00:48:28
Shadow trading? [crosstalk 00:48:29].
Thomas Obrist: 00:48:29
Shadow trading.
Kirill Eremenko: 00:48:30
I’m sorry, I heard schedule. Shadow trading. You’re trading a demo version, you’re not trading real money?
Thomas Obrist: 00:48:39
Exactly, we are not trading it, but we receive the data from this other company which says like, “You would have done that.” Because if you receive it on a day to day basis, they cannot cheat, because we are the one who bought it, so they don’t actually know what’s going to happen.
Kirill Eremenko: 00:48:55
You can evaluate?
Thomas Obrist: 00:48:59
Exactly. We can add an out of sample testing. The issue with this is … I mean, you cannot do this forever. Trading is really short term, markets are changing a lot. If you have something which works, you don’t want to do this for two years.
Thomas Obrist: 00:49:19
You want to go fast to market. There is the issue like, you produce [inaudible 00:49:23] for one year, but then your out of sample testing come up in one year. Let’s say two or one month, let’s say, just as an example, one month, where you can evaluate the out of sample, perhaps it was a bad one month. Why was it bad? Is this still a good idea, or was it actually bad? There are so many things to consider.
Thomas Obrist: 00:49:46
It could be just a bad month, and then you will say, “It would anyways.” Because we can explain why it was a bad month. But perhaps it was a really good month, and then you start actually trading, and it goes south. It’s really hard to evaluate, because you don’t want to do it too long then you lose value, your ideas change.
Thomas Obrist: 00:50:11
If you do it too short, you have the statistical sample actually to extrapolate. How many people you are out of sample testing? This is like, as a data scientist you want to have as much data as possible. As a trader, you want to generate value as soon as possible. You have this trade-off between testing and actually generating money.
Thomas Obrist: 00:50:34
With this grid loss case it was difficult. Actually there I’ve tried a lot of things. The issue is, why it is so difficult is there is a lot of uncertainty in the market, and a lot of things who could go wrong, because you have so much wrong inputs to your model. You really don’t want to overfit.
Thomas Obrist: 00:50:55
You need to think really deep about, of course I have four months where my model didn’t behave really good. The question was why? Because if I just build a model, which would take this into account, it could be an overfit, because perhaps this situation will not generalize in the future again. I need to know why it performed bad. I need to know why it’s happened, because if it’s just build more features, perhaps we might fix it.
Thomas Obrist: 00:51:25
There is high complexity always, and you can introduce more features, build a higher complex model. This will fix your issue on your test side, and your training side. If you look at your test side several times because your manager came back and said, “Do it again.” Then you might overfit. You need to balance between overfitting and generalization. This is always the case.
Thomas Obrist: 00:51:56
The difficulty was the third party said, “Our model generalizes better.” At the end I improved a little bit the model with new features, more analysis on the data to capture more uncertainty on the inputs. Then we said as well, nobody sends a bad back test. The third party, we said like, “Your back test was too good to be true.” I mean, I don’t say they did the wrong job or they wanted to trick us, but it’s really difficult to actually generalize well always.
Thomas Obrist: 00:52:35
I mean, is it know the problem, the issue … Even if you think your training and test error could balance it might not be, because there might be some factor, which you don’t consider.
Kirill Eremenko: 00:52:51
A very interesting case study. I like the trade-off you described about testing and trading. How the markets change really fast. It’s a different thing, not something you often see in data science, this trade-off. I guess it’s specific to applying data techniques in market conditions. Tell us a bit about your hackathon. On LinkedIn I read that you won an international hackathon on predictive modeling of sport prices. Can you tell us a bit about that?
Thomas Obrist: 00:53:36
Exactly. I mean, as Axpo, I did a hackathon for Axpo. I mean, Axpo organized everything. It was for students. Every students could have come, but mostly it was ETH students who study mathematics, mostly machine learning and data science actually. It was a hackathon for students mainly. It was really nice. We went for three days in a power plant from Axpo somewhere in the mountains.
Kirill Eremenko: 00:54:10
This was before you worked at Axpo?
Thomas Obrist: 00:54:12
Actually, I joined once as a student, and once I was the organizer. I mean, I was not part of renting a room or something like this, but I got a use case, and then I prepared the use case. I gathered all the data. I wrote environments how students can submit their models. The use case was, we have a lot of wind parks which we manage energy for.
Thomas Obrist: 00:54:43
Some of these use cases, especially were in the Nordics region, north of Europe. Not all the same, but some of those wind parks they send quite real live data. You have a feed in I’d say every 50 minutes of measurement data, of how much spark it’s producing. You can calibrate your new model for the end of the day. So you have intra-day market as well.
Thomas Obrist: 00:55:12
You could see if you are really wrong on their head, perhaps you should adjust your intra-day updates. So you can trade better. Since we have a lot of parks, perhaps the use case was, there is a correlation, we don’t see it between different parts.
Thomas Obrist: 00:55:31
For example, in the east of these Nordic country, there was a huge error, but in the west not yet. Perhaps in an hour the error will be there as well.
Kirill Eremenko: 00:55:50
Interesting.
Thomas Obrist: 00:55:50
That is an interesting example, but there could be different correlations which we don’t see yet. This is just one example like, those are things we don’t consider yet in our data, because we have so much data. Normally, our wind forecasting work is like, you have to position, you give this to a third party. They will do a mapping between a wind model, like they look at weather data and everything. They do a mapping how much your wind park produces. Based on the location it is, which wind turbine it is, and so on.
Thomas Obrist: 00:56:22
There is kind of a standalone basis let’s say like this. Perhaps they miss the correlation between mappings. If there was an error in the east, this error could happen in an hour later in the west, or in the south, and north, or other things. Perhaps you can interfere from one impacting the other one. What we did, I get all the data from each country for all the wind parks we have.
Thomas Obrist: 00:56:48
Then we gave all this data to the students. We gave wing speed data, wind temperature measurements, forecast measurements. It was just trying to improve the wind forecast itself. Like how much is going this turbine to produce? This was the use case. It was a really interesting hackathon. I mean, it’s really fundamental, you need to start to think about how a wind turbine produces energy, how it’s dependent on wind speed, and other things.
Thomas Obrist: 00:57:23
The funny thing in the Nordics countries is, turbines can freeze. If the temperature is so low, even if you have wind, if it’s frozen it will not going to produce something. I think this makes it really interesting, if they freeze there is a really huge decrease in production.
Kirill Eremenko: 00:57:45
Interesting. How did that go? Did the students solve the hackathon?
Thomas Obrist: 00:57:51
I mean, actually there was again one model, which was really a simple one, or a good idea. Which was a bit better than the baseline. Honestly, I would say that in the university hackathon students tried a lot. I think it was interesting for everyone. I would say to some extent it was a bit difficult perhaps as well. First thing, you need to understand how … We introduced them to day ahead market price, or intra-day market price, when you can trade something in how a wind turbine is built, how it’s producing based on wind inputs, dependencies. Everything was in Python. They know Python, but we built libraries for them that they can access data and other things.
Thomas Obrist: 00:58:45
I would say three days was just not enough to solve this input. They tried a lot, and I think it was really interesting to see how they progressed. We did I would say every hot day we did a stock where I evaluated all the current models. They submitted something, I rated them, and gave them feedback. They did a round of discussions.
Thomas Obrist: 00:59:11
It was really fun to see how they worked in three days. At the beginning, the first models we were like, “Okay. Let’s just try to load the data, do something, and submit something.” We used models like linear regressions or something very simple, some inputs. The second models then were like, I mean, they used all their techniques. They took a lot of data, built features, put it in huge models with high degree of complexity.
Thomas Obrist: 00:59:54
Then the other thing, and this was the second round, and then everyone was really disappointed. I had a hidden asset, where I evaluated all of the models and they didn’t have access to it. It was like they only could test basically four times. I mean, they one had test that split itself in training and test, but there was one set which only I had.
Thomas Obrist: 01:00:19
Then the second one was … They tried so much. With one student I think I stayed until three o’clock in the room in the morning, just so that he was able to finish his training. Then it was disappointing because the second round was worse than the first. What happened was that most were too high complexity, they didn’t generalize well outside of their set.
Thomas Obrist: 01:00:46
In the third run they all cut back. So they went and filtered on features, they filtered on data, they filtered on the complexity and tried to reduce it. It was really interesting to see how it well worked. It was really fun. We had one model, which was better than the baseline, but overall perhaps we should have chosen an easier use case, and solve the issue of European wind production.
Kirill Eremenko: 01:01:16
Very interesting. Interesting to see how people adjust their thinking, and change the models with your feedback. This didn’t work, make it more complex, less complex, and so on. That was fun.
Kirill Eremenko: 01:01:30
Thomas, we’re actually running out of time. It’s been an hour. It’s flowed by real quick. Before we finish up, just one final question for you. What’s your recommendation for somebody who wants to get into this space that you’re in? Into energy trading, somebody who is going to be a data scientist, or starting into this space of data science? What would you say is an important thing for them to look into as a first step?
Thomas Obrist: 01:01:58
I mean, if you only want to go to energy trading itself, I would say as a data scientist you really need to want to do this. It’s dependent on this. It’s trading. First, be interested in trading, start a little bit of trading yourself … This always looks really nice, even if you’re a data scientist, if you have to feed off being a trader, or you know what it is to press the button, and actually do a trade. I think this is always welcome. So be interested in finance. Else, it’s really good if you have some knowledge about quantitative approaches as I discussed in the beginning, what’s Monte Carlo simulation?
Thomas Obrist: 01:02:39
I mean, those people know it, but it’s not always let’s say if you’re really IT heavy, and you went from IT side to data science, it’s not necessary that you did it. This could be something which is a plus on your CV [inaudible 01:02:56].
Kirill Eremenko: 01:02:55
Awesome. That’s a cool idea. Look into what trading is all about. All right, thanks a lot Thomas for coming. It’s been a pleasure. Before we wrap up, where can our listeners get in touch with you and find you? What’s the best places to connect?
Thomas Obrist: 01:03:18
I mean, just drop me a message on LinkedIn and I try to respond.
Kirill Eremenko: 01:03:23
Awesome. One final question for you. What’s a book or books that you can recommend for our listeners?
Thomas Obrist: 01:03:31
I would say I recommend two books. One is Systematic Trading from Robert Carver. This is not about data science, it’s more about trading in general. It gets you thinking about, how could I use data approach, or a quantitative approach for trading. It’s a really nice book, read and apply it book about how to build a framework for quantitative trading. It starts you thinking about how to generalize ideas.
Thomas Obrist: 01:03:58
The other book, I mean, most times I just read papers, but as a student I went through Deep Learning from Ian Goodfellow. It’s a long but I thought it’s perfect. It’s really in detail, and I really like to read through it. It takes some time, I think it’s 800 pages. Once you get done with it, then I think it’s really nice.
Kirill Eremenko: 01:04:23
Is that the one that’s for free?
Thomas Obrist: 01:04:25
Yeah. I think it’s from MIT Press. You can buy it on Amazon, but there is as well HTML links where you can access this for free.
Kirill Eremenko: 01:04:36
Yeah, I think it’s deeplearningbook.org. That’s the website. It’s been recommended a few times. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. You can access it there for free if you’re interested. Is it a good book?
Thomas Obrist: 01:04:56
I think it is a really nice book. If you read through it you’ll know everything about networks, and deep learning. It’s a nice book to read through.
Kirill Eremenko: 01:05:07
It’s about four years old though. Do you think it’s still up to date? Is it still relevant?
Thomas Obrist: 01:05:14
Depending on which level you are I would say. If you are a student … I think it’s still on ETH trend list, trade ETH deep learning. I’ve not looked this year, but this book was on the list of lectures, that they go through a part of this book in the lecture.
Thomas Obrist: 01:05:37
I would say if you want to get into deep learning, this book covers it very well. I mean, if you are a front runner in the research, perhaps not. Then I would recommend something different. Depending on which level you are. I thought as a book it gives you a really good overview of a lot of the concepts.
Kirill Eremenko: 01:05:56
Awesome.
Kirill Eremenko: 01:06:02
Well, thank you for the recommendations, and on that note we’re going to wrap up. Thanks a lot Thomas for coming on the show. It was real fun.
Thomas Obrist: 01:06:11
Thank you very much.
Kirill Eremenko: 01:06:17
There you go everybody. I hope you enjoyed this episode. As mentioned at the beginning it was quite advanced, and a lot of topics. I’m sure we could have dove into many of them, but we touched on quite a lot of topics. Very briefly so, my favorite part was the trade-off between testing, and trading. It resembles the whole trade-off between exploration and exploitation. In this case, once we have a model that you’ve back tested, and you’ve verified that works, then you want to forward test it. Basically you want to put it onto the market and shadow trade it for a bit, to make sure that your model wasn’t overfitting, that your out of sample test … Not just out of sample test, but out of sample test on live data that comes in with all these glitches, and all that delays and lags, and everything else that resembles the real world markets, some things that are quite hard sometimes to recreate and back test, even with out of sample back test.
Kirill Eremenko: 01:07:19
You want to put it on, and shadow trade it for a bit, but the question is for how long? If you shadow trade it for four months, you might get your validation, but by then markets might have changed, and as soon as you switch to real trading, it’s no longer working. On the other hand if you shadow trade for too short for a week, you might not get enough data to validate that it’s working, and when you switch to live trading again, it’s now working. An interesting balance. I love these situations when it’s time to decide a balance, and show you there isn’t one right answer. It’s on a case by case basis. Maybe there are some guiding principles, but it’s ultimately an art that data scientists have to participate in.
Kirill Eremenko: 01:08:03
I’m sure you had your own favorite parts from this episode. As always, the show notes are available at www.superdatascience.com/405 where you can find the transcript for this episode, any materials we mentioned, and URLs to connect with Thomas. Hit him up on LinkedIn, especially if you’re interested in the space of energy or quantitative analysis of markets and trading. I’m sure he’ll be happy to help out. If you know somebody in this space, very easy to send them the episode, to share, just send them a link superdatscience.com/405.
Kirill Eremenko: 01:08:40
On that note, thank you so much for being here today, I look forward to seeing you back here next time. Until then, happy analyzing.