64 minutes
SDS 625: Analyzing Blockchain Data and Cryptocurrencies
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
We're back with another blockchain-related episode! After welcoming Philip Gradwell from Chainalysis last week, we're now sitting down with the company's Director of Research, Kim Grauer, as promised. We're diving even deeper into blockchain and machine learning applications for a second, illuminating look at the intersection of these two exciting fields.
About Kimberly Grauer
Thanks to our Sponsors:
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
About Kimberly Grauer
Kim is the Director of Research at Chainalysis, where she examines trends in cryptocurrency economics and crime. She was trained in economics at the London School of Economics and in politics at Oxford University. Previously, she explored technological advancements in developing countries as an academic research associate at the London School of Economics, and was an economics researcher at the New York City Economic Development Corporation.
Overview
Did you know that machine learning can predict crime patterns on the blockchain? With one percent of all blockchain transactions linked to illegal activities, ML now plays a critical role in investigating criminal activity. This week, Chainalysis's Director of Research, Kim Grauer, joins Jon Krohn to dive into this ever-growing field of research. After welcoming Philip Gradwell from Chainalysis two weeks ago, we've teamed up with Kim to explore the state of economic-data analysis on the blockchain, as promised.
At Chainalysis, Kim primarily concerns herself with economic questions: What drives people's behaviour? Why is blockchain activity increasing or decreasing? And because many of the transactions on the blockchain may not be economically meaningful, Kim's goal is to reduce the incredible amount of noise in blockchain data and distinguish among the many signals to serve clients like banks or governmental agencies.
In 2021, 14 billion dollars in illicit activity took place on the blockchain. These may include ransomware, NFT wash trading, terrorist financing, stolen funds, malware, and child abuse. But when crypto wallets possess anonymous properties, how can their activities be traced? As Kim explains, wallets are likely to participate in transactions with legal entities that are required by law to keep records. From here, Chainalysis collaborates with these organizations to follow the paper trail.
When it comes to the future of crypto, blockchain and machine learning, Kim predicts that blockchain technology will become more intuitive over time, creating a digital native world. This shift, however, is likely to happen gradually until we're all comfortable with these new technologies.
Tune in for more from Kim, including examples of her own research on blockchain data.
In this episode you will learn:
- Kim's role as Director of Research [5:02]
- The unique real-time economic-data analytics of the blockchain [13:07]
- How ML can predict patterns of criminal activity on the blockchain [18:56]
- Interesting use cases of ML for crime investigation [29:37]
- The tools and approaches Kim uses daily [47:44]
- The future of crypto, blockchains, and data science [50:54]
- Why a data science bootcamp helps people break into data science [53:42]
Items mentioned in this podcast:
- Datalore - Use the code SUPERDS for a free month of Datalore Pro, and the code SUPERDS5 for a 5% discount on the Datalore Enterprise plan.
- Chainalysis
- SDS 621: Blockchains and Cryptocurrencies: Analytics and Data Applications
- SDS 537: Data Science Trends for 2022
- Mempool
- NFT wash trading
- Colonial Pipeline ransomware attack
- The 2022 Global Crypto Adoption Index
- Gephi
- Data Science Bootcamp
- SuperDataScience Podcast Survey
- Jon Krohn's Podcast Page
- The Overstory by Richard Powers
Podcast Transcript
Jon Krohn: 00:00:00
This is episode number 625 with Kim Grauer, Director of Research at Chainalysis. Today's episode is brought to you by Datalore, the collaborative data science platform.
00:00:14
Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, Jon Krohn. Thanks for joining me today, and now, let's make the complex simple.
00:00:45
Welcome back to the Super Data Science Podcast. We've got the profoundly insightful Kim Grauer on the show today. Kim is Director of Research at Chainalysis, the world's leading crypto analytics firm, whose analysis is regularly featured in major mainstream news outlets. Previously, Kim worked in an economic research and analysis group for New York City. She holds a master's in political theory from Oxford and a master of Public Administration from the London School of Economics, and she completed the General Assembly Data Science Bootcamp.
00:01:12
Today's episode will appeal primarily to folks who are interested in blockchains and cryptocurrencies, particularly those keen to perform data analysis on blockchain data. In this episode, Kim details the unique realtime economic data analytics opportunities that blockchains provide. She also talks about examples of her own research on blockchain data, such as analyses of illegal activity and global crypto adoption. She fills us in on the tools and approaches she uses daily to analyze and report on blockchain data. She talks about where the evolutions of crypto blockchains and data science are going together, and why a data science bootcamp could be exactly the right thing for you if you're looking to break into the field of data science. All right. You ready for this illuminating episode? Let's go.
00:02:04
Kim, it's awesome to have you here on the Super Data Science Podcast. I've been waiting for this opportunity for weeks, and now you're here. Where in the world are you calling in from?
Kim Grauer: 00:02:13
Thanks for having me. I'm calling in from New York, New York.
Jon Krohn: 00:02:18
Yeah. We missed an opportunity to film in-person. Sometimes when I have New York guests, they come right to my apartment and we film there, and we'll just have to do that some other time. The reason why we got into the scenario administratively, it was an administrative oversight because our original intention was to have you and your colleague Philip Gradwell together in episode number 621, but you couldn't make it that day, and so we split up the episode and, actually, that ended up working really well.
00:02:50
So we couldn't have done it with you and Philip in-person because Philip is in France. So yeah, we just set this up as a remote recording, and it ended up being fortuitous that we split up you and Philip because it turns out you both have an absolutely enormous amount to share about how data science is applied to blockchains and to cryptocurrency. So everyone benefits from this new situation, except that we didn't get to meet in-person.
Kim Grauer: 00:03:22
Yeah. I secretly created that situation because I wanted my own episode.
Jon Krohn: 00:03:28
Oh, very good, diabolical Machiavellian, I love it. So I originally heard about you through Sadie St. Lawrence, who was in episode 517 of the Super Data Science Podcast, and then she is my only ever, so far at least, repeat guest. So she came back in episode number 537 to do data science trends predictions for 2022. So it was the first episode of this year and she was predicting trends, and she had a whole section on cryptocurrency and blockchains. I can't remember if I said this to her on air or after the episode, but I was like, "Is that stuff really important to data science? Do I really need to know this?" and she was effusive about how important it is, and the great opportunities for data analysis that there are for data scientists to be working with blockchain data, cryptocurrency data because there are these big public records.
00:04:31
Then she highly recommended you, Kim Grauer, to be on the Super Data Science Podcast to be our luminary to describe that. Then I noticed that you were a colleague of Philip, whom I've already known for 15 years. So yeah, again, wanted to get you in the same episode, but then ended up in this better situation with you in separate episodes.
00:04:52
All right. So for a couple of years, you were a senior economist at Chainalysis, and now you're a Director of Research, and you've been doing that for a couple of years as well. So what does this Director of Research role entail at Chainalysis and how is it different or related to what you were doing as a senior economist?
Kim Grauer: 00:05:13
So the Director of Research role right now oversees all of our public-facing research that we put out into the world. At Chainalysis, I've been at this company for five years, over five years. So I've been probably deposited on 10 different teams, but always doing the same thing, which is research of what's happening on the blockchain. I've never really been tied to products or things that we sell. It's more about what insights can we glean about cryptocurrency because, oh my gosh, there's so many misconceptions around cryptocurrency. The only way to combat that is just with boring, neutral data.
00:05:58
Turns out a lot of the data is not boring. Crime data is not boring, but early on, our founders, I remember sitting down on my first interview, and it was pre-series funding. We were at a co-working space, and they didn't have a marketing team. They didn't really have product teams yet, but they knew that it was important for them to prioritize research. So they hired me on as an economist. Since then, I've just been doing various forms of research.
00:06:33
It proved to be more important than I think. Maybe they knew it, and I didn't quite appreciate how important it was going to be for our company and also for the industry because we now do things with that data like testify in front of the Senate Banking committees and really important stuff for the industry.
00:06:53
So as a senior economist, I was doing small blogs, small research questions around interesting, nerdy, quirky things that the blockchains have to tell us, but now, it's become an arm that I lead within Chainalysis, a team of really smart people focusing on different things, tackling the world, tackling the most interesting questions in the industry that we think that people need to know about, coordinating with different companies and partnerships and data feeds, and just trying to get the best possible answer to some of the most interesting questions that the industry needs to be able to answer if it's going to grow and become more widely adopted.
Jon Krohn: 00:07:44
Yeah. So it sounds like you specifically played an enormous role in bringing Chainalysis to the world. So I see Chainalysis frequently in mainstream media publications. I read The Economist all the time. You guys are mentioned in there regularly. In terms of providing data to the public based on research, Chainalysis is head and shoulders above any organization with respect to analyzing blockchain or cryptocurrency data. There's no question.
00:08:16
So it sounds like the work you've been doing at Chainalysis from that initial conversation prior to any series funding has, it sounds like you're making an enormous impact not only in the company. Not it sounds like. Let me rephrase that. You are making an enormous impact not only in the company, but also to the world, and some of that is financial information. Like you're saying, might not be the spiciest out there, but some of this is also related to crime. So you're also making a big positive difference with the research that you're bringing out. Yeah. It's interesting to hear that places like the US Senate Banking Committee are taking notice and valuing what you're doing. It's super cool.
Kim Grauer: 00:08:57
Yeah. Thanks for saying that. That's really cool to hear. I do feel like I have, and my team and we at Chainalysis have impact, and I think that's one thing that's really interesting about this space is that it's new. There's a lot of opportunities to be a part of the conversation more so in cryptocurrency than in other domains because there's not a rule book, really.
00:09:29
So we've had an extreme ... It's something that is one of those things where a lot of things came together in the right way to allow for this to happen. We've really been nimble and been adaptive and not confined ourselves into research boxes and tried to respond as best we can to what we think the industry needs. There's some things I think about now is what next, but, obviously, that's a bigger question.
Jon Krohn: 00:09:59
Nice, and we will get to some what next later, what nexts. Before we get there, there's a topic that we discussed prior to recording that I would love to rehash for our audience because it really interested me, and it seemed to even surprised you as a question. So let me frame this for the audience.
00:10:20
When people talk about blockchain technologies, it is often referred to as the blockchain. Actually, you've used both in this interview now. So you talked about the blockchain and then later you mentioned blockchains. So I asked you as an expert in this field, what is correct? What is the right way to do it? I hear people talking about the blockchain, but it seems confusing to me because there's lots of blockchains out there. There's the Bitcoin blockchain, the Ethereum blockchain. So it seems like there are blockchains, and yet they are referred to as the blockchain. So I love for you to elucidate that for our listeners.
Kim Grauer: 00:11:00
Well, I had never thought about it. Probably a bad thing to admit on this podcast, but I never thought about it.
Jon Krohn: 00:11:09
Well, it's like a fish not noticing the water that they're swimming in. It's like-
Kim Grauer: 00:11:11
Yeah, yeah, and we were saying now I'm going to notice it everywhere. I think that my answer, and there might very well be a correct answer out there, but my take was that it's a relic of the times before there were all these multi-chains. So we live in a time now where we talk about Solana, Ethereum. All of these other blockchains are equally important. In fact, Bitcoin is, I think, less than 10% of total transaction volume that we see. Whereas the obvious next thing I'm going to say is it used to be 100%. There used to be one blockchain, the blockchain. So that could be it, but there could also be other reasons as to, yeah.
Jon Krohn: 00:12:00
I think it's an elegant explanation, and it makes perfect sense to me. So some listener out there, if you want to correct us, you can find me on LinkedIn and message me about that, but I think that this is a great explanation for why the blockchains are often referred to as the blockchain, the historical context. Makes perfect sense.
00:12:21
All right. So you've been, for more than five years now, analyzing the blockchains. So Sadie and Philip were also talking about this huge opportunity for people who are interested in doing data analysis like data scientists are because it's this huge public record of transactions. So there's an enormous amount to be mined. So clearly, that's the thing that must have drawn you to this space initially when you started getting involved with Chainalysis, but prior to filming another conversation that we had was about how there are some caveats to this great opportunity. So yeah, maybe you can fill us in on that a bit more, both the big opportunity of these big public blockchains that can be analyzed, as well as something that makes it tricky, some bits that make it tricky.
Kim Grauer: 00:13:18
Well, there's always a caveat. It's the expression, "The large print giveth and the small print taketh," but with cryptocurrency and with blockchains, in particular, the obvious use cases and opportunities that this dataset represents are anyone can have access to the real-time data, not just the real-time data, but also what's in what's called the Mempool, so transactions not yet to be settled. Anyone can get access to a node and publicly analyze this data.
00:13:54
It's extremely powerful for researchers. There's a lot of economic activity. Can you think of another real-time economic dataset that anyone can have access to? I used to work for the government, and I would use year-old survey data to explain economic trends. We would all get so excited, "Oh, a new survey is coming out," but it's already a year old. So the opportunities are huge.
00:14:25
Now, the reason why I said that there are caveats is because this is a messy dataset, and it is full of noise. Sometimes you have trouble separating the signal from the noise. I think that's probably true in any big dataset, but I'll give you, I think, maybe one or two examples of what that means in blockchain data.
00:14:50
One example is that sometimes there might be a thousand transactions between different distinct wallets, but they're all actually a part of just one payment chain from one service to another, and just the technical quirk that has worked out is that they will be carrying out one transfer across a thousand different wallets. So if you just parse the blockchain, that's going to look like a thousand different wallets, but that's not a thousand units of economic value.
00:15:25
For me, increasingly, one thing that I'm struggling with the blockchain data is that a lot of ... I'm interested in economic questions. So what is driving people's behaviors? Why is blockchain activity going up? Why is it going down? What is the significance of this data? How do we interpret it? There's also this a lot of transfers are just administrative transfers between exchanges that are things that are happening on their books that we're not privy to that information. So there might be a 700 million dollar transfer into an exchange. It might just be them topping up their reserves, and it's not actually economically meaningful or them moving their money around.
00:16:10
So that being said, this is still early stages in this data. So again, the challenges present opportunities, and there's this concept where in a bear market, you put your head down and you build, and I'm seeing a lot of people, a lot of different startups and companies excited by these problems, wanting to be the one to solve for these issues. So where is the signal? Where is the noise? That's certainly one thing that Chainalysis does as a company is we identify services and we really try and reduce the amount of noise in the blockchain. When you live in any single dataset, some days are better than others when it comes to appreciating the value of your data.
Jon Krohn: 00:16:55
Right, right, right, right. Today's show is brought to you by Datalore, the collaborative data science platform by JetBrains. Datalore brings together three big pieces of functionality. First, it offers data science with a first-class Jupiter notebook coding experience in all the key data science languages, Python, SQL, R, and Scala. Second, Datalore provides modern business intelligence with interactive data apps and easy ways to share your BI insights with stakeholders. Third, Datalore facilitates team productivity with live collaboration on notebooks and powerful no-code automations. To boot with Datalore, you can do all this online in your private cloud or even on-prem. Register at datalore.online/sds and use the code SUPERDS for a free month of Datalore Pro and the code SUPERDS5 for a 5% discount on the Datalore Enterprise Plan.
00:17:48
Yeah, those all make sense. So huge opportunity here. There's no real-time economic dataset like it in the world, and that's a huge difference from economic datasets that people have had to work with historically. Like you're mentioning, these year-old surveys being common. So despite those big opportunities, of course, like any big opportunity, there are the slightly less attractive bits. In this case, sometimes they're being very big transactions, which could appear to be a signal, and it's difficult to distill whether that is a signal or whether it's just some administrative move, maybe even between two wallets that one organization has. Got it. That makes perfect sense.
00:18:38
So as Director of Research, you are responsible for lots of reports, as we've already talked about. We're going to get into a bunch of those. One of those reports is about illegal activities. So I want to dig into that a little bit. Two years ago, Chainalysis estimated that 1% of blockchain transactions were linked to illegal activities, and 1% is a big number because of the absolute number of transactions that happen. So 1% might not sound like a lot, but when the number of transactions is so enormous, it really is a lot.
00:19:17
So two years ago, estimated that 1% of blockchain transactions were linked to illegal activities, so things like ransomware and NFT, non-fungible token wash trading. So what are those things? What's NFT wash trading? What's ransomware? How are these things facilitated by blockchain transactions? Why are they a problem?
Kim Grauer: 00:19:40
So it's funny that you say 1% is small, I mean, is large because a lot of people say, "That's so small," but I'm like, "What is the right number that you would say, 'Hey, that feels right'?" So yeah, that's about 14 billion dollars last year in illicit activity across 10 different or, sorry, 10 or 11, I can't quite remember, different types of illicit activity. Let's see if I can name them off. You did NFT wash trading, ransomware, but we also have scams, stolen funds, sanctioned activity, terrorist financing, child abuse materials, hacking, and malware. So there's many different types of illicit activity that happen on the blockchain.
Jon Krohn: 00:20:30
It is super impressive that you just reeled those off. For people who aren't watching the video version of this, which is most of you, listeners, Kim did that on her fingers. She wasn't looking in notes. She was just counting on her fingers up the 10 or 11.
Kim Grauer: 00:20:44
Dark net marketplaces, fraud shops are two more.
Jon Krohn: 00:20:49
There you go.
Kim Grauer: 00:20:50
I think those are all of them. Yeah, no, so there's a lot of different variety in crime. So when people say, "How much crime is happening?" you want to give one number, but each of those components of crime are really different. So obviously, terrorist financing is really different from an NFT wash trade. So just some definitions, NFT wash trading or wash trading in general is when you create a fake sense, you create fake volumes around an asset being traded. So if I launch an NFT, no one's going to buy my dumb NFT. They're not going to want to buy it unless they love the art that I make. There's not going to unless there's a promise of return, but if I buy and sell my own NFT on a platform hundreds of times in a bot, there's a lot of bots that are lurking in the crypto space because anyone can build one, and might pick up on that and then buy one of my NFTs.
00:21:57
So we're able to quantify that, which is actually a pretty new thing that we're working on. I think that it's a domain that I'd like to linger on a little bit from a data science perspective, but the other types of crime are ransomware. That's when someone downloads, when you accidentally download a malware and your computer gets locked unless you pay a ransom, which is almost always, definitely always in cryptocurrency.
00:22:27
Hacking is obviously a big cryptocurrency exchange was hacked. Dark net marketplace, buying drugs online. Fraud shops, buying and selling credit cards online. Scams are people saying, "Hey, you want to make 100% returns every day for life?" and then not the case. So there's a lot of variety and all of those are driven and impacted by different factors.
Jon Krohn: 00:22:51
Interesting, and Chainalysis works with law enforcement organizations to help them trace some of these transactions into the real world. So even though the blockchain has some anonymity, so this came up in Philip's episode a bit that even though a wallet on its own is anonymous, that wallet might do transactions with a known entity, so like an exchange. Some entities have legal requirements to keep information on the counterparties that they do transactions with.
00:23:30
So you can correct me if I'm wrong on this or if I'm saying anything incorrectly, but it sounds like Chainalysis can work with law enforcement agencies to help them trace the root of some transaction and then the FBI or whoever in the real world can then follow up with exchanges, counterparties, and get to the bottom of who actually is ultimately responsible for this crime that's happened.
Kim Grauer: 00:23:57
Yeah. I'd say that's right. So it manifests in two distinct ways in our customers. One is if you come across a crime scene and there's a cryptocurrency wallet, and you want to get to the bottom of it. You'll use our software to find, to basically follow the money, where did those funds come from, and then maybe the exchange that sourced the funds will have some personally identifiable information on that individual, which can lead to ... So you got the investigation side of things, and then you have the compliance side of things.
00:24:32
So if you're in exchange, you're regulated by the Banking Secrecy Act, the Foreign Corrupt Practices Act. You're not allowed to receive terrorist financing sanctioned funds. It's just a non-starter. You can't do that, but how do you know if the cryptocurrency from the string of alpha numeric letters and numbers is a sanctioned address or not? Well, you put it in our software, we'll tell you where those funds came from, and then you can build an automatic transaction monitoring system. So anything that has traces of any of those illicit activities, you can then get an alert, freeze those accounts, offboard them, not allow them to get processed, and allow your company to remain compliant.
Jon Krohn: 00:25:22
Cool. That was a really eloquent explanation of how that works, much better than mine. I'm glad to have you run through that.
Kim Grauer: 00:25:29
Well, you're talking about, and we talk about wallets as entities too. So yeah.
Jon Krohn: 00:25:34
I talk about everything as entities. What isn't an entity?
Kim Grauer: 00:25:39
That's a really good point. Everything is an entity.
Jon Krohn: 00:25:40
It's a word I use to sound smarter than if I just say thing.
Kim Grauer: 00:25:45
Well, we use entity, actually, in our internal dialogue. So we have internal and external dialogue, so things like clusters or entities or TXIDs. So we have all of these concepts that you have to really learn like, "That's an internal concept versus an external concept." So I'm just impressed I haven't used the word cluster. That's a good thing.
Jon Krohn: 00:26:12
What is it? Can you spill the beans on that?
Kim Grauer: 00:26:14
Sure. A cluster is what we call, basically, a cluster of addresses. So what we do at Chainalysis is we associate addresses together, and we do that through five, six different heuristics. Some of them are manual, some of them are automatic. That's when you're getting into the more data science world. I'll give you an example. There's a concept called because co-spending, where if two addresses spend the same transaction output together in the same transaction, you can know that they're connected. Two addresses must be controlled by the same individual. So they will associate those together and put them in one cluster.
00:26:56
A cluster is a series of wallets. Clusters can have millions, 10 million addresses in every single one. So that's at the heart of what Chainalysis is doing. So like we talked about at the top of the call, anyone can plug into this dataset, but you won't know which addresses are associated with each other without running a whole bunch of heuristics over the entire history of the dataset, which is something that prevents a lot of people from being able to draw the conclusions that we're able to draw.
Jon Krohn: 00:27:32
Yeah, because they have to come up with the heuristics, which requires some expertise. Then even once they've come up with the heuristics, running over on a big blockchain like the Bitcoin blockchain, the blockchain, it would be an enormous computational expense to go over all the history and identify where those spending is occurring, for example. So much cheaper to work with a tool like Chainalysis provides to be able to quickly get the results of all of your intellectual property and computational work.
Kim Grauer: 00:28:09
It would be computationally expensive, but also, there's other heuristics that we employ to get attribution, what we call attribution. I think that's another internal word, where we identify what the name of the service is, so who controls those 10 million addresses. So it's a hard, hard one dataset that is really seven years and eight years in the making now, and every day it grows tremendously.
Jon Krohn: 00:28:40
Nice. What do you think about the Super Data Science Podcast? Every episode, I strive to create the best possible experience for you, and I'd love to hear how I'm doing at that. For a limited time, we have a survey up at superdatascience.com/survey, where you can provide me with your invaluable feedback on what you enjoy most about the show and critically about what we could be doing differently, what I could be improving for you. The survey will only take a few minutes of your time, but it could have a big impact on how I shape the show for you for years to come. So now's your chance. The survey is available at superdatascience.com/survey. That's superdatascience.com/survey.
00:29:20
So those are great examples. The co-spending, for example, and you were just talking about attribution in general, those are great examples of how we can be using data modeling techniques to be drawing inferences about a blockchain dataset. Do you happen to have any other interesting use cases, maybe even related specifically to the illegal transactions that we were just talking about? Yeah. How can we be using data science or machine learning to be identifying illegal activities?
Kim Grauer: 00:29:53
It's a really good question, and I think this is where the domain of machine learning and data science in blockchain is particularly exciting. One example might be that there might be tells on what a certain type of illicit address or type of illicit category looks like. I guess a better example is ... So we talked about ransomware. There are many, many different ransomware bad actors, and what they do is they create a strain of ransomware. I'll give you an example of one. One is called Conti. That is the biggest, one of the biggest ransomware strains right now.
00:30:35
Another one that you might have heard of is called Darkside, and that is the ransomware strain that was behind the colonial pipeline attack in 2021, and that got the Biden administration all paying attention to ransomware and saying, "Hey, ransomware is on par with terrorism," but these different ransomware gangs, they hold their cryptocurrency and they manage it in specific ways. So maybe they will manage their transactions after they receive a ransom in a very specific way.
00:31:10
So this does present an opportunity to scan the blockchain based on behaviors, based on properties, based on all of this data that has more information than you might realize. So the time of day of the transaction is something that we use sometimes, the frequency of the transaction, and all of these things that you can create a list of properties and then say, "Hey, what else fits this criteria?"
00:31:37
Now, that is great for research. Wouldn't fit the bar of gaining attribution at Chainalysis, but because we need to be 100% on our attribution because they're used in real life investigations that sometimes might put someone in jail, but it still is really an interesting domain to, if you have a training, a test and training dataset, you can see how close you can get to identifying ransomware.
00:32:10
Then once you perfect that, the cool thing about that is you can apply it across everything, and then all of a sudden you've ... The last thing I'll say on this is one of the cooler things that we did in the crime report that was more data sciencey was we looked at all of the deposit addresses on exchanges that were receiving illicit funds in a given year. I think this speaks to the power of the blockchain that Philip and Sadie were talking to in the past, which is that I could write a query and immediately find all of the off-ramps that are responsible for moving those funds. I think that's extremely powerful if you're interested in crime prevention, proactive, being more proactive with data.
00:33:06
It's really interesting to see law enforcement, who you might not have in your mind as being the ones at the crime scene as being very into the mechanics of data, but they're really leaning into these more proactive techniques.
Jon Krohn: 00:33:24
Cool. That was a great explanation. So we can use machine learning, say, on a training dataset where we have labels of some particular illicit activity that we're aware of, say, ransomware. So we have this label dataset of positive cases, negative cases, and then machine learning is adept at pattern recognition, and so it can recognize the patterns associated with particular behaviors, particular activities on the blockchain that are related to, say, ransomware. Then we can deploy that model once it's trained across all of the blockchain or blockchains and identify criminal activity or flag potential criminal activity across it. Super cool.
00:34:16
So in addition to the crime report that Chainalysis produces, you also produce a global crypto adoption report. In that report, you've identified that emerging markets such as Vietnam, India, and Ukraine are topping this list. So those specific countries, Vietnam, India, and Ukraine have topped the Chainalysis global crypto adoption list for two years in a row. So you've called this grassroot adoption. What makes one country more willing to embrace blockchain and cryptocurrencies than others? Yeah, I guess that's the end of my question.
Kim Grauer: 00:35:01
Yeah. It's what I think is probably one of the more underreported stories in crypto. A lot of media talks about Celsius bankruptcies, which is obviously an important thing, but you don't hear a lot about crypto adoption around the world. So that's what motivates this report because we know that you can't ... Different crime categories are all different. Every country is different and has different reasons for using cryptocurrency. You really see that when you start to tease out our data. Honestly, I do a lot of interviewing for this report because one of the limitations to blockchain data is, "Hey, I see a Coinbase to Gemini transfer." I can't go, "Excuse me. What was the intention of this transfer?" I can't. Ideally, I could, and then they would say, "Oh, this is a remittance payment," or, "This is me sending money overseas," or sending money, buying a Tesla, I don't know.
00:36:14
So we can't ask the intention behind these transactions, but the geography index report was really interested in that question. So we built a way to measure adoption by country. We weighted everything for purchasing power and population, and we saw that these emerging markets were at the top of the index and it's really for different reasons. So the Philippines, for example, really stood out as having a huge play to earn population, and people are interested in DeFi and gaming, and there's a lot of web traffic activity in the Philippines from these sites, but then in India-
Jon Krohn: 00:37:01
So people in the Philippines, they'll play games for small amounts of cryptocurrency. Well, what would be in the USA not worth your time, but in the Philippines, spending an hour playing these video games where you get cryptocurrency reward is worth it because of purchasing power differences. That's what happens there, right?
Kim Grauer: 00:37:24
Yeah, yeah, that's a really good point. That's probably why it's so popular in Philippines. Yeah. So gaming really pops out in the Philippines for those reasons. In other regions in Central and Southern Asia as well, gaming, especially play to earn, are popular in those countries, but in India, we saw a ton of NFT activity, which I thought was interesting as well. Apparently, there was a new cricket game that was launched in India a few months ago or six months ago or something that just really became popular.
Jon Krohn: 00:38:01
Yeah, and that's related to NFT somehow, the game, this cricket game.
Kim Grauer: 00:38:06
Yeah, exactly, trading cards, trading NFTs in India. So there's different reasons for adoption in emerging markets. One thing that I do think is driving a lot of the activity is people want to responsibly invest their money no matter where you are. If you have disposable income, you want to put it in a place where you think it's going to grow, but the problem is you and I, we can buy stocks and we can put our money in equities, but a lot of people around the world don't have that same level of access. There's reasons why they are blocked out of the financial ecosystem for regulations that prevent them from gaining proper licenses for doing cross-border payments or anything.
00:39:07
So cryptocurrency is available to anyone. I think that that drive to invest, people call it and characterize it as gambling, which it is to some degree, but I think that what it is even more than that is just a willing, a need, a want, and a desire to grow your wealth, and we see that a lot around the world.
Jon Krohn: 00:39:35
Yeah. That makes perfect sense. So a big driver behind grassroot adoption, so notwithstanding the specific examples you made around pay to play gaming or NFTs related to cricket, there's a general trend in emerging markets toward adoption because it provides people with a way to earn passively on their income that otherwise they might not have any access to. Is it also the case that it might also provide them with an opportunity to just exchange that otherwise they wouldn't be able to? There might not be other mechanisms for transferring funds between people in some countries like we have in the West, and so crypto facilitates that as well.
Kim Grauer: 00:40:25
We definitely have seen some of that. We talk to people, for example, in Nigeria who just do business. They have a local shop and they do business. They use cryptocurrency. It just makes more sense to them. It allows them to have international customers, and it's just easier in some ways once you get over the hump of figuring out how to download it. I think there's some growing places where it just makes more sense, but then you'll also have places like in Argentina or Latin America I found this more broadly, where people don't care about cryptocurrency. They literally couldn't care less about who Satoshi Nakamoto is. They're just in the presence of hyperinflation, and they are just, "How do I get exposure to the dollar, but I can't get the dollar because in Argentina, the most dollars you can hold in a single month is $200. That's regulations. So let's use stablecoins." So you see people saying, "Hey, I actually could get exposure to the dollar in this other way."
Jon Krohn: 00:41:36
Right. So stablecoins are pegged to a major currency like the US dollar. That's what distinguishes them from the Bitcoins are free, the pricing on them is free to move, but some stablecoins, they're supposed to be pegged and they often are.
Kim Grauer: 00:41:51
Yeah. Often, a lot of the times they're pegged. So yeah, that's right. They are pegged to a stable, secure. They're a stable source of value. So the major stablecoin is Tether, and that's pegged to the US dollar, but we are joking around because there was a recent example, something called UST, which is different from USDT, which was supposed to be pegged to the dollar as well, but it was through this process called ... It was an algorithmic stablecoin, and it lost the peg and crashed to basically nothing.
Jon Krohn: 00:42:29
Yeah, I remember reading about that recently, which was actually what prompted me to say that. Yeah, often pegged, but yeah. So it sounds like some stablecoins like Tether, they're more trusted because they tend to actually have the US dollar assets to back that peg. Whereas, I guess, as is common in any market that expands really rapidly, you end up, especially in the bull times when the market is growing, you end up with people devising these kinds of algorithmic pegs, these kinds of scams. Maybe some of them actually thought the algorithm would work at all times. Maybe some others had a hunch that it would only work in a bull market, more like a Ponzi scheme. So the bear market that we've experienced in crypto in 2022 in a lot of markets has exposed some of those scams.
00:43:34
So related to the grassroots adoption question, so we talked about some of the cases like Philippines, India, where adoption is really high, and it's thanks to grassroots adoption, but there are other instances where we've seen the opposite happen. So in El Salvador, for example, the president there, he pushed Bitcoin as legal tender. So my understanding is it was the first country in the world to have Bitcoin be legal tender, and that hasn't gone over extremely well. So people have taken to the streets to destroy Bitcoin ATMs, for example. So is there any one thing or any short list of things that went wrong there? What should countries be doing to approach making crypto legal and be well-adopted?
Kim Grauer: 00:44:35
I think it shows your very typical question around top down versus bottom up. I think probably bottom up tends to be slightly more sustainable growth, and bottom down, you have to win the hearts of minds in people and it's more effective to do that slowly. I don't think that the experiment with El Salvador has really played out yet, although I think the reaction to it is definitely resulting from this bottom down approach. If you're someone who doesn't like cryptocurrency or thinks it's unstable, then knowing that so much of your country's assets is suddenly overnight sitting in cryptocurrency, maybe you think that's too risky, but we'll see. I've heard other countries are following this or might be following this example. So we'll see how it plays out, but I think that a lot of the backlash really is because of the different approach used there.
Jon Krohn: 00:45:50
Nice. That makes perfect sense. So we've just talked about the Chainalysis' global crypto adoption report. Prior to that, we were talking about reporting on illegal activities. What other reports are you responsible for, Kim, that our audience might be interested in hearing about?
Kim Grauer: 00:46:10
Right now, I'm working on something in DeFi. I really want to figure out questions around market integrity and how we can use all of this public data to better audit the industry. I think a lot of people in the cryptocurrency industry think that the cryptocurrency volumes are faked or there's a few bad actors that are doing a lot of the transaction activity or there are whales that are pumping and dumping the market. So can we use our amazing, the data at our disposal to build tools that can help with market integrity? Can we build an industry-wide market manipulation metric, a front-running metric? Can we identify better where the bots are, when people are front-running?
00:47:07
I think there's a lot, a lot of building to happen there. So that's where my head is at right now and I'm trying to get some of that for our upcoming crime report, but we also do a ton of other research, so competitive landscape, who are the winners and losers in the cryptocurrency space, what's happening in the NFT markets, and then some just random research. My first viral piece of research that I did at Chainalysis was around the lost Bitcoins, so things like that as well pop up.
Jon Krohn: 00:47:41
Cool. Sounds really fun. What kinds of analytical or data science tools do you use day to day to analyze the data that you have at your disposal and create these reports?
Kim Grauer: 00:47:54
So I do everything, and we use Python and Jupiter notebooks, and we plug into our data directly, which is part of the blockchain getting parsed. Then our engineering team will clean it up again, and then we access the databases using SQL and do all of the transformations and chart building and report generation in Python. I do a little bit of exporting to Gephi for some network analysis when necessary. Yeah, I think that's pretty much the main, probably 90% of my time is in there.
Jon Krohn: 00:48:32
Yeah. So a lot of that is probably unsurprising to our listeners. So SQL for extracting information from databases once it's been processed by your data engineering team and put into those structured databases, Python for your data analysis done in Jupiter notebooks, and even some of your report creation, maybe the charts and things like that happens in Python, but there was one tool that you mentioned there at the end that I hadn't heard of before. Gephi?
Kim Grauer: 00:49:01
Gephi, Gephi is great. Gephi, you just-
Jon Krohn: 00:49:04
G-E-P-H-Y?
Kim Grauer: 00:49:07
Yeah, G-E-P-H-I.
Jon Krohn: 00:49:10
Ah, right. Nice. We'll be sure to have that in the show notes. So that's fun for working with graph data.
Kim Grauer: 00:49:17
Yeah, exactly. You just export datasets that are in a network structure and you can upload it into Gephi and then it runs all of these network statistics on your data that you've imported. Super user friendly. Really encourage people to explore it. It's really cool and free.
Jon Krohn: 00:49:38
Nice. Ideal. Other than Gephi, are there any other data science tools or techniques that you're really excited about that you think our listeners should know about?
Kim Grauer: 00:49:49
I don't know if this is not a data science technique, but since I'm not sure about those, what I would recommend there beyond the basics, I would say one domain I'm really excited about is more advanced parsing of Ethereum and smart contract data. So every single transaction on the Ethereum blockchain has this component called the logs, and the logs are a uncharted territory. I think they're really interesting and there's a lot of good data there. So I would point tools in that that area and it's a parsing of that and it's something that I am using, I'm really focusing on now.
Jon Krohn: 00:50:37
Nice. Really cool. So that's a good example. So this advanced parsing of Ethereum smart contract logs, that's a great example of how data science techniques and blockchain cryptocurrency are evolving together. Where do you think that might go in the coming years or, if you dare to take an even bigger risk, in the coming decades?
Kim Grauer: 00:51:02
So it's a really good question, and if I had to venture a guess as to where we're heading, I think that it's very clear that there's a lot of people building in the cryptocurrency ecosystem. So I think a lot of things that we do now might transfer onto blockchain technologies, where you might not notice that they're there, but they're making things slightly more efficient. I think that things that involve being digitally native will just slowly become more and more intuitive in our lives and a part of our lives and transferring money to people around the world using cryptocurrencies potentially, but it could also CBDC. So Central Bank Digital Currencies could also play a role here. It will become a little bit easier over time and it will just create a smoother transition to a more digital native world that ... I know that's not a great futuristic answer, but I see the progress as being slow and hard one and just gradually seeping into our lives until suddenly we're extremely competent and digitally native.
Jon Krohn: 00:52:30
I love that answer. That is a great one. So yeah, just this general idea of smoothing our transition into a digitally native world. Today, we are still, there's so much that is purely analog that could be digital, and things like the blockchain provide us with the opportunity to be tracking these things digitally. Data science, machine learning provides us with tools for analyzing those data for informational purposes, for avoiding crime. So great. That's a really great answer.
00:53:13
So your background is interesting. I mean, there aren't people, maybe there are very young people today who can grow up and do cryptocurrency degrees. I don't know if that's a thing, but you studied political science, including doing a political theory master's at Oxford, doing a master's in public administration at the London School of Economics. So how did that formal education background prepare you for what you're doing today, and then what was the value in doing a data science bootcamp with General Assembly? How did that tie in to the more traditional formal education that you had beforehand?
Kim Grauer: 00:54:02
Yeah. My background is certainly random. It's all over the place a little bit. I think that philosophy and political theory have a clear way in structuring your thinking and structuring the way you carry out research and how you write about that research, how you formulate hypotheses. I think being educated in more of the soft sciences made me appreciate the power of that. So I learned a lot of that from those degrees.
00:54:37
Then honestly, a lot of my data skills have come from the General Assembly Bootcamp, where I was working in government before my job and they allowed us to use a stipend for any education. I saw that there was this General Assembly Bootcamp, and I just was not qualified at all to take it. I mean, I did state up in my graduate degree and I used data and had run regressions and was familiar with statistics and whatnot, but definitely was not familiar with basic programming. They said that. They said, "They don't sign up for this if you don't have these basic things," and I was like, "I don't need it." So I did it and it was way over my head and I wish I had listened to them and I wish I could take ... To be honest, it really lit the spark of me wanting to do that.
00:55:39
So then I joined Chainalysis and I had the basics under my belt, and because Chainalysis was growing so much, I just learned on the job and was really committed to learning and growing the data skills that I had to the point that they are today. I really encourage people who think that, "Hey, I want to do data science, but I'm not qualified," it's not as hard. You just have to put in the work and it's not as hard as you think. I really encourage people to consider these certificate courses because they can be so powerful and in a friendly way that's not a huge, crazy master's degree that's going to put you in tremendous debt for life. You can get exposure to a new field.
00:56:30
So I highly recommend these certificates, especially if you have that, "I'm curious about this thing and I'd really like to try it, but no, I'm no good at it. It's too late." It's not too late, especially for blockchain. I did the certificate several years after I was like, "I'm never going to be educated ever again." So yeah, it was a great resource and I love these certificate programs that are growing, that are popping up and just all the self-education courses that people have access to.
Jon Krohn: 00:57:05
Yup. We think they're great as well. Our sister company, so there's a thing called superdatascience.com that we don't talk about on air very often, but it is that, exactly that kind of thing, like a Coursera for learning data science, machine learning skills. So yes, we are very much behind that as something as these data science bootcamps like General Assembly, and that's a very well-regarded one at General Assembly. They allow you to, yeah, get exposure as you say in a friendly way, an unintimidating way to the breadth of data science.
00:57:42
Then once you know what's possible like you did, so even if some of it was a bit over your head as you were actually pursuing the program, it provides you with a map of understanding what all is possible and then you can fill in those details on the job like you have. So that makes a lot of sense to me. I think there's, yeah, there's a lot of opportunity there and it's very low risk in terms of your time, as well as your money relative to pursuing, say, a master's in data science like you mentioned. So yeah, definitely something for our listeners to consider. If you're thinking of getting into data science, as Kim says, you can do it, try it out, do a bootcamp, see how it goes, do some online courses, see how it goes. You can definitely do it.
00:58:28
All right. Kim, it's been a brilliant episode. I've learned a lot. I'm sure our audience has loved this episode as well. Our regular listeners will know that at the end of an episode, I ask you for a book recommendation.
Kim Grauer: 00:58:43
Oh, yes. Okay. So I just read Bewilderment by Richard Powers, and I semi-recommend that, but I really recommend his first book, The Overstory. It's a really good book weirdly about trees and just a beautiful book and I highly recommend it.
Jon Krohn: 00:59:03
Nice. That's a great recommendation. So then if people want to be able to stay in touch with you after this episode, I'd love to hear what social media you'd recommend that they follow you on, but I'd also like to give a plug here for Chainalysis. So you're doing hiring for data engineering roles and machine learning engineering roles. So another way that people could stay in touch with you if they feel like what you're doing at Chainalysis is really cool, they could literally work beside you and be creating the algorithms that mine the enormous blockchain and provide data into SQL tables that then you can query more easily. So the data engineers would be doing that, and then the machine learning engineers, I imagine, would be training the machine learning models like the ones you described in this episode for identifying patterns of blockchain activity that are, say, associated with criminal activity like ransomware, and then applying those machine learning algorithms over the multi-chain.
Kim Grauer: 01:00:11
Oh, that's what we should use.
Jon Krohn: 01:00:12
So yeah, I don't know if I plugged those accurately or sufficiently. I really don't. I'm taking a bunch of guesses here, but those sounds like what data engineers and machine learning engineers would be doing at Chainalysis. So I don't know if you have anything else to say about that, but yeah, definitely tell us either way how listeners can be following you after this episode.
Kim Grauer: 01:00:31
Definitely reach out to me on LinkedIn or Twitter. I've been told to start doing more Twitter.
Jon Krohn: 01:00:39
More twitting.
Kim Grauer: 01:00:40
Yeah. I'm definitely trying to be more available on that. Also, I love meeting new people, so just genuinely reach out to me on LinkedIn, say you listened on the Super Science Podcast. If you want to chat about careers, feel free to message me or browse our website and, yeah, let's work together.
Jon Krohn: 01:00:58
Nice. Sounds great. I like the idea of being the Super Science Podcast, not just the Super Data Science Podcast. Maybe we should move in that direction. You heard it here first from Kim Grauer. That's the direction we're going in. I love science in general. That'd be a great way to go. All right, Kim. Thank you so much. It's been so much fun having you on air, and as I already said, I learned a ton. Thank you so much for sharing your knowledge with us.
Kim Grauer: 01:01:28
Thank you so much for having me. It was really, really great.
Jon Krohn: 01:01:36
I was deeply impressed by Kim's depth of knowledge on every topic we touched on today, and I had a blast working with her on today's episode. In it, Kim filled us in on how blockchains provide unprecedented access to rea-time economic data, but how massive economically meaningless transactions that occur regularly can add a lot of noise to the data. She talked about how data science through tracing co-spending facilitates the identification of clusters, while attribution enables the identification of crypto wallet owners. She talked about how machine learning can predict patterns of blockchain activity related to criminal activity. She talked about how SQL, Python, Jupiter notebooks, and the open graph visualization platform, Gephi, are tools that she uses daily, and she talked about how the blockchain, crypto, and data science are evolving together to facilitate a digital native world.
01:02:26
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Kim's social media profiles, as well as my own social media profiles at superdatascience.com/625. That's superdatascience.com/625.
01:02:43
Well, so every episode, I strive to create the best possible experience for you and I'd love to hear how I'm doing at that. For a limited time, we have a survey up at superdatascience.com/survey, where you can provide me with your invaluable feedback on what you enjoyed most about the show and critically about what we could be doing differently, what I could be improving for you. Again, the quick survey is available at superdatascience.com/survey.
01:03:07
Thanks to my colleagues at Nebula for supporting me while I create content like this Super Data Science Podcast episode for you, and thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the Super Data Science team for producing another fascinating episode for us today. For enabling this super team to create this free podcast for you, we are deeply grateful to our sponsors. Please consider supporting the show by checking out our sponsors' links, which you can find in the show notes. If you yourself are interested in sponsoring an episode, you can find our contact details in the show notes as well or make your way to jonkrohn.com/podcast.
01:03:41
Last but not least, thanks to you for listening all the way to the end of the show. Until next time, my friend, keep on rocking it out there and I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.
Show all
arrow_downward