Podcastskeyboard_arrow_rightSDS 601: Venture Capital for Data Science

56 minutes

BusinessData Science

SDS 601: Venture Capital for Data Science

Podcast Guest: Sarah Catanzaro

Tuesday Aug 16, 2022

Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn


This week, we're exploring the venture capital side of data science with Sarah Catanzaro, General Partner at Amplify Partners. Learn how to fund your data science business idea, take note of what start-ups can do to survive or raise capital in the current economic climate, and discover how to break into the field of venture capital yourself.


Thanks to our Sponsors: 


About Sarah Catanzaro
Sarah Catanzaro is a General Partner at Amplify Partners, where she focuses on investing in and advising high potential startups in machine intelligence, data management, and distributed systems. Her investments at Amplify include startups like OctoML, RunwayML, Hex, and Gantry among others. Sarah also has several years of experience defining data strategy and leading data science teams at startups and in the defense/intelligence sector including through roles at Mattermark, Palantir, Cyveillance, and the Center for Advanced Defense Studies.

Overview
This financial-focused episode kicks off with the basics. If you're a keen follower of tech and start-ups, you've surely heard of the various rounds of investments. Still, we left it up to Sarah to help us define the differences between the seed stage of investment, series A stage, and private equity.

After ten years of experience in the field, here's how she breaks down what investors are looking for according to each stage:
  • Seed: here, investors are looking for a long-term vision, a minimum viable product (MVP), and a few hypotheses about future products.
  • Series A: here, early validation of the MVP must be present via user sign-ups and deals. Hypotheses testing regarding potential future products should also be present.

While seed and series A are primarily investment vehicles that drive growth, private equity, on the other hand, is an investment strategy that focuses on realizing operational efficiencies.

And while Sarah focuses mainly on early-stage venture capital, she also explains that most of her investments are made before product-market fit is established. At this stage, companies are still iterating on the responses to questions such as: "what are we building and who are we building for?" 
 
A few of her investments include previous SuperDataScience podcast guests, including Peter Abbeel's Covariant and Tim Kraska's Einblick.

When looking back on the lessons she's learned over the years, she says that teams need to explore not only the data that they have but also the data that they could collect in the future, and how they could collect it. She also stresses that the tools and best practices that have been adopted focus on collaboration among data professionals rather than thinking about cross-disciplinary partnerships with data scientists and, for example, designers. 

With the data tooling space growing quickly, Jon wondered how Sarah picks winners in an increasingly crowded space. According to Sarah, her winning formula includes choosing "people, people, people, product, market."

"We are often investing in deeply technical founders who really understand the domain in which they are building solutions. But we are also looking for people who can articulate a crisp version of the future," she adds. "We also want to make sure...that the founder deeply understands and empathizes with the user pain point."

Tune in for more insights on start-up investing and to hear about Sarah's trick for accelerating from a data science idea to obtaining funding.

In this episode you will learn:      
  • Angel vs. venture capital vs. private equity investment [7:27]
  • How early-stage investment is made prior to a firm having product-market fit [14:33]
  • How to pick winners in early-stage investments [28:08]
  • Tricks to accelerating from a data science idea to obtaining funding [36:21]
  • Observational causal inference [44:01]
  • How to get involved in venture capital [47:37] 

Items mentioned in this podcast:
Jon Krohn: 00:00
This is episode number 601 with Sarah Catanzaro, general partner at Amplify Partners. Today's episode is brought to you by Pachyderm, the leader in data versioning and MLOps pipelines. 

Welcome to the SuperDataScience podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple.

Welcome back to the SuperDataScience podcast. Today we're very lucky to have Sarah Catanzaro joining us on the program for an episode all about investing, particularly venture capital investments, in data science companies. Sarah is a general partner at Amplify Partners, a Bay Area venture capital firm that specializes in investing in early stage startups that are pioneering new applications of data science, analytics, and machine learning. Previously, she worked as an investor at Canvas Ventures, as head of data at Mattermark, and as an embedded analyst at Palantir. She holds a bachelor of science degree from Stanford University.

Today's episode will appeal to anyone who's keen to understand investing in early stage startups. In this episode, Sarah details what venture capital is and how it differs from other types of invest like private equity investments, how to go from a data science idea to obtaining funding, how to pick winning investments, what startups can do to survive or raise capital in the current economic climate, the lessons she's learned from 10 years of experience in the field of data science herself and how to break into the field of venture capital yourself. All right, are you ready for this exciting episode? Let's go Sarah, welcome to the SuperDataScience podcast. I'm so excited to have you here. Where in the world are you calling in from? 

Sarah Catanzaro: 02:10
I am calling in from sunny albeit chill San Francisco where I am envious of the heat waves happening everywhere else. 

Jon Krohn: 02:21
Oh yeah. It is hot in New York at the time of filming. Yesterday with a thunderstorm and the humidity from that, it was over 100. Yeah. My puppy does not love that. 

Sarah Catanzaro: 02:34
I mean, it was a bummy 52 degrees here yesterday. 

Jon Krohn: 02:42
For our listeners in Europe who are experiencing that in Celsius, Sarah means in Fahrenheit. 

Sarah Catanzaro: 02:48
Yes. 

Jon Krohn: 02:48
In this crazy heat wave of a summer. All right, so we know each other from the New York R Conference. So the R Conference has been going on for years. It often has some of the biggest names in data science speaking at it. Wes McKinney is there a lot. And we recorded a live episode of SuperDataScience there live with Hillary Mason, one of the biggest names that we've ever had on the podcast. That's episode number 589. And Sarah, you were a speaker at the R Conference as well. You're a recurring speaker at New York R conference, aren't you? 

Sarah Catanzaro: 03:20
I'm a recently recurring speaker. So this was my second time presenting there. 

Jon Krohn: 03:26
Second consecutive year. We actually have some questions based on both the talk that you gave last year, as well as the talk that you gave this year. So some valuable content that you provided for us. So you are an investor. You have invested actually in a number of our recent speakers companies, recent guests on the shows companies. So professor Pieter Abbeel, another really big name guest that we've had on, his robotics company, Covariant, you've invested in them. Peter's in episode number 503. He does a great episode on industrializing robotics research. And it's a really nice technical deep dive from a renowned Berkeley professor. And then we also more recently had professor Tim Kraska. So he's at MIT, and he has a company called Einblick that you've invested in Sarah. So a few different ways that we're connected and some really cool companies that you've invested in. So at this most recent R Conference, you did a talk on lessons learned from your 10 years of experience in the data science space. Do you want to give us a taste of the lessons that you've learned? 

Sarah Catanzaro: 04:41
Yeah, absolutely. So in the past three to five years, I feel like data teams have come a long way in terms of adopting new tools, new workflows, et cetera, and they have made significant progress, but there were some lessons learned in the decade prior that haven't really been encoded in the [inaudible 00:05:08] data teams today. So for example, I see a lot of effort being put into preparing data that exists within the data lake or the data warehouse, so starting with the data that you have. Whereas I think 10 years ago, we started to recognize that sometimes the data that we need is not available. 

And so when starting a new project, we ought to think not just about the data that we have, but also about the data that we could collect and how we can collect it. Similarly, I think a lot of the tools and best practices that have been adopted really focused more on collaboration among data professionals rather than thinking about the interfaces between engineers and data scientists or even designers and data scientists. In my role in VC, I think often what I'm doing is thinking about what have we learned in the past, but not really implemented, but also what are we learning today that could breed problems in the future and how might we rethink those standards and workflows. 

Jon Krohn: 06:35
Cool. It sounds like an amazing position to be in to be changing the world. So data science, as we know, as probably a lot of our listeners are aware and probably why they're excited about this space, is we have the opportunity with data and automation to make a large number of enormous changes that iteratively transform the world over a many year span, especially a multi-decade span. And in a position like yours, you are providing the capital that allows great ideas to be turned into broadly applicable real world applications. So super cool job. So to go into some specifics about your firm and what the related terms mean, you work at what's called an early-stage venture capital firm. So let's break down for listeners that aren't already aware what venture capital is and how that's different from other kinds of investment that a company could obtain. And then what does it mean to be early-stage? 

Sarah Catanzaro: 07:40
Yeah, absolutely. So the way that I typically explain it is that venture capitalists we exchange capital and expertise for ownership in companies. So what we typically do is invest capital usually at early stages, at least 100,000 up to, let's say, 20 million. And what we buy is equity in startups who have plans to scale beyond an initial idea or even an initial product, but into large scale business, large scale enterprise that could be generating hundreds of millions of dollars in revenue and possibly IPO. By early stage what I mean is that we're not investing in those companies pre IPO. We're investing in them really pre-product market fit. Now product market fit is such a nebulous concept, but really we're looking at companies where they're still answering questions about what are we building for? For whom are we building? What problem are we solving? You're still iterating on the responses to those key questions. So they're not yet at the phase where you can make certain sales and marketing optimization such that if you pour $1 in, you know that you're going get $2 out. So that's typically what we mean by early stage. These are companies that may be pre-product, may be pre-revenue sometimes, but are definitely pre-product market fit. 

Jon Krohn: 09:32
Got it. So the earliest stage of investment would be, I guess, angel investment or seed investment when companies definitely don't have product market fit and they might not even have a product at all. They just have an idea. They've maybe put a team together. So then they can get that angel investment. And that might be less than $100,000 or less than a million dollars in most cases. And then in a lot of circumstances, I guess, once they've proved some product elements, they have some working components, maybe, maybe not, then they come to a venture capitalist like you, and they raise 100,000 to probably a few million in early stage venture capital at that early stage. And then maybe once they definitely have their product worked out, but they haven't figured out the product market fit. They haven't figured out exactly to whom they're selling or all the possible different markets they could be selling into. That's your sweet spot. So once a firm has been working with you, they've maybe done a couple of rounds of raises. Can you distinguish for us this idea of series A versus series B and so on? 

Sarah Catanzaro: 10:52
Yeah, absolutely. So angel pre-seed seed, I would say that a lot of that can be fairly nebulous, but during those phases, I think the thing that is critical to have is a long-term vision, that vision of how you build this billion dollar business and what's the future with your product or platform looks like over the next 10 years, and then an MVP, a strong sense of what is the thing that you build first, as well as potentially some hypotheses about what is the next thing that you build after the MVP? What is the next set of ideas that you test? So you need to have an idea about what Z is. You need to have a pretty clear idea about what A is. And then couple of hypotheses about what B might. At the series A phase, you've typically seen some early validation of the MVP.

So perhaps you've had some users adopt the platform, maybe you've signed a couple of deals from those interactions. You're collecting additional information about what people want. You're also collecting additional information about perhaps the competitive landscape or the technical issues that you're going to need to address in order to get to the next step. And you have tested some of these hypotheses about what you should build next. So at series A, you're not quite at product market fit. You might still have questions about various personas. Maybe who know who your user is, but you don't have the clearest sense of who are the other stakeholders in a buying process, or how do you ensure that your tool is sticky and you can retain users over a longer time horizon? But you're off to the races, really. Again, a lot of this is pretty arbitrary. And I think at times the ways in which we define scene stage versus series A stage hinges on the macroeconomic climate, but generally speaking, I think about the seed as being really about testing the MVP, getting validation of the idea and collecting information so that you know how to act or proceed from there. 

Jon Krohn: 13:52
Nice. This episode of SuperDataScience is brought to you by Pachyderm. Pachyderm enables data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Their unique approach provides parallelized processing of multi-stage language agnostic pipelines with data versioning and data lineage tracking. Pachyderm delivers the ultimate CICD engine for data. Learn more at pachyderm.com. That's P-A-C-H-Y-D-E-R-M.com like the elephant. All right, now back to our show. So then, with Amplify investing at the early stage, what's then the desired outcome with the investments that you make? Where are you hoping that they end up? Do you get involved in later investment rounds? Or do you exit your investment or maybe a mix?

Sarah Catanzaro: 14:55
Yeah. Great question. So very tactically, we invest primarily at pre-seed, seed series A where we will lead the round, which to simplify things generally means that we are writing the biggest check and potentially taking a board seat. However, we will participate in subsequent rounds. 

Jon Krohn: 15:19
Got you. 

Sarah Catanzaro: 15:21
But typically, at series B and beyond, the company will bring in a new investment lead so that they're not just accruing dollars, they're also accruing expertise. 

Jon Krohn: 15:31
Got you. 

Sarah Catanzaro: 15:32
So at Amplify, we recently closed our fifth fund, as well as an opportunity fund that doesn't allow us to invest in those subsequent stages. And that set of funds it's 700 million. Before we, the people on the investment team, as well as Amplify's operating team see any return, we need to return $700 million to our LPs. And those are the people who invest in Amplify. 

Jon Krohn: 16:03
The limited partners. 

Sarah Catanzaro: 16:04
Exactly. The limited partners, which are often pension funds, perhaps hedge funds, university endowments. Given that dynamic, we really need to swing for the fences. So we're not looking to sell portfolio companies to potential acquirers in the tens of millions, even hundreds of millions. We are really looking for those billion-dollar outcomes. I think the other thing that is easy to forget, particularly in a bull market, is that most startups don't survive. Most startups don't become billion-dollar companies, particularly when you're investing at the pre-seed stage when the company may not even have a product, let alone that product market validation. So given that dynamic, it makes it even more critical that we really focus on things that can get big. 

Jon Krohn: 17:05
Nice. That makes a huge amount of sense. You talked about having multiple funds there. So my understanding is that this is standard practice that a lot of investment funds, venture capital firms, as well as private equity firms where you'll have ... You're called to fund often a venture capital fund, but actually within the venture capital company there are typically multiple funds. So you talked about this $700 million fund being your fifth fund. So what does it mean to have these separate funds? Do you have to fill a certain number of investments for each fund? I guess there must be a target size. You might say, "Okay, we have $700 million. So we'd like to carve this up into 20 different investments of a range of sizes." Some of them will be $100,000 dollars, which aren't going to take up much or take 7,000 of those investments. So there's not going to be very many that small to take up all the investment in that individual fund. So what's the process like? Is there a time span that you're trying to deploy that capital over, and then when are the LPs expecting a return on that 700 million? 

Sarah Catanzaro: 18:23
Yeah, absolutely. So to be clear, that 700 million is a pair of funds. One is our early-stage fund. The other is the opportunity fund whereby we invest in our existing portfolio. 

Jon Krohn: 18:38
Got you. 

Sarah Catanzaro: 18:38
So generally, we have a 10 year life cycle for funds, which means that we expect to see returns over 10 years. However, on average, I would say we probably deploy a fund. Well, we make our initial investments over the course of two to three years. So in the first two to three years of that tenure cycle, we make our bets. However, in the remaining years, we will make additional decisions about where we want to double down. And so we call that concentrating capital. Typically, early-stage investors and subsequent investors will have what we call a pro-rata right, which is the right to invest in subsequent rounds so that your ownership remains relatively constant. I'm simplifying things slightly. But, for example, if I owned 20% of the company at seed, then I would have the right to own approximately 20% of the company at the next round. 

Jon Krohn: 19:57
Got you. 

Sarah Catanzaro: 19:57
Now, as the round sizes, as those increase pretty significantly, usually the A is going to be bigger than the seeds, the B will be bigger than the A, those pro rata checks end up getting pretty big. And so we have these select funds or opportunity funds as really a strategy to diversify our investments. Because if we were to write our pro rata checks from the flagship fund, then we could end up with a really high concentration of capital in a few companies, which is a very different approach to mitigating risk than you typically see with an early stage fund where you have more diversity. 

Jon Krohn: 20:50
Cool. All right, that was a great explanation. I definitely now have a better understanding of how venture capital works. So thank you. I'm sure many of our listeners have a better understanding now as well. And one last final question for you is related to this separate investment firm that we often hear associated with later stage companies or bigger companies, which is private equity. So how is private equity different from venture capital? 

Sarah Catanzaro: 21:16
Again, great question. So I think what people often conflate, and understandably because they're subsets, is private equity and growth equity. So many private equity firms are investing in private companies that are not startups and helping them streamline their operations such that they can sell those companies and capture the upside. Now, in the past couple of years, an increasing number of companies that would otherwise go public were staying private. And so you saw a lot of private equity companies also as well as hedge funds look at these growth stage startups and start making investments in those startups. So in many ways, growth equity which is really, again, oriented towards growth and private equity which is oriented towards realizing these operational efficiencies are two significantly different investment strategies. 

Jon Krohn: 22:32
Got you. So private equity generally is not making growth investments like venture capital is, although sometimes there are scenarios where private equity firms are doing late stage growth investment maybe in companies that a company like Amplify has already done the early stage investing. 

Sarah Catanzaro: 22:51
Yeah. That's exactly it. So some private equity firms will have a growth equity practice where they're focused on investing in these companies that are growing rapidly, still experiencing over 100% year-over-year growth. Whereas others focus primarily on companies where they see opportunities to improve their economics, improve some of their fundamentals by doing operational restructuring. 

Jon Krohn: 23:26
Got you. Makes perfect sense. That was a great explanation. So to sum up, it can actually end up being the case with the early stage venture capital that you're doing, in the words of our researcher Serg Masís, that folks can show up with an idea and leave your office with a briefcase full of money, but they have to have a great idea of what product A is going to be, a bunch of ideas as to what product B could be and a big vision as to what product Z or Z might be that billion-dollar company. 

Sarah Catanzaro: 24:08
Yeah. I would make a few corrections. First and foremost- 

Jon Krohn: 24:13
You must have a lot of briefcases. 

Sarah Catanzaro: 24:16
Well, I was actually going to say post-pandemic we're not typically in the office, but yes, we don't typically do capital transfers over briefcase. We typically wire our investments. Have we invested in companies where it is basically just an idea? Yes. But again, I think it is an idea about the future. It is a clear set of product and technical requirements for the present as well as demonstrated market end-user research about why that imagined future and why that immediate product both solve an urgent problem, but unlock a better way of doing things over, again, a longer time horizon. 

Jon Krohn: 25:12
It's probably also the case. I mean, you can tell me if I'm wrong, but I would imagine if I was you, I might be more likely to make a pre-product investment if it was going to be working with a team that had demonstrated success in the past, so the individuals on the team are already successful entrepreneurs. They have some track record. 

Sarah Catanzaro: 25:34
Yeah. So actually, that's an interesting one because I would probably say that the majority of Amplify founders are in fact first time founders. 

Jon Krohn: 25:44
Oh, no kidding. 

Sarah Catanzaro: 25:46
Yeah. So we actually really focus on a persona I might call the practitioner towards the turned founder. So we're looking at people who deeply understand the problem that they set out to solve. 

Jon Krohn: 26:04
Yeah. That's definitely the case with Pieter Abbeel and Tim Kraska who've been on SuperDataScience. Both of them are- 

Sarah Catanzaro: 26:09
Exactly. 

Jon Krohn: 26:09
... deep technical experts who are now starting maybe a company for the first time. Cool. 

Sarah Catanzaro: 26:16
Yeah. I mean, I think those are both great examples to where both of the founders are also technical. They don't come from a more traditional business background either, but they had done research to really understand the technical requirements associated with the solution to the problem that they were trying to solve. They had actually worked with collaborators in an academic setting to better understand their needs and to gain more visibility not only into the technical requirements, but into the user pains, as well as the potential opportunities and started their respective companies based on, again, years and years of research. 

Jon Krohn: 27:06
Cool. All right, so getting into more detail about exactly the kinds of companies that you invest in, Sarah, you focus on investing in early stage companies in the data tooling space, everything from databases to analytics engines and MLOps tools. So our listeners can probably get a great sense now of why you specifically were an amazing guest to have on the show on a data science program to talk about venture capital, because that really is your area of expertise. So to get a sense of the scale of how the space is changing, there is a pretty popular diagram, a chart out of FirstMark Capital that shows a landscape of companies in the data tooling space, and we'll include in the show notes. From 2012 there's 150 vendors on it. And last year's chart has over 2000 companies. So you can barely see the names and logos on the infographic. So in a space like that that has exploded, it's about 10 times larger in terms of the number of companies over a 10-year span, how do you pick winners in an increasingly crowded space like that, Sarah? 

Sarah Catanzaro: 28:21
Yeah, it's a great question. So I think about this through two lenses. One is how do we pick winners? The other is how do we pick winners in a space like that? So I'll start with how do we pick winners. The thing that I often tell people who are going into early stage B, C or who are asking about how we make decisions is that it's people, people, people product market. As I had mentioned before when we were talking about Pieter Abbeel and Tim Kraska, we are often investing in deeply technical founders who really understand the domain in which they're building solutions, but we're also looking at people who can articulate a crisp vision of the future, who can do so in such a way that they will embolden and enamor potential customers, recruits, community in some senses. We also want to make sure that in addition to deeply understanding the technology, a potential founder deeply understands and empathizes with the user pain point.

So [inaudible 00:29:40] investing is often that you find founders with a technology that's searching for a problem rather than a technology that is either solving a problem or unlocking a clear opportunity. We do think a lot about the product strategy and how does that technology get encapsulated in a product that one can sell efficiently, that one can sell repeatedly without incurring too much services, implementation and other associated costs. And lastly, we think a lot about the market and whether we expect the markets that that company is selling into to expand, to contract or to become crowded. So let's talk about that. Clearly, the data ML market has become increasingly crowded. I think in that vein, what we're typically focusing on is this company solving an urgent problem? Is it a problem for which there is no adequate solution, but also does the founder have a sense of what are the adjacent use cases or workflows that they can expand into?

So you're not just solving that point problem, but you're also thinking about the adjacencies. And I think that's really important, not just from an investing perspective, but from the perspective of data science practitioners. Nobody wants to use thousands of tools. Nobody wants to use 10 tools to get their job done. I'm not super bullish on, quote-unquote, end-to-end platforms, one tool to rule them all. But if your options are somewhere between one and 10, three feels about right, four feels about right. So really identifying those problems and the things that are orthogonal or adjacent that you can cover I think becomes more critical to delivering a great experience to users. 

Jon Krohn: 32:02
Super cool. All right, so to summarize how you pick winners, it's people, people, people product market, and is there a direction to that over time? Or is it mostly just your point is that you're emphasizing people the most? 

Sarah Catanzaro: 32:20
Yeah. So at the earliest stages, you don't have much to go off of when you're evaluating products and markets. Nobody can predict how markets will inflect over time. The company is going to iterate on their product. They're going to gain information by interfacing with users in the market, but that hasn't always happened yet, particularly, again, at the early stages. So really, the thing that you can bet on is the people. Are these great leaders? Do they have a clear sense of thinking about the alignment between technologies, products and problems? Over time, you gain information about the product, you gain information about the market. And so the balance shifts away a little bit from people, people, people product market to maybe people product market. But I think the thing that we see time and time again is that it really does take people to make a great company, and companies with great leaders that are able to recruit and retain great talent they're often able to survive some of the hiccups in product or market that almost always arise. 

Jon Krohn: 33:48
Right. Cool. That's great perspective. All right, so that gives us a sense of how to pick winners in general. And then do you have specific guidance for when it's a crowded fast-moving space like data tools? 

Sarah Catanzaro: 33:59
Yeah, I mean, that's where, again, I get into the focus on solving an urgent problem, but also an urgent problem with these adjacent workflows that you can expand into. 

Jon Krohn: 34:13
Nice. Perfect. 

Sarah Catanzaro: 34:16
The other thing actually that I would add there too is that I believe pretty strongly in a reversion to simplicity. So I think often when we have a new market, take data science and ML, the first set of tools that come to market, they're going to be a little bit clunky. They're going to be a little bit more complex. I'm sure many of your listeners have felt this when interfacing with tools day-to-day, but over time, the developer ergonomics should improve. Over time, the abstractions should become more manageable and simple. And you see the Hadoops turn into Snowflakes. You see Jupyter Notebooks turn into products like Hex. So perhaps I'm just an optimist, but I think over time markets go up and tools get better. 

Jon Krohn: 35:14
Very cool. So we are currently in an environment that is maybe not as favorable to startups or investment as say a year ago. So I don't have the exact numbers in front of me, but I know from seeing charts that 2021 was by far the biggest investment year ever in early stage startups. And that grew on already 2020 being I'm sure the second best year ever. And so there was a huge amount of capital. And to some extent, in 2022, with stock markets coming down, particularly at the time of recording, tech heavy stock markets like NASDAQ are down about 30% from the start of the year. And so this means that the limited partners in funds, the pension funds, the hedge funds that are investing in venture capital funds, they are more risk averse than they were a year ago. And so what can startups be doing differently in this current capital climate either to survive, if they already are a startup, or to raise capital if they're looking? 

Sarah Catanzaro: 36:33
Yeah, absolutely. So given the current economic climate, I think many founders are tempted to say, "We are going slow down. We're going to press pause. And we're going to extend our runway," runway meaning the amount of cash they have still available as far as possible. But often that is not a sane approach because you need to make progress so that when the economy rebounds, you haven't ceded market share to a competitor, ignored potential opportunities for product iteration that would enable you to expand into those adjacent categories, things like that. So instead, I think what we're really advising our portfolio companies to do is to understand clearly the ROI associated with their investments with the bets that they're taking and think critically about the milestones that they need to achieve to unlock a subsequent round of financing. Now, this all may sound obvious because more or less what I'm saying is that they need to be disciplined. But I actually think this is an area where data and data science has a very clear role. We're not in a phase in history where you could ignore your LTV to CAC ratio. We're not- 

Jon Krohn: 38:16
What does that mean? 

Sarah Catanzaro: 38:19
The lifetime value of your customer compared to the cost of acquiring that customer.

Jon Krohn: 38:26
Got you. 

Sarah Catanzaro: 38:27
We're not at a point of time in history where you can just ship anything. You need to run experiments. You need to understand how those product bets are impacting your strategic KPIs and metrics. So we are telling our portfolio companies think about what you need to achieve, exercise discipline, but also measure what you're doing and its impact. For those that are thinking about raising capital, well, of course, the first strategy that I alluded to was extend your runway, wait a little bit, because there are a lot of investors that are sitting on their hands right now. But for those who are thinking about raising for the first time, I think what I've seen is that the bar has shifted a bit higher. So it may no longer be enough to come in with that vision and the MVP. Now investors might want to see that you have lined up a set of potential design partners, or you have built a prototype that you put on Hacker News that attracted a lot of attention and adoption. So I'm not saying in order to raise seed funding, you need to have a million dollars in revenue or ARR recurring revenue, but you probably need a bit more evidence that your idea is going to land. 

Jon Krohn: 40:01
That is super helpful advice, Sarah. And so it seems clear that part of your role as a venture capitalist is not just to provide the capital, but also to provide guidance to founders and to help them in succeeding in commercializing their products. So if we have listeners out there who have a startup idea, or maybe even an early stage prototype, but they don't know how to get venture capital investment, you've already given us some insight. We need to be particularly in this climate. We need to be demonstrating the value, or maybe even having some recurring revenue already for a product. What else does it take? What kind of roadmap should a listener with a startup idea put together to go from idea to funding? 

Sarah Catanzaro: 40:52
Yeah, absolutely. So I think one of the most important things for startups is to learn fast so that they can iterate fast. So often that first phase involves speaking with hundreds, at least dozens of potential users and getting feedback on the idea. Now, the closer that you can get to an application, the more precise the feedback will be. It's hard to give high fidelity feedback in response to a verbal articulation of an idea, slightly easier if there's a deck, even better if there's a prototype, even better if there is an application to actually demo or put in the hands of potential users. The other thing that we've seen, too, is that you can sometimes test your ideas not only through user and market research, through these conversations, but also through content. Can you write a blog post where you articulate some of the key assumptions that you're making about the product or some of the pillars of its value proposition? If you do that, how do people react? So a lot of what we do in the early stages is information gathering.

The next set of things that we really think deeply about, too, is who do we need to hire in order to manifest this idea? We primarily, as I said before, invest in technical founders. And so they can typically build the prototype themselves, but in some cases, they're going to have to hire their first engineer. What does it take to convince somebody to leave their job at a thing company or a growth stage startup and join a company that has no product or no revenue? We can provide them with guidance on how to think about a hiring pitch, how to think about a hiring roadmap. So often we are really focused on the hiring strategy, on the product strategy, and maybe thinking about some of the experiments that we want to run to see the right way to potentially sell this. 

Jon Krohn: 43:28
Nice. So awesome. That's really actionable guidance. Thank you, Sarah. So on that idea of experimentation and getting information, this might not be exactly the same kind of experimentation, but in a recent interview and presentation, and we'll provide a link in the show notes. The presentation was actually at the R Conference last year, the 2021 R Conference. You mentioned that there's a lot of opportunity for more advanced experimentation, including observational causal inference. So what is observational causal inference, and why can the experimentation unleashed by it create an explosion in business opportunities? 

Sarah Catanzaro: 44:13
Yeah, absolutely. As I had alluded to before, I think that one of the most critical aspects of experimentation is that it allows you to clearly understand the ROI associated with your product bets, and as such, particularly in a downturn, but certainly in any market, I think experimentation helps you really differentiate the good ideas and the ideas that will move the needle for your company from those that may not be worthwhile to implement, or may even have a negative toll on the business. The gold standard in experimentation is the randomized control trial. However, not all companies will have the sufficient traffic to run an experiment. Perhaps they don't have time to run a rigorous experiment, or there's some other issue that would inhibit them from running an experiment. And so they need to think more about leveraging the data that they do have available. Now, in this case, what you're working with is observational data. It's not data that is collected in the course of this RCT.

And so over time, companies can think more about leveraging that data set while still not just making predictions, but actually understanding the causal impact of the potential changes, so really understanding things like their mechanism of action. So that's generally what I mean by observational causal inference. I think one of the things that's super interesting to me is that observational causal inference, arguably, is not even a tool for early stage companies. You still do need a lot of data, but it's becoming increasingly adopted by later stage companies where they have a lot of ideas that they want to test. They can't test all of them. And in some cases, they actually want to figure out like, "Which of these ideas should we prioritize for experimentation?" 

Jon Krohn: 46:42
Cool. I think I got it. So in an ideal world, the gold standard is to run a randomized controlled trial, but very often even big companies don't have enough user traffic or time to be running proper randomized control trials. So particularly in early stage companies, doing this observational causal inference that you're describing, making use of data to try to make predictions as quantitatively as possible in the absence of a proper experimentation can be enormously useful, especially for prioritizing product ideas. 

Sarah Catanzaro: 47:19
Yeah, exactly. 

Jon Krohn: 47:21
Cool. All right, so you've made working at venture capital sound pretty interesting. It sounds like a great job. I'm sure there's lots of listeners out there that are jealous about what you get to do every week in your role. So how could a listener potentially get involved in venture capital? It's a famously difficult industry to break into. 

Sarah Catanzaro: 47:45
Yeah. So one of the things that I've realized about venture over time is that it is highly network-driven. Many of the companies that we end up investing in are introduced to us by data scientists or professors or other technologists within our network. And often therefore the best way to intersect with a venture capital firm is through these warm connections. So if you are an employee at a venture back startup, maybe ask the founder like, "Hey, would you mind connecting me to our investors or reach out to those investors?" You're only one hop removed, or if you have a founder friend, you can ask to connect through them. What we often see, too, is that one of the clearer pathways into venture is through either angel investing or advising. So we end up intersecting with potential team members through our own portfolio companies. We get experience working with someone who is advising or writing small checks into companies. And from there solidify the relationship. 

Jon Krohn: 49:10
Makes perfect sense. Yes. So highly network-driven industry, but through getting introductions or making small angel investments, advising on smaller deals, you can get exposure to venture capital firms, and that can result in potentially becoming an employee at venture capital firm yourself. Cool. So are there any particularly exciting investments at Amplify that you think our listeners should be aware of? What's cool that's going on over there? 

Sarah Catanzaro: 49:42
Yeah. Great question. Well, I just talked about experimentation, and so one of our recent investments was in a company called Apo, which is making experimentation accessible to companies of all shapes and sizes. I think in the past a lot of experimentation tools were either really focused on marketing or web design changes. They didn't enable you to test things like pricing algorithms or other backend changes. So Apo is uniquely suited for that. Additionally, there were a lot of experimentation tools that were really focused more on feature flagging or assignment, so determining if somebody should be in the test or control arm, but data scientists had to spend a lot of time really babysitting experiments and analyzing experiments.

So they have both a statistical engine as well as an experiment analysis UI that really streamlines the process of experimentation and minimizes the need for data scientists to spend all of their time just babysitting experiments. Another recent investment that we made is in a company called Modal. I won't reveal too much of what they're doing because the company is still in stealth. But one of my biggest gripes about the data science ecosystem is that data scientists need to spend so much time on work that is not in fact data science, that is neither data nor science. I'm talking about managing cloud resources, thinking about environments in config. And so Modal is really focused on building a whole set of tools that allow data scientists to focus on the data science to write code, and it just runs. So it's another one that I'm super excited about. 

Jon Krohn: 51:57
Very cool. Those sound like great companies, Sarah, and I'm not surprised given your depth of knowledge in the field that you're making such wonderful investments. As we begin to wrap up the program, a question that I ask all of our listeners is if they have a book recommendation for us. You got one, Sarah? 

Sarah Catanzaro: 52:18
Oh gosh. Sure. Rather than give you a more traditional answer, one of the books that I recently read that I just loved was Boys in the Boat. When people ask now what is my favorite management book, I've been recommending this one. It's actually about one of the US rowing teams who won the Olympics in Berlin, I believe, leading up to World War II, but just thinking about what it takes to get a team to work together productively. It gives you insights into that in a really unique way. So it's one I've really enjoyed. 

Jon Krohn: 53:07
Super cool. Thank you for that recommendation, Boys in the Boat. And obviously, Sarah, you have an enormous amount of information on venture capital, particularly for the data science industry to imbue upon all of us. What are the best ways that listeners can be following you after this episode ends? 

Sarah Catanzaro: 53:29
Yeah, I wish I blogged more. I do so occasionally through Amplify, but I'd say the best way to kind of follow what I'm thinking about is probably Twitter. My handle is just @sarahcat21 or through my newsletter projects to know where I'm typically highlighting open source projects, academic papers and other industry content that I think can impact data practitioners in their day-to-day. Not just that is clickbait worthy. 

Jon Krohn: 54:02
Nice. Those sound like great recommendations, your blog, Twitter or newsletter. Will be sure to include those all in the show notes. Sarah, thank you so much for being on the program and enlightening me and listeners on venture capital investment in the data science sector specifically. It's been awesome having you on the show, and maybe some point in the future we can check in with you again. 

Sarah Catanzaro: 54:23
Yeah. Of course. It was a pleasure being on the show. Thanks so much for having me. 

Jon Krohn: 54:33
Sarah is so much fun to speak with and so deeply knowledgeable about both the fields of data science and venture capital. I feel energized from our conversation and do hope we'll have her on the program again sometimes soon. In today's episode, Sarah filled us in on the differences between angel venture capital and private equity investment, how early stage investment might be made prior to a firm having product market fit, perhaps even before the firm has a minimum viable product. How the trick to picking winners in early-stage investments is people, people, people, product and market. How the trick to accelerating from a data science idea to obtaining funding is iterating quickly and speaking to dozens of prospective customers and how observational causal inference can be a solid substitute for randomized control trials, especially in situations where you have limited user traffic or time.

As always, you can get all the show notes, including the transcript for this episode, the video recording any materials mentioned on the show, the URLs for Sarah's social media profiles, as well as my own social media profiles at superdatascience.com/601. That's superdatascience.com/601.If you enjoyed this episode, I'd greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show.

Thanks to my colleagues at Nebula for supporting me while I create content like the SuperDataScience episode for you. And thanks, of course, to Ivana Zibert, Mario Pombo, Serg Masís, Sylvia Ogweng and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another fabulous episode for us today. Keep on rocking it out there folks, and I'm looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 

Show all

arrow_downward

Share on