54 minutes
SDS 499: Data Meshes and Data Reliability
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
We get into some interesting topics including Monte Carlo’s work in data reliability, high-quality data pipelines, what data reliability is, how reliable data can allow for a data mesh, how to build a data science team, and more!
About Barr Moses
Barr Moses is CEO & Co-Founder of Monte Carlo, a data reliability company backed by Accel, GGV, Redpoint, and other top Silicon Valley investors. Previously, she was VP Customer Operations at customer success company Gainsight, where she helped scale the company 10x in revenue and among other functions, built the data/analytics team. Prior to that, she was a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science.
Overview
I’ve been aware of Barr’s work for a while in our mutual LinkedIn circles but recently I saw Barr in a post about data voices, I found many of her talks and podcast appearances, and now here we are. A topic that came up a lot in my research on her was something called a data mesh. What is that? Barr explains that only recently have companies actually become truly data-driven in decision-making. We have thousands of available third-party data sources, more complex pipelines, and an increase in careers centered around working directly with data. This all brings up a need to define exactly what it means to work with data. This is the data mesh, a data engineering concept that has moved into the data science field. It focuses on more decentralized management and the use of data. Breaking down centralized standards in conjunction with domain-specific ownership allows teams to move faster in their work.
We began to discuss different departments within a company that might need to utilize data outside of the centralized data team. Operations, marketing, and sales are all departments that may need to pull data, interact with it, and make predictions based on it without going through the data team. This allows different groups to own specific parts of the data but for this to work, they need to be able to answer fundamental questions about the data they’re working with. They need to be able to self-serve on their data. Data reliability is analog to website run time (aka you want a website running all the time) so data pipelines need to be up all the time to ensure decision-makers always have access. This is what Monte Carlo does, help companies ensure data up time.
The standardization of cloud solutions for data storage and management has really allowed for data teams to become faster and better. So, how does Monte Carlo ensure the data running in those systems is accurate? You look at the system outcome to understand its health. They call this observability. What are indications of system health? Well, one thing they look for is good pipelines but bad data. They talked to other groups about their methods of solving for this and created their five core pillars of data observability: freshness, volume, schema, distribution, and lineage. To achieve this, we need to rethink our methodology and philosophy. For this, Barr suggests we stop the finger-pointing down the line between teams. Focus on service level agreements and service level operations, which are contracts that allow teams to set expectations rather than punt blame. Another new concept is data as product and data product managers.
From there we dove into our audience Q&A:
- How has Barr’s intelligence background influenced her career?
- Barr was born and raised in Israel where she spent time in her father’s lab and eventually went on to do intelligence work for the IDF. She was the commander of a data unit in the air force and feels she developed responsibility and understanding of data’s consequences very early. In her early days of work, Barr aimed for “zero defect”. - What lead Barr to founding her own company?
- Barr’s early work in customer success showed her the importance of making customer success quantitative and how to use data to get a view on customer success. In this work, she saw flaws in data collection and predictions. She set out to solve this pain point through her own company. - Where does the name Monte Carlo come from?
- The name comes partially from the compressed timeline of starting the company. Barr found the name in her stat book and went with it. - How do you get some people to pay for this service?
- Barr notes there’s a difference between saying something is a great idea and getting someone to pay for a product. To figure out if people cared about the problem and would use the solution, she cold-called professionals to gather information. - What is the typical workday for a company like Monte Carlo?
- Barr says no two days are alike at this stage. The company changes constantly thanks to intense growth. Barr talks a lot to her team about rewriting their job descriptions based on company needs. A core part of the company is recognizing adaptability, valuing the speed of change, and customer impact. - How does Barr tackle biases in data?
- Barr thinks this is one of the most important problems to solve with data. Barr has been researching this from a philosophical standpoint to apply it to her work.
In this episode you will learn:
- Data meshes [4:25]
- Self-serve data reliability [15:36]
- How Monte Carlo helps data up time [21:13]
- How to build an effective data science team [26:50]
- LinkedIn Q&A [31:50]
Items mentioned in this podcast:
- DataScienceGo Connect
- Noise by Daniel Kahneman
- Thinking Fast and Slow by Daniel Kahneman
Follow Barr:
Follow Jon:
Episode Transcript
Podcast Transcript
Jon Krohn: 00:00
This is episode number 499 with Barr Moses, co-founder and CEO of Monte Carlo.
Jon Krohn: 00:12
Welcome to the SuperDataScience Podcast. My name is Jon Krohn, chief data scientist and bestselling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let's make the complex simple.
Jon Krohn: 00:43
Welcome back to the SuperDataScience Podcast. Barr Moses is with us on the show today, and we're very lucky to have her. Barr is co-founder and CEO of Monte Carlo, a venture capital-backed start-up that has grown in head count by a remarkable 10x over the past couple of years. Monte Carlo specializes in data reliability, making sure that the data pipelines used for decision-making or production models are available 24/7 and that the data are high quality. Over the course of the episode, Barr will detail what data reliability is, including how we can monitor for the good pipelines - bad data problem, how reliable data enabled the creation of a data mesh that empowers data-driven decision-makers across all of the departments of a company to independently create and analyze data. She talks about how to build a data science team and how to get a data-focused start-up off the ground, generating revenue and rapidly scaled up.
Jon Krohn: 01:42
Today's episode should be of interest to technical folks like data scientists and commercially oriented folks alike. While Barr is a highly technical expert, we put in extra effort to make sure we broke down software engineering concepts so they could be understood by anyone. And this episode does nevertheless contain a number of gems for business success with data. Okay. You all set for another awesome episode? Let's go.
Jon Krohn: 02:11
Barr Moses, welcome to the SuperDataScience show. I'm delighted to have you here. Where in the world are you calling in from?
Barr Moses: 02:19
Thanks, Jon. It's awesome to be here. I am in San Francisco, California. It's sunny today.
Jon Krohn: 02:25
Yeah. Is that true? We can see in the YouTube version that it's sunny outside, so you've got to be somewhere else.
Barr Moses: 02:31
That's true. That's true. All fake news.
Jon Krohn: 02:35
It looks like a beautiful day in San Francisco, I guess, an unusual day where you might not need a sweater.
Barr Moses: 02:41
That's right. Very unusual for San Francisco, but we'll take what we can.
Jon Krohn: 02:45
Nice. I'm absolutely delighted to have you on the show. I had been kind of creeping your internet profile for a while. So, you showed up on my radar a number of different ways. I think we've been in each other's LinkedIn network for a while. So kind of just seeing you post things or comment on things. I'd seen you that way. And then Kate Strachnyi, she did a few months ago, she created this LinkedIn post of data voices, and there were a couple of dozen people on there and you were on there. And I was like, "Oh! Barr Moses again." And so, then I went out, I looked at your LinkedIn profile. I was like, "Fascinating. I wonder if she's a good speaker." So then I went up and I looked up some talks that you'd given, some podcast appearances that you'd had. And you were a 10 out of 10, just an absolutely amazing communicator of technical content. It's just so interesting to hear you speak. So, yeah. So, you were on my radar.
Jon Krohn: 03:45
And then just in one of those serendipitous moments, I don't know what I was doing. I was tweeting about the SuperDataScience show or something and Scott Hirleman of the Data Mesh Learning network commented on one of my tweets, I guess, tweeted at me and said, "You should think about having Barr Moses on your show." And I immediately sent you a message in Twitter and asked if you'd like to be. And you said, "Yes." And now here you are.
Barr Moses: 04:11
Amazing serendipity and yeah, I'm indebted to both Kate and Scott for being amazing voices in data as well.
Jon Krohn: 04:21
Nice. So, Scott is from the Data Mesh Learning network, as I just mentioned. What the heck, Barr, is a data mesh?
Barr Moses: 04:30
Great question. The million dollar question these days. So let me take a step back and describe a little bit of sort of what's actually I think is causing data mesh to sort of be a hot topic these days.
Jon Krohn: 04:42
Cool!
Barr Moses: 04:43
I think obviously the world has changed tremendously in the last year and a half or so, but in the data space, it's really changed a lot in the last five to 10 years. I think five to 10 years ago, we really liked to say that we were data-driven, but it sort of stopped at that. We didn't really go beyond sort of claiming that. I think today when people and companies claim that, it's a little bit more true. It's actually said in earnest. And there's actually more behind that to solidify that. But really, if you think about it, five or 10 years ago, very few people were actually doing data science. Very few people in companies were actually using data to make decisions or to power products. It was really a world where honestly, a lot of data was confined sometimes in finance, sometimes in IT. It was a very small handful of people who were actually using data to make decisions. And they were using that data not often. So, it might be only once a quarter to report to the street. It was based on a limited set of data sources.
Barr Moses: 05:42
So, it was obviously [inaudible 00:05:44] was important, but it didn't really have the spotlight. Now fast-forward to today, the world in which we live in is insanely different than where we were five to 10 years ago. It's mind-blowing to me. You cannot compare what it's like. First of all, starting with an insane amount of data sources. It's not uncommon for companies to use thousands of third-party data sources. In the past, maybe you just were relying on Google and maybe Facebook and that's about it. Today, there's so many data sources that companies rely on. There's also a lot more complexity in terms of the pipelines that we're building and the architecture. In the past, we had one database and we pulled all the data in that and that's it. Today, you can have several data warehouses, a data lake, ETL, ELT, reverse ETL, BI, machine learning. Name it and you have that. So data is really spread everywhere and has become distributed.
Barr Moses: 06:38
And then maybe a third sort of big thing that I'm seeing that's happening is the rise of lots of more people that are working with data. So, in the past, you really only had maybe a financial analyst or a data entry person, maybe only one or two, sort of a handful of roles that were dealing with data. But today you have machine learning engineers and data scientists and data engineers and analytics engineers, and analytics, and executives that are using data, really anyone in the organization is actually either a producer or a consumer or both for data, which is a tremendous change from where we've been. And so with these sort of three main themes, really being the driving force behind everything that's happening last decade or so, there's this new need at the center of all that, there's a need to define how do we actually work with data in this organization? What do all these changes mean for us? Whatever worked before is not going to cut it today anymore. We need to rethink. And so, I think that's really what's brought the data mesh to be front and center.
Barr Moses: 07:47
Yeah. So explain the data mesh. So, the data mesh is originated based on a concept from software engineering that's been adopted in the data space. So, in software engineering, there's a movement that has been, again, in the last couple of decades or so where we've moved from monolith to microservices architecture. And with that, there's a lot of technologies and processes and people changes that have made in order to support that.
Jon Krohn: 08:11
What does that even mean? What's the difference between a monolith and a microservices architecture?
Barr Moses: 08:16
Great question. So, fundamentally it's a difference between managing something in a very centralized fashion versus a decentralized fashion. And so, I'll give you a very concrete example when it comes to data. Oftentimes what we see in organizations that are struggling with a centralized model is where you have a group of people, maybe there'll be called the data platform or the infrastructure team, or can be whatever name and all requests related to data are come into that particular team. And so, if you want a new report, if you want to use data, if you have a question about what customers are more important than others, what feature has been most successful to date, what customers are about to churn? Any really question that you have about the data and about your business, you have to go through the centralized team. And so, think about this team. They have this insane backlog of requests and questions. And they might be like, "Well, see you in a year from now when we actually have time to get to your request," like, "Get in line, take a number." And that really doesn't allow for an effective use of data. There's sort of this gatekeeper. And that's when we see companies really ask themselves, "Okay. There has to be a better way."
Barr Moses: 09:28
And so, what the data mesh principal is creating a standard sort of centralized best practice or sort of centralized organization that can define standards and norms for what does strong data management actually mean, but then allowing for distributed ownership of data in domain-specific organization. So allowing for marketing specifically to own data that's related to marketing that has questions about for campaigns that they're running in particular. Those are very different than the questions that the product team may be asking or that the finance team may be asking. And so the idea of actually breaking down the ability between having the centralized sort of standards and, in conjunction with that, the ability to do self serve and domain-specific ownership, actually sort of unblocks that congestion that we had and allows teams to move faster.
Jon Krohn: 10:30
Nice. You may already have heard of DataScienceGO, which is the conference run in California by SuperDataScience. And you may also have heard of DataScienceGO Virtual, the online conference we run several times per year. In order to help the SuperDataScience community stay connected throughout the year from wherever you happen to be on this wacky giant rock called planet Earth, we've now started running these virtual events every single month. You can find them at datasciencego.com/connect. They're absolutely free. You can sign up at any time. And then once a month, we run an event where you will get to hear from a speaker, engage in a panel discussion or an industry expert Q&A session. And critically, there are also speed networking sessions where you can meet like-minded data scientists from around the globe. This is a great way to stay up to date with industry trends, hear the latest from amazing speakers, meet peers, exchange details, and stay in touch with the community. So, once again, these events run monthly. You can sign up at datasciencego.com/connect. I'd love to connect with you there.
Jon Krohn: 11:44
I think I now understand what data meshes are. So, historically we've had these monolithic architectures in software development, including in data-related processes where you have this central repository of data probably, maybe a bunch of SQL databases or something that only the data science team or the data analytics team really has access to querying, but we're moving more and more to this distributed scenario, where, as you've mentioned, with lots of people across organizations needing access to data in real time, and being able to do at least some simple data polls, but maybe even some complex modeling. You could have people on your marketing team or on your HR team who specialize in getting these data out and actually doing some sophisticated modeling. It doesn't need to happen on some central data science team. So, with a data mesh, it sounds like you still want to have across the company, not a central team or the central dependency, but you want to have a centrally defined standard. Right. Okay. But then anybody in the organization can plug in, can, I guess, create data, store data, and access data from this data standard.
Barr Moses: 13:20
Yeah. I think the idea is that right now it's a little bit of a wild west. And so, to create some sort of ... I think what we're seeing is definitely sort of the creation of this standard and norms. And that's typically sort of federated by sort of a centralized group. And then, there's domain-specific owners for each of those areas that you mentioned. And to your point, you're absolutely right. You could actually have pretty sophisticated models in other areas such as marketing and HR that are not in the sort of standard centralized data organization, that you might ask yourself, "What are some of the challenges with implementing something like this?" Probably the biggest challenge that we actually see is sort of a change management challenge, actually. So, figuring out how do you structure the team, who should be on which team, who should own what, what is mandated, if you will, as sort of a standard and what is not, what are some of the technologies that you use in a central fashion and what do you actually enable to the domain-specific owners?
Barr Moses: 14:31
That's probably the biggest question or sort of the biggest challenge that we see folks run into. And the first thing within that, and the first sort of friction that comes up is in this model, you are actually allowing different groups of people to own specific parts of the data, but in doing so, you also need to make sure that they can answer some fundamental questions about the data. For example, which data should I be using, which table is the right table to use, when was this table actually updated, is it fresh, can I actually use this, what does this data mean, who's using this data, has this data been used in the past or not? And so actually allowing folks to have access to data, but not letting them being able to self-serve on those question poses a really big challenge. And so, that's kind of how we think about self-serve data reliability to enable the implementation of the data mesh.
Jon Krohn: 15:36
Okay. Okay. So you just used a term that I really want to dig into there. You said, "Self-serve data reliability." So, it's clear that that is a critical concept to having a data mesh work. Can you tell us a bit more about that?
Barr Moses: 15:51
Yeah. I just threw a lot of buzzwords at you, so let me unpack that a little bit. Data reliability is a concept that, again, draws on a concept from software engineering. So, in software engineering and DevOps in particular, there's a concept of making sure that your applications and your infrastructure are reliable, which today seems like a no brainer, right?
Jon Krohn: 16:14
Up time.
Barr Moses: 16:15
Of course. Exactly. Something that's super obvious to us. But 10 years ago, not as much. Today, it does seem a little bit crazy to have a software engineering team that doesn't think about reliability, doesn't have something like Datadog or Grafana or really any solution that actually monitor the reliability and the uptime infrastructure and applications, right?
Jon Krohn: 16:42
Mm-hmm (affirmative). Totally.
Barr Moses: 16:44
And that, as is really emerging a very important part of our ability to deliver software solutions or products that are scalable and reliable and secure.
Jon Krohn: 17:00
Got it. So, just to say something back to you to make sure that I understand this. So, data reliability is basically, it's analogous to a software service uptime. So, in the same way that if you have a website, if you run facebook.com, you need to have Facebook up all of the time. And so, you have all these tools in place, like you mentioned Datadog in place to make sure that you have redundancy. And if one thing goes down, something else can come up and your site is still up. So, it sounds like data reliability is, I imagine, this idea that if you have this distributed data organization and everybody needs access to data, they're making decisions based on data. Those data pipelines need to be up all the time. Is that the idea?
Barr Moses: 17:49
Yeah. That's exactly right. The stakes are higher now. You cannot have data downtime anymore. In the past, when less people were using data, there was less data going around and it wasn't as mission critical. It wasn't actually like ... Products weren't really relying on it. In that world, it was fine. The data was down and no one noticed, and it was fine that it was wrong every once in a while. And on top of that, you also-
Jon Krohn: 18:17
Right. The data needs to be right, too. Yeah.
Barr Moses: 18:18
Exactly. So, data has to be right. And I'll add another complication to that. In the past, it was really or relatively easy to manually make sure that it was right. I could just stare at the numbers for a long enough time and look at where the data came from and I could tell you whether it's accurate or not. So, if I'm looking at our revenue numbers or the number of visitors on our website, I should have a pretty strong understanding of that because I'm only looking at a small handful of metrics and I'm looking at them regularly. And I report on them once a quarter. So, I have a lot of time to think about whether the number is right.
Jon Krohn: 18:57
Mm-hmm (affirmative). "This one cell in this spreadsheet is wrong."
Barr Moses: 19:03
Exactly. You know when you get a model from someone, you're like, "I don't understand any of your model or these formulas. I'm going to go one by one and check it." Maybe you could do that manually once. Terrible, right?
Jon Krohn: 19:16
Yeah. Like you're saying, there's too many data sources and some of them are coming from third-party vendors. Some of them are coming from other people in the organization. And so, in a modern organization, increasingly you can't possibly keep an eye on every single data source manually. So yeah, I totally see the need here.
Barr Moses: 19:37
It's sort of mind-blowing that we've come such a long way. And so yeah. The question is, sort of as an analogy, call this data downtime, how do you minimize data downtime in an organization in order to increase trust in the data? So, we actually see organizations start measuring things like time to detection of data incidents. How quickly did you identify that your data was wrong? Oftentimes, we see companies go from months to minutes in time to detection, which is huge, when that data is actually ... I'll give an example. Let's say it's a data set that your marketing team is running a campaign for in Facebook. Let's just use that example.
Barr Moses: 20:24
If you're targeting the wrong people and you find out about that months later, that's a big deal. Your company's lost a lot of money. If you are using data to price something, let's say you're selling marketplace, let's say you're selling homes or selling jeans or selling shoes. If you're underpricing or overpricing for a long period of time, you're losing a lot of money and not knowing about that can be disastrous. And the reality is that's happening today already because we've already, "Open the flood gates, let everyone use data," but now we're like, "Oh, wait. But maybe sometimes data is wrong. We need to make sure that's actually accurate."
Jon Krohn: 21:06
Nice. So, somehow, I suspect based on things I've read about you and your company online, that your company, Monte Carlo, can somehow help with all of these data uptime challenges.
Barr Moses: 21:21
Yeah. I mean, we're really sort of focused on helping kind of the community figure out, so what do we ... Okay. Maybe the question is how do we actually solve data reliability or what's the right way to approach that? And so, with the thinking that it's sort of the first step to actually being able to use data in your company. And so, I think the solution, again, draws on what we've seen in software engineering. And I like to mention that because I think there's something really nice about the fact that we don't have to completely reinvent the wheel in a sense. And, by the way, the problem of data being wrong and data quality has been around for decades. It's sort of the old age problem in data. Nothing is new there. But all of these changes actually introduce an ability to solve this problem in a totally new way that was not possible until today.
Barr Moses: 22:15
And that is the standardization of cloud solutions, the rise of solutions like Snowflake and Databricks and GCP and Looker and Tableau and so many others that are really sort of allowing data teams to get faster and better in how they're using data and how they're processing data. And so, the question is how do we make sure that the data that's running in all of those systems is actually accurate? And the solution is, again, as I mentioned, as a concept from software engineering, how do you actually solve that in software engineering? Well, what you do is you look at certain metrics or you look at the sort of the outcome of the system to understand the health of it. So, you call this observability, meaning trying to understand what are some indicators of system that can tell us whether it's healthy or not?
Barr Moses: 23:13
So, we actually call this the good pipelines, bad data problem. What do I mean by that? I mean, you've invested so much in setting up the best data warehouse, the best data lake, the best pipeline. You have real time data, the best of machine learning models. Everything is top notch, but the data that's powering all of that is actually inaccurate. And so, one of the ideas is that in order to solve that, what if we looked at specific metrics that helped us understand whether the data is accurate or not? And so, we actually spoke to hundreds of data organizations and asked them, "How do you solve this problem? How do you solve the good pipelines, bad data problem? What do you do? What do you look at in order to understand whether data is accurate or not?"
Barr Moses: 24:04
And based on that, we actually took all that good stuff and consolidate that into five core pillars of data observability. Those are freshness, volumes, schema, distribution, and lineage. And we believe that if you can automatically collect information, monitor those five pillars, you can actually have a holistic, unified view of the health of your data. And so, to really define top notch best in class data reliability means having some way to really define what those five pillars are for your organization.
Jon Krohn: 24:42
Cool. That makes a lot of sense to me. Can you reel off those five pillars to me one more time?
Barr Moses: 24:48
Yeah, of course. The first pillar is freshness, which is related to everything about the timeliness of your data. Has the data arrived on time? Maybe you expect it to in at 6:00 a.m. and it arrived at 6:15 and maybe that's a big problem because someone downstream is using it at 6:15. You need to know about that. The second pillar is volume. So number of rows is a good basic example. Maybe you've been getting 20 rows every day and suddenly you're getting 20 million rows. That probably indicated something is wrong. The third is distribution and distribution really speaks to the values themselves. So, it could be anywhere like percentage null values, or if you have a particular field that you typically track credit card numbers, and suddenly you have letters there. Probably something is wrong. The fourth is schema. So, schema changes sort of around the structure of data. If a table is added, removed, the fields have changed, field type has changed.
Barr Moses: 25:47
And then the fifth is lineage. So lineage, when I say, "Lineage," I actually mean lineage at the data level, not at the job level. So, at table level or field level lineage and overlaying that along with the ability to say, "Something broke here in this table and here are all the dependencies on that. Here are all the reports that are being used." And, by the way, they're being used by your marketing team for that campaign daily. You better fix that problem that you have in this table. Maybe there's another table that has a big problem, and there's nothing downstream that relies on it. So who cares? That's not a real problem.
Jon Krohn: 26:23
Right. Cool! Well, not only did you reel them off for me, but you gave me illustrative, clear examples. So, that was beautiful. I'm sure the audience appreciates that. So, it sounds like if we're going to have great data reliability, we're going to have a data mesh. It sounds like philosophically, we might have to make a lot of changes to our organization to make the most of that. So how do we build a data team or an organization to handle this new world?
Barr Moses: 26:58
Such a great question. The most common problem that we see is actually this finger-pointing or blame game, where every team is really sort of blaming a team either upstream or downstream of them for the data problems. So, some of the things that, some of the issues that we do see, I'll give you an example. Let's say there's an engineering team that's sort of responsible for a specific product. And then there is a data engineering team that's building the pipelines to ingest that data and transform the data from that. And then there is a data analyst or data analytics team that's in creating specific reports that, say, marketing and HR team is using. We oftentimes see in this simplified view analysts teams sort of pinging the data engineering team and saying, "Why is the data wrong? Why is it coming late? What's going on?" And then the data engineering team will be pinging the software and the engineering team and saying, "What's going on? Why is it a day late? We can't figure out the root cause of this." And then there's weird dynamics and friction.
Barr Moses: 28:08
And then the other way around can happen. Software and the engineering team and say something like, "The jobs all ran perfectly well. Everything is fine. We don't know what the problem is." And so there's less dynamic where it's actually really hard to untangle things. And so, when you think about team dynamics, there's a few tools that I think are really helpful to facilitate that. And one is to actually think about SLAs and SLOs regarding your data, which are essentially ways-
Jon Krohn: 28:39
All right. So we got SLAs, SLOs. What are those, Barr?
Barr Moses: 28:40
Service level agreements and service level operations?
Jon Krohn: 28:45
Yeah. Got it.
Barr Moses: 28:45
And basically the idea is there are contracts that allow teams to set expectations around, let's say, let's take freshness as an example, around when do I expect sort of data to come in and what time and what percentage of the time should that be happening? So, it's sort of the common sort of notion of having five nines of availability in software engineering. Basically the equivalent of that in data.
Jon Krohn: 29:13
Right. So, the five nines being like 99.999% uptime. And also, I just want to quickly talk about that word contract. So you talk about having these service level contracts to, say, guarantee some level of freshness of the data. This doesn't mean that someone on the software engineering team literally signs a contract for the data science team, which is how I typically think about contracts, but it's an agreement internally, not literally like a signed document, but just this idea that I agree that our team is going to do this. We'll provide you this data with this level of freshness. And then, so you can build your processes based on that.
Barr Moses: 29:53
Yes. It's a badge of honor that you're doing this, but actually it is also helpful to document that. So, while you might not be signing your name, if you do, there are solutions, and Monte Carlo is one of them, to actually document these and to help automate when they don't happen so that you can actually enforce those, if that makes sense.
Jon Krohn: 30:20
It makes perfect sense. Yeah.
Barr Moses: 30:22
And then the other thing on team organization that we see that's I think pretty cool. So the rise of new and important roles. So, for example, the role of the data product manager. So, the thinking of the product manager that's actually dedicated to thinking of how to build data products. And that's a pretty new concept as well, thinking of data as a product.
Jon Krohn: 30:44
Yeah. Actually, that's a new one to me. And it makes perfect sense. Now, I know exactly what you mean and I can see the value of it, but I've never heard somebody put those three words together before. Cool! Data project managers. Wait, project manager or product manager?
Barr Moses: 31:00
Product. Data as a product.
Jon Krohn: 31:01
Product. Data product manager. Yeah. Glad I asked. And yeah, that's the one that makes even more sense. You can imagine a data project manager, but data product manager. Yes. Yes.
Barr Moses: 31:13
We can add another one if you want.
Jon Krohn: 31:17
I can see more demand for this data product manager. Nice.
Jon Krohn: 31:22
Okay. So, we've understood now about data meshes. We've understood about data reliability. We know about the challenges in building data meshes including these kinds of philosophical ideas around how to build a data team and having specialized roles like data product managers involved. So, very interesting and loving understanding how Monte Carlo fits in with all of that. So, I'd love to ask you now some questions that have actually come from audience members about your background, about your company, and they tie into everything that we've already covered.
Jon Krohn: 32:04
So the first one is from Ciro, Ciro Gómez Parssian, who is a Microsoft analytics and cloud architect at VIEWNEXT. And Ciro is curious about how your intelligence background has influenced your career since. So, listeners probably don't know, but you have a storied career, both in a private world, in technology and in consulting, as well. But prior to that, you started in the Israeli Air Force as a commander and you were involved in, even back then, analyzing intelligence data. So, it sounds to me like this arc, there's been this analysis and quantities and mathematics and computer science, I guess, throughout most of your life. But I don't know why I'm saying all this. Why don't you just fill it in for us.
Barr Moses: 33:05
Way better when you say it. Yeah. So I was born and raised in Israel, actually in the Weizmann Institute of Science. So, my dad is a professor of physics. So, I spent a lot of time in his lab trying to blow up things. My mom is a meditation teacher. So, I also spent a lot of time meditating. Maybe not as much as I should have.
Jon Krohn: 33:30
That's probably always true. I think everyone always feels like that.
Barr Moses: 33:31
Yes. Exactly. Eat your veggies and meditate is what I should be doing. But yeah. So, in Israel was drafted to the Israeli Air Force. I was a commander of a data unit, spent a long time doing data analysis, obviously very, very different than what we're doing today. This was a long time ago and different technologies. And, honestly also very different use case. Maybe there's certainly a lot of learnings from that time that have influenced my work today. I think sort of being at that young age, working on such sensitive material, you develop sort of a responsibility and a strong understanding of sort of how important data is actually to just some decisions that can actually impact human life. And so, in those cases, I think developing that sense of responsibility also drives you to come up with sort of strong solutions to things like data accuracy.
Barr Moses: 34:33
I think this is sort of a term that I heard later in my life called zero defect, which is if your data or if your products can be zero defect, meaning if you can have as little mistakes in them as possible, that's something that actually we thought about way back then in the early days. And that really ties to what we're trying to do today, where it's really hard, harder to become zero defect. And yet, the decisions that we're making every day are based on data that may actually be inaccurate. And that's a scary proposition.
Jon Krohn: 35:07
Super interesting. I don't know what I was expecting as the answer, but that was a wonderful one. It makes perfect sense to me. So, then after that, you studied mathematics and computer science at Stanford, and you worked as a management consultant at Bain, and then you held a number of senior roles in organizations, no doubt involved in data and analytics and using those data and analytics for commercial purposes within the organization. So, feel free to fill us in on any major things from that giant period of your life. Distill it down to one really good one. But the place that I'm trying to get to is to understand what led you to founding your own company, to founding Monte Carlo and being the CEO of that company. How did that come about?
Barr Moses: 36:05
Yeah. I think prior to Monte Carlo, I was at a company called Gainsight, which created the customer success category. And so, I learned a few very important things. One, I learned one of the most important thing that we did in creating the category was actually to make it quantitative and solidify what customer success means based on data. So, in the past, customer success was something that was really often subjective. And I would sort of buy our software because you bought my software and maybe I'm using it, maybe I'm not. And in today's world, it's that you are earning your customer's business every day. And in order to do that, well, you actually need to use data about your customer and you need to understand, are they using your product, are they not, are they happy with it, how are they using your product?
Barr Moses: 36:58
And so, I was always sort of fascinated with how data can improve our lives, improve business outcomes. And I just feel very fortunate to be during this time where there's such advances in data in general. And I think some of the things that led me to start this company is actually at Gainsight, I noticed that I'd been working with hundreds of companies is that the biggest thing that folks sort of trip over and fell, including myself, was the ability to trust the data. So I remember we were trying to become very data-driven as an organization and we couldn't rely on the numbers that we were looking at. I was looking at a report and I was like, "I don't even know if this is right or not. And if it's not right, then I might as well just go with my best guess. And in that case, why are we even collecting this data to begin with?"
Barr Moses: 37:48
And I remember my team spending so much time asking these questions, too. And I was like, "Why is this so hard? This should be easy. If we want to become data-driven, it's not going to get easier." And so, I think sort of wanting to really solve that pain. And I ended up sort of speaking with lots of other companies outside of my work after Gainsight, actually, to understand how are they addressing this? What are the main sort of root causes for why data goes down and all these different questions that I really wanted to make sure, like am I the only one who's struggling with this? When you start a company, you don't really know if you're solving something that someone else will care about. And so, I was curious. Am I solving something that only I care about or other people do? And I was just thrilled to learn that it's something that's lots of people care about and that it's unsolved. And so, I just couldn't imagine a world where there wouldn't be a solution to this. I was like, "We're not going to become data-driven if we don't solve this, so we better go solve this."
Jon Krohn: 38:49
Cool. So what's up with the name? What's up with the name Monte Carlo? When I think of Monte Carlo, I think of casinos and this is kind of the opposite. You're trying to reduce risk. You're trying to increase data accuracy. It's like you don't want to be taking any chances with your data.
Barr Moses: 39:06
Yeah, it's very true. So, when we started the company, we were thinking about the name and we actually didn't have a lot of time to decide. Quite frankly, we had customers who wanted us to sign papers quickly and start working with them. And so I opened my stats book and was searching for names.
Jon Krohn: 39:31
Mark off Jane Monte Carlo.
Barr Moses: 39:32
Exactly. That's right. So market Jane was a little bit too much for my taste. It was a candidate but I couldn't make it work. And so, that's where we landed. And also my co-founder Lior is a big fan of Formula One. So, you'll often see him with a Formula One hat. So, we just felt like it's right. And we sort of figured in good old start-up fashion that we'll solve it later if it becomes a problem.
Jon Krohn: 39:58
Nice. That sounds great. That's a really good and really honest explanation. You think that over the years, you'd come up with a really great story. "Really, we were just honestly, we were just looking for one really quickly." Great. Well, it's memorable to me. I think it's a really great company name. So, all right. So, you start the company. You quickly realize that yes, there are other people out there who need this problem to be solved. How did you get some to pay and specifically, so Ramesh Vakkalagadda, who's a senior research fellow at the Center For DNA Fingerprinting and Diagnostics has asked this question of you. He wants to know how you landed your first few clients.
Barr Moses: 40:41
Great question. And I'll give you the honest answer here as well. So, when we started the company, one of the things that can be a little bit confusing or misleading as a founder is you want to make sure that you're working on a problem that people actually care about and they're not just being nice to you. And, oftentimes if you ask people, "Hey, what do you think about my start-up idea?" If I come up to you, Jon, I'll be like, "Hey! I'm working on this new idea. What do you think?" You might be like, "Oh, my God, Barr, that's such a great idea. I love it." Just because you want to be nice to me. It's sort of human nature. But then, when the time comes, you're like, "Okay, well, you paid for this. Are you actually going to pay?" You're like, "Ah! So sorry. I have 10 other things that are more important."
Jon Krohn: 41:20
Yeah. My favorite answer in those kinds of situations is, "I can see why your solution is great, but not for me."
Barr Moses: 41:28
That's right. That's right.
Jon Krohn: 41:28
"There's someone else out there who this is perfect for."
Barr Moses: 41:32
That's right. "This is awesome. I love it. Just for my neighbor there."
Jon Krohn: 41:36
Exactly.
Barr Moses: 41:38
Exactly. That's right. And so, when you start a company, I was sort of faced with how do I find folks? How do I actually validate that? I can't work with anyone who will give me sort of the, "This is a great solution." I need to find out whether people actually care about this. And so, actually cold called people and I was like, "Is this a problem that you care about? And suppose I had this solution, would you want to use it?" So people that I never spoke to, and that was the bar for whether this is an important enough problem. I was like, "If people care enough about this pain point that they will talk to someone like me with no company at the time, no brand, nothing behind me, if this is important enough to them, they will engage." And indeed, that's what happened. The first 10 customers that we had, we did not know them before we had no relationship with them.
Jon Krohn: 42:30
Wow!
Barr Moses: 42:31
Yeah. And so, the way that we did that was by making sure that this is a really important pain point to them, putting a product in their hands really quickly. So we didn't develop product in the dark for years. We actually put something in the hands of customers very, very fast, within weeks. And then made sure that that thing actually solved a real problem for them. And in that case actually, they wanted to pay us. To be perfectly honest, I was shocked when they were like, "Hey! We want to pay for this." I was like, "What are you talking about?" They're like, "Yeah. This is like a great product and we should be paying for it." And that was a sign, too. It was like, "If someone wants to pay, this means that it's important enough."
Jon Krohn: 43:11
That sounds really great. And it reminds me of the kinds of philosophies that you read about in a book like Lean Startup. Yeah. If that you find people who are on the cutting edge, who are looking for something new to solve a new problem and you get it into their hands quickly and you get feedback quickly and you iterate from there, you don't worry about making a perfect product. You just get them [inaudible 00:43:40].
Barr Moses: 43:41
That's absolutely right. And I read at the time various books and Y Combinator has a great blog with sort of things to do early on, unscalable things to do, which has been very helpful for me. Yeah. There's a variety of resources that are incredibly helpful for early stage. I would say the other sort of thing that's really valuable is actually speaking to other founders who are sort of a few years ahead of you. They typically have some great advice.
Jon Krohn: 44:11
Nice. Yeah. That is great guidance and some great resources, which we'll have in the show notes. So, another audience question, this one's from a machine learning engineer named Bernard, who's a repeat question-asker here on the SuperDataScience show. I recognize his name and Bernard is curious what the typical workday is like in a start-up like yours. I guess it's interesting for him maybe what your work is like, but I think also more generally, I'd love to know what it's like for a data scientist or a software developer in a company at kind of your stage solving your kind of problem.
Barr Moses: 44:47
Yeah, definitely. So I will say that, at this stage, probably no two days are alike. So, every day is really sort of different and changes. And also, the company changes a lot. So, I was sort of encouraged that the committee think through, the rate that we're growing, the company needs to be way ... Or the company sort of is growing intensely every week and every month. We basically 10x the number of people that we had in the last year in the company.
Jon Krohn: 45:17
Wow!
Barr Moses: 45:17
And that means that each and every one of us has also to get better at our job every week, every quarter, every month. And so, the type of things that my job was when it was just me is very different than when we were few people and few tens and few hundreds. It'll be very different. And so, one of the things that I talk a lot about in my team is rewriting your own job description. And sometimes, after two weeks or three months or six months or a year, you're suddenly in a place where the company needs something different from you. And so, you actually need to write a new job description for yourself. And so, I think maybe sort of the constant theme in what does a day-to-day like is a lot of adaptability to change, recognizing that you and the company and the landscape changes a lot. That's sort of the big core thing.
Barr Moses: 46:10
There's two additional ones that I'll mention. The second is around the speed of change. So, there's a value that we really reference a lot at Monte Carlo, which is measuring minutes, meaning sort at our stage or level of impact is not kind of years or months, it's actually minutes. And so, we need to be thoughtful about how do we make every minute count and the way to make every minute count is by making a customer impact really quickly, which is the third part. So, we are extremely customer-focused. I think a lot of that has to do with the fact that I kind of grew up in the customer success industry, if you will. And so, I really think that fundamentally the most important thing that data organizations, in particular, data scientists specifically, but also companies in general need to really focus on is being extremely customer driven.
Barr Moses: 47:05
So, for everything that we're doing, how is this impacting your customers today? And so, if you're a data scientist at Monte Carlo or a software engineer, or if you're me, you're all doing the same thing, which is focusing on making our customers as happy as possible. We might be coding. We might be speaking to customers. We might be writing blogs. We might be doing a bunch of other stuff, but whatever it is, it ultimately all has to point towards making our customers extremely happy. That is everyone's job description, the first bullet line now, and forever.
Jon Krohn: 47:39
Beautiful answer, Barr. And in the spirit of measuring in minutes and knowing that you have a hard stop coming up in a few of them and need to jump off and get to another meeting and probably make a customer very happy, making some big impact for them, I have one last question here, which it's from Svetlana Hanson. And she's curious about how you address biases in data. And so, I assume she's asking you about kind of unwanted biases. So, probably biases against particular socio-demographic groups, for example. And so, we talked about this a little bit before the show. It sounds like, in production, there isn't something in Monte Carlo that's a solution in that space, but it's something that you're curious about and maybe have future products in the works for.
Barr Moses: 48:29
Yeah, certainly. I think this is one of the top most important and most interesting problem for us to solve in data and has impact in what Monte Carlo does, but also everywhere else in politics and pandemics, you name it. Data is now driving all of that. And so, our ability to make decisions and think about the bias is incredibly important. I will say that there's a book that Daniel Kahneman just released recently called Noise, which actually addresses or talks a little bit about noise and bias that actually I just started reading it, but it seems great and I highly recommend it. I think it fundamentally speaks to surprisingly how often we make wrong decisions based on wrong data due to noise and bias and how way more prevalent it is and discusses some of the aspects of that in some surprising new learnings that he found.
Jon Krohn: 49:32
I haven't read Noise yet, but I read Daniel Kahneman's ... I say, "Kahneman." You say, "Kahneman." You're probably right, because he's Israeli, isn't he?
Barr Moses: 49:43
He is, I think, formerly Israeli, but I'm actually not sure what the exact pronunciation is.
Jon Krohn: 49:48
I don't know. Anyway, somewhere in the middle, but I've read, oh my goodness, how am I ...
Barr Moses: 49:56
Thinking, Fast and Slow.
Jon Krohn: 49:57
Thinking, Fast and Slow. Yeah, exactly. I was stuck there in my system one and needing my system two to really kick in there. Where are you, system two? I need you.
Barr Moses: 50:07
System two to the rescue.
Jon Krohn: 50:10
Yeah, exactly. Well, in that case, you were to the rescue. So, I absolutely love that book, Thinking, Fast and Slow, it impacts how I act and how I try to think every day. And it's in such an important book in my life. So, I'm not surprised to hear that Noise is also great. And I can't wait to check that out. So Barr, I always end the show with a book recommendation and you just segued right into it. So, we saved you a couple of minutes right there. It's been so awesome having you on the show. I learned a ton in this episode and I'm sure our audience members did, too. If they want to follow you and learn more, what should they do?
Barr Moses: 50:54
You can check out our blog at montecarlodata.com. We write pretty regularly about these topics and others, data mesh, data reliability, how to build a data team, what does a data product manager mean? All those topics, stuff that we write a lot about. You're also welcome to reach out to me on LinkedIn, barrmoses, if you'd like to connect directly. Super happy to chat about this stuff. It's one of the things that I'm most passionate about.
Jon Krohn: 51:20
Nice. All right. Thank you so much, Barr. It's been wonderful having you on the show and maybe we'll have you on again sometime soon.
Barr Moses: 51:27
Sounds great. Thanks so much for having me.
Jon Krohn: 51:34
Barr is such a brilliant expert on data science and in particular on successfully building a data-focused start-up and Barr manages to effortlessly and clearly convey technical content in such a cool, smooth way. I really admire her style. In today's episode, Barr filled us in on how data meshes involve the central definition of a company-wide data standard that liberates anyone across the organization to create, store, and analyze data. She talks about monolithic versus distributed microservice-based software architectures, how data reliability is analogous to software service uptime, what a data product manager is, how five categories of data metrics enable us to solve the good pipelines, bad data problems, specifically freshness, volume, distribution, schema, and lineage. And Barr talked about how feedback via cold calls and getting products into customer's hands within weeks can greatly accelerate a start-up's early success.
Jon Krohn: 52:38
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show and the URLs for Barr's LinkedIn profile, as well as my own social media profiles at superdatascience.com/499. That's superdatascience.com/499. If you enjoy this episode, I'd, of course, greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel, where we have a video version of this episode.
Jon Krohn: 53:07
To let me know your thoughts on the episode directly, please do feel welcome to add me on LinkedIn or Twitter, and then tag me in a post to let me know your thoughts on this episode. Your feedback is invaluable for figuring out what topics we should cover next. All right. Thanks to Ivana, Jaime, Mario, and JP, on the SuperDataScience team for managing and producing another stellar episode for us today. Keep on rocking it out there, folks and I'm looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.
Show all
arrow_downward