Podcastskeyboard_arrow_rightSDS 846: Making Enterprise Data Ready for AI, with Anu Jain and Mahesh Kumar

18 minutes

Data ScienceArtificial Intelligence

SDS 846: Making Enterprise Data Ready for AI, with Anu Jain and Mahesh Kumar

Podcast Guest: Anu Jain and Mahesh Kumar

Friday Dec 20, 2024

Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn


In this Five-Minute Friday, Jon Krohn speaks to Anu Jain, CEO of Nexus Cognitive, and Mahesh Kumar, CMO of Acceldata. They talk about the importance of updating data, especially for predictive models that make key financial decisions for a company, as well as the current state of data governance and why it’s overdue its own update.


About Anu Jain
Anu Jain is the CEO of Nexus Cognitive, a company driving innovation in data and analytics solutions. A visionary leader with deep expertise in digital transformation, Jain has held key leadership roles at IBM and Teradata, where he led groundbreaking initiatives, including early development of Watson and growth in analytics for industries like communications and media. He also served as Chief People Officer and a founding member of Deloitte Digital. A serial entrepreneur, Jain co-founded dRSTi 360 and Janus, pioneering advancements in AI and analytics. 

About Mahesh Kumar
Mahesh Kumar is the Chief Marketing Officer at Acceldata, where he leads the company’s global marketing strategy, driving brand growth, demand generation, and customer engagement. With over two decades of experience in enterprise technology and SaaS, Mahesh is a seasoned marketing leader known for scaling high-growth companies and establishing them as market leaders.

Overview
In this Five-Minute Friday, Jon Krohn continues his series of discussions with top thinkers in the AI community at the ScaleUp:AI conference in New York. Following a session Jon hosted on how enterprises can adopt an AI-first mindset through data management, he extended the discussion with a few bonus questions for his guests. Anu Jain and Mahesh Kumar, CEO of Nexus Cognitive and CMO of Acceldata respectively, talk about the importance of updating data, especially for predictive models that make key financial decisions for a company. You’ll also learn about the current state of data governance and why it’s overdue its own update.

At Acceldata, Mahesh helps the company provide trusted, high-quality data to its clients’ AI models. They manage both structured and unstructured data, observing their overall quality and troubleshooting client issues with their models by analysing potential root causes in the dataset. Anu explains that Nexus Cognitive builds an entire data mesh for companies through modules — which can be a combination of open- and closed-source tools — then integrating these modules into a fully functional outcome for clients. Anu refers to clients using old digital infrastructures as having “technical debt” that Nexus Cognitive helps them to pay off at speed.

Maintaining data accuracy is paramount to both companies, and so Jon asked Mahesh about the impact of even ‘minor’ data errors on a client’s AI models, and how Acceldata grapples with them. Mahesh addresses first the two critical milestones in the process for ensuring high-quality data (building the model and running the predictions) and adds that his company makes sure to observe the integrity of this data at both these points.

Finally, Jon asks Mahesh and Anu about data governance. With AI and data science moving at such a rapid pace, terminologies can often evolve and change in a short time, so Jon wanted first to ask if his guests could offer a definition for data governance. Mahesh says that, to date, data governance has been centralized, where organizations decide by committee how data should be used. He acknowledges that maintaining standards are essential for the ensured security and privacy of company and personal data, and yet he also believes that governance will soon have to adapt to the way that data are being managed and shared across products. In these cases, centralized governance is hardly suitable for distributed data, so companies will need to have a more agile architecture in place that can “apply the right rules and policies” depending on the status of the data.

Listen to the episode to hear more about the work and mission of Nexus Cognitive and Acceldata, and why it’s important to avoid getting locked into one cloud vendor.

Items mentioned in this podcast:

Follow Anu


Follow Mahesh


Did you enjoy the podcast?

Jon Krohn: 00:05
This is episode number 846 with Anu Jain and Mahesh Kumar.

00:27
Welcome back to the Super Data Science Podcast. I am your host, Jon Krohn. Today's episode features the highlights of a session I hosted on managing data to embrace an AI-first mindset for enterprises. And that session had, not one, but two guests from the C-suite of fast-growing venture capital backed startups, namely, those guests were Anu Jain, who's CEO of Nexus Cognitive, and Mahesh Kumar, who's CMO of a company called Acceldata. And he's an interesting CMO because he has an engineering background and he still writes code.

01:01
Today's short episode should be interesting to folks looking to make AI implementations effective in large organizations that have lots of data. In today's episode, Anu and Mahesh detail how a tiny data error can lead to millions of dollars in losses for an enterprise. They have a specific example. They also talk about why data storage isn't a major cost driver anymore and what is, and they fill me in on what the heck data governance actually is and why it matters.

01:29
Ready? Let's jump right into our conversation, which was recorded at the ScaleUp:AI conference in New York a few weeks ago. That conference is hosted by Insight Partners, so you'll hear that gigantic venture capital firm mentioned in today's episode.

01:45
You also may hear... Well, you will hear the name Andrew and that refers to Andrew Ng, who was someone that I interviewed earlier in the day at the conference. If you want to listen to that, the recording of that interview with that superstar, Andrew Ng, that's in episode number 841. All right, that's everything. Let's go.

02:06
Welcome back to this second stage. We're here for a session on managing data to embrace an AI-first mindset for enterprises. My esteemed guests for this exciting session are Anu Jain, immediately to my right, he's Nexus Cognitive, CEO, and to his right is Mahesh Kumar, who's the Acceldata CMO. Let's start off talking about Nexus Cognitive, Anu.

Anu Jain: 02:34
Absolutely.

Jon Krohn: 02:35
It's Insight Partners' first services automation business in its portfolio, it modernizes data and AI infrastructure, allowing outcomes within days or weeks providing speed-to-value by simplifying what you described to me when we talked last week, simplifying the ball of yarn of integrations. Tell us a bit more about Nexus Cognitive, Anu.

Anu Jain: 02:58
Well, first of all, you're a great spokesperson, so you're hired. No, so look, at the end of the day, we are a composable and agnostic data architecture and ecosystem that integrates and automates the workflows that really help us drive data-driven outcomes or really AI-powered outcomes at [inaudible] speed, value and scale.

03:22
And so, how do we do that? We do that through our NexusOne control plane and our managed service offering. So really, helps us get to real flexible options where our clients' needs are at.

Jon Krohn: 03:34
So when you describe services automation, you are taking services that would, traditionally... some kind of process? Earlier, in Andrew Ng's talk, I don't know if you saw his-

Anu Jain: 03:44
Yeah.

Jon Krohn: 03:45
He was talking about how AI doesn't displace jobs, it displaces tasks. And so, if you're looking for opportunities to streamline your operations in some role, look at the different tasks that make up that role and try to identify which will be most easy to automate. And so, that's where you're automating individual services that a human might have historically done?

Anu Jain: 04:08
That's correct. All the hard work of integrating all the pieces of infrastructure, all the way through the data, the integration, all the way through the outcome.

Jon Krohn: 04:16
Nice. All right, so let's move on now to get an introduction to Acceldata from Mahesh. So tech-like, literally, Acceldata is used within the Nexus platform, and so, there's kind of a bit of a bridge there, but Acceldata also stands alone as a data observability platform for enterprises. Tell us a bit more about Acceldata and the role of data observability in AI success such as proactively preventing model drift, for example.

Mahesh Kumar: 04:43
Sure. Pleasure to be here with both of you. Even if you look at today's discussion, a lot of it was about the applications and the importance of building good AI applications. What powers that? It's the data. And Acceldata, what it does is we allow you to provide very trusted, high quality data to all your AI models. Whether it's structured data or unstructured data, we kind of manage both of them.

05:09
Let me illustrate with an example. One of our customers, they're a data provider, provide business data to others. And they get data from over 130 different countries, over 100 data points. All of that has to come together, goes through about 30, 40 different transformation and steps. Eventually, it's consumed by hundreds of thousands, millions of other businesses, even government entities. So the ability for them to provide that trusted data, including the AI models that give business risk, financial risk and other kinds of information about their business becomes super critical.

05:45
Before Acceldata, if they had a problem, it would take them weeks to find out where the root cause was, with us, it takes them hours. So you can imagine how their business completely transformed with us.

05:56
So we observe the quality of data and other various other characteristics of the data throughout the pipeline from the landing zone to the consumption point and allow you to then manage that in a very proactive manner so you provide trust to all your AI initiatives.

Jon Krohn: 06:13
Nice. Makes a huge amount of sense. And it also makes a lot of sense why that data observability piece would be such a key component in something like the Nexus Cognitive solution. So you talked to me last week, Anu, about how Nexus has this modularity, has a building block approach where you can say, okay, a solution like Acceldata, other modules in the platform, it's more like, by working with Nexus Cognitive, it's like buying a car as opposed to buying individual car parts and trying to integrate all those together yourself. Tell us a bit more about modularity, like a building block.

Anu Jain: 06:54
Yeah, absolutely. So we use the word we're a composable data architecture, so we're-

Jon Krohn: 07:00
What does that mean, composable?

Anu Jain: 07:01
Fantastic question. So we're basically building the entire data mesh through LEGO blocks, and so, that's any of the open source tools or even some of the closed source guys that are out there, but we ride on the rails of open standards to put those pieces together and integrate it as one outcome.

07:21
So back to the question around car parts versus buying the car as a whole. So we have clients on both ends of the spectrum. We have those today who are managing massive technical debt, they have old infrastructures and they like parts of it and they want to upgrade and modernize parts of it. For those folks, we'll come in and really provide the newer car parts if you will, but then, have that integrated fully, end-to-end, with the observability plane.

07:48
Other clients, and this is where we're seeing huge advantages, we have net new workloads. We want to drive AI outcomes at scale, at speed. We don't want to wait six months to get infrastructure up. We don't want to wait nine months to hire and build a team to get to a real outcome. So here, it's, the car comes to you, all the parts are built, it's stood up in days and you're getting outcome in weeks.

Jon Krohn: 08:14
Very cool. I love that approach. Mahesh, over to you with a question about small data errors. So even with a data observability platform like Acceldata, obviously, you're monitoring for data issues. Something that might not be immediately obvious to everyone is that even very small data errors can impact AI models that those data feed into. So how can enterprises adopt strategies to prevent these errors from snowballing into big business problems?

Mahesh Kumar: 08:47
Sure. I think there are two aspects to AI models, one is the building of the model itself and the other is running the predictions. In both those cases, you need really good, high quality data feeding into the models. For example, if your data is skewed, you don't have data from one particular source, so obviously, there is a change in the model and how it can predict it.

09:11
So let me give you one more example. One of the largest banks in the world, essentially, uses AI to predict cash loan offers, credit card offers and such. And in their instance, one of the things that they found out was the pipeline that's feeding credit scores was not getting updated properly. So now, you can imagine, when you're trying to predict, "Should I give this person the loan or not?" and the credit scores aren't up-to-date, you got a huge impact. You're talking about many tens of millions of dollars over the year. And some of these problems can go undetected because if you could think about hundreds and thousands of pipelines, so many different data sources, data feeding from so many different places.

Jon Krohn: 09:57
The data might look right in a circumstance like that.

Mahesh Kumar: 09:59
Exactly.

Jon Krohn: 10:00
You're getting the credit score in the right format and so nothing breaks or nothing noticeably breaks.

Mahesh Kumar: 10:07
Yes. So that's where I think the observability plays a big role because we are able to catch each of these issues at the source, shift left in terms of detecting the problem and fixing it so you understand it very early, and then, you're able to prevent it from snowballing in.

10:25
The other thing, thread I want to pull on, it's a little bit, we had a discussion around agentic workflows and such. So if you can imagine a series of agents performing a larger task, any error in each of these due to bad data, so many other reasons, but primarily, the input data is not very good. You can imagine the compounding effect, bad decision versus bad decision versus bad decision. Pretty soon, in four or five agents down the road, you're so far divergent from where your ideal scenario is going to be that in the AI era with more AI agents being built for a lot of different tasks, it becomes that much more critical for you to have a handle on data and be able to provide very trusted data to build your models and also trusted data to make the predictions. Your customer 360 database has to be perfect or as close to perfect as possible because that kind of feeds into the model, and then, you get a prediction on the other end.

11:23
So it's an ongoing process and that's why you need something like an observability tool to actually manage all of this. We operate both on on-premise data and cloud data. We are agnostic to all the data platforms, Snowflake, Databricks, AWS, Azure, Google, name them, hyperscaler, or even smaller data platforms, we work with them. So the ability for you to kind of span multiple platforms and also have different on-prem and cloud observability becomes very critical and that's where we excel.

Jon Krohn: 11:55

That leads perfectly into my next question, Anu, which is the importance of infrastructure agnosticism. So could you give us some of your thoughts on avoiding getting locked in to a particular cloud vendor or superscalar? Why does that matter?

Anu Jain: 12:12
Yeah, great question, and great example. As you were speaking just now, I was reminded of what Andrew was talking about just maybe an hour or so ago around data gravity has been really real for a lot of our clients today, but that's reducing. And what we're seeing it's the real cost is not data storage, it's really data compute. And so, we talk about locked in vendors. And so, what we do at Nexus is we're removing the ability of being locked in, we're removing compute from storage. So we're able now to say, "Hey, you're on Databricks today, you're on Snowflake tomorrow, you're on an open source compute layer." It's the ability to decouple all of those pieces of the engine and really get to outcome. And so, our view is it's a world of open compute, it's an open world, open standards, and you should be able to take your compute to where you want to.

Jon Krohn: 13:09
Nicely said. Anything else you'd like to add on to that, Mahesh?

Mahesh Kumar: 13:12
No, I think the world is moving, obviously, as you said, much more to an open environment. I think the cost of these models and compute is also changing quite rapidly. Enterprises, more now than ever, want this sort of optionality of multiple, different vendors as opposed to getting locked into one. So portability of your infrastructure and your ability to analyze data and choose which is the right place to put that becomes very important, and I think totally understand what you're talking about.

Jon Krohn: 13:51
Nicely said. All right. Next topic for both of you gentlemen is data governance. So this is something we promised would be covered in this session. Yeah, I don't even really... I've been working in data science for my whole life. I did a PhD in neuroscience. I've been working commercially in data science for over a decade. I still don't really understand what data governance is. Does either of you want to explain that to me?

Mahesh Kumar: 14:15
I'll take a crack at it. I think probably the reason why you don't understand, to some extent, is it's always been up until now very ivory tower type of a situation where there's a committee that kind of decides how data can and should be used within the enterprise, obviously, for a good reason because you want good standards, good control, security, privacy, all of those rules and regulation, laws, in many cases, they have to be adhered to. And then, that sort of gets percolated down to the data organizations and they kind of use those things in day-to-day.

14:53
What's happening from an Acceldata standpoint, what we are saying is going forward, governance is not going to be a centralized part of the whole equation. Governance has to sort of, metaphorically, move with the data. You have to govern the data wherever it is rather than in a very centralized manner.

15:14
And I think Ali today pointed out about three things, like people, process and product, where how increasingly the people or even organizations or departments are taking charge of AI initiatives, now with the ability to produce code and things of that nature.

15:32
Now, in that scenario, if you think about building up these data products gets decentralized. Now, you cannot have governance centralized trying to manage this thing that is so distributed. So you have to have an architecture where the data management platform essentially understands the state of data wherever it is, for whatever purpose it is being used. And then, it's able to apply the right rules and policies to make sure that whoever is using the data is using it in a way that's appropriate, both from a corporation standpoint, and also, from a legal standpoint, an ethical standpoint.

16:09
So I think data governance is due for a huge shakeup in the near future where people are not going to be looking at it from a committee, ivory tower, of course, there'll be inputs there, but a lot of the action is really going to be very close to the data and where it's being used.

Anu Jain: 16:26
Just to add to that, I think when we talk to our clients, data governance, I like to say everyone talks about it, but no one's really doing it today. And what we find, they've had so much technical debt, so many different tools that it's literally impossible for them to think about how do I follow my data from source to digital twin to mesh, warehouse, applications, whatever it may be.

16:55
And what we're finding is, as we've adopted a composable architecture with open standards using observability, we're able to start to automate a lot of those governance features so that heavy, process-intensive people, or intensive part of this governance world is coming out. And getting that meta information visible is creating a ton of value.

Jon Krohn: 17:17
Excellent. Thank you so much both of you. Anu Jain, Nexus Cognitive CEO, Mahesh Kumar, Acceldata CMO. Thank you so much for this great session on managing data to embrace an AI-first mindset for enterprises. And yeah, hopefully, we'll catch up with you both again soon.

Anu Jain: 17:32
Sounds good. Thank you.

Mahesh Kumar: 17:33
Thank you. It's been a lot of fun.

Jon Krohn: 17:34
All right, I hope you enjoyed today's conversation with Anu Jain and Mahesh Kumar on making enterprise data ready for AI. To be sure not to miss any of our exciting upcoming episodes, subscribe to this podcast if you haven't already. But most importantly, I just hope you'll keep on listening. Until next time, keep on rocking it out there. And I'm looking forward to enjoying another round of the Super Data Science podcast with you very soon.

Show all

arrow_downward

Share on