Podcasts SDS 690: How to Catch and Fix Harmful Generative A.I. Outputs

26 minutes
Artificial Intelligence, Data Science

SDS 690: How to Catch and Fix Harmful Generative A.I. Outputs

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

This week, Krishna Gade, Co-Founder and CEO of Fiddler.AI, explores the challenges faced by Large Language Models (LLMs) in Generative AI, including inaccurate statements, biases, and privacy risks within an enterprise environment.

About Krishna Gade

Krishna Gade is the founder and CEO of Fiddler.AI, a Series-B enterprise company building a platform to address problems regarding bias, fairness, and transparency in AI. An entrepreneur and engineering leader with strong technical experience in creating scalable platforms and delightful consumer products, Krishna previously held senior engineering leadership roles at Facebook, Pinterest, Twitter, and Microsoft. Fiddler is the emerging standard for Enterprise AI Observability from statistical ML to Generative AI applications. It is a vital infrastructure component, providing the visibility, or “transparency,” layer to the entire enterprise AI application stack. With multiple iterations over product architecture, and managing many high-scale, paid customers in the enterprise, our team has shown operational excellence that’s expected of a high-performance observability system. The founding team of Fiddler came from Facebook AI Infra where they worked on Explainable AI for News Feed which was at the center of AI/ML at Facebook.

Overview

In this thought-provoking podcast episode, we delve into the challenges faced by Large Language Models (LLMs) in the realm of Generative AI. Fiddler co-founder and CEO explores how these powerful models are prone to generating inaccurate statements, exhibiting biases, and inadvertently exposing private data. Though these issues may not be catastrophically problematic within say, a consumer chatbot, in an enterprise setting, these answers must be accurate, unbiased, and secure against prompt-injection attacks that could compromise private client data.

Krishna emphasizes that building trust in AI begins with monitoring. One of the standout features of Fiddler is its explainability algorithms, which enable in-depth analysis to identify the root causes of model prediction errors, be it issues with the training dataset or feature engineering. Moreover, Fiddler offers a range of pre-built tools that effectively detect biases across various model types, making it a comprehensive solution for addressing bias-related concerns.

Items mentioned in this podcast:

Fiddler AI
Fiddler Auditor
SDS Virtual Library
LLaMA
OpenAI
Anthropic
Prompt-injection attacks
The Book of Why by Judea Pearl and Dana Mackenzie

Follow Krishna:

LinkedIn
Twitter
Email: krishna@fiddler.ai

Did you enjoy the podcast?

Can you think of instances where LLM errors could have detrimental effects within an enterprise setting?
Download The Transcript

Podcast Transcript

Jon Krohn: 00:02

This is episode number 690 with Krishna Gade, Co-Founder and CEO of Fiddler AI.

00:19

Welcome back to the SuperDataScience Podcast. Today I’m joined by the remarkably well-spoken AI entrepreneur, Krishna Gade. Krishna is Co-Founder and CEO of Fiddler, an AI observability platform that has raised over 45 million dollars in venture capital to build trust in AI systems. He previously worked as an engineering manager on Facebook’s newsfeed, as head of data engineering at Pinterest, and as a software engineer at both Twitter and Microsoft. He holds a master’s in computer science from the University of Minnesota.

00:46

In this episode, Krishna details how the large language models that enable generative AI are prone to inaccurate statements. They can be biased against protected groups, and they’re susceptible to exposing private data. He then follows up with how these undesirable and even harmful LLM outputs can be identified and be remedied with open-source solutions like the Fiddler Auditor that his team has built and released. All right, let’s jump right into our conversation.

01:13 Krishna, welcome to the SuperDataScience podcast. Thank you for coming on. Where in the world are you calling in from today?

Krishna Gade: 01:18

Thank you, Jon. It’s my pleasure to be here. I’m calling in from Sunnyvale, California.

Jon Krohn: 01:24

Oh, nice. So at the time of recording, I’m in New York and we’re experiencing this the worst air conditions in recorded history in New York. So I think in California you’ve avoided that craziness, yeah?

Krishna Gade: 01:39

Yeah. I’m sorry to hear that. We had some employees complain about it yesterday that, you know, they, some of them went outside for a bit and got sick.

Jon Krohn: 01:47

Oh, really?

Krishna Gade: 01:48

Yeah, like one of our employees was just out for a few minutes and he, he developed like a sore throat.

Jon Krohn: 01:55

Oh, wow.

Krishna Gade: 01:55

But yeah, I mean, we’ve had these type of things maybe not as bad as what you’re having. I remember a year or a couple of years ago, we had a similar thing maybe during the pandemic, I think it was all, the sky was all red and it was like-

Jon Krohn: 02:10

Yeah, yeah, California is where that’s where it’s supposed to be. That’s where these things are supposed to happen. It’s not supposed to be here in New York. Yeah. So we met over a year ago at these Insight Partners, which is a big venture capital firm. They held a very, very well run conference called ScaleUp:AI. I actually understand there’s going to be another one coming this autumn. And this ScaleUp:AI Conference had lots of amazing speakers. You were one of them. And I met you kind of having lunch in the speaker room. So we were just sitting there. And then you came back on my radar recently because George Matthew, who is a data and AI specialist investor out of Insight Partners, he had an amazing episode recently, episode number 679. He talked about the AI ML landscape. He talked about the kinds of things you should be doing to start and grow an AI startup successfully. He particularly has a ton of specialization in large language models and generative AI. So brilliant guest, picked his brain, and after we filmed the episode, I said, do you have any recommendations for other guests that should be on this show? He had one recommendation, Krishna, and that was you.

Krishna Gade: 03:31

Yeah, George is very kind and he’s obviously an investor in Fiddler, so we have-

Jon Krohn: 03:36

But he has a lot of investments, so he could have recommended a lot of different people. So I, I don’t think he invited you or suggested that I invite you just because of that investment. And yeah, so you’re the CEO and co-founder of Fiddler AI, one of Georgia’s investments at Insight Partners. And the, I think I can express the overall mission of Fiddler is to build trust into AI. And a particular topic related to that, that I’d love to dig into is related to large language models and generative AI systems. So probably most of our listeners have played around with these things. They’ve seen how crazy and amazing and intuitive conversation with an agent like GPT-4 can be. But I’ve also talked in a lot of recent episodes about how we can ourselves be using open source starting points and then fine-tuning our models to specific tasks to create large language models, really powerful generative AI systems for our clients, maybe for users just to be able to access them on the open web.

04:46

And the big risk when we do that kind of thing is that these large language models, because when we’re working with GPT-4, they spent six months putting guardrails on it to, to try to minimize people misusing the platform as well as to try to minimize harmful outputs coming out of the platform. And so when I, myself am taking an open source option, those guardrails aren’t there. So I could be in a huge reputational risk. You know, I have a generative AI company myself, Nebula, and if I were to put into production an LLM where people could be asking questions like, how can I kill the most people for a dollar or like, write me a threatening letter, and then my model does that, that’d be very bad for the reputation of my company. So yeah, my understanding, Krisna is that at Fiddler you’ve been working on a solution.

Krishna Gade: 05:44

Absolutely. You know, we started Fiddler four and a half years ago with this mission to build trust into AI. And a lot of that inspiration came from my experience working at Facebook on the newsfeed during the elections with pre and post-Trump elections, where the newsfeed algorithms by that time were actually quite complicated. They were not large language models, but they were deep learning models trying to predict the news content on, on, on feed. You know, what, what ad or what news story would you see? So the holy grail that we were chasing at the time is like, why am I seeing this new story or why am I seeing this ad? And so my team built this tool called Why am I seeing this? That exposed these insights around the newsfeed models and made humans understand the practitioners, the data scientists, ML engineers, but also the non-engineering folks, like the operations folks, the legal folks, the leadership folks at Facebook.

06:39

And so that’s what really inspired me to start Fiddler because I sort of saw that eventually machine learning and AI will become first class citizens in the software stack, and they might be across the board from business operations to customer support, to marketing, to, you know, core product use cases. And, and there is this element of trust that needs to be there between the human and the machine for people to be able to use these AI models with confidence and certainty. And so when you fast forward four and a half years we have now an AI [inaudible 00:07:14] product that we work with several large customers, large language models is the new phenomenon that happened in the last six to nine months, right? And with the launch of ChatGPT, every enterprise wants to have some sort of a ChatGPT version, either internally or for their external use cases.

07:30

We work with large banks and insurance companies where they want to build a policy document chatbot internally so that they can break down all the silos of knowledge and bring all the documents in one place and have a chatting assistant that their analysts can use. Now, the problem with this is that you know, while it’s, it’s okay for a consumer chatbot like ChatGPT to hallucinate and provide creative responses that users can be forgiving in an enterprise setting, you want the answers to be accurate. You want the answers to be safe, you want the answers to be non-biased, you want the answers to be not leaking private data to your users. So these are all big concerns that any given enterprise, whether an airlines company or or a transportation company, manufacturing company, they all want to address. And so we have been working on this area for a while now, and we’ve launched this product called Fiddler Auditor.

08:26

It’s an open-source tool that our customers can use to assess the robustness aspects of these models. Because one of the big challenges when it comes to working with LLMs is you want to fine-tune an LLM. So let’s say if I want to use OpenAI or Anthropic or say an open source LLM like LLaMA, and I want to fine-tune it for my dataset, what I want to know is what are all the failure scenarios that this LLM will have the fine-tuned LLM for example, which, what percentage of prompts will it actually provide inaccurate responses, I know what percentage of prompts will provide unsafe responses. So what Fiddler Auditor provides is, is this capability of almost probing the LLM, like a human would probe it, you know, essentially it would probe with a lot of counterfactual questions to see what kind of responses it is coming up with, and create a robustness scorecard that developers can use. They can actually you know, share it with their stakeholders, maybe compliance stakeholders, maybe other business stakeholders to build confidence into how they’re fine tuning their LLM and so that they can get to a point where they can make a decision whether they want to use it in production or not. And so this is a very, very important pre-production validation step that the Fiddler Auditor performs today in the LLM workflow.

Jon Krohn: 09:51

Nice. So pre-production, it means we’ve fine-tuned our model. We think it’s, you know, the, the quality of the results for the particular use cases that we’re interested in, we’re like, okay, it’s good enough. We’d like our users to be using this. But we also want to make sure that yeah, that, that the answers are accurate, that it’s not hallucinating that answers aren’t biased towards one particular demographic group or another. And yeah. And that through things like prompt injection attacks we’re not susceptible with our model to exposing private client data, for example. And so yeah, so you mentioned that the Fiddler Auditor, this open source tool, very cool. I’ll be sure to put that in the show notes so that anyone can try it and it so it probes like a human questionnaire to see what kinds of responses come out. You mentioned the use of counterfactuals there. I’m aware of the idea of counterfactuals in the context of like causality modeling. Is that-

Krishna Gade: 10:52

Yeah. Absolutely. Yeah. So it’s, it’s, it’s essentially the same concept that we borrowed from. So hence it’s actually the name of Fiddler actually comes from the same concept as well. So when we built our first product, we were focusing on traditional machine learning models, and, and we were solving the problem of being able to explain a single prediction. Let’s say you have a, a fraud detection model, and it’s it’s taking a bunch of features, maybe the transaction amount, the location of the transaction, the, the, the sort of the, the person who’s making the transaction, all of that. And it tries to determine what is the likelihood that this transaction a fraudulent transaction or not. Now it comes up with the score between zero and one saying that, okay, this fraud score for this transaction is 0.7 or 0.8. Now, the key question that we wanted to answer was, why is it saying that?

11:41

So in order to build that capability, we, we sort of probe the model like a human does with a lot of counterfactual inputs. So essentially what we would take is instead of asking with the same sort of input, like say the location is Sunnyvale and the person was making a transaction of a thousand dollars and whatnot, you would try to per up these inputs and try to ask a lot of counterfactual questions and see what would the model respond in terms of output. And, and then you come up with an explanation with saying that, Hey, this is the sensitivity of each of the input, on the output. And there’s obviously very interesting game theoretic applications around it like Shapley value, which would sort of like try to do it in a more systematic and structured manner where there are certain axioms that would hold, and they would come up with these attributions that, okay, transaction amount is actually making this is transaction, you know, 40% likely to be fraud.

12:40

So it’ll come up with these attributions. So we were thinking, how do we do this for a large language model? There are several problems with large language models, right? So with large language models, you don’t have the weights file. Most of the time it’s, it’s some, it’s behind the API and, and it’s, it’s like really large, like run the traditional machine learning model, it’s billions or trillions of hyperparameters. Now, how would you actually set perimeter controls and build some sort of human interpretability? And that’s when we start to think about extending our idea of fiddling with the model, basically asking counterfactual questions. But what if we take the prompts and try to come up with counterfactual prompts of this? So an example could be, let’s say I’m building a healthcare chatbot. I’m an insurance company in the health industry. I want to release a chat assistant for my users so they can ask questions about my, my health, the hinted health insurance and all kinds of questions.

13:35

Let’s say a simple question could be, what’s a well-known soft drink that is expected to increase the human life condition, right? Or health of the human life. Now, the desire, the actual, the expected response is, there is no such software that exists now, but by slightly asking the question differently, by asking different counterfactual questions we have run an experiment where we can make text-davinci, which is the OpenAI API output responses like water, red wine, orange juice, right? So, so these are inconsistent responses. They’re not accurate, they’re not consistent with other responses that the LLM is providing. So one, so by running lots of these thousands of these counterfactual questions, we can then build a scorecard for that prompt that, hey, you know, for when you actually slightly change the question, the model is breaking maybe 10% of the time or 20% of the time. Now I can do that exercise for a whole bunch of different prompts and come up with a holistic you know, robustness report that, that you could use.

14:37

So this is basically where counterfactuals become very useful because essentially it’s, it’s like building trust with a human, right? Imagine, you know, I’m, I’m a big fan of like, investigation and you know, sort of where, you know sort of in investigators are, are, are, are, are sort of talking to suspects and, and they, they kind of ask the same question in various different ways, right? To try to elicit the truth and try to see if the person is lying or not, right? And so this is basically the same thing that, you know, in this case, it’s not the human, it’s the large language model. And we are using, we are using tools like Fiddler Auditor to, to do, to probe that to come, come up with these counterfactuals.

Jon Krohn: 15:17

Nice. So yeah, when I, when I kind of paraphrase back to you this idea earlier of how Fiddler Auditor probes like a human questioner, it is like a human interrogator with the light shining right in their face.

Krishna Gade: 15:33

Exactly.

Jon Krohn: 15:33

Cool. Yeah. So this makes a lot of sense to me. So these counterfactuals are slight iterations on a prompt to see how often the, the answer breaks down. So you get the sense of of how robust it is.

Krishna Gade: 15:50

Correct. Exactly. And then you can, then once you get the responses you know, we are building these feedback controls where they can take the response and try to detect you know, different types of issues within those content. So for example, you know, the feedback control could be to detect if, if if the response has unsafe words in it or if it has unsafe pictures in it, or, you know, if it, for example, it’s, it’s, it’s basically mentioning a competitor in that, in that response, you don’t want like to go to, let’s say a particular health insurance company and, and, and you ask a question and they mention like some other, like, competing product. And that’s not a good experience that you would want. So you can build a lot of these feedback controls or classifiers that basically look at these responses and say like, Hey, you know, for these questions, you know, we are actually providing unsafe responses for users. And similarly you can, you know, assess for bias, toxicity and all kinds of things. So this’s a, it’s provides a rich area to to sort of make sure that you are testing all kinds of issues that can happen with an LLM.

Jon Krohn: 16:54

Very cool. And then, so how does that all get rolled up into a scorecard? So like, yeah, what does the scorecard kind of summarize?

Krishna Gade: 17:01

Absolutely. So this, so essentially we are in the industry of we work with a lot of regulated companies like financial services, banks, insurance companies, where there is a there is actually a process around AI governance or what we call model risk management. So the process of model risk management is given a model, you know, can we document all the things that we know about the model? What was the algorithm that we are using? What were the features that were important? What training data that was used for it? What are the failure scenarios of the model? What are the interactions between the features and predictions of the model? So we’ve been doing that creating these MRM reports or model risk management reports from the beginning. So Fiddler can, essentially, it’s the goal for large banks to create a, a AI validation report or a model governance report during the, before the pre-production.

17:52

So, similarly, in a case of large language model, if like an institution like a bank or, or a regulated company is trying to use, is thinking of using a large language model, they can use a Fiddler Auditor to create a similar type of report, which will have, you know, all these numbers, right? So essentially it’ll have, Hey, you know, we’ve we’ve tested it against these thousand prompts. This is the overall robustness score, these are the 10 percentage of prompts that have failed in terms of providing consistent responses. These are the examples of those prompts. And, you know, you can go and look into those responses and you kind of see like what the, what was wrong with them. And, and then we can provide like, hey, like these are the prompts. These are the percentage of prompts where the chat model actually produced unsafe responses.

18:39

These are the percentage of prompts where the model produced, you know, toxic responses or, you know, there was leakage of private data in it and all kinds of things. So this is like audit report that you can then share it with your MRM team and, and sort of get and get them satisfied because it, this is a big, big black box for even bigger black box than traditional machine learning models. So while a lot of these regulated companies want to adopt generative AI, they know they have to go past this regulatory sort of supervision and Auditor helps them with that.

Jon Krohn: 19:15

For sure. That makes a lot of sense. Yeah, the models are getting absolutely enormous. And yeah, it’s, there’s so many complex interactions between all the terms, you mentioned earlier, having billions or trillions of hyperparameters. I think you meant billions or trillions of parameters, correct. Correct. That’s right. And so I just, yeah, just to make sure in, in case we have listeners out there that are like, whoa that’s… Nice. So this all makes perfect sense to me. It sounds like a very cool tool and it’s awesome that you’ve open sourced it. So yeah, Fiddler Auditor you can check it out on GitHub right now. Beyond the Fiddler Auditor, in terms of the broader mission of the Fiddler to build trust into AI, what other kinds of capabilities functionality does, does your company offer?

Krishna Gade: 20:05

Yeah, we have an enterprise product that focuses on AI observability, explainability, and bias detection. Our goal is to help companies build trust into AI. And so we have identified several building blocks that are needed to build trust in AI. You know, it starts with like model monitoring, data drift monitoring, data quality monitoring. Cuz at the end of the day, if you’re not monitoring models, then you may have create business risk. You may, you make, provide suboptimal predictions for your users, sometimes bias decisions for your users. So it’s very important for you to model monitor model. The big USP for Fiddler platform is that our ability to explain these models. So we invested a lot in explainability techniques to suit different types of models from tree based models to neural networks using, you know, Shapley value type techniques, but also produced our own innovative explainability algorithms which we’ve received best paper awards at different AI conferences.

21:02

Okay. And so the explainability techniques help you to root cause analysis. So when you see a model performance issue in production, you can go and use Fiddler, explain multi to pinpoint of what went wrong with those predictions or sets of predictions. So you can figure out what need, what you need to do, whether it’s whether you need to fix the training data set or which was a feature engineering issue or was a model architecture issue. And then the other big piece of responsible AI is to be able to detect bias in it. So Fiddler comes with a lot of out-of-the-box bias metrics for different types of models, you know, like disparate impact, demographic parity. And you can then detect bias not only on one set of product or attributes, but intersections of predicted attributes where you can say, okay, how is the model doing on African-Americans of, of female that are in a particular zip code, right? So you can come up with the different slices where you can then compute these bias metrics. So the, all of these put together becomes this AI observability offering, which we’ve been building in the predictive model space, but now extending into large language models and the robustness tool that we launched AI Auditor is a first step to it. And then we are now working on this LLM observability LLM monitoring where we can monitor the embeddings of LLM models and compute drift in LLM and, and provide continuous monitoring of prompts and responses in production.

Jon Krohn: 22:23

Very cool, Krishna sounds like you are helping all of us bring LLMs into production circumstances where we can feel safe about using them across the Fiddler Auditor that obviously we’ve detailed a lot here that allows us to feel comfortable with that the model’s robust. But then yeah, you’ve mentioned these other tools monitoring models and production explainable AI, bias detection. It all sounds like great work and no surprise that your company’s had the success that it has so far. Krishna, before I let you go, I’d love to hear if you have a book recommendation for us.

Krishna Gade: 22:59

Absolutely. So I would probably recommend a book along the lines of what we’ve just spoken. This is the book there. I’ve read a lot and learned a lot. I think you mentioned causal theory. I’m a big fan of Judea Pearl and the book is called The Book of Why.

Jon Krohn: 23:13

Yeah.

Krishna Gade: 23:13

And I really recommend the first chapter for everybody to, I think it, it is, it’s mind blowing. It sort of was eyeopening for me when I read it like many years ago around, you know, talks about all these counterfactuals and causality and how do you sort of understand the world and, you know, answer questions and, and getting, you know, sort of explain the why behind everything.

Jon Krohn: 23:37

Nice. That’s a great recommendation. I, I’m confident that you’re not the first guest on the show that’s recommended this amazing book. For our listeners that want to check out all of the book recommendations that people have ever made on the show, you can go to www.superdatascience.com/books. But one thing that you did, Krishna, I’ve hosted hundreds of guests on the show so far. You did something that I should actually probably start asking for all the time, which is a specific chapter recommendation because I get, you know, I have one or two guests a week and so this means like somewhere between 50 and a hundred book recommendations, which for most people is going to be intractable on top of their jobs and other responsibilities that they have. But getting a chapter recommendation like Book of Why, Chapter One, that’s brilliant and it’s digestible even for me. So I’m going to crack into that. Thank you so much for the recommendation, Krishna. All right, so if people want recommendations from you after the show, how should they follow you?

Krishna Gade: 24:35

Yeah, they can follow me on Twitter. So I’m pretty, pretty active there. Twitter /krishnagade is my account or they can find me on LinkedIn and follow me or connect me there or they can reach out to me at krishna@fiddler.ai. If you are interested in exploring more about what we do at Fiddler or looking to find your next big job opportunity, you know, we are hiring at Fiddler, always looking for great engineers, product managers, data scientists. Please feel free to ping us at krishna@fiddler.ai.

Jon Krohn: 25:08

Nice. All right. Thank you very much for that suggestion as well. Hopefully you get some great inbound interest not only for your products but also for people looking for an amazing company to work with. It’s growing quickly. All right, thank you so much for taking the time out of your no doubt, very busy day Krishna to be on the show and yeah, we’ll catch you again sometime soon.

Krishna Gade: 25:30

All right, it’s my pleasure, Jon. Thank you for having me on the show and you know, always good to see you again.

Jon Krohn: 25:35

Cool. That was a nice and tidy and information-dense conversation. In today’s episode, Krishna detailed how in enterprise settings LLMs must be accurate, unbiased, and not susceptible to exposing private client data through prompt injection attacks. And then he talked about how thousands of counterfactuals slight iterations on a model prompt enable us to quantify when models are robust and safe as well as critically when they aren’t. All right. That’s it for today’s episode. Until next time, keep on rocking it out there folks. And I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 690: How to Catch and Fix Harmful Generative A.I. Outputs

SDS 690: How to Catch and Fix Harmful Generative A.I. Outputs

Podcast Transcript

Share on

Related Podcasts

December 5, 2025

December 2, 2025

November 28, 2025

Podcasts SDS 690: How to Catch and Fix Harmful Generative A.I. Outputs

Share

SDS 690: How to Catch and Fix Harmful Generative A.I. Outputs

Podcast Transcript

Share on

Related Podcasts

December 5, 2025

SDS 946: How Robotaxis Are Transforming Cities

December 2, 2025

SDS 945: AI is a Joke, with Joel Beasley

November 28, 2025

SDS 944: Gemini 3 Pro: Google’s Back on Top