85 minutes
SDS 843: Safe, Fast and Efficient AI, with Protopia’s Dr. Eiman Ebrahimi
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
What’s holding your AI projects back from success? Dr. Eiman Ebrahimi, CEO of Protopia AI and former NVIDIA scientist, takes us on a fascinating journey through the challenges of AI data security and enterprise scalability. Learn how to escape "proof of concept purgatory," unlock profitable AI solutions, and tackle the trade-offs between cost, speed, and security. Plus, discover how the philosophy of Alan Watts can inspire innovation and drive meaningful change in the world of AI.
Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.
About Eiman
Eiman Ebrahimi is the CEO and co-founder of Protopia AI, a company dedicated to advancing privacy-preserving AI solutions. With a passion for enabling AI to enhance lives while protecting sensitive data, he leads innovation in secure AI adoption across industries. Before Protopia AI, Eiman spent nine years as a Senior Research Scientist at NVIDIA, tackling challenges in accessing massive datasets and optimizing GPU systems for large-scale AI like training LLMs. He holds a Ph.D. in Electrical and Computer Engineering from UT Austin, during which his groundbreaking research on multi-core systems won the best paper award at ASPLOS.
Overview
Dr. Eiman Ebrahimi, CEO of Protopia AI, introduces the groundbreaking "stained glass transform" technology that is redefining AI data security. This innovation enables machine learning models to utilize secure, transformed data that remains functional while safeguarding privacy, addressing a critical challenge for enterprises working with sensitive information.
Eiman explores the complex balance between cost, speed, and security in enterprise AI systems. He provides a detailed analysis of the trade-offs between private infrastructure, which ensures security but comes with significant expense, and shared infrastructure, which is cost-effective but introduces vulnerabilities. Drawing from his expertise at Nvidia and Protopia AI, he presents practical strategies for building scalable, secure, and profitable AI systems.
He examines the barriers that prevent many AI projects from moving beyond "proof of concept purgatory" and offers a structured approach to overcoming these obstacles. By identifying key challenges and offering actionable solutions, Eiman provides a roadmap for turning AI prototypes into production-ready applications.
Adding depth to the discussion, Eiman reflects on how Alan Watts’ philosophy can inspire innovation. He draws connections between entrepreneurial success and the principles of adaptability and creativity, highlighting their importance in navigating the rapidly changing landscape of AI development.
In this episode you will learn:
Items mentioned in this podcast:
Follow Eiman:
Eiman Ebrahimi is the CEO and co-founder of Protopia AI, a company dedicated to advancing privacy-preserving AI solutions. With a passion for enabling AI to enhance lives while protecting sensitive data, he leads innovation in secure AI adoption across industries. Before Protopia AI, Eiman spent nine years as a Senior Research Scientist at NVIDIA, tackling challenges in accessing massive datasets and optimizing GPU systems for large-scale AI like training LLMs. He holds a Ph.D. in Electrical and Computer Engineering from UT Austin, during which his groundbreaking research on multi-core systems won the best paper award at ASPLOS.
Overview
Dr. Eiman Ebrahimi, CEO of Protopia AI, introduces the groundbreaking "stained glass transform" technology that is redefining AI data security. This innovation enables machine learning models to utilize secure, transformed data that remains functional while safeguarding privacy, addressing a critical challenge for enterprises working with sensitive information.
Eiman explores the complex balance between cost, speed, and security in enterprise AI systems. He provides a detailed analysis of the trade-offs between private infrastructure, which ensures security but comes with significant expense, and shared infrastructure, which is cost-effective but introduces vulnerabilities. Drawing from his expertise at Nvidia and Protopia AI, he presents practical strategies for building scalable, secure, and profitable AI systems.
He examines the barriers that prevent many AI projects from moving beyond "proof of concept purgatory" and offers a structured approach to overcoming these obstacles. By identifying key challenges and offering actionable solutions, Eiman provides a roadmap for turning AI prototypes into production-ready applications.
Adding depth to the discussion, Eiman reflects on how Alan Watts’ philosophy can inspire innovation. He draws connections between entrepreneurial success and the principles of adaptability and creativity, highlighting their importance in navigating the rapidly changing landscape of AI development.
In this episode you will learn:
- (02:53) Protopia’s role in AI data security and privacy
- (11:45) The functionality behind Stained Glass Transform
- (22:20) Eiman’s journey from NVIDIA to founding Protopia
- (25:37) Challenges enterprises face with ROI on AI projects
- (36:40) Multi-tenancy in AI systems
- (55:37) Stained Glass Transform’s privacy-preserving capabilities
- (01:09:31) Emerging trends in AI
- (01:14:55) Alan Watts’ philosophies and their link to entrepreneurship
Items mentioned in this podcast:
- Protopia AI
- NVIDIA
- SDS 781: Ensuring Successful Enterprise AI Deployments, with Sol Rashidi
- Stained Glass Transform
- "The Executive's Guide to Secure Data and Impactful AI" article
- "Securely Build Open LLMs with Protopia AI Stained Glass Transform" article
- SiP-ML research article
- Planaria: Spatial Multi-Tenancy research article
- The Wisdom of Insecurity by Alan Watts
- Out of Your Mind Lectures by Alan Watts
- Dream Speech by Alan Watts
- God Bless You, Mr. Rosewater by Kurt Vonnegut
- Mother Night by Kurt Vonnegut
- SDS 700: “The Dream of Life” by Alan Watts
- SuperDataScience
- Jon Krohn's Mathematical Foundations of Machine Learning Course
- The Super Data Science Podcast Team
Follow Eiman:
Podcast Transcript
Jon Krohn: 00:00:00
This is episode number 843 with Dr. Eiman Ebrahimi. CEO of Protopia.
00:00:11
Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week we bring you fun and inspiring people and ideas, exploring the cutting edge of machine learning, AI and related technologies that are transforming our world for the better. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple.
00:00:45
Welcome back to the Super Data Science Podcast. I am fortunate indeed today to be joined by the extremely intelligent and well-spoken Dr. Eiman Ebrahimi. Eiman is CEO of Protopia AI. A venture capital-backed startup based in Austin, Texas that converts sensitive data into a special stochastic format that improves AI model accuracy, protects privacy, and reduces compute costs. Prior to founding Protopia, Eiman spent a decade at NVIDIA as a senior research scientist and computer architect. He holds a PhD in computer engineering from the University of Texas at Austin.
00:01:19
Today's episode is relatively technical, so it might appeal most to technical listeners, but Eiman is such a terrific communicator that anyone interested in AI might end up absolutely loving it. In today's episode, Eiman details how he went from optimizing GPU performance at NVIDIA to revolutionizing AI data security. Why most promising AI projects get stuck in what he calls proof of concept purgatory, and how to escape it. He provides gripping deep detail on the real world trade-offs between the cost, speed and security of running AI models in production. He talks about how to make your enterprise AI products profitable, why having your own private server doesn't make your AI system as secure as you think. And my favorite, what Alan Watt's philosophy teaches us about entrepreneurship and innovation. All right, you ready for this exceptional episode? Let's go. Eiman, welcome to the Super Data Science podcast. We're filming live from New York. Thank you for joining me here.
Eiman Ebrahimi: 00:02:17
Thanks for having me.
Jon Krohn: 00:02:19
We're in a beautiful studio at NeueHouse, and they have amazing cameras. They've got an outstanding audio. So delighted that you came here to join us. And our audience, I'm sure, will enjoy that as well.
Eiman Ebrahimi: 00:02:31
Happy to be here.
Jon Krohn: 00:02:32
So we know each other through Sol Rashidi, who was my guest in episode number 781. It came out in May and it was one of the most popular episodes of this spring. It was all about ensuring successful enterprise AI deployments, and as part of that conversation she mentioned Protopia by name.
Eiman Ebrahimi: 00:02:51
Very cool.
Jon Krohn: 00:02:53
And so, now we get to dig into why Protopia is so essential. The idea is that when you are using third-party LLMs, huge amount of power, you want to take advantage of the absolute state of the art in a lot of circumstances or maybe a compute efficient option, but you might want to be able to switch between vendors and your clients and you want to be assured that your data are secure. And with solutions like Protopia that's possible. So we're going to spend the whole episode talking about this basically. And it's going to be fascinating because you can get really into the technical nitty-gritty.
Eiman Ebrahimi: 00:03:35
Looking forward to doing that. And I think one thing to quickly broaden the scope of what we're doing at Protopia is that yes, it is accurate based on what you just described, that it's applicable to large language models and helping this idea of protecting data when the models are being run on third-party systems. But it's not necessarily just about third-party systems. It's generally focused on whatever system is the most efficient for either a large language model or any other machine learning model for that matter. How can you minimize the exposure of data while you're using those models? So just zooming out just a little bit from only LLMs or third-party, the topic of data exposure in machine learning has always been a challenge and would love to get into that.
Jon Krohn: 00:04:36
Fantastic. We are going to, and so it might sound like you are one of the machine learning engineers or developers or maybe CTO of the company, but in fact you're the CEO of the company.
Eiman Ebrahimi: 00:04:45
Jon Krohn: 00:41:21
Wow.
00:45:55
And because of these tradeoffs that you're outlining, security versus cost saying, "Okay, we want to be super secure, so we're going to dedicate a server with large GPUs expensive server to run, that's going to be just for this one kind of user in our organization." And that means that maybe that server only ends up being used for a few hours a week. And so in that scenario, you've handled the security problem or some of the security problem, some of the issues with security. But the cost is so prohibitive that the return on investment, even though the functionality is great and people love it, it's too expensive to run. You don't get a positive return on, you take a loss, a loss on investment.
00:48:27
And that's come with a lot of actual... We've looked to the market to learn how the market's evolving, because this is also something that's constantly happening. So examples are, you'll note that a lot of the work around open source has been focused on, well, it's open models and the weights are there. So if someone wants to have control over the implementation of the model, someone being the enterprise, they can go grab it from Hugging Face another source, they can go deploy it. While that's true, if you look at the most capable open models, let's just take Llama 3.1 405B, right? Which has changed the game of open versus closed because it is so, so capable. And if you think about deploying Llama 405B at scale, yes, it's doable, but standing it up at scale on an infrastructure and having many, many, call it, hundreds, thousands of users using that on some infrastructure does involve some human expertise that is not easy to find.
00:49:42
It's not the case that all enterprises of all sizes will just have access to that. And that's where we see both infrastructure providers doing a lot to make it easy to use so that this notion of if you need a foundation model, even if it's open models, they will set it up for you on the infrastructure and manage it for you so that your enterprise developers can just have access to an API and they can interact with this API and build applications around it. Some efforts will go farther than that. Their entire emergent space of application providers around open source, where they will take open models and build applications around it in the name of making it easy to use for the enterprise. Because without it those use cases won't come to light because it takes too long. And it taking too long will end up in basically those use cases that have been defined in 2023 actually not making it to production again.
Jon Krohn: 00:51:16
So you did a fair bit of research and publishing at NVIDIA on parallelization, which tackles another aspect or which associates with other aspects of the kinds of problems that we're facing today as an industry, as we try to realize, as you mentioned in your last answer, is we try to realize as an industry value from these extremely expensive capabilities systems that we're developing and deploying.
Eiman Ebrahimi: 00:51:41
This is episode number 843 with Dr. Eiman Ebrahimi. CEO of Protopia.
00:00:11
Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week we bring you fun and inspiring people and ideas, exploring the cutting edge of machine learning, AI and related technologies that are transforming our world for the better. I'm your host, Jon Krohn. Thanks for joining me today. And now let's make the complex simple.
00:00:45
Welcome back to the Super Data Science Podcast. I am fortunate indeed today to be joined by the extremely intelligent and well-spoken Dr. Eiman Ebrahimi. Eiman is CEO of Protopia AI. A venture capital-backed startup based in Austin, Texas that converts sensitive data into a special stochastic format that improves AI model accuracy, protects privacy, and reduces compute costs. Prior to founding Protopia, Eiman spent a decade at NVIDIA as a senior research scientist and computer architect. He holds a PhD in computer engineering from the University of Texas at Austin.
00:01:19
Today's episode is relatively technical, so it might appeal most to technical listeners, but Eiman is such a terrific communicator that anyone interested in AI might end up absolutely loving it. In today's episode, Eiman details how he went from optimizing GPU performance at NVIDIA to revolutionizing AI data security. Why most promising AI projects get stuck in what he calls proof of concept purgatory, and how to escape it. He provides gripping deep detail on the real world trade-offs between the cost, speed and security of running AI models in production. He talks about how to make your enterprise AI products profitable, why having your own private server doesn't make your AI system as secure as you think. And my favorite, what Alan Watt's philosophy teaches us about entrepreneurship and innovation. All right, you ready for this exceptional episode? Let's go. Eiman, welcome to the Super Data Science podcast. We're filming live from New York. Thank you for joining me here.
Eiman Ebrahimi: 00:02:17
Thanks for having me.
Jon Krohn: 00:02:19
We're in a beautiful studio at NeueHouse, and they have amazing cameras. They've got an outstanding audio. So delighted that you came here to join us. And our audience, I'm sure, will enjoy that as well.
Eiman Ebrahimi: 00:02:31
Happy to be here.
Jon Krohn: 00:02:32
So we know each other through Sol Rashidi, who was my guest in episode number 781. It came out in May and it was one of the most popular episodes of this spring. It was all about ensuring successful enterprise AI deployments, and as part of that conversation she mentioned Protopia by name.
Eiman Ebrahimi: 00:02:51
Very cool.
Jon Krohn: 00:02:53
And so, now we get to dig into why Protopia is so essential. The idea is that when you are using third-party LLMs, huge amount of power, you want to take advantage of the absolute state of the art in a lot of circumstances or maybe a compute efficient option, but you might want to be able to switch between vendors and your clients and you want to be assured that your data are secure. And with solutions like Protopia that's possible. So we're going to spend the whole episode talking about this basically. And it's going to be fascinating because you can get really into the technical nitty-gritty.
Eiman Ebrahimi: 00:03:35
Looking forward to doing that. And I think one thing to quickly broaden the scope of what we're doing at Protopia is that yes, it is accurate based on what you just described, that it's applicable to large language models and helping this idea of protecting data when the models are being run on third-party systems. But it's not necessarily just about third-party systems. It's generally focused on whatever system is the most efficient for either a large language model or any other machine learning model for that matter. How can you minimize the exposure of data while you're using those models? So just zooming out just a little bit from only LLMs or third-party, the topic of data exposure in machine learning has always been a challenge and would love to get into that.
Jon Krohn: 00:04:36
Fantastic. We are going to, and so it might sound like you are one of the machine learning engineers or developers or maybe CTO of the company, but in fact you're the CEO of the company.
Eiman Ebrahimi: 00:04:45
That's right.
Jon Krohn: 00:04:46
You are just a highly technical CEO founder, which I understand a lot of investors love and our audience certainly loves for sure. So yeah, so Protopia, thank you for broadening the scope there. In general is a leader in data protection and privacy, preserving machine learning. Let's talk about the name first, Protopia. This is a word that I recently learned in the last few months. Because I'm developing a TV show in one of our working titles is Cracking Utopia.
Eiman Ebrahimi: 00:05:12
Oh wow. Okay.
Jon Krohn: 00:05:12
And I quickly discovered that there's a number of interesting things here, which is that the term utopia means not place. It's a Greek for not place. And so it's designed to describe an impossible.
Eiman Ebrahimi: 00:05:25
That's correct.
Jon Krohn: 00:05:27
Whereas a Protopia, so this is kind of like maybe an episode one of this Cracking Utopia series we'd get into, okay, we call this TV show Utopia because it's a word people know. A paradise on earth maybe. But it's also by definition unattainable. Whereas Protopia is something that we can aspire toward.
Eiman Ebrahimi: 00:05:47
That's right. And I love that you bring this up as the first thing. Because as we go through the different aspects of what it is that we're doing at Protopia, hopefully it'll become more and more clear how that name fits. Because this whole topic of how do we protect data on systems, if we were to consider some utopian version of that, there would be zero, absolute zero, not close to zero, absolute zero exposure at any time. But that being almost impossible, if not impossible to achieve, Protopia would be how do you get as close as possible to that? How do you make it such that you are minimizing attack surfaces around the data as much as possible without bogging the whole system down in the nature of the impossible things that would make it inapplicable.
Jon Krohn: 00:06:48
Bogged down, data transfers are the dystopian nightmare everyone fears. Countless films have been created about it. No, it's true. I mean it would be a nightmare. We expect our devices when we're using any digital tool, we want everything immediately. And that's now interesting in, and you already broadened nicely there, the idea of Protopia supporting any kind of machine learning situation. But LLM is the initial one that I spoke about, they are often highly compute-intensive. You have to wait around and so performance is essential, and you wouldn't want a bottleneck around your data security.
Eiman Ebrahimi: 00:07:32
Yeah, I think there's this kind of expectation that anyone working in the space of data protection, data security needs to set around what different axes of freedom they have to play around with. And one of those that is non-negotiable in the world of machine learning as we see it today, let's call it the world of LLMs and GenAI as the current focus of this space. Latency of how quickly you are able to interact, especially when you're focused on inference, right, that becomes something that's a non-negotiable. You can't really go about building solutions that are heavily focused on protecting data, especially for inference if you are making big impacts on the latency of the system, because then it just starts becoming prohibitive to what the original point is. And that's one of the tenets of what we build our product and solution around. Is focusing on the system level requirements that are pretty common among every use case that anyone imagines for these sort of workloads. So yeah, that's definitely one of the major focuses.
Jon Krohn: 00:08:55
Nice. Makes a huge amount of sense. I also want to highlight quickly that we didn't really define Protopia. We said that it's something that's attainable. And so my understanding of Protopia is this idea of iterating towards improvement. So you talked about, you kind of you alluded to what Protopia means by saying utopia would be a situation where there's absolutely no data security risks and no latency as that happens, but that's probably not ever going to be attainable. However, a Protopia, this idea of continuous iteration and improvement. And so in the context of being able to access machine learning securely, efficiently, that's, I guess the focus of Protopia now. But in terms of a broader term, it means human quality of life on the planet, improving life spans, extending healthier quality of life into later years, a sense of fulfillment, community, all the kinds of things that we dream of for ourselves and the people that come after us.
Eiman Ebrahimi: 00:09:54
Yeah and if we were to create some mapping of that definition, which would be Protopia on earth into Protopia for AI and the data space especially, I think that iteration also has this aspect to it of... Again, one of the ways that we look at this space is that there already exists a lot of effort and technology that goes into securing systems. And data security partially comes from the systems being secure. So the technologies that we build at Protopia, what we're focused on is also being complimentary to this broader set of ecosystem technologies that are being built.
00:10:40
And it's only with these different layers that you're able to achieve better and better protection of the data by way of again, narrowing the attack surface at various different levels. Now in particular, the part of the problem that we have been focused on thus far has been addressing the attack surface at the data layer by what we think of as a proactive approach to data security as opposed to a reactive approach. Where instead of just relying on the system not being compromised, we ask the question if the system were compromised. And that's more or less always the case that can happen, then how do you make it such that proactively whatever it is that's on that system is of minimal use to somebody that may come across it, a bad actor, et cetera.
Jon Krohn: 00:11:37
Yeah, and that actually was the exact context that Sol brought up Protopia in episode 781, where she talked about how Protopia, I'm assuming this is the Stained Glass Transform solution.
Eiman Ebrahimi: 00:11:49
Correct.
Jon Krohn: 00:11:50
That it transforms raw data into a format that even if somehow somebody gets their hands on that, it's not going to be very meaningful to them.
Eiman Ebrahimi: 00:12:02
Correct. Correct. The idea is one of utilizing essentially the fundamentals of machine learning models, and the fact that machine learning models live in fairly large often representational spaces. And so what we are taking advantage of is the ability to move around in that representational space in a manner that the underlying model that's dealing with that data still is able to make sense of the data. But given that there's essentially manifolds of representations that cover the same meaning to the model. Now if you were able to know what those manifolds looked like and started picking representations at random at runtime, now you are making it extremely difficult to follow moving targets for anyone seeing those representations. And that's part of what Stained Glass is doing.
Jon Krohn: 00:13:03
What does that mean manifold? And maybe you can give me a bit of an understanding of that when I think about, okay, my raw data, let's say I am querying a large language model, so I'm saying give me some ideas for how I can increase my profit margins given the situation and I upload some proprietary spreadsheet from my company, I send it off to OpenAI or Anthropic or Quirkir to analyze the data, give me some kind of response. But on the way there I leveraged Protopia, specifically the Stained Glass Transform solution. How do my data change from just being a character string to-
Eiman Ebrahimi: 00:13:45
So if we just take the example of the question and the document that has sensitive information in it there, right? Today when that data needs to be sent to a platform running a language model, and you mentioned three proprietary models, you can, let's again broaden that and consider open models as well. Because open models often are thought of as, oh, if I have an open model then because I have control over the model, then I'm safer somehow.
00:14:21
But ultimately that open model still needs to run somewhere. And what we're thinking about and focused on is the platform that that model runs on. It is less about whether or not the user, the end enterprise, trusts the model provider. In fact we assume that they do. It's not the case that the enterprise is saying the model provider is malicious, not at all. They do trust their vendor, but ultimately those models are running somewhere. That somewhere a compute platform that, like any other compute platform, can be compromised. It can be anything as simple as human error. It can be someone not setting two-factor authentication, very simple mistakes that have big consequences.
Jon Krohn: 00:15:06
Phishing emails.
Eiman Ebrahimi: 00:15:07
Right. All these things that pop up every now and then and there's some compromise that's happened right, now the documents that you mentioned are being sent. Today, those documents end up appearing on those target platforms in plain text. And that plain text turns into embedding vectors with a one-to-one relationship. There's an embedding function that goes from the tokens to the embedding vectors and then the rest of the model beyond the embedding function computes some ads, right?
00:15:40
That second part of the model after the embedding layers lives in a much broader and larger representational space than just what the range of the embedding function covers. So imagine a very large representational space of all kinds of embedding vectors and there's some subset of it that is the range of the embedding function. What we are doing with our core product that's called Stained Glass engine, is that we enable in a post-training step, once the large language model or whatever other machine learning model has been trained, we enable the identification of these, think of it as probability distributions or mathematical functions, that map any given embedding in the range of the embedding function to many other possibilities in that larger distributional space.
00:16:41
And once you have that now as a function, that set of functions that defines those relationships becomes what we call Stained Glass Transform. Now, Stained Glass Transform in effect is itself a very small machine learning model. It's a series of layers, but there's something very special about it. Is that it's not a set of weights, it's a set of probability distributions. And so at runtime, when you're running Stained Glass Transform, you're not rewriting a data record from what it would have been as a deterministic embedding as another deterministic embedding. You are taking a sample of that manifold that I was talking about, where you have one deterministic embedding becoming one sample of many different possibilities, which to the target model, and this is where it's important to realize to the model that it was intended for will mean the same thing.
00:17:35
But otherwise it's a sample in that much larger representational space and there is no one-to-one mapping with the original tokens that were being transferred. Now in the case of the examples that you were giving, you're talking about proprietary models. In those scenarios, the model provider would need to be providing the Stained Glass Transform for their model. In the case of open source models, the infrastructure provider, Protopia ourselves, or the customer can use Stained Glass engine to create the transformation themselves.
Jon Krohn: 00:18:06
Very cool. I do understand what you're describing now. And for our listeners who aren't already familiar with the idea of vector embeddings, this is an absolutely standard, in fact, I'm not aware of a large language model that could work on natural language data, which a large language model kind of inherently is, without converting the language. There were a few words there. So you have a string of characters that you type into your... Is your query. That gets converted into a word that you used many times there, token. And this is kind of token for the most part, it's kind of like a word or a part of a word. So you can think of a longer word would typically get broken up into these sub-word components.
00:18:52
And so you end up, if you pass in a million words into a large language model, a kind of rule of thumb would be that could end up about 700,000 sub-word tokens. And so those sub-word tokens then get converted into this embedding space that you were just describing, which is, it's a sequence of numbers you could think of like coordinates on a map. A map is two-dimensional, a flat map. And so you have latitude and longitude. It's the same idea, except you might have a thousand or 2000 dimensions-
Eiman Ebrahimi: 00:19:32
High dimensional spaces.
Jon Krohn: 00:19:34
... high dimensional spaces. And it's that high dimensionality that allows a large language model to have so much nuance. So every, you know if you think about on a map, okay, you move latitude a little longitude a little, that changes where you are on the surface of the earth and you could go gradually from a valley that's warm and growing fruit to a mountain peak by changing the latitude. And so as you move in one direction, what that describes changes. And in the same way, and let's say you have a thousand dimensional space, each of those thousand directions can relate to changes in real world meaning of the language that's being represented.
Eiman Ebrahimi: 00:20:14
That's right.
Jon Krohn: 00:20:15
And so that's why when you describe the Stained Glass Transform working with those embeddings, it doesn't correspond back to some specific sequence of words and proprietary sequence of words. It just, it's kind of just the general meaning of those words.
Eiman Ebrahimi: 00:20:37
It's the general meaning of those words as that particular target model understands it. Right?
Jon Krohn: 00:20:42
Right.
Eiman Ebrahimi: 00:20:43
Which is where, because we're creating that disconnect to what the model understands. That's why combining that notion with the sampling of the embeddings at runtime, that's happening dynamically, that's what makes it extremely difficult now for these embeddings to just reflect what it is that the original plain text information was. And we're creating a decoupling in effect of the ownership of the plain text information to begin with, where some data owner in the enterprise as a user of this sort of technology would need to get comfortable today with, okay, it's okay to use that target platform, independent of whether the model's open or closed. It's okay to use that platform.
00:21:34
Today, the decision is made based off of plain text exposure. But now the data owner can understand that the plain text is actually never leaving whatever trust boundary I create. And beyond that trust boundary, if I transform the data, what I'm exposing outside of there is no longer plain text. So now, the idea becomes what platforms, what target compute infrastructures are you able to use where you would normally not have been using those. And a lot of use cases we find end up not seeing the light of day exactly because of this problem.
Jon Krohn: 00:22:11
Extremely well said. You're a great explainer of complex information. It's a delight to have you explaining these things.
Eiman Ebrahimi: 00:22:17
Appreciate it.
Jon Krohn: 00:22:18
Yeah, thank you. So let's dig into how you ended up building these solutions. Because you and I were chatting beforehand and it sounds like data security wasn't always the most exciting topic to you.
Eiman Ebrahimi: 00:22:33
I think data security is one of those topics that a lot of folks that like myself come from high-performance backgrounds. So my work prior to Protopia was very heavily focused on improving performance of systems.
Jon Krohn: 00:22:54
You were at NVIDIA, right?
Eiman Ebrahimi: 00:22:55
Yeah, I was at NVIDIA for close to 10 years. And at least half of that time, if not the majority of that time was whether in product or in research, was focused on improving performance of GPU systems. Now with a variety of different target application spaces in mind, but almost exclusively how do we make these systems faster, stronger? And it was during the last few years of my time working on these problems that a lot of what we were doing in the world of making GPU systems faster and stronger, whether it be improving inter-GPU communication speeds or whether it be coming up with new ways to describe locality through the programming language so that the underlying system, whether within the caches of the micro-architecture or accessing memory would do a better job at keeping data that is used in close subsequence, close physically in the memory system, all focused on performance.
00:24:12
Sometimes power would become a big component as well. But there was a time there where we started not... When I say we, it's not just me, many other folks in the high-performance computing space. Started realizing that from a systems perspective there were other problems that really needed addressing, to the point where some of these problems were large enough that if they weren't solved in some seamless manner, some of this innovation happening in making systems faster, stronger, bigger actually wouldn't get used. Because if again, the investment that users enterprises need to make in those systems is so high that the return on that investment is behind some choke point, then without solving the choke point, the investment's not going to return. And so this whole innovation cycle of making things faster, bigger, stronger is not going to go anywhere. And that's what made me interested in data security. Is because it started becoming something that would bubble up when I would look at what we were building in those various system architecture efforts.
Jon Krohn: 00:25:37
As a time back to Sol Rashidi's episode, again, 781, in that episode, a lot of what she was talking about was getting a return on investment on an AI project-
Eiman Ebrahimi: 00:25:48
Yeah, it's huge.
Jon Krohn: 00:25:49
I need to have a commercially successful project. And what you are saying is that at NVIDIA you were working on making these systems highly performant. And so latency reductions, you mentioned power reductions, and so you played a part in realizing in developing the hardware that allows us to have these magical-feeling AI capabilities that we've had for the last 18 months now in the world and are getting wilder and wilder by the quarter. So awesome. Thank you for that. But what you're saying is that you discovered, maybe unexpectedly to you, that the uptake by enterprises of some of these systems was slower than anticipated because people were concerned, okay, this is highly performant but not necessarily secure.
Eiman Ebrahimi: 00:26:44
Yeah, I think the observation is when you think of work that goes on in research, typically the mandate for research is to look 5 to 10 years out. So if you look at any of the things that we're benefiting on right now, whether it's the systems that have been built that enable the training of these magical models or even the software architecture of the models themselves. If you kind of pull on the string of when was the research done on it, the initial research on it will have been much longer ago, but the main parts of it start coming together somewhere in that 5 to 10 year ahead of time. Period.
00:27:32
That also creates the possibility of seeing other problems. As you solve some of the problems, now you see other problems that are also 5 to 10 years out. But some of them start looking really serious in that it looks like if you don't solve that problem, much of everything else is not actually going to come to fruition. And so what you just said of us having seen already that enterprises were stuck behind this problem, that hadn't happened. But we could see that it was probably going to happen.
00:28:12
I think what we're seeing in 2023, 2024 is the coming to life of that problem. Where something really exciting happened in the industry, LLMs started becoming something people could use, and the value of it at a POC level, 100%, everybody started seeing, right? During 2022, a lot of tinkering around happened. 2023, people got more serious, started actually spending some money defining some real, real use cases that, this is an operative word, could create a lot of value for the enterprise. Could. But then as soon as those high value use cases were identified. If you look at what data goes into those use cases, creating the value, those data records that would need to be pulled from here and there in the enterprise, let's assume they're ready to do that. So they've done some data cleanup. When you look at what tiers of data that is, it's generally not in the non-sensitive tiers of data.
00:29:19
If you think of all the enterprise data that exists out there, and let's just call it broad brush three tiers. There's the non-sensitive tier, and then there's some restricted set of tiers. Typically, there's three to five in an organization, then you've got some top secret information, right? Let's forget about the non-sensitive and the top secret. Everything in between those three to five tiers of data, that's where most of the really interesting use cases that people talk about tend to be.
00:29:49
But now after those use cases have been identified in '23 and through '24, at least what we see is that a lot of those use cases, we'll just sit around. And people will either try to take a couple more steps of proving that the use case is actually going to create value. But a question needs to be asked of where are you going to run these models at scale in a performant manner in order to deliver the value that the boards of the organizations are asking for? And that's where things start becoming a little bit more complicated than, I just need a powerful model. There's a lot of other elements that go into it.
Jon Krohn: 00:30:32
Ready to take your knowledge in machine learning and AI to the next level? Join SuperDataScience and access an ever-growing library of over 40 courses and 200 hours of content. From beginners to advanced professionals, SuperDataScience has tailored programs just for you, including content on large language models, gradient boosting and AI. With 17 unique career paths to help you navigate the courses, you will stay focused on your goal. Whether you aim to become a machine learning engineer, a generative AI expert, or simply add data skills to your career, SuperDataScience has you covered. Start your 14-day free trial today at superdatascience.com.
00:31:12
This is so compelling. I feel like I'm a VC investor hearing this pitch from you, and I'm like, "I need to throw my money at this. How can the world go on without this problem being solved?" Let's talk a little bit before we dig even more. It's so tantalizing to talk about how this is going to happen. How Protopia allows people to be at scale, securely performingly, deploying LLMs and other machine learning models. But first let's just do a little bit on your experience at NVIDIA, the research that you were doing there that ultimately led you to Protopia. All right, so tell us about dynamic architecture vision. That sounds fun.
Eiman Ebrahimi: 00:31:55
Oh wow, that's a blast from the past.
Jon Krohn: 00:31:58
And how does dynamic architecture vision influence the scalability and efficiency of cloud-based AI services?
Eiman Ebrahimi: 00:32:05
Yeah, so that particular piece of research work, which was the first author on that Suresh, was at UCSD at the time, and there were a handful of industry collaborators on that paper. It was part of a broader kind of understanding of how cloud AI services will eventually turn out. And the idea, if we want to kind of place the research work in time, was there was a moment in time, there was a lot of work had been done over the past, say 10 years, focused heavily on machine learning training, deep neural network training. And it was a time that looking at what it would mean to deploy models at scale and do so in a cost-efficient manner was becoming important. Because once you've actually started training these models and you have the ability to train them, and that wave of research has come about natural, next set of questions is, okay, how do we cost efficiently do this at scale? And that particular paper was focused on systolic array architectures.
Jon Krohn: 00:33:30
What does that mean? Systolic... I was actually this morning having my annual physical and as part of that, they took my systolic and diastolic pressure.
Eiman Ebrahimi: 00:33:40
Well, this is probably different from that. It is different from that. Systolic arrays are actually a concept. I think they go back to the late seventies of these architectures where there's processing elements very tightly coupled. And the idea is to be able to organize these processing elements in the on-chip memory in a fashion that is very good for certain types of computation, like the computation that happens in neural networks, but not necessarily for all general purpose compute. They've kind of come and gone in importance. But as deep neural networks became more and more important in the industry, they've had quite a bit of a comeback. I think for instance, TPUs would be classified as systolic array architectures.
Jon Krohn: 00:34:34
TPUs, but not an NVIDIA GPU.
Eiman Ebrahimi: 00:34:36
No, because NVIDIA GPUs have historically been designed for graphics, but they also happen to be very good at doing the types of tasks that are involved in machine learning. And over the years, over the past 10 years, highly tuned to be extremely good at that. But independently of whether systolic arrays are better or GPUs are better, that's an orthogonal topic. What that paper was actually concerned with is this notion of inference as a service. Of you have models, they've been trained, and now you have many different users that you're trying to serve these models to.
00:35:26
The topic that paper you're asking about was getting into fission was the word that's being used was, if you have these systolic arrays in there, large substrates of many, many different processing elements, but now you need to serve models that maybe don't require all of what's there. How do you go about doing this in a cost-efficient manner, cost-efficiency when it comes to hardware substrates often results in the question of, okay, well what are you running? How highly utilized is that substrate? Are all of those processing elements actually being used? Because once you've powered this thing up, you're delivering the power to it. It's sitting there in a rack, in a data center, it's being cooled. There's a lot of costs that goes into that. How well utilized is that piece of that has been brought up?
00:36:18
And so when you look at what amount of hardware it takes to do inferencing, just the prediction part of the machine learning process post-training after deployment, often the models don't require the entirety of the hardware substrate. And so if you start having an entire substrate come up and you're just using this tiny corner of it to serve a model, it's not very cost-efficient. So multi-tenancy on these substrates would be something that industry is familiar with. It becomes a question of how you do that.
00:36:54
Now, that particular paper was talking about how do you dynamically break down a large systolic array in order to be able to provide this inference as a service in a multi-tenant fashion? And what are all the micro architecture details that need to go in there? What does the interconnect need to look like? How does on-chip memory need to organize itself? And there are a lot of cool ideas there that Surush had that the paper was around. But what's interesting there that you bring that paper up is that that particular problem of inference as a service being able to deploy things in a multi-tenant fashion, in order for it to be cost-effective.
Jon Krohn: 00:37:33
We should probably define multi-tenant quickly.
Eiman Ebrahimi: 00:37:35
So multi-tenant just meaning there are multiple different users, let's call them just users for right now abstractly, that are receiving service in the form of sending a request and inference request. Let's call it in the context of LLMs a prompt with some context that the model's going to respond with something with. And these multiple users are sharing a hardware substrate, right? Now, sharing a hardware substrate can happen at multiple different levels. In this particular case, in that paper we're talking about at the chip level. But multi-tenancy doesn't necessarily need to happen at the chip level. It could happen at the rack level, box level, or rack level as well. So it kind of depends on what layer of abstraction you're talking about, but it's multi-tenant just means there's different tendencies that belong to different users.
Jon Krohn: 00:38:32
Yeah, multiple tenants. So you and I are both simultaneously in our browser on our laptop using ChatGPT, and they could both be getting sent to the same server or the same chip and my request and your request are being processed simultaneously on the same hardware.
Eiman Ebrahimi: 00:38:51
That's correct. That's correct. And that involves a lot of different things. It involves all of those users basically having active sessions that are on those different systems. You can imagine now all of those different users will be using their own data. All of those different users will have their own credentials to interact with their session. And so multi-tenancy is a way that we've always looked at making the use of systems more efficient for inferencing of machine learning. And in fact, GPUs, independent of systolic arrays have also had those sort of important features built into them. So NVIDIA, MIGs, multi-instance, GPUs are an example of doing the same thing. Where a GPU can be broken down into seven micro-GPUs, and each of those micro-GPUs can separately be handling an application for a user. And that goes back to the same concept of how do you make it efficient?
00:39:55
Now what we're talking about is inside of a processing element, as I mentioned, this could also be across a board that has eight GPUs, each of the GPUs handling a different entity set that's also multi-tenant. And so that whole area is actually one of the reasons why we started to see the importance of data protection. Because in multi-tenant systems, because you have multiple different users with multiple different sessions, that's where the data owners behind any of those user sessions will have to think about the fact that the data that they're sending to their session is going onto some level of shared infrastructure with other entities.
00:40:47
And it's really, really important to note that this is not about this target system not being secure at all. Security, however, as soon as it becomes a multi-tenant system is a shared responsibility among all of the people entities on that system. Because one entity doing something poorly can lead to system compromise, which now you have other people's data on that same system.
You are just a highly technical CEO founder, which I understand a lot of investors love and our audience certainly loves for sure. So yeah, so Protopia, thank you for broadening the scope there. In general is a leader in data protection and privacy, preserving machine learning. Let's talk about the name first, Protopia. This is a word that I recently learned in the last few months. Because I'm developing a TV show in one of our working titles is Cracking Utopia.
Eiman Ebrahimi: 00:05:12
Oh wow. Okay.
Jon Krohn: 00:05:12
And I quickly discovered that there's a number of interesting things here, which is that the term utopia means not place. It's a Greek for not place. And so it's designed to describe an impossible.
Eiman Ebrahimi: 00:05:25
That's correct.
Jon Krohn: 00:05:27
Whereas a Protopia, so this is kind of like maybe an episode one of this Cracking Utopia series we'd get into, okay, we call this TV show Utopia because it's a word people know. A paradise on earth maybe. But it's also by definition unattainable. Whereas Protopia is something that we can aspire toward.
Eiman Ebrahimi: 00:05:47
That's right. And I love that you bring this up as the first thing. Because as we go through the different aspects of what it is that we're doing at Protopia, hopefully it'll become more and more clear how that name fits. Because this whole topic of how do we protect data on systems, if we were to consider some utopian version of that, there would be zero, absolute zero, not close to zero, absolute zero exposure at any time. But that being almost impossible, if not impossible to achieve, Protopia would be how do you get as close as possible to that? How do you make it such that you are minimizing attack surfaces around the data as much as possible without bogging the whole system down in the nature of the impossible things that would make it inapplicable.
Jon Krohn: 00:06:48
Bogged down, data transfers are the dystopian nightmare everyone fears. Countless films have been created about it. No, it's true. I mean it would be a nightmare. We expect our devices when we're using any digital tool, we want everything immediately. And that's now interesting in, and you already broadened nicely there, the idea of Protopia supporting any kind of machine learning situation. But LLM is the initial one that I spoke about, they are often highly compute-intensive. You have to wait around and so performance is essential, and you wouldn't want a bottleneck around your data security.
Eiman Ebrahimi: 00:07:32
Yeah, I think there's this kind of expectation that anyone working in the space of data protection, data security needs to set around what different axes of freedom they have to play around with. And one of those that is non-negotiable in the world of machine learning as we see it today, let's call it the world of LLMs and GenAI as the current focus of this space. Latency of how quickly you are able to interact, especially when you're focused on inference, right, that becomes something that's a non-negotiable. You can't really go about building solutions that are heavily focused on protecting data, especially for inference if you are making big impacts on the latency of the system, because then it just starts becoming prohibitive to what the original point is. And that's one of the tenets of what we build our product and solution around. Is focusing on the system level requirements that are pretty common among every use case that anyone imagines for these sort of workloads. So yeah, that's definitely one of the major focuses.
Jon Krohn: 00:08:55
Nice. Makes a huge amount of sense. I also want to highlight quickly that we didn't really define Protopia. We said that it's something that's attainable. And so my understanding of Protopia is this idea of iterating towards improvement. So you talked about, you kind of you alluded to what Protopia means by saying utopia would be a situation where there's absolutely no data security risks and no latency as that happens, but that's probably not ever going to be attainable. However, a Protopia, this idea of continuous iteration and improvement. And so in the context of being able to access machine learning securely, efficiently, that's, I guess the focus of Protopia now. But in terms of a broader term, it means human quality of life on the planet, improving life spans, extending healthier quality of life into later years, a sense of fulfillment, community, all the kinds of things that we dream of for ourselves and the people that come after us.
Eiman Ebrahimi: 00:09:54
Yeah and if we were to create some mapping of that definition, which would be Protopia on earth into Protopia for AI and the data space especially, I think that iteration also has this aspect to it of... Again, one of the ways that we look at this space is that there already exists a lot of effort and technology that goes into securing systems. And data security partially comes from the systems being secure. So the technologies that we build at Protopia, what we're focused on is also being complimentary to this broader set of ecosystem technologies that are being built.
00:10:40
And it's only with these different layers that you're able to achieve better and better protection of the data by way of again, narrowing the attack surface at various different levels. Now in particular, the part of the problem that we have been focused on thus far has been addressing the attack surface at the data layer by what we think of as a proactive approach to data security as opposed to a reactive approach. Where instead of just relying on the system not being compromised, we ask the question if the system were compromised. And that's more or less always the case that can happen, then how do you make it such that proactively whatever it is that's on that system is of minimal use to somebody that may come across it, a bad actor, et cetera.
Jon Krohn: 00:11:37
Yeah, and that actually was the exact context that Sol brought up Protopia in episode 781, where she talked about how Protopia, I'm assuming this is the Stained Glass Transform solution.
Eiman Ebrahimi: 00:11:49
Correct.
Jon Krohn: 00:11:50
That it transforms raw data into a format that even if somehow somebody gets their hands on that, it's not going to be very meaningful to them.
Eiman Ebrahimi: 00:12:02
Correct. Correct. The idea is one of utilizing essentially the fundamentals of machine learning models, and the fact that machine learning models live in fairly large often representational spaces. And so what we are taking advantage of is the ability to move around in that representational space in a manner that the underlying model that's dealing with that data still is able to make sense of the data. But given that there's essentially manifolds of representations that cover the same meaning to the model. Now if you were able to know what those manifolds looked like and started picking representations at random at runtime, now you are making it extremely difficult to follow moving targets for anyone seeing those representations. And that's part of what Stained Glass is doing.
Jon Krohn: 00:13:03
What does that mean manifold? And maybe you can give me a bit of an understanding of that when I think about, okay, my raw data, let's say I am querying a large language model, so I'm saying give me some ideas for how I can increase my profit margins given the situation and I upload some proprietary spreadsheet from my company, I send it off to OpenAI or Anthropic or Quirkir to analyze the data, give me some kind of response. But on the way there I leveraged Protopia, specifically the Stained Glass Transform solution. How do my data change from just being a character string to-
Eiman Ebrahimi: 00:13:45
So if we just take the example of the question and the document that has sensitive information in it there, right? Today when that data needs to be sent to a platform running a language model, and you mentioned three proprietary models, you can, let's again broaden that and consider open models as well. Because open models often are thought of as, oh, if I have an open model then because I have control over the model, then I'm safer somehow.
00:14:21
But ultimately that open model still needs to run somewhere. And what we're thinking about and focused on is the platform that that model runs on. It is less about whether or not the user, the end enterprise, trusts the model provider. In fact we assume that they do. It's not the case that the enterprise is saying the model provider is malicious, not at all. They do trust their vendor, but ultimately those models are running somewhere. That somewhere a compute platform that, like any other compute platform, can be compromised. It can be anything as simple as human error. It can be someone not setting two-factor authentication, very simple mistakes that have big consequences.
Jon Krohn: 00:15:06
Phishing emails.
Eiman Ebrahimi: 00:15:07
Right. All these things that pop up every now and then and there's some compromise that's happened right, now the documents that you mentioned are being sent. Today, those documents end up appearing on those target platforms in plain text. And that plain text turns into embedding vectors with a one-to-one relationship. There's an embedding function that goes from the tokens to the embedding vectors and then the rest of the model beyond the embedding function computes some ads, right?
00:15:40
That second part of the model after the embedding layers lives in a much broader and larger representational space than just what the range of the embedding function covers. So imagine a very large representational space of all kinds of embedding vectors and there's some subset of it that is the range of the embedding function. What we are doing with our core product that's called Stained Glass engine, is that we enable in a post-training step, once the large language model or whatever other machine learning model has been trained, we enable the identification of these, think of it as probability distributions or mathematical functions, that map any given embedding in the range of the embedding function to many other possibilities in that larger distributional space.
00:16:41
And once you have that now as a function, that set of functions that defines those relationships becomes what we call Stained Glass Transform. Now, Stained Glass Transform in effect is itself a very small machine learning model. It's a series of layers, but there's something very special about it. Is that it's not a set of weights, it's a set of probability distributions. And so at runtime, when you're running Stained Glass Transform, you're not rewriting a data record from what it would have been as a deterministic embedding as another deterministic embedding. You are taking a sample of that manifold that I was talking about, where you have one deterministic embedding becoming one sample of many different possibilities, which to the target model, and this is where it's important to realize to the model that it was intended for will mean the same thing.
00:17:35
But otherwise it's a sample in that much larger representational space and there is no one-to-one mapping with the original tokens that were being transferred. Now in the case of the examples that you were giving, you're talking about proprietary models. In those scenarios, the model provider would need to be providing the Stained Glass Transform for their model. In the case of open source models, the infrastructure provider, Protopia ourselves, or the customer can use Stained Glass engine to create the transformation themselves.
Jon Krohn: 00:18:06
Very cool. I do understand what you're describing now. And for our listeners who aren't already familiar with the idea of vector embeddings, this is an absolutely standard, in fact, I'm not aware of a large language model that could work on natural language data, which a large language model kind of inherently is, without converting the language. There were a few words there. So you have a string of characters that you type into your... Is your query. That gets converted into a word that you used many times there, token. And this is kind of token for the most part, it's kind of like a word or a part of a word. So you can think of a longer word would typically get broken up into these sub-word components.
00:18:52
And so you end up, if you pass in a million words into a large language model, a kind of rule of thumb would be that could end up about 700,000 sub-word tokens. And so those sub-word tokens then get converted into this embedding space that you were just describing, which is, it's a sequence of numbers you could think of like coordinates on a map. A map is two-dimensional, a flat map. And so you have latitude and longitude. It's the same idea, except you might have a thousand or 2000 dimensions-
Eiman Ebrahimi: 00:19:32
High dimensional spaces.
Jon Krohn: 00:19:34
... high dimensional spaces. And it's that high dimensionality that allows a large language model to have so much nuance. So every, you know if you think about on a map, okay, you move latitude a little longitude a little, that changes where you are on the surface of the earth and you could go gradually from a valley that's warm and growing fruit to a mountain peak by changing the latitude. And so as you move in one direction, what that describes changes. And in the same way, and let's say you have a thousand dimensional space, each of those thousand directions can relate to changes in real world meaning of the language that's being represented.
Eiman Ebrahimi: 00:20:14
That's right.
Jon Krohn: 00:20:15
And so that's why when you describe the Stained Glass Transform working with those embeddings, it doesn't correspond back to some specific sequence of words and proprietary sequence of words. It just, it's kind of just the general meaning of those words.
Eiman Ebrahimi: 00:20:37
It's the general meaning of those words as that particular target model understands it. Right?
Jon Krohn: 00:20:42
Right.
Eiman Ebrahimi: 00:20:43
Which is where, because we're creating that disconnect to what the model understands. That's why combining that notion with the sampling of the embeddings at runtime, that's happening dynamically, that's what makes it extremely difficult now for these embeddings to just reflect what it is that the original plain text information was. And we're creating a decoupling in effect of the ownership of the plain text information to begin with, where some data owner in the enterprise as a user of this sort of technology would need to get comfortable today with, okay, it's okay to use that target platform, independent of whether the model's open or closed. It's okay to use that platform.
00:21:34
Today, the decision is made based off of plain text exposure. But now the data owner can understand that the plain text is actually never leaving whatever trust boundary I create. And beyond that trust boundary, if I transform the data, what I'm exposing outside of there is no longer plain text. So now, the idea becomes what platforms, what target compute infrastructures are you able to use where you would normally not have been using those. And a lot of use cases we find end up not seeing the light of day exactly because of this problem.
Jon Krohn: 00:22:11
Extremely well said. You're a great explainer of complex information. It's a delight to have you explaining these things.
Eiman Ebrahimi: 00:22:17
Appreciate it.
Jon Krohn: 00:22:18
Yeah, thank you. So let's dig into how you ended up building these solutions. Because you and I were chatting beforehand and it sounds like data security wasn't always the most exciting topic to you.
Eiman Ebrahimi: 00:22:33
I think data security is one of those topics that a lot of folks that like myself come from high-performance backgrounds. So my work prior to Protopia was very heavily focused on improving performance of systems.
Jon Krohn: 00:22:54
You were at NVIDIA, right?
Eiman Ebrahimi: 00:22:55
Yeah, I was at NVIDIA for close to 10 years. And at least half of that time, if not the majority of that time was whether in product or in research, was focused on improving performance of GPU systems. Now with a variety of different target application spaces in mind, but almost exclusively how do we make these systems faster, stronger? And it was during the last few years of my time working on these problems that a lot of what we were doing in the world of making GPU systems faster and stronger, whether it be improving inter-GPU communication speeds or whether it be coming up with new ways to describe locality through the programming language so that the underlying system, whether within the caches of the micro-architecture or accessing memory would do a better job at keeping data that is used in close subsequence, close physically in the memory system, all focused on performance.
00:24:12
Sometimes power would become a big component as well. But there was a time there where we started not... When I say we, it's not just me, many other folks in the high-performance computing space. Started realizing that from a systems perspective there were other problems that really needed addressing, to the point where some of these problems were large enough that if they weren't solved in some seamless manner, some of this innovation happening in making systems faster, stronger, bigger actually wouldn't get used. Because if again, the investment that users enterprises need to make in those systems is so high that the return on that investment is behind some choke point, then without solving the choke point, the investment's not going to return. And so this whole innovation cycle of making things faster, bigger, stronger is not going to go anywhere. And that's what made me interested in data security. Is because it started becoming something that would bubble up when I would look at what we were building in those various system architecture efforts.
Jon Krohn: 00:25:37
As a time back to Sol Rashidi's episode, again, 781, in that episode, a lot of what she was talking about was getting a return on investment on an AI project-
Eiman Ebrahimi: 00:25:48
Yeah, it's huge.
Jon Krohn: 00:25:49
I need to have a commercially successful project. And what you are saying is that at NVIDIA you were working on making these systems highly performant. And so latency reductions, you mentioned power reductions, and so you played a part in realizing in developing the hardware that allows us to have these magical-feeling AI capabilities that we've had for the last 18 months now in the world and are getting wilder and wilder by the quarter. So awesome. Thank you for that. But what you're saying is that you discovered, maybe unexpectedly to you, that the uptake by enterprises of some of these systems was slower than anticipated because people were concerned, okay, this is highly performant but not necessarily secure.
Eiman Ebrahimi: 00:26:44
Yeah, I think the observation is when you think of work that goes on in research, typically the mandate for research is to look 5 to 10 years out. So if you look at any of the things that we're benefiting on right now, whether it's the systems that have been built that enable the training of these magical models or even the software architecture of the models themselves. If you kind of pull on the string of when was the research done on it, the initial research on it will have been much longer ago, but the main parts of it start coming together somewhere in that 5 to 10 year ahead of time. Period.
00:27:32
That also creates the possibility of seeing other problems. As you solve some of the problems, now you see other problems that are also 5 to 10 years out. But some of them start looking really serious in that it looks like if you don't solve that problem, much of everything else is not actually going to come to fruition. And so what you just said of us having seen already that enterprises were stuck behind this problem, that hadn't happened. But we could see that it was probably going to happen.
00:28:12
I think what we're seeing in 2023, 2024 is the coming to life of that problem. Where something really exciting happened in the industry, LLMs started becoming something people could use, and the value of it at a POC level, 100%, everybody started seeing, right? During 2022, a lot of tinkering around happened. 2023, people got more serious, started actually spending some money defining some real, real use cases that, this is an operative word, could create a lot of value for the enterprise. Could. But then as soon as those high value use cases were identified. If you look at what data goes into those use cases, creating the value, those data records that would need to be pulled from here and there in the enterprise, let's assume they're ready to do that. So they've done some data cleanup. When you look at what tiers of data that is, it's generally not in the non-sensitive tiers of data.
00:29:19
If you think of all the enterprise data that exists out there, and let's just call it broad brush three tiers. There's the non-sensitive tier, and then there's some restricted set of tiers. Typically, there's three to five in an organization, then you've got some top secret information, right? Let's forget about the non-sensitive and the top secret. Everything in between those three to five tiers of data, that's where most of the really interesting use cases that people talk about tend to be.
00:29:49
But now after those use cases have been identified in '23 and through '24, at least what we see is that a lot of those use cases, we'll just sit around. And people will either try to take a couple more steps of proving that the use case is actually going to create value. But a question needs to be asked of where are you going to run these models at scale in a performant manner in order to deliver the value that the boards of the organizations are asking for? And that's where things start becoming a little bit more complicated than, I just need a powerful model. There's a lot of other elements that go into it.
Jon Krohn: 00:30:32
Ready to take your knowledge in machine learning and AI to the next level? Join SuperDataScience and access an ever-growing library of over 40 courses and 200 hours of content. From beginners to advanced professionals, SuperDataScience has tailored programs just for you, including content on large language models, gradient boosting and AI. With 17 unique career paths to help you navigate the courses, you will stay focused on your goal. Whether you aim to become a machine learning engineer, a generative AI expert, or simply add data skills to your career, SuperDataScience has you covered. Start your 14-day free trial today at superdatascience.com.
00:31:12
This is so compelling. I feel like I'm a VC investor hearing this pitch from you, and I'm like, "I need to throw my money at this. How can the world go on without this problem being solved?" Let's talk a little bit before we dig even more. It's so tantalizing to talk about how this is going to happen. How Protopia allows people to be at scale, securely performingly, deploying LLMs and other machine learning models. But first let's just do a little bit on your experience at NVIDIA, the research that you were doing there that ultimately led you to Protopia. All right, so tell us about dynamic architecture vision. That sounds fun.
Eiman Ebrahimi: 00:31:55
Oh wow, that's a blast from the past.
Jon Krohn: 00:31:58
And how does dynamic architecture vision influence the scalability and efficiency of cloud-based AI services?
Eiman Ebrahimi: 00:32:05
Yeah, so that particular piece of research work, which was the first author on that Suresh, was at UCSD at the time, and there were a handful of industry collaborators on that paper. It was part of a broader kind of understanding of how cloud AI services will eventually turn out. And the idea, if we want to kind of place the research work in time, was there was a moment in time, there was a lot of work had been done over the past, say 10 years, focused heavily on machine learning training, deep neural network training. And it was a time that looking at what it would mean to deploy models at scale and do so in a cost-efficient manner was becoming important. Because once you've actually started training these models and you have the ability to train them, and that wave of research has come about natural, next set of questions is, okay, how do we cost efficiently do this at scale? And that particular paper was focused on systolic array architectures.
Jon Krohn: 00:33:30
What does that mean? Systolic... I was actually this morning having my annual physical and as part of that, they took my systolic and diastolic pressure.
Eiman Ebrahimi: 00:33:40
Well, this is probably different from that. It is different from that. Systolic arrays are actually a concept. I think they go back to the late seventies of these architectures where there's processing elements very tightly coupled. And the idea is to be able to organize these processing elements in the on-chip memory in a fashion that is very good for certain types of computation, like the computation that happens in neural networks, but not necessarily for all general purpose compute. They've kind of come and gone in importance. But as deep neural networks became more and more important in the industry, they've had quite a bit of a comeback. I think for instance, TPUs would be classified as systolic array architectures.
Jon Krohn: 00:34:34
TPUs, but not an NVIDIA GPU.
Eiman Ebrahimi: 00:34:36
No, because NVIDIA GPUs have historically been designed for graphics, but they also happen to be very good at doing the types of tasks that are involved in machine learning. And over the years, over the past 10 years, highly tuned to be extremely good at that. But independently of whether systolic arrays are better or GPUs are better, that's an orthogonal topic. What that paper was actually concerned with is this notion of inference as a service. Of you have models, they've been trained, and now you have many different users that you're trying to serve these models to.
00:35:26
The topic that paper you're asking about was getting into fission was the word that's being used was, if you have these systolic arrays in there, large substrates of many, many different processing elements, but now you need to serve models that maybe don't require all of what's there. How do you go about doing this in a cost-efficient manner, cost-efficiency when it comes to hardware substrates often results in the question of, okay, well what are you running? How highly utilized is that substrate? Are all of those processing elements actually being used? Because once you've powered this thing up, you're delivering the power to it. It's sitting there in a rack, in a data center, it's being cooled. There's a lot of costs that goes into that. How well utilized is that piece of that has been brought up?
00:36:18
And so when you look at what amount of hardware it takes to do inferencing, just the prediction part of the machine learning process post-training after deployment, often the models don't require the entirety of the hardware substrate. And so if you start having an entire substrate come up and you're just using this tiny corner of it to serve a model, it's not very cost-efficient. So multi-tenancy on these substrates would be something that industry is familiar with. It becomes a question of how you do that.
00:36:54
Now, that particular paper was talking about how do you dynamically break down a large systolic array in order to be able to provide this inference as a service in a multi-tenant fashion? And what are all the micro architecture details that need to go in there? What does the interconnect need to look like? How does on-chip memory need to organize itself? And there are a lot of cool ideas there that Surush had that the paper was around. But what's interesting there that you bring that paper up is that that particular problem of inference as a service being able to deploy things in a multi-tenant fashion, in order for it to be cost-effective.
Jon Krohn: 00:37:33
We should probably define multi-tenant quickly.
Eiman Ebrahimi: 00:37:35
So multi-tenant just meaning there are multiple different users, let's call them just users for right now abstractly, that are receiving service in the form of sending a request and inference request. Let's call it in the context of LLMs a prompt with some context that the model's going to respond with something with. And these multiple users are sharing a hardware substrate, right? Now, sharing a hardware substrate can happen at multiple different levels. In this particular case, in that paper we're talking about at the chip level. But multi-tenancy doesn't necessarily need to happen at the chip level. It could happen at the rack level, box level, or rack level as well. So it kind of depends on what layer of abstraction you're talking about, but it's multi-tenant just means there's different tendencies that belong to different users.
Jon Krohn: 00:38:32
Yeah, multiple tenants. So you and I are both simultaneously in our browser on our laptop using ChatGPT, and they could both be getting sent to the same server or the same chip and my request and your request are being processed simultaneously on the same hardware.
Eiman Ebrahimi: 00:38:51
That's correct. That's correct. And that involves a lot of different things. It involves all of those users basically having active sessions that are on those different systems. You can imagine now all of those different users will be using their own data. All of those different users will have their own credentials to interact with their session. And so multi-tenancy is a way that we've always looked at making the use of systems more efficient for inferencing of machine learning. And in fact, GPUs, independent of systolic arrays have also had those sort of important features built into them. So NVIDIA, MIGs, multi-instance, GPUs are an example of doing the same thing. Where a GPU can be broken down into seven micro-GPUs, and each of those micro-GPUs can separately be handling an application for a user. And that goes back to the same concept of how do you make it efficient?
00:39:55
Now what we're talking about is inside of a processing element, as I mentioned, this could also be across a board that has eight GPUs, each of the GPUs handling a different entity set that's also multi-tenant. And so that whole area is actually one of the reasons why we started to see the importance of data protection. Because in multi-tenant systems, because you have multiple different users with multiple different sessions, that's where the data owners behind any of those user sessions will have to think about the fact that the data that they're sending to their session is going onto some level of shared infrastructure with other entities.
00:40:47
And it's really, really important to note that this is not about this target system not being secure at all. Security, however, as soon as it becomes a multi-tenant system is a shared responsibility among all of the people entities on that system. Because one entity doing something poorly can lead to system compromise, which now you have other people's data on that same system.
Jon Krohn: 00:41:21
Wow.
Eiman Ebrahimi: 00:41:22
And so this has historically been a thing where, as an industry we've approached it as, okay, well what if we just make it such that it is one organization's tendency? So if I don't want to share a tendency for certain tiers of data with other organizations and I'm going through the same infrastructure provider, I'll ask for my own tendency. Great. That's one thing that improves the situation.
And so this has historically been a thing where, as an industry we've approached it as, okay, well what if we just make it such that it is one organization's tendency? So if I don't want to share a tendency for certain tiers of data with other organizations and I'm going through the same infrastructure provider, I'll ask for my own tendency. Great. That's one thing that improves the situation.
Jon Krohn: 00:41:52
That'll be more expensive typically, right?
That'll be more expensive typically, right?
Eiman Ebrahimi: 00:41:54
Absolutely. It typically requires you to give some level of commitment to how long you're going to use the system for it to be private just to yourself because the costs need to be shared. And then also what's important to recognize is that multi-tenancy still exists within the organization. You might say the organization has a single tendency, but now you have different departments inside the same organization with different data owner silos.
00:42:21
You've got an organization that has an HR department and a finance department and another department dealing with health records. These typically are not the same data ownership configurations. And so even within an organization and having your own tenancy still has considerations with respect to how data gets exposed. And this is one of those places where when we think about systems, when we think about return on investment from those systems to the enterprise, and how enterprises consider where they can create value in the use cases and what investments do they need to do for those use cases to bear fruit, this is where challenges start happening.
00:43:11
Absolutely. It typically requires you to give some level of commitment to how long you're going to use the system for it to be private just to yourself because the costs need to be shared. And then also what's important to recognize is that multi-tenancy still exists within the organization. You might say the organization has a single tendency, but now you have different departments inside the same organization with different data owner silos.
00:42:21
You've got an organization that has an HR department and a finance department and another department dealing with health records. These typically are not the same data ownership configurations. And so even within an organization and having your own tenancy still has considerations with respect to how data gets exposed. And this is one of those places where when we think about systems, when we think about return on investment from those systems to the enterprise, and how enterprises consider where they can create value in the use cases and what investments do they need to do for those use cases to bear fruit, this is where challenges start happening.
00:43:11
Because if the enterprise gets stuck behind, I need to create private systems for different data owners to be able to run their models. You are competing. You have two competing priorities, efficiency of those systems and cost. So the investment part of the ROI. And whether or not that will ever actually happen to show you the return on the investment. And those two competing priorities are why data security is so important. So to tie it back to something we were talking about earlier of how does data security come into the picture. Without a seamless solution to data security, all of these important innovations on the system side, on the microarchitecture side, they don't turn into value on the enterprise side. And our understanding in this space at Protopia is that this is a problem that we need to solve as an industry in order for all of these really great innovations to create value in the real world. And without it, this whole innovation cycle can very well get stuck.
Jon Krohn: 00:44:27
Mathematics forms the core of data science and machine learning. And now, with my Mathematical Foundations of Machine Learning course, you can get a firm grasp of that math, particularly the essential linear algebra and calculus. You can get all the lectures for free on my YouTube channel, but if you don't mind paying a typically small amount for the Udemy version, you get everything from YouTube plus fully worked solutions to exercises and an official course completion certificate. As countless guests on the show have emphasized, to be the best data scientist you can be, you've got to know the underlying math. So, check out the links to my Mathematical Foundations of Machine Learning course in the show notes, or at jonkrohn.com/udemy. That's jonkrohn.com/U-D-E-M-Y.
00:45:14
So yeah. So to kind of play back for you, in my own words, machine learning innovations, like prominently right now generative AI innovations, since 2023, lots of organizations, enterprises have been doing proofs of concept on these capabilities. And these proofs of concept when shown to executives in the company say, "Wow, this is incredible." You show it to users, they say, "Oh my goodness, I can't wait to get my hands on this." And then comes the hard work of figuring out how to deploy that proof of concept into a scalable, real world secure production system.
Jon Krohn: 00:44:27
Mathematics forms the core of data science and machine learning. And now, with my Mathematical Foundations of Machine Learning course, you can get a firm grasp of that math, particularly the essential linear algebra and calculus. You can get all the lectures for free on my YouTube channel, but if you don't mind paying a typically small amount for the Udemy version, you get everything from YouTube plus fully worked solutions to exercises and an official course completion certificate. As countless guests on the show have emphasized, to be the best data scientist you can be, you've got to know the underlying math. So, check out the links to my Mathematical Foundations of Machine Learning course in the show notes, or at jonkrohn.com/udemy. That's jonkrohn.com/U-D-E-M-Y.
00:45:14
So yeah. So to kind of play back for you, in my own words, machine learning innovations, like prominently right now generative AI innovations, since 2023, lots of organizations, enterprises have been doing proofs of concept on these capabilities. And these proofs of concept when shown to executives in the company say, "Wow, this is incredible." You show it to users, they say, "Oh my goodness, I can't wait to get my hands on this." And then comes the hard work of figuring out how to deploy that proof of concept into a scalable, real world secure production system.
00:45:55
And because of these tradeoffs that you're outlining, security versus cost saying, "Okay, we want to be super secure, so we're going to dedicate a server with large GPUs expensive server to run, that's going to be just for this one kind of user in our organization." And that means that maybe that server only ends up being used for a few hours a week. And so in that scenario, you've handled the security problem or some of the security problem, some of the issues with security. But the cost is so prohibitive that the return on investment, even though the functionality is great and people love it, it's too expensive to run. You don't get a positive return on, you take a loss, a loss on investment.
Eiman Ebrahimi: 00:46:51
So absolutely correct. Now, let's add one really other important wrinkle into the mix as well, which is time. We're running out of time to create that value. And by we, I mean the industry. There is a clock ticking on all the investment that as an industry has been made into building the applications and the systems. Translating that into value is necessary. There's no two ways about it. It's necessary. And there's a timeframe on that, because people are paying attention both at the enterprise and they're spending money on these POCs trying to get to okay production value.
So absolutely correct. Now, let's add one really other important wrinkle into the mix as well, which is time. We're running out of time to create that value. And by we, I mean the industry. There is a clock ticking on all the investment that as an industry has been made into building the applications and the systems. Translating that into value is necessary. There's no two ways about it. It's necessary. And there's a timeframe on that, because people are paying attention both at the enterprise and they're spending money on these POCs trying to get to okay production value.
00:47:45
And so, one of the things that I think the industry has realized is that we need easy to use or quick to use solutions as well. So introducing completely new frameworks is also pretty challenging. And that's another one of our tenants of how we approach our product. Is how do we fit into what is being built, both software and hardware infrastructure, in a manner that what it is that we're building can deliver this narrowing of the attack surface at the data layer that would compliment what happens in system security and not require the user to do something significantly differently.
00:48:27
And that's come with a lot of actual... We've looked to the market to learn how the market's evolving, because this is also something that's constantly happening. So examples are, you'll note that a lot of the work around open source has been focused on, well, it's open models and the weights are there. So if someone wants to have control over the implementation of the model, someone being the enterprise, they can go grab it from Hugging Face another source, they can go deploy it. While that's true, if you look at the most capable open models, let's just take Llama 3.1 405B, right? Which has changed the game of open versus closed because it is so, so capable. And if you think about deploying Llama 405B at scale, yes, it's doable, but standing it up at scale on an infrastructure and having many, many, call it, hundreds, thousands of users using that on some infrastructure does involve some human expertise that is not easy to find.
00:49:42
It's not the case that all enterprises of all sizes will just have access to that. And that's where we see both infrastructure providers doing a lot to make it easy to use so that this notion of if you need a foundation model, even if it's open models, they will set it up for you on the infrastructure and manage it for you so that your enterprise developers can just have access to an API and they can interact with this API and build applications around it. Some efforts will go farther than that. Their entire emergent space of application providers around open source, where they will take open models and build applications around it in the name of making it easy to use for the enterprise. Because without it those use cases won't come to light because it takes too long. And it taking too long will end up in basically those use cases that have been defined in 2023 actually not making it to production again.
00:50:43
And that whole theme of making it an easy to enter point, I think exists across all layers of this stack. From the application provider, the foundation model provider, all the way down to the infrastructure providers.
And that whole theme of making it an easy to enter point, I think exists across all layers of this stack. From the application provider, the foundation model provider, all the way down to the infrastructure providers.
Jon Krohn: 00:50:57
Gotcha. Yeah, lots to dig into there that you have covered beautifully. I want to actually move on to another research topic that you had because it also ties into the problems that we're facing today. And this is parallelization.
Eiman Ebrahimi: 00:51:15
That's right.
That's right.
Jon Krohn: 00:51:16
So you did a fair bit of research and publishing at NVIDIA on parallelization, which tackles another aspect or which associates with other aspects of the kinds of problems that we're facing today as an industry, as we try to realize, as you mentioned in your last answer, is we try to realize as an industry value from these extremely expensive capabilities systems that we're developing and deploying.
Eiman Ebrahimi: 00:51:41
Yeah, parallelization strategies. I think they kind of fall into that category of coming up the layers of abstraction from kind of microarchitectural improvements to system aspects to just pure software. How do you best make use of large amounts of compute that you have available to you? And that particular body of work was at a time where, and I think it continues to this day, where there's a lot of effort that goes into how to make best use of thousands, tens of thousands of GPUs to reduce the both power cost and time that goes into training a new model.
00:52:31
And one of the really, really interesting aspects at that time to me was the various different modes of parallelization that people had started looking at. At that time data parallel was probably the main thing. Model parallelism had just started becoming something people were looking at. I think the paper that you were referring to is where we were looking at the different kinds of model parallel. One being kind of model, the traditional model parallel and then pipeline parallelism. After that, Tensor Parallelism has been kind of added to the mix. And there's entire startups that focus on how to best implement these various parallelization strategies with some really, really great impacts. Because even tens of percents improvement there at the scale that we're talking about means a lot of money and a lot of energy that's really important in this problem.
And one of the really, really interesting aspects at that time to me was the various different modes of parallelization that people had started looking at. At that time data parallel was probably the main thing. Model parallelism had just started becoming something people were looking at. I think the paper that you were referring to is where we were looking at the different kinds of model parallel. One being kind of model, the traditional model parallel and then pipeline parallelism. After that, Tensor Parallelism has been kind of added to the mix. And there's entire startups that focus on how to best implement these various parallelization strategies with some really, really great impacts. Because even tens of percents improvement there at the scale that we're talking about means a lot of money and a lot of energy that's really important in this problem.
00:53:28
But one of the interesting things is that these were all problems heavily focused on the training side, and when we think about who all in the world is doing training, training in large scale from scratch of the models, there is, it has significant impact of course, because just from an energy consumption perspective and the fact that we're always improving these models is very big, very big problem. But, as we've kind of come into the space where now there's a lot more use of the models being defined and actual prediction tasks and building applications, the inference side is also the place where you think of where compute is going to be used most in the next 10, 20 years.
00:54:21
It's likely that most of the compute is going to be inference compute. And so that's where I think that particular piece of research that you're referring to, I think of that as one of the places where a shift for me started happening of, again, from the hardcore where improving performance for the most compute intensive tasks, which at the time training was the main thing. In fact, there was a back and forth between whether training of large language models was the most compute intensive task that we spend our time on thinking about or large recommender systems. Also, back then it wasn't. As a researcher in that field, I was looking at both of those problems. But now, looking back four years ago, that's where most of my interest was. That interest started shifting around that 2019, 2020 timeframe to thinking about what about inference? How do we scale inference and how do you have efficient compute delivering all of that, which is tied to some of what we just talked about.
Jon Krohn: 00:55:30
And so now you mentioned earlier Llama 3.1 405B. You might have a better sense off the top of your head as to how many kind of state-of-the-art NVIDIA GPUs would be required to run a model that large at inference time, but it would be multiple.
Eiman Ebrahimi: 00:55:44
Yeah, I believe if it's running in its baseline capacity, it's 32 H100s.
Yeah, I believe if it's running in its baseline capacity, it's 32 H100s.
Jon Krohn: 00:55:52
Wow.
Wow.
Eiman Ebrahimi: 00:55:52
So you need four boxes of GPUs to even stand up one instance of it. One instance. So if you are thinking of doing any sort of fine-tuning of it or trying to serve a lot of different users, now you're talking about multiple of that in order to be able to get through fine-tuning, for instance, in a time efficient manner. But yeah, that's where you start understanding the complexities around maintaining those systems is why is not trivial to think that everybody's just going to do this for themselves. Even though it's doable theoretically, right? There's nothing stopping you from going and getting 32 GPUs somewhere on the cloud and trying to stand this up. But how quickly will you have a production ready system for that I think sometimes gets underestimated a little bit.
Jon Krohn: 00:56:44
No doubt. All right, cool. So we've now kind of set the stage in terms of historical work that you did research and a lot of the problems that we face as an industry today around efficiency, driving value and having security at the same time. So now, tell us about Protopia and the Stained Glass Transform solution that you have this product, what makes it privacy, privacy preserving and unique? What allows Stained Glass to tackle this slew of problems that we've now been describing over the course of the episode so far?
Eiman Ebrahimi: 00:57:21
Yeah, very good point to kind of connect some of these dots. So we did talk somewhat about the privacy preserving aspects of it, of this notion that we take representations from what is deterministic to these randomized representations that are still consumable by a target model. Now it's good to talk about what makes it more usable than some of the existing technologies that have been worked on for quite some time. There's one particular set of deliberate trade-offs that we have made in designing Stained Glass Transform as an approach that kind of feeds into our whole protopia versus utopia narrative as well.
Eiman Ebrahimi: 00:57:21
Yeah, very good point to kind of connect some of these dots. So we did talk somewhat about the privacy preserving aspects of it, of this notion that we take representations from what is deterministic to these randomized representations that are still consumable by a target model. Now it's good to talk about what makes it more usable than some of the existing technologies that have been worked on for quite some time. There's one particular set of deliberate trade-offs that we have made in designing Stained Glass Transform as an approach that kind of feeds into our whole protopia versus utopia narrative as well.
00:58:18
It's entertaining to think about in hindsight. Is that we think of what it means to make data that would be potentially leaked from a target system, difficult to understand for a bad actor that would come across it. Our standard way of thinking of protecting data has always been encryption and continues to be the gold standard of how we think of things. If we've encrypted a data record and somebody doesn't have access to the key, then we tend to think it's safe. Safe as can be.
It's entertaining to think about in hindsight. Is that we think of what it means to make data that would be potentially leaked from a target system, difficult to understand for a bad actor that would come across it. Our standard way of thinking of protecting data has always been encryption and continues to be the gold standard of how we think of things. If we've encrypted a data record and somebody doesn't have access to the key, then we tend to think it's safe. Safe as can be.
Jon Krohn: 00:59:05
I mean that's basically where I go.
I mean that's basically where I go.
Eiman Ebrahimi: 00:59:07
And key management is one of the things that kind of makes that difficult when things start getting passed around, but it's safe, right? Now when we think of encryption, there's a whole body of work that has existed that goes by homomorphic encryption that focuses on, okay, well what if we could just not take data out of encrypted mode and do all of our computation on that? And there's decades of work that has gone into homomorphic encryption and continues to make progress and improve and new versions of products today come out that focus on that. The biggest challenge with homomorphic encryption has generally been that when you think of the operations that go into complicated deep neural networks, implementing those in homomorphic encryption is possible but tends to blow up the latency of the actual task, right? We're talking multiple orders of magnitude. To the point where it becomes prohibitively impactful on the task that you're trying to do.
01:00:22
What we've done is we look at encryption as you're adding stochasticity to the data. And Stained Glass makes a trade-off to say, what if the stochasticity we were adding to the data was not just the random, which results in the challenges that we just talked about with homomorphic encryption, but it was curated for a target task to not have the latency penalties. How do you make it such that the target task can still operate on that stochastic data? The opportunity we found to bring this to reality was the nature of machine learning itself. Which is why Stained Glass is primarily or exclusively today in the real world, it's used for machine learning models as the target, not some arbitrary computation.
01:01:18
So, much like how we've always approached systems problems, we've constrained the problem to say, if our target was just machine learning deep learning models, how would we use stochasticity to protect the data without taking the latency penalty? So when you ask what's unique about it's that particular trade-off that is being very deliberately made. And so now the problem becomes, well, what is that stochasticity? How do you know what stochasticity won't impact a target model?
01:01:53
And this is where the really interesting math comes into play, where we formulate that particular problem as a machine learning problem itself. So we say, we want to know what the stochasticity is, let's go learn it. Let's use machine learning to learn the stochasticity. So what we do, what Stained Glass Engine, our core product does, is that it takes a given pre-trained model, and in a post-training step, it reformulates the problem using the same machine learning tools that you're using today. So it's an extension to PyTorch essentially, where now you go about learning, using machine learning, what the stochasticity you can add to the data that you are going to in the future send this pre-trained model can be such that this model can still operate. And that involves some math that we can talk about.
01:02:51
But essentially that's what's happening and that's what makes this something that doesn't change the tooling in a significant manner. It's a post-training step that you can drop into your existing training loop. That's one of our tenants. It won't impact in a meaningful way the latency of the operation at inference time, because the complexity of that set of layers that we are learning can be controlled. We define it in a way that it's not going to just blow up in computational space. And third is that we're doing all of this targeted as a machine learning model so that there's no keys involved and the model can just operate on the target data when the data is transformed.
Jon Krohn: 01:03:39
Wow, that is cool. And that term then, Stained Glass, does that relate to this idea of how a Stained Glass window, in the real world, blends and scatters light waves, so there's kind of like it introduces some kind of stochasticity?
Eiman Ebrahimi: 01:03:55
Yeah, yeah, it does. And we've spent the majority of the time today talking about language models and text as the data modality, but everything we talk about with respect to Stained Glass and the engine and how you learn these transformations, it's applicable to any machine learning model. So with computer vision, when you look at what it is that these transformed data records look like in the image space or in video, what the transformed imagery looks like, looks exactly like you've got a pane of Stained Glass and you're sitting very close to it. And so what you see on the other side is like you're describing this highly fragmented scattered representation that you can't really make sense of as a human, but imagine that Stained Glass being specific for the model if the model was sitting on the other side.
Jon Krohn: 01:04:50
Nice. All right. This has been an amazing conversation. Something that we dug up in our research we can tie to the conversation we've just been having, is that Protopia has published an insightful series of articles called The Executive's Guide to Secure Data and Impactful AI. We'll be sure to include a link to that in the show notes. That happened to be in partnership with Sol Rashidi, whom we've already been talking about since the beginning of this episode.
01:05:15
And the second article in this series is called Risks in AI Systems and Mitigation Strategies. I'd love to hear from you about what the biggest AI risks are for enterprises that they frequently overlook, and something that comes to mind for me in the context of the homomorphic encryption that you've been talking about recently is when people are doing compute, whether it's on a cloud device or edge, there are security vulnerabilities that you have no control over and that pop up all the time. I think let's dig into that risk and how Protopia can mitigate the risk?
Eiman Ebrahimi: 01:05:55
Yeah, I think what we talk about in that kind of three-part series around risks and when we think about what is it that's often overlooked. Is that there's new usage models around AI and language models that are different from how data has been used in the past with just any application. The main difference being this notion that the systems are increasingly complex to maintain, and I'm not necessarily talking about just very large systems to train these. No. Back to the example of just open model, deploying it at scale for many users and managing that is non-trivial task. It's doable, non-trivial. Enterprise needs time to value, so they need their info provider application providers to help them do that. The data owner in the enterprise doesn't really want to, nor can they get involved with all the implementation details of what happens in these systems.
01:07:08
Now, a common thought process is to say, if the implementation is somehow inside my VPC or if the implementation is on-prem, it is much, much safer than if it's somewhere else. And I think that's something that gets overlooked sometimes. Because at the end of the day, the implementation details are not particularly clear. If you have some entity helping, and that entity can just be the IT department of your organization or someone external, and the data owner does not want to or have the time to get involved with every implementation detail of where did the data... Until, what point was it encrypted and what point did it get decrypted and did it get stored after that? And if it got stored, how long did it get stored? All the things that go into accepting whether or not an implementation is good, these deployment models and this sort of use of data is relatively new, and it's based on nude, it's nude, it's raw, and it's unencrypted and it exposes the data.
01:08:23
That's funny actually slip of the word in there. But the model is new. And so I think organizationally what we see enterprises grapple with is, again, back to this notion of, okay, what are all of the system security features that we need to put into play? And it doesn't consider that, doesn't always consider, many people are aware of it, but doesn't always consider that even on-prem systems, often it just takes a bad container being run to expose the entire system because of some vulnerability in the state-of-the-art software that you were using, run it. Knowing that that is a possibility and wanting to not just block every use case that touches some tier of data that is sensitive in some way, and that can be code. Code completion is one of the big, big use cases that now post-RAG, that's the biggest enterprise use case that comes up.
01:09:33
Code completion, code refactoring, writing snippets of code in bigger bodies of code. These are use cases that pop up all the time. Yet, it's very difficult to envision calling code in the organization non-sensitive, and it can be sent to whatever platform it can't. So in order for those things to actually make progress, being able to proactively make sure that if and when leakage of data happens, it's not exposed in a unencrypted, nude, raw way. That's the thing that I think we find organizations very interested in digging into once they realize that it's an aspect that is getting in the way of creating value.
01:10:22
Because when we talk to organizations, our point is not at all to say you shouldn't be doing something. There's plenty of tooling in the market that helps them stop requests going to services that they don't want to or helps the CISO understand what systems are being used, what data sources are being connected to what models. And those are all very important part of the overall AI security posture that the organization needs to have. But when you need to create value, how can you do that safely? And that's the part that we've been having a lot of interesting conversations around.
Jon Krohn:01:10:58
Nice. Fascinating. And looking to the future with the next big trends that we have coming in AI as we make the shift from generative AI systems like LLMs being so effective, we are getting more and more into Agentic AI where we're trusting those generative systems to work independently as opposed to just be called upon by us to provide some information. So Agentic AI or what are the kinds of shifts do you see in the future and how does that relate to the security efficiency trade-offs that we've been talking about throughout the episode?
Eiman Ebrahimi: 01:11:32
Yeah, I think is interesting to observe that the space of applications around LLMs and AI is very quickly not going to be, oh, there's an LLM, there's an application. We just have to protect that. That is a part of a broader system of potentially agents. It seems like more and more the narrative of how the market is evolving into making use of these models. And what that means from a data security perspective is again, we need to think differently about, oh, everything's just going to live on a system that's going to be right next to where the data lives. Because if you've got agents, those agents are dealing with different data sources. There are potentially different places. Some of them will be on-prem, some of them will be in a private cloud. Some of them may be in a public cloud served to you by an application provider that needs to run things multi-tenant in order to make their business model work.
01:12:34
So suddenly the thinking of data exposure among these systems will need to be different. And I think it's not just us that is innovating in this space. There's a lot of innovation actually happening in the homomorphic encryption space, and it needs to be considered, where is it applicable? In fact, I think it was just a few weeks ago, Apple announced some new homomorphic encrypted versions for things like information retrieval that they're embarking upon. And there are bits and pieces of the problem that may be solved with being able to do something in homomorphic encrypted mode. In fact, Stained Glass itself is a great, great application to run in homomorphic mode. Because you can imagine if you're taking plain text information and turning it into a transformed representation, doing that operation under complete encryption is fantastic. Because you do that in a completely homomorphic manner, but then you release the rest of the computation that is potentially very complicated and it's challenging to implement it in a homomorphic fashion to run on whatever hardware is accessible and most efficient to run that.
01:13:54
So when you ask the question of what will data security look like? I think data security will need to involve in these Agentic systems in more complicated use of models as components of a larger system solving a problem, we'll need to focus on where do those different components run? What are the acceptable exposure parameters of those systems in terms of the data you need to send it to, and how can you manage that in a programmatic manner? And Stained Glass, we believe is a big unlock for that sort of broader system and needs to and will combine with these other technologies.
Jon Krohn: 01:14:39
And so tying it to something that you discussed earlier. In the episode, you talked about when doing research, you want to be looking ahead to problems that are 5, 10 years away. And you just highlighted there again away that the kinds of solutions you're developing in Protopia will solve the problems of the future.
Eiman Ebrahimi: 01:14:56
Yeah. And we look really deeply into partnerships in order to facilitate delivering these cutting edge solutions in the fastest manner. I think one of the things that we see across the ecosystem is that from the largest of the businesses that have made all of this possible, all the way to the startups that are very active in this space and building a lot of very important technology, partnering and being able to deliver broader solutions is really essential to actually delivering value. And so we've spent a lot of time, again, across the stack. From the provider to the builders of the foundation models themselves, to the application providers, building on top of that, finding ways that we can unlock the use of data from the topmost user all the way down to the infrastructure that needs to crunch on that data in order to create value. How do we plug in across that stack is a big portion of being able to again, deliver the larger value that the industry really does need to survive.
Jon Krohn: 01:16:10
Very cool. So we recently met in person before recording this episode here in New York. We met in Austin, your hometown.
Eiman Ebrahimi: 01:16:20
Yes.
Jon Krohn: 01:16:20
And where Protopia is headquartered.
Eiman Ebrahimi: 01:16:22
Yes.
Jon Krohn: 01:16:23
And while we were having drinks, we got talking about Alan Watts. So I know that Alan Watts is somebody I've highlighted on the show before, I think episode 800, if I'm remembering correctly.
Eiman Ebrahimi: 01:16:35
Oh wow, that's accurate.
Jon Krohn: 01:16:36
Well, it was on the hundred. And so, for every hundredth episode I've done something special. And I'm pretty sure that for episode 800, it was a... I recited part of an Alan Watts speech, his dream speech. Well, whether it was 800 or 700, it'll be in the show notes. And so I also learned in that conversation that you are widely read. And so, I'm really interested now. A question I ask all of my guests. I can't wait to hear what your book recommendation is for us.
Eiman Ebrahimi: 01:17:16
All right. Well, since we're talking about Alan Watts, I feel compelled, even though there's many, many really amazing books that come to mind, but I think Alan Watts is, Wisdom of Insecurity?
Jon Krohn: 01:17:29
Right.
Eiman Ebrahimi: 01:17:29
Is a really important one. And I think the theme of it actually goes hand in hand with a lot of what happens in this sort of entrepreneurial space. Doesn't need to be connected to that. I think there's a lot of themes in life that fall in this world of what the book talks about, but a biggest takeaway just being that there's a great deal to be learned about how one lives, one's life. If we kind of focus on what our almost obsession on wanting to tell ourselves or other people that we know something for a fact does. And I think it highlights that it's not necessary. It highlights that a lot of the secondary effects of that obsession to feel like we know the answer-
Jon Krohn: 01:18:41
Or we know what's going to happen.
Eiman Ebrahimi: 01:18:43
Or we know what's going to happen, leads to not really having a good time in life.
Jon Krohn: 01:18:49
That's right. That's right.
Eiman Ebrahimi: 01:18:50
And unnecessary also. So the book makes a case for how there's a lot of freedom in how you approach life and how you can enjoy life given the very scarce opportunity that it is and how much more grateful you can be about how it is and what the experience is at any given moment by way of detaching from that obsession of needing to be sure about things. So it's a highly recommended read.
Jon Krohn: 01:19:26
Nice. Yeah, that was the book that I think got us talking about Alan Watts.
Eiman Ebrahimi: 01:19:28
Yes.
Jon Krohn: 01:19:29
You also recommended to me, and I've ordered it's sitting on the very top of my pile of books next to my bed, and I haven't had a chance to quite start it yet, because I'm not typically a multi-book reader. I finish before-
Eiman Ebrahimi: 01:19:41
What are you reading now?
Jon Krohn: 01:19:42
I am about to finish. I'm like a dozen pages away from finishing Kurt Vonnegut's, Thank you. Mr. Rosewater.
Eiman Ebrahimi: 01:19:50
Yes. Very good book.
Jon Krohn: 01:19:53
So I'm big into Kurt Vonnegut. Have been for more than a decade. But a year ago I decided to start reading all of his fiction novels in chronological order. I read one here and there starting with his most famous one, Sirens of Titan, Cat's Cradle, Slaughterhouse-Five, and then just randomly picking ones. And I was like, I want to read through all of them and I want to do it in order. And reading it in order is actually a really rewarding experience, because he has recurring characters and locations and sometimes they are actually the same character from other books or other times they just happen to have the same name. It's like coincidental. But it's interesting to see his thinking evolve.
01:20:42
And something else that was really interesting for me about doing it this way is that his books are always put in the science fiction section. But two of his early works, including God Bless You, Mr. Rosewater, that's what it's called. God Bless You, Mr. Rosewater is the name of the book, and God bless You, Mr. Rosewater as well as the book right before it in chronology, which is... Oh my goodness, I can't remember. I'm blanking at the name of it. But it follows... Yeah, I'm completely blank on the name of the book.
Eiman Ebrahimi: 01:21:18
We'll have to pick that one up and put it in notes.
Jon Krohn: 01:21:20
But yeah, exactly. But with both of those books, they aren't science fiction, they're just fiction. So he started with science fiction and then he briefly, at least for these two books, and actually I know the next book is Slaughterhouse-Five, which it does have science fiction elements. And so it's interesting that he... I wasn't aware until I did this process that he had the... Mother Night is the name of the book.
Eiman Ebrahimi: 01:21:44
Mother night. I've not read that one.
Jon Krohn: 01:21:48
Yeah, Mother Night and God Bless You, Mr. Rosewater, fiction, non-science fiction. And another thing that's interesting about doing it chronologically is that, so for example, I said, I have about a dozen pages left in God bless You, Mr. Rosewater. He mentions the bombing of Dresden on the 12th last page, and the whole next book Slaughterhouse-Five is about the bombing of Dresden. So it's kind of interesting because you get this kind of insight into the artist's thinking as they progress.
Eiman Ebrahimi: 01:22:22
Yep.
Jon Krohn: 01:22:24
Anyway- Eiman Ebrahimi: 01:22:25
You were saying about the other... What did I suggest you-
Jon Krohn: 01:22:28
You recommended Out of Your Mind?
Eiman Ebrahimi: 01:22:30
Oh, yes, yes. The lectures. Yeah. Out of Your Mind is a series of lectures by Alan Watts that have been kind of collected into that book.
Jon Krohn: 01:22:39
Nice. Yeah, I can't wait to read it because Wisdom of Insecurity was such an important book for me to read. Highly recommend it.
Eiman Ebrahimi: 01:22:47
Love that.
Jon Krohn: 01:22:48
Thank you so much for taking the time with us today. I've been blown away by the precision with which you speak, the clarity with which you speak. If people want to get more of you after this episode, how should they do that? How should they follow you?
Eiman Ebrahimi: 01:23:04
I think the place where most of this sort of information comes out is LinkedIn, so we can definitely connect there. And Eiman@ProtopiaAI is also my email, so happy to connect.
Jon Krohn: 01:23:18
Nice. Very kind of you to provide that email address. Yeah. Thank you so much, Eiman. And yeah, maybe we'll be able to check in again on the Protopia journey, see how we're iterating towards paradise in the coming years.
Eiman Ebrahimi: 01:23:34
Love that. Thanks for having me.
Jon Krohn: 01:23:40
What a fabulous episode with the exceptionally intelligent and clear Eiman Ebrahimi. In today's episode, Eiman covered how Protopia's Stained Glass Transform allows machine learning models to work with transformed data representations that preserve meaning for the model while being unintelligible if intercepted. He talked about the critical trade-off between security and cost and enterprise AI, that is that dedicated private infrastructure is secure, but too expensive typically, while shared infrastructure is cost-effective, but poses security risks. He talked about why data security is essential for getting a return on investment on AI investments that is without seamless security solutions, many valuable use cases never make it to production.
01:24:21
He talked about how multi-tenancy, multiple users sharing computing infrastructure creates security vulnerabilities even in seemingly private systems. How the future trend toward agent-based AI systems will require new approaches to data security as agents interact with data across multiple locations and systems. And he talked about the importance of proactive rather than reactive approaches to data security focusing on making leaked data unusable rather than just trying to prevent leaks.
01:24:49
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Eiman's social media profile, as well as my own at superdatascience.com/843. Thanks of course to everyone on the Super Data Science podcast team, our podcast manager, Sonja Brajovic, our media editor Mario Pombo, partnerships manager Natalie Ziajski, researcher Serg Masis, our writers Dr. Zara Karschay and Sylvia Ogweng, and our founder Kirill Eremenko.
01:25:18
Thanks to all of them for producing another exceptional episode for us today for enabling that super team to create this free podcast for you. We are deeply grateful to our sponsors. You. Yes, you can support this show yourself by checking out our sponsor's links, which are in the show notes. And if you yourself are ever interested in sponsoring an episode, you can get the details on how to do that by going to Jonkrohn.com/podcast.
01:25:43
Share this episode with people who you think might love it. Review it on your favorite podcasting app. That's super helpful for us. Subscribe. Obviously, if you're not a subscriber and you like this show, yeah, subscribe. But the most important thing of all is that I just hope you keep on tuning in. I'm so grateful to have you listening, and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there, and I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.
What we've done is we look at encryption as you're adding stochasticity to the data. And Stained Glass makes a trade-off to say, what if the stochasticity we were adding to the data was not just the random, which results in the challenges that we just talked about with homomorphic encryption, but it was curated for a target task to not have the latency penalties. How do you make it such that the target task can still operate on that stochastic data? The opportunity we found to bring this to reality was the nature of machine learning itself. Which is why Stained Glass is primarily or exclusively today in the real world, it's used for machine learning models as the target, not some arbitrary computation.
01:01:18
So, much like how we've always approached systems problems, we've constrained the problem to say, if our target was just machine learning deep learning models, how would we use stochasticity to protect the data without taking the latency penalty? So when you ask what's unique about it's that particular trade-off that is being very deliberately made. And so now the problem becomes, well, what is that stochasticity? How do you know what stochasticity won't impact a target model?
01:01:53
And this is where the really interesting math comes into play, where we formulate that particular problem as a machine learning problem itself. So we say, we want to know what the stochasticity is, let's go learn it. Let's use machine learning to learn the stochasticity. So what we do, what Stained Glass Engine, our core product does, is that it takes a given pre-trained model, and in a post-training step, it reformulates the problem using the same machine learning tools that you're using today. So it's an extension to PyTorch essentially, where now you go about learning, using machine learning, what the stochasticity you can add to the data that you are going to in the future send this pre-trained model can be such that this model can still operate. And that involves some math that we can talk about.
01:02:51
But essentially that's what's happening and that's what makes this something that doesn't change the tooling in a significant manner. It's a post-training step that you can drop into your existing training loop. That's one of our tenants. It won't impact in a meaningful way the latency of the operation at inference time, because the complexity of that set of layers that we are learning can be controlled. We define it in a way that it's not going to just blow up in computational space. And third is that we're doing all of this targeted as a machine learning model so that there's no keys involved and the model can just operate on the target data when the data is transformed.
Jon Krohn: 01:03:39
Wow, that is cool. And that term then, Stained Glass, does that relate to this idea of how a Stained Glass window, in the real world, blends and scatters light waves, so there's kind of like it introduces some kind of stochasticity?
Eiman Ebrahimi: 01:03:55
Yeah, yeah, it does. And we've spent the majority of the time today talking about language models and text as the data modality, but everything we talk about with respect to Stained Glass and the engine and how you learn these transformations, it's applicable to any machine learning model. So with computer vision, when you look at what it is that these transformed data records look like in the image space or in video, what the transformed imagery looks like, looks exactly like you've got a pane of Stained Glass and you're sitting very close to it. And so what you see on the other side is like you're describing this highly fragmented scattered representation that you can't really make sense of as a human, but imagine that Stained Glass being specific for the model if the model was sitting on the other side.
Jon Krohn: 01:04:50
Nice. All right. This has been an amazing conversation. Something that we dug up in our research we can tie to the conversation we've just been having, is that Protopia has published an insightful series of articles called The Executive's Guide to Secure Data and Impactful AI. We'll be sure to include a link to that in the show notes. That happened to be in partnership with Sol Rashidi, whom we've already been talking about since the beginning of this episode.
01:05:15
And the second article in this series is called Risks in AI Systems and Mitigation Strategies. I'd love to hear from you about what the biggest AI risks are for enterprises that they frequently overlook, and something that comes to mind for me in the context of the homomorphic encryption that you've been talking about recently is when people are doing compute, whether it's on a cloud device or edge, there are security vulnerabilities that you have no control over and that pop up all the time. I think let's dig into that risk and how Protopia can mitigate the risk?
Eiman Ebrahimi: 01:05:55
Yeah, I think what we talk about in that kind of three-part series around risks and when we think about what is it that's often overlooked. Is that there's new usage models around AI and language models that are different from how data has been used in the past with just any application. The main difference being this notion that the systems are increasingly complex to maintain, and I'm not necessarily talking about just very large systems to train these. No. Back to the example of just open model, deploying it at scale for many users and managing that is non-trivial task. It's doable, non-trivial. Enterprise needs time to value, so they need their info provider application providers to help them do that. The data owner in the enterprise doesn't really want to, nor can they get involved with all the implementation details of what happens in these systems.
01:07:08
Now, a common thought process is to say, if the implementation is somehow inside my VPC or if the implementation is on-prem, it is much, much safer than if it's somewhere else. And I think that's something that gets overlooked sometimes. Because at the end of the day, the implementation details are not particularly clear. If you have some entity helping, and that entity can just be the IT department of your organization or someone external, and the data owner does not want to or have the time to get involved with every implementation detail of where did the data... Until, what point was it encrypted and what point did it get decrypted and did it get stored after that? And if it got stored, how long did it get stored? All the things that go into accepting whether or not an implementation is good, these deployment models and this sort of use of data is relatively new, and it's based on nude, it's nude, it's raw, and it's unencrypted and it exposes the data.
01:08:23
That's funny actually slip of the word in there. But the model is new. And so I think organizationally what we see enterprises grapple with is, again, back to this notion of, okay, what are all of the system security features that we need to put into play? And it doesn't consider that, doesn't always consider, many people are aware of it, but doesn't always consider that even on-prem systems, often it just takes a bad container being run to expose the entire system because of some vulnerability in the state-of-the-art software that you were using, run it. Knowing that that is a possibility and wanting to not just block every use case that touches some tier of data that is sensitive in some way, and that can be code. Code completion is one of the big, big use cases that now post-RAG, that's the biggest enterprise use case that comes up.
01:09:33
Code completion, code refactoring, writing snippets of code in bigger bodies of code. These are use cases that pop up all the time. Yet, it's very difficult to envision calling code in the organization non-sensitive, and it can be sent to whatever platform it can't. So in order for those things to actually make progress, being able to proactively make sure that if and when leakage of data happens, it's not exposed in a unencrypted, nude, raw way. That's the thing that I think we find organizations very interested in digging into once they realize that it's an aspect that is getting in the way of creating value.
01:10:22
Because when we talk to organizations, our point is not at all to say you shouldn't be doing something. There's plenty of tooling in the market that helps them stop requests going to services that they don't want to or helps the CISO understand what systems are being used, what data sources are being connected to what models. And those are all very important part of the overall AI security posture that the organization needs to have. But when you need to create value, how can you do that safely? And that's the part that we've been having a lot of interesting conversations around.
Jon Krohn:01:10:58
Nice. Fascinating. And looking to the future with the next big trends that we have coming in AI as we make the shift from generative AI systems like LLMs being so effective, we are getting more and more into Agentic AI where we're trusting those generative systems to work independently as opposed to just be called upon by us to provide some information. So Agentic AI or what are the kinds of shifts do you see in the future and how does that relate to the security efficiency trade-offs that we've been talking about throughout the episode?
Eiman Ebrahimi: 01:11:32
Yeah, I think is interesting to observe that the space of applications around LLMs and AI is very quickly not going to be, oh, there's an LLM, there's an application. We just have to protect that. That is a part of a broader system of potentially agents. It seems like more and more the narrative of how the market is evolving into making use of these models. And what that means from a data security perspective is again, we need to think differently about, oh, everything's just going to live on a system that's going to be right next to where the data lives. Because if you've got agents, those agents are dealing with different data sources. There are potentially different places. Some of them will be on-prem, some of them will be in a private cloud. Some of them may be in a public cloud served to you by an application provider that needs to run things multi-tenant in order to make their business model work.
01:12:34
So suddenly the thinking of data exposure among these systems will need to be different. And I think it's not just us that is innovating in this space. There's a lot of innovation actually happening in the homomorphic encryption space, and it needs to be considered, where is it applicable? In fact, I think it was just a few weeks ago, Apple announced some new homomorphic encrypted versions for things like information retrieval that they're embarking upon. And there are bits and pieces of the problem that may be solved with being able to do something in homomorphic encrypted mode. In fact, Stained Glass itself is a great, great application to run in homomorphic mode. Because you can imagine if you're taking plain text information and turning it into a transformed representation, doing that operation under complete encryption is fantastic. Because you do that in a completely homomorphic manner, but then you release the rest of the computation that is potentially very complicated and it's challenging to implement it in a homomorphic fashion to run on whatever hardware is accessible and most efficient to run that.
01:13:54
So when you ask the question of what will data security look like? I think data security will need to involve in these Agentic systems in more complicated use of models as components of a larger system solving a problem, we'll need to focus on where do those different components run? What are the acceptable exposure parameters of those systems in terms of the data you need to send it to, and how can you manage that in a programmatic manner? And Stained Glass, we believe is a big unlock for that sort of broader system and needs to and will combine with these other technologies.
Jon Krohn: 01:14:39
And so tying it to something that you discussed earlier. In the episode, you talked about when doing research, you want to be looking ahead to problems that are 5, 10 years away. And you just highlighted there again away that the kinds of solutions you're developing in Protopia will solve the problems of the future.
Eiman Ebrahimi: 01:14:56
Yeah. And we look really deeply into partnerships in order to facilitate delivering these cutting edge solutions in the fastest manner. I think one of the things that we see across the ecosystem is that from the largest of the businesses that have made all of this possible, all the way to the startups that are very active in this space and building a lot of very important technology, partnering and being able to deliver broader solutions is really essential to actually delivering value. And so we've spent a lot of time, again, across the stack. From the provider to the builders of the foundation models themselves, to the application providers, building on top of that, finding ways that we can unlock the use of data from the topmost user all the way down to the infrastructure that needs to crunch on that data in order to create value. How do we plug in across that stack is a big portion of being able to again, deliver the larger value that the industry really does need to survive.
Jon Krohn: 01:16:10
Very cool. So we recently met in person before recording this episode here in New York. We met in Austin, your hometown.
Eiman Ebrahimi: 01:16:20
Yes.
Jon Krohn: 01:16:20
And where Protopia is headquartered.
Eiman Ebrahimi: 01:16:22
Yes.
Jon Krohn: 01:16:23
And while we were having drinks, we got talking about Alan Watts. So I know that Alan Watts is somebody I've highlighted on the show before, I think episode 800, if I'm remembering correctly.
Eiman Ebrahimi: 01:16:35
Oh wow, that's accurate.
Jon Krohn: 01:16:36
Well, it was on the hundred. And so, for every hundredth episode I've done something special. And I'm pretty sure that for episode 800, it was a... I recited part of an Alan Watts speech, his dream speech. Well, whether it was 800 or 700, it'll be in the show notes. And so I also learned in that conversation that you are widely read. And so, I'm really interested now. A question I ask all of my guests. I can't wait to hear what your book recommendation is for us.
Eiman Ebrahimi: 01:17:16
All right. Well, since we're talking about Alan Watts, I feel compelled, even though there's many, many really amazing books that come to mind, but I think Alan Watts is, Wisdom of Insecurity?
Jon Krohn: 01:17:29
Right.
Eiman Ebrahimi: 01:17:29
Is a really important one. And I think the theme of it actually goes hand in hand with a lot of what happens in this sort of entrepreneurial space. Doesn't need to be connected to that. I think there's a lot of themes in life that fall in this world of what the book talks about, but a biggest takeaway just being that there's a great deal to be learned about how one lives, one's life. If we kind of focus on what our almost obsession on wanting to tell ourselves or other people that we know something for a fact does. And I think it highlights that it's not necessary. It highlights that a lot of the secondary effects of that obsession to feel like we know the answer-
Jon Krohn: 01:18:41
Or we know what's going to happen.
Eiman Ebrahimi: 01:18:43
Or we know what's going to happen, leads to not really having a good time in life.
Jon Krohn: 01:18:49
That's right. That's right.
Eiman Ebrahimi: 01:18:50
And unnecessary also. So the book makes a case for how there's a lot of freedom in how you approach life and how you can enjoy life given the very scarce opportunity that it is and how much more grateful you can be about how it is and what the experience is at any given moment by way of detaching from that obsession of needing to be sure about things. So it's a highly recommended read.
Jon Krohn: 01:19:26
Nice. Yeah, that was the book that I think got us talking about Alan Watts.
Eiman Ebrahimi: 01:19:28
Yes.
Jon Krohn: 01:19:29
You also recommended to me, and I've ordered it's sitting on the very top of my pile of books next to my bed, and I haven't had a chance to quite start it yet, because I'm not typically a multi-book reader. I finish before-
Eiman Ebrahimi: 01:19:41
What are you reading now?
Jon Krohn: 01:19:42
I am about to finish. I'm like a dozen pages away from finishing Kurt Vonnegut's, Thank you. Mr. Rosewater.
Eiman Ebrahimi: 01:19:50
Yes. Very good book.
Jon Krohn: 01:19:53
So I'm big into Kurt Vonnegut. Have been for more than a decade. But a year ago I decided to start reading all of his fiction novels in chronological order. I read one here and there starting with his most famous one, Sirens of Titan, Cat's Cradle, Slaughterhouse-Five, and then just randomly picking ones. And I was like, I want to read through all of them and I want to do it in order. And reading it in order is actually a really rewarding experience, because he has recurring characters and locations and sometimes they are actually the same character from other books or other times they just happen to have the same name. It's like coincidental. But it's interesting to see his thinking evolve.
01:20:42
And something else that was really interesting for me about doing it this way is that his books are always put in the science fiction section. But two of his early works, including God Bless You, Mr. Rosewater, that's what it's called. God Bless You, Mr. Rosewater is the name of the book, and God bless You, Mr. Rosewater as well as the book right before it in chronology, which is... Oh my goodness, I can't remember. I'm blanking at the name of it. But it follows... Yeah, I'm completely blank on the name of the book.
Eiman Ebrahimi: 01:21:18
We'll have to pick that one up and put it in notes.
Jon Krohn: 01:21:20
But yeah, exactly. But with both of those books, they aren't science fiction, they're just fiction. So he started with science fiction and then he briefly, at least for these two books, and actually I know the next book is Slaughterhouse-Five, which it does have science fiction elements. And so it's interesting that he... I wasn't aware until I did this process that he had the... Mother Night is the name of the book.
Eiman Ebrahimi: 01:21:44
Mother night. I've not read that one.
Jon Krohn: 01:21:48
Yeah, Mother Night and God Bless You, Mr. Rosewater, fiction, non-science fiction. And another thing that's interesting about doing it chronologically is that, so for example, I said, I have about a dozen pages left in God bless You, Mr. Rosewater. He mentions the bombing of Dresden on the 12th last page, and the whole next book Slaughterhouse-Five is about the bombing of Dresden. So it's kind of interesting because you get this kind of insight into the artist's thinking as they progress.
Eiman Ebrahimi: 01:22:22
Yep.
Jon Krohn: 01:22:24
Anyway- Eiman Ebrahimi: 01:22:25
You were saying about the other... What did I suggest you-
Jon Krohn: 01:22:28
You recommended Out of Your Mind?
Eiman Ebrahimi: 01:22:30
Oh, yes, yes. The lectures. Yeah. Out of Your Mind is a series of lectures by Alan Watts that have been kind of collected into that book.
Jon Krohn: 01:22:39
Nice. Yeah, I can't wait to read it because Wisdom of Insecurity was such an important book for me to read. Highly recommend it.
Eiman Ebrahimi: 01:22:47
Love that.
Jon Krohn: 01:22:48
Thank you so much for taking the time with us today. I've been blown away by the precision with which you speak, the clarity with which you speak. If people want to get more of you after this episode, how should they do that? How should they follow you?
Eiman Ebrahimi: 01:23:04
I think the place where most of this sort of information comes out is LinkedIn, so we can definitely connect there. And Eiman@ProtopiaAI is also my email, so happy to connect.
Jon Krohn: 01:23:18
Nice. Very kind of you to provide that email address. Yeah. Thank you so much, Eiman. And yeah, maybe we'll be able to check in again on the Protopia journey, see how we're iterating towards paradise in the coming years.
Eiman Ebrahimi: 01:23:34
Love that. Thanks for having me.
Jon Krohn: 01:23:40
What a fabulous episode with the exceptionally intelligent and clear Eiman Ebrahimi. In today's episode, Eiman covered how Protopia's Stained Glass Transform allows machine learning models to work with transformed data representations that preserve meaning for the model while being unintelligible if intercepted. He talked about the critical trade-off between security and cost and enterprise AI, that is that dedicated private infrastructure is secure, but too expensive typically, while shared infrastructure is cost-effective, but poses security risks. He talked about why data security is essential for getting a return on investment on AI investments that is without seamless security solutions, many valuable use cases never make it to production.
01:24:21
He talked about how multi-tenancy, multiple users sharing computing infrastructure creates security vulnerabilities even in seemingly private systems. How the future trend toward agent-based AI systems will require new approaches to data security as agents interact with data across multiple locations and systems. And he talked about the importance of proactive rather than reactive approaches to data security focusing on making leaked data unusable rather than just trying to prevent leaks.
01:24:49
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Eiman's social media profile, as well as my own at superdatascience.com/843. Thanks of course to everyone on the Super Data Science podcast team, our podcast manager, Sonja Brajovic, our media editor Mario Pombo, partnerships manager Natalie Ziajski, researcher Serg Masis, our writers Dr. Zara Karschay and Sylvia Ogweng, and our founder Kirill Eremenko.
01:25:18
Thanks to all of them for producing another exceptional episode for us today for enabling that super team to create this free podcast for you. We are deeply grateful to our sponsors. You. Yes, you can support this show yourself by checking out our sponsor's links, which are in the show notes. And if you yourself are ever interested in sponsoring an episode, you can get the details on how to do that by going to Jonkrohn.com/podcast.
01:25:43
Share this episode with people who you think might love it. Review it on your favorite podcasting app. That's super helpful for us. Subscribe. Obviously, if you're not a subscriber and you like this show, yeah, subscribe. But the most important thing of all is that I just hope you keep on tuning in. I'm so grateful to have you listening, and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there, and I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.
Show all
arrow_downward