Podcasts SDS 859: BAML: The Programming Language for AI, with Vaibhav Gupta

56 minutes
Artificial Intelligence, Data Science, Machine Learning

SDS 859: BAML: The Programming Language for AI, with Vaibhav Gupta

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

In this week’s guest interview, Vaibhav Gupta talks to Jon Krohn about creating a programming language, BAML, that helps companies save up to 30% on their AI costs. He explains how he started tailoring BAML to facilitate natural language generation interactions with AI models, how BAML helps companies optimize their outputs, and he also lets listeners into Boundary’s hiring process.

Thanks to our Sponsors:

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

About Vaibhav
Vaibhav Gupta is the Founder and CEO of Boundary, a Y Combinator startup developing a new programming language (BAML) that makes LLMs both easier and more efficient for developers. Across nearly a decade in software engineering, Vaibhav has built predictive pipelines and real-time computer vision solutions at D. E. Shaw, Google’s Augmented Reality groups, and Microsoft HoloLens. In his free time, Vaibhav dabbles in competitive table tennis and board games, and various aspects of compilers.

Overview

Among hundreds of applications, Boundary was one of just 10 startups selected to present at The GenAI Collective. The company also won the Collective’s Technology Award, so Jon had a lot to ask Vaibhav about Boundary’s impressive underlying programming language, BAML. The language, which stands for “Basic As Machine Learning”, sprang from the notion of early web code being notoriously difficult to maintain and easy to break. If a well-meaning intern, forgetting a closing < div >, could break a corporation’s website, Vaibhav felt that something had to change. He brought in the React Compiler, which helped them reshape their thinking about web development before starting BAML through Y Combinator.

Vaibhav also considered how we can use the origins of web development to understand prompt engineering and its future. While he expresses skepticism over the use of the word “engineering” in a task that doesn’t require complex mathematical processes, he does think prompt engineering is more complex than many AI practitioners give it credit for, and that the field is still in its nascency. Jon and Vaibhav also discussed another AI trend, this time in retrieval-augmented generation (RAG). RAG gives Generative AI models the ability to retrieve information from a specific group of documents, and it is enormously helpful for companies that want to focus on retrieving domain-specific knowledge. For Vaibhav, the vital component of RAG is ensuring the model has access to high-quality, cleaned-up data. Domain-specific knowledge is also important for demonstrating Generative UI, where understanding what an end-user wants from the tool is essential to its development. Vaibhav says developers should ask themselves if their tool has a purpose that would encourage users to use it rather than querying ChatGPT. He says to build the best UX for the problem: “If you’re trying to analyze stocks, show me the ticker in real-time, show me the graphs.”

So, where are Boundary and BAML heading next? Vaibhav says that the key to success with AI startups is knowing where not to set their sights because the possibilities are so vast. He is interested in designing ways to send specific users (e.g. paid and free) to a designated LLM.

Listen to the episode to hear how Vaibhav launched his AI startup, how to get a job at Boundary, and how to start using BAML today!

In this episode you will learn:

(04:53) What BAML stands for
(14:33) Making a prompt engineering a serious practice
(18:00) How BAML helps companies
(23:30) Using retrieval-augmented generation (RAG)
(43:09) How to get a job at Boundary

Items mentioned in this podcast:

Follow Vaibhav:

Follow Jon:

Episode Transcript:

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00
This is episode number 859 with Vaibhav Gupta, founder and CEO of Boundary. Today’s episode is brought to you by ODSC, the Open Data Science Conference.

00:16
Welcome to the SuperDataScience Podcast, the most listened to podcast in the data science industry. Each week, we bring you fun and inspiring people and ideas, exploring the cutting edge of machine learning, AI, and related technologies that are transforming our world for the better. I’m your host, Jon Krohn. Thanks for joining me today. Now let’s make the complex simple.

00:50
Welcome back to the Super Data Science Podcast. Today, we’ve got a highly technical episode, I know a lot of you love that, with the brilliant, well-spoken engineer and entrepreneur, Vaibhav Gupta.

00:59
Vaibhav is founder and CEO of Boundary, a Y Combinator-backed Seattle-based startup that has developed a new programming language called BAML, B-A-M-L, that makes working with LLMs easier and more efficient for developers. Across his decade of experience as a software engineer, Vaibhav has built predictive pipelines and real-time computer vision solutions at the likes of Google, Microsoft, and the renowned hedge fund D.E. Shaw. He holds a degree in computer science and electrical engineering from the University of Texas at Austin. As mentioned at the episode’s outset, this is a relatively technical episode. The majority of it will appeal to folks who interact with LLMs or other model APIs, hands-on with code.

01:37
In today’s information dense episode, Vaibhav details how his company pivoted 13 times before settling upon developing a programming language for AI, why creating a programming language was, his words, “really dumb”, but why it’s turning out to be brilliant, including by BAML already saving companies 20 to 30% on their AI costs. He talks about fascinating parallels between today’s AI tools in the early days of web development and his unconventional hiring process, I’ve never heard of anything remotely close to it, and the psychology behind why that unconventional hiring process works. All right. Ready for this awesome episode? Let’s go.

02:20
Vaibhav, welcome to the Super Data Science Podcast. I’m so excited to have you here. Where are you calling in from today?

Vaibhav Gupta: 02:27
I’m actually in SF, but normally, I would be in Seattle.

Jon Krohn: 02:30
You say that, but in fact, I have only ever seen you in San Francisco.

Vaibhav Gupta: 02:36
That is true, and in fact, many people have only ever seen me in San Francisco, mostly on Tuesdays while I’m here.

Jon Krohn: 02:43
My sample size is small though. It’s an N equals one sample size of where your location is. I’ve only met you in person one time. That was in December at an event run by The GenAI Collective. It was a really cool event and you’d already surpassed quite a big hurdle to be invited to be one of the 10 startups presenting at The GenAI Collective. Hundreds of startups applied, and then from that, you won an award as well, didn’t you? You were one of the companies that won.

Vaibhav Gupta: 03:18
Yeah, we won the Best Technology Award, which was surprising, but also I felt awesome to see the community react well to what we had been building.

Jon Krohn: 03:27
Yeah, and then I would say from my own personal award to you, I don’t know if I mentioned this in person and I was about to embarrass you with this right before we started recording, but I was like, “Let’s do it on air,” which is that your presentation was easily the funniest. Some of the things that I remember, so you had things like… I don’t know if these are zingers that you’ve learned to bake in over time. You get audience reactions and you’re like, “That one I’ve got to remember for future pitch standup opportunities like this one.” But things like you started off by saying we did something really dumb or really stupid, which was created a programming language. Definitely something you shouldn’t do.

Vaibhav Gupta: 04:09
Yeah, maybe it’s funny, but it’s also true. You probably should never make a programming language and you’re setting me up a failure here because now I have to meet the expectation of trying to be funny on this podcast. Now we’ll see what the people expect while they’re listening in.

Jon Krohn: 04:25
You’ve got it naturally. I spent some time with you afterward and I know you’ll nail it. Although sometimes people probably just have bad days, you could be having the worst day. So, you’re the CEO and co-founder of Boundary, which is the creator of BAML, B-A-M-L, a programming language, and it’s an expressive language for text generation specifically. So, our listeners, we probably have a ton of listeners out there who are calling LLMs, finetuning them for various purposes, and BAML is designed for them. So, tell us about BAML, what the acronym means, why you decided to do this acronym first.

Vaibhav Gupta: 05:06
So the acronym first… BAML stands for Basic Ass Machine Learning, but if you tell your boss, you can say basically a made-up language, but the premise of BAML really came in from this theory around how web development started out. So, when we all started coding, at least for me when I started coding websites, it was all a bunch of PHP and HTML hacked together to make website work. Then I remember interning at Meta and they were the ones that made React. I think part of the reason why they made React was because their code base was starting to get atrocious to maintain. Imagine having a bunch of strings concatenating your HTML syntax, and now an intern comes in, like myself, forgets a closing div and now your newsfeed is busted.

05:56
It’s not really the way we want to write code where multi-billion dollar corporations rely on an intern closing strings correctly. It’s not really even the intern’s fault, because how could they really read a giant blob? I barely read essays. How could the intern do that? But a compiler like React could actually correct for those mistakes if you add HTML and JavaScript into the same syntax. By creating a new syntax, those ideas become much more easily expressed. Now in two milliseconds, you get a red squiggly line saying you didn’t close this div tag. In that web development loop, it just reframed the way we all started thinking about web development.

06:38
Instead of being like things are going to be broken, we could do state management because React handled it for us. We could do things like hot reloading a single component and having the state around it persist because React did that for us. It was tastefully done even though it required learning something new. We asked in this AI world that we’re all headed towards, we think a few things are going to be true. One, every code base will have more prompts in every subsequent year than they did have in the previous year. If that is true, we probably don’t want all these unclosing div tag type of mistakes existing around forever.

Jon Krohn: 07:17
When you say prompt, you mean like an LLM prompt?

Vaibhav Gupta: 07:21
Yeah, calling an LLM of some kind. LLMs, I think, are one start, but I think all models in general are going to be used long-term. Models are only going to become more easy to use for people that know nothing about machine learning in the future.

Jon Krohn: 07:39
So we’ve done episodes recently. For example, people can listen to episode 853 where we talked about this generalization of LLMs to foundation models more broadly. Maybe a vision model, for example, where you don’t necessarily need to have a language input or output, but even with that model, even in a vision use case, it could be helpful. It could make things easier for people calling on that vision model if instead of having to write code, they can use a natural language prompt. So, I 100% agree with you more and more often, the models that we’re calling, whether they’re big foundation models, including specifically LLMs or the smaller models, having natural language prompts in there to just very easily get what you’re looking for, maybe even just out of a plot.

Vaibhav Gupta: 08:26
Yeah, exactly. I think the thing that we have to think about as the stuff becomes more and more prevalent is actually developer tooling that has to come with it. Just like how React had to exist for Next.js, TypeScript, and all these other things to come out and make our lives a lot better in the web development world, we asked what has to exist in the world of LLMs and generally AI models as a developer, not as the people perhaps producing the models because that’s a different world, but just the people consuming the models.

09:01
No matter how good the models get, at some point, you have to write bits on a machine that flip and that’s code. It has to plug into your code base in a way that makes sense. Just like JavaScript sucks and TypeScript is way better because type safety and static and asset errors that we get, we wanted to do a bunch of algorithmic work that reframes the problem for users when we made BAML.

Jon Krohn: 09:26
Nice. I learned from you just before we started recording, this was probably something that I intended to have later in the episode, but I feel like it’s kind interesting now as you’re talking about why you created BAML is that you were a Y Combinator company. When you were accepted to Y Combinator, you were expecting to be a competitor to Slack. That’s what was on your application deck. In reality, you had 13 pivots before landing upon BAML. Do you want to tell us about that adventure? I mean you could even tell us about why you did Y Combinator at all and then the initial idea and all those pivots.

Vaibhav Gupta: 10:07
My co-founder and I had met over eight or nine years ago while we were just graduating from college in 2015, both graduated UT Austin. We just became friends because we’re both guns for misery. Eight or nine years later, he worked for a while at Amazon. I worked for a while at some other companies, Microsoft, Google, and a few hedge funds. I was like, “That was really fun, but I want to do something dumb.” The dumb thing was I wanted to go and innovate and create something from scratch that no one had ever seen before because we’re both builders and building things just gives the level of satisfaction. So, when we started out, I remember playing out in the ed tech space for some time.

10:50
We tried out in the creator economy and we all thought we had this edge because I did stuff in the ed tech space before that. My co-founder used to run a YouTube channel only enough to hundreds of thousands, I think millions of views for a while. So, we played on our edge and then eventually we’re like, “Oh, we both hate remote work. Let’s go solve this Slack thing because Slack sucks.” Slack is way better than email for sure, but it’s so distant and everyone on Slack feels like a colleague. What I love about work is having friends. Same reason me and Aaron, my co-founder, are friends. So, we tried to solve that problem and we applied to Y Combinator with it. I’ve actually applied four times to Y Combinator over the last 10, 12 years.

11:39
And this was the first time… So, we’ve gone to the interview a few times. I’ve gone to the interview like three of those four times. When I got it this last time, this was the first time we got in. But I remember very distinctly the partners in the interview being like, “This Slack thing is not going to work. Do you guys have other ideas?” It’s fortunate that we had both worked on hard technical problems beforehand so we could talk about them. But I also remember distinctly saying, “We think it’s going to work. We probably shouldn’t change our idea.” But the key insight to us was just that Aaron spent nine years in distributed systems. I spent 9 or 10 years building algorithms and assembly. We probably should be building UIs, just a small hunch.

12:23
A while later during the Y Combinator batch, we actually decided to move away from Slack and try and go do something in the machine learning space. The reason was during the batch, the batch we joined was Winter ’23. That was a batch when ChatGPT came out. It was wild in SF that month, I guess, three months when we were there. Everything was moving so fast, there was so much information, but one thing was constant. We had… Just due to our backgrounds and us being slightly more seasoned engineers and some folks, I mean not saying that they’re bad, just we had worked in industry a little bit more than a lot of other folks.

13:04
We were naturally just answering a lot of questions to a lot of our batch mates and we figured it out most of the questions were machine learning and we just help them. It was fun. We do it on top of our startup, which is not related to AI at all. So, then it just made sense for us to move into that direction. When we realized the real direction we wanted to play in, it wasn’t finetuning. It wasn’t even like any of the SDK layer because the fundamental problem was no matter if you build a Python SDK, whether you build a typescript SDK, there’s Vercel AI SDK, there’s Pydantic AI and a whole bunch of other systems, the problem is naturally expression. The expression that you can make in those SDKs is restricted by the language.

13:50
Lastly, I think every person in the world, every software in the world is going to use AI in some form or another. Whether you’re a C++ shop, whether you’re Java shop, whether you’re a Go shop, all of them will use something. Anytime we need something that has cross-language compatibility, it’s always been a use syntax. JSON, YAML, TOML, they’re all new syntaxes that are globally supported. I think that’s a really underlooked point in a lot of systems that I’ve seen.

Jon Krohn: 14:21
Nice. That was really great background on the problem that BAML solves. One of the key things in there is obviously this solving some of the interfacing problems around prompts, and we will get to a number of details related to that. But first, I wanted to address this idea of prompt engineering more generally and just get some of your thoughts. Our researcher here wrote that it’s a joke discipline. I mean, it is in the sense that it’s definitely not engineering. There’s no future where there’s people doing mechanical engineering and electronics engineering, and then you’ve got a four-year major in prompt engineering.

15:01
So, engineering is being applied to this idea of prompt engineering. When you use that strong of a word, it is a joke. So, yeah, how can this prompt engineering, which is part craft, part alchemy, it’s getting better all the time with LLMs anticipating what humans are providing. But you’ve been studying prompt inputs and outputs in a granular level for a long time. How can creating high-quality prompts become a rigorous practice with the boring reliability we come to expect from other software tasks?

Vaibhav Gupta: 15:41
Yeah, I think the key part that you mentioned is reliability because that’s what we all want. The prompt is just a way to transform some data that we provide to some other data that we really want. In the case of the classification problem, you take a transcript and you categorize it into, “Is this a conversation with a doctor and a patient? Is this a conversation about healthcare, general well-being, or some other category?” When I think about the rigor and the exercise there, it’s funny that you say that you feel that this isn’t rigorous enough or not complex enough because I used to say the same. I remember when this stuff first happened and Aaron was like, “AI engineer is becoming a term.”

16:21
I was so vehemently against it. I was offended that that term was being used to refer to someone prompt engineering because the term that I wanted to use was AI engineers, people that studied networks and actually have a PhD and then can describe and have a detailed conversation around this. I felt like the old grandfather on the lawn saying, “Get off the lawn.” It was like attacking me at my core. But one thing I’ve actually come to appreciate over time is I actually think there’s a little bit more nuance that goes into using a model than people give credit for. For example, we think of prompt engineering as just this idea of putting strings into a model, but we used to think about web development as just this idea of putting plain text out there.

17:14 It evolved into reactive components, which have all sorts of complexity over time. I think if it’s the same way, I used to look at people on my websites before, I was like, “Ah, that’s not hard. That’s so easy. It’s not a real software engineering. Front engineering isn’t real.” But now when we look at front engineering, it’s complex. It’s an art. It takes tremendous amount of skill to build a beautiful website. When I think about prompt engineering, I think we’re still in that early days of that plain text, HTML, no CSS, no interactivity world. But I think we’re headed towards the level of where eventually there will be a term called prompt engineering, right? I do think it’ll be a well-studied discipline over time.

Jon Krohn: 17:58
Interesting. Yeah. Okay.

Vaibhav Gupta: 18:00
Hot take, I guess.

Jon Krohn: 18:01
Yeah, I mean that is interesting. We’ve talked about the kinds of problems that BAML is solving. Can you give us maybe some key use cases or provide us with some color? Actually, we were talking about how you’ve been on other podcasts where you’ve done screen sharing to make it visual. I think we’re going to try to, if we can here, stick mostly with audio descriptions. What are the kinds of things that BAML is doing that I wouldn’t be getting if I just go and I use an OpenAI API and I provide it with a natural language prompt? What am I missing out on that I would be getting if I was using BAML instead?

Vaibhav Gupta: 18:40
I think there’s two big things that people usually, at least people have told us that they love about BAML. The first one is the syntax is so clear that even their PMs can use it, which is high value for-

Jon Krohn: 18:55
Zing.

Vaibhav Gupta: 18:57
Which is what I find to be high value for folks. The second part is really the hot reload loop. Again, I’ll loop back to web development because I think it’s a really good analogy for how machine learning works. When I do web development, I change my code, I look at the browser. If it doesn’t match what I want, I go back, change my code, hit Command-S, and the thing is refreshed. In about 10 minutes, I can try 15 different styles and make it work with Tailwind and React and whatever else I want to go do. Today, when I see people trying out AI pipelines without using BAML, I see them try maybe five prompts in 20 minutes because their testing loop is totally busted. They have no hot reload loop.

19:40
With BAML in 20 minutes, they can do 240 prompts because it takes five seconds to test each one. That hot reload loop is you’re just not going to be bored because you’re not sitting around twiddling. That XKCD of my code is compiling is so true in the world of prompt engineering. You’re literally just waiting for the model to run. You’re like, “Oh, I have to run these five commands to run my test case,” or “I have to run my whole pipeline end to end to experience this one prompt that’s in the deep end of it” when really you could just write a quick little unit test. But unit tests aren’t really fun to write in Python or Typescript or any of these languages because they’re not designed for it.

20:20
Rust has this really cool snippet where you can write a test in any file. In any Rust file, you can write a test. When you write a test in any Rust file in VS Code, they have this little plugin that says run test. You just click it and it runs the test for you. It’s so fast, you just write more code. Rust code doesn’t just magically work, it’s just more testable. So, it more magically works because people write better code. We do the same thing for prompting.

Jon Krohn: 20:49
Nice. I like that soundbite. That’s great. Maybe we can turn that into a YouTube Short. Nice. So, thanks for the general overview of why BAML would be helpful. Something that is super trendy right now other than agentic AI, another big trend right now is RAG, retrieval augmented generation, where people have some large number of documents that they want to be able to search over and get natural language responses over. So, let’s say you’ve got a million documents in your company. They’re all like insurance claims or something, and you can use an encoding large language model to take each of those million documents and convert them into this vector representation, which allows you then to search very rapidly for related documents.

21:42

So, now you are an insurance claims person. I don’t know what titles are in insurance companies. I don’t know why I went into insurance. I don’t know anything about it, but you’re an insurance company employee. You have this new claim come in and you want to look at similar claims to that one. So, you can use a retrieval augmented generation to say, “Here’s a question I have, or here’s a claim that I have. How does this relate to previous claims that we’ve dealt with in the past?” That new claim that you have or that new natural language inquiry that the insurance company employee has is in real time, very rapidly in hundreds of milliseconds converted into a vector representation as well.

22:32
Then you can do really simple fast math and there’s tricks to have this work over very large spaces, even if you have billions or trillions of documents and you bring back the most relevant documents. So, let’s say in that insurance company example, we had a million. There’s maybe six documents that are determined to be closely related in semantic meaning, and the natural language is similar to the user’s query. Those six documents come back and then we can use all of those six documents as context to a large language model that can come up with a great answer based on that.

23:08
So, retrieval augmented generation is linking together a bunch of different technologies, encoding LLMs, generative large language models in order to be able to give potentially great insights over vast amounts of documents that a human could never look over manually. That doing keyword searches would miss a lot of information in. So, RAG, very cool, very powerful. So, what are some kinds of insights or pain points in RAG, in retrieval augmented generation? It sounds like from some research we did on past interviews you did, that RAG itself was part of what led you to pivot towards developing BAML.

Vaibhav Gupta: 23:49
Yeah, so we actually started off heavily indexing on the RAG pipelines as the original journey that we started as one of our 13 pivots. The reason that we really moved away into the more general world of just using LLMs is the thing that makes RAG really good is really high quality data. One element of really high quality data that you touched on was this ability to pull the relevant documents. That’s just a thing that someone just has to do. You can’t really help them with it. It’s very data dependent, which is why we moved away from it, because we don’t believe that there’s a general RAG solution for everything. Just like there’s no general web component for the perfect accordion everywhere, you have to use SHADCN and build your own accordion that matches your theme and your styles.

24:36
One element of RAG that not a lot of people think about is really around this idea of how you actually put the context into the prompt itself. So, imagine if I was saying like and uhm between every other sentence, you guys would immediately tune out. This would not be a fun podcast and conversation to listen to. But whenever you add a bunch of JSON blobs for example, as context into your prompt, you’re doing the same thing. You’re putting a lot of things that the model doesn’t care about. You’re putting a bunch of quotation marks, you’re putting a bunch of escape characters, a bunch of colons, and that doesn’t actually make the prompt easier for the model to read. It just makes it possible for you to run JSON.parse on it.

25:17
Those are likes and uhms that the model has to go remove. If you put an image into the prompt, the way that you orient the image, the sizing that you use, even the text you put around the image to help it understand what the context of the image is, whether you put above or below the image matters. These are things we found not a lot of people pay attention to, but with BAML, we kind of we made it obvious what you’re doing and where you’re doing it. We made it possible to detect these likes and uhms because right in VS Code, like you do with Markdown files, you can see what the Markdown file renders has. We actually show you what the prompt renders as with the images, with the audio as you’re typing.

25:59
So, you can find these likes and uhms and actually be like, “Oh, that looks ugly to me.” That probably means it’s ugly to the model, which probably means it’s hard for the model to understand, because these models aren’t trained on human data.

Jon Krohn: 26:09
All right. So, in addition to RAG, another kind of common thing that happens in addition to likes and uhms, those kinds of verbally meaningless things that can end up being in a prompt, in addition to those which are relatively innocuous, they’re like a benign tumor, there are also malignant tumors in ambiguity, in errors. So, how does BAML handle instances where input data might lead to ambiguous or incorrect outputs? How does this capability improve reliability in AI driven applications?

Vaibhav Gupta: 26:45
Yeah, I think that’s where I think the syntax shines a lot because the problem with English is that it’s a really, really poor language for actually describing what you want. It’s amazing for rapid fluid conversation like we’re having right now. It’s horrible for written instructions. If you’ve ever given someone a 20 bullet point list of instructions of things you want them to follow, they will mess up on one of them. They do it for a couple of reasons. One, because you probably have some contradictions in there just naturally by having written them out and you’re relying on the user’s inference to know which ones are relevant to your context. The more informed they are with you, the better they’ll do at it. But what we do is we set every prompt is really just a function.

27:31
A function takes in parameters and every parameter is type safe. So, it’s not a message. It’s a message that is a class that has a rule and a content. It’s not an invoice. It’s actually an invoice and all the parameters that exist in it. So, it’s strongly typed and every function has to describe what it’s going to return. So, if you’re going to return a bunch of categories, you’re returning enum, which has a specific set of categories that are described. Instead of injecting your prompt as a giant string, you actually break down your prompt into semantically meaningful chunks. So, in the case of let’s say that insurance example you’re talking about earlier, your insurance company might be processing millions of different types of documents.

28:17
One of those documents might be a new claim, one of those might be an update to an existing claim, and one of those might be just a regular new customer inbound. Instead of describing all those rules in the core prompt, which is in English, you would actually attach a description next to each one of those categories in the enum. So, now your code becomes more readable. To understand what it means to be a new customer inbound, you only have to read that one section of the prompt and that natively makes you write shorter prompts and almost makes you write your prompts like code, but with the flexibility of English. I think that gives a balance of both worlds, but we want these prompts to be able to do anything.

28:59
But we also don’t want a two-page essay that no one ends up reading actually over time and we just keep adding to it and eventually realize we have a list of contradictions and no wonder the model’s been behaving poorly the whole time for the last six months.

Jon Krohn: 29:12
Do you ever feel isolated, surrounded by people who don’t share your enthusiasm for data science and technology? Do you wish to connect with more like-minded individuals? Well, look no further, Super Data Science community is the perfect place to connect, interact, and exchange ideas with over 600 professionals in data science, machine learning, and AI. In addition to networking, you can get direct support for your career through the mentoring program, where experienced members help beginners navigate. Whether you’re looking to learn, collaborate, or advance your career, our community is here to help you succeed. Join Kirill, Hadelin and myself, and hundreds of other members who connect daily. Start your free 14-day trial today at www.superdatascience.com and become a part of the community.

29:57
Nice. Yeah, so you’ve provided lots of examples here of ways that BAML makes calling an LLM or some other model better with a prompt, getting better results back. So, we’ve talked about things like being able to do prompt testing rapidly, being able to iterate quickly over those, being able to handle RAG use cases, being able to handle ambiguity errors.

30:23
One last thing that I want to get into in terms of an advantage, and there might be others that come up organically, but something that you talked about at The GenAI Collective pitch day that I was at in San Francisco in December was token efficiency, which was something that was really surprising to me. It was something that I hadn’t thought about at all, that you could actually create a language like BAML to allow your inputs and your outputs to say an LLM be much more token efficient. So, tell us a bit about that and how that can lead to major operational cost savings.

Vaibhav Gupta: 30:59
Yeah, and this was really inspired by just having spent 10 years in performance optimizations and hand rolling assembly for a while. What I really learned in that journey was I was a pretty damn good performance engineer, but the compiler beat me every time. Not because it wrote better code than me, but because just on a time per dollar value, the compiler could in the same amount of time optimize way more code than what I could do. So, it made sense for me to optimize some parts of the code, but not all of the code. The thing with prompting and token efficiency, we have a similar take.

31:35
You should probably hyperoptimize one or two of the prompts that are super, super critical to you, but for 90% of the prompts, you just want something to do a really damn good job at it. What we thought about with performance optimization is the idea of one, everyone is using structured outputs and structured outputs is this idea where or function calling is this idea where an LLM is given a bunch of tools. Let’s say give it access to the weather API, and the weather API takes an either zip code or a city in the state. Then the LLM also has access to a restaurant booker where it has to take in the name of the restaurant and its address of some kind.

32:16
Then lastly, it gets a restaurant like a restaurant finder or something where I give it again a city and state and I ask the model, “What’s the weather today?” It should pick out the restaurant tool and fill out the parameters based on whatever context provided. The idea of that is the best way to send that data between the model and your software today is JSON. JSON, as we talked about earlier, has a bunch of likes and ups. It doesn’t make sense that we’re going to enforce the model to follow the standard that we’ve built. That was amazing for web development with all these quotation marks, this strictness in its definition of you have to have that quotation mark there. You have to only have single line strings. You can’t even put comments inside of a JSON file.

33:05
To us, what we said was, “What if there was a different format?” So what we did is we spent about eight months writing a new algorithm called Schema-Aligned Parsing, which is actually able to take the model’s response and automatically infer it to be the data model that you provided it against. If it made some mistakes, like it forgot the quotation marks around your data model, it forgot a comma at the end of a line, it gave you a string when you expected a number. All sorts of mistakes that models will make because they’re probabilistic in nature, we algorithmically correct for that in under five milliseconds. Again, not saying that it’s perfect, but it does the same thing that a compiler does, which is it just does it more often more correctly than I do.

Jon Krohn: 33:49
Nice. Then so by having token efficiency, even if it is, I don’t know… so on average, what do you think? It saves you maybe like 10% in terms of tokens?

Vaibhav Gupta: 33:59
We’ve seen customers save like 20 to 30% tokens easily on outputs.

Jon Krohn: 34:03
Oh really?

Vaibhav Gupta: 34:03
Yeah. So, it’s a lot faster.

Jon Krohn: 34:04
Oh wow.

Vaibhav Gupta: 34:05
But I think one thing that’s overlooked is not just that it’s faster and cheaper, but the fact that it works with every existing model. So, DeepSeek-R1 came out recently and they released that model without function calling. We have user… And same with OpenAI’s o1 models, they released that without function calling. We have users of BAML using function calling with those models today because Schema-Aligned Parsing requires no modification to the model to be able to make that work.

Jon Krohn: 34:32
Wow, that’s super cool. That’s amazing. So, in addition to that use case there that you just gave, that was a really interesting one with DeepSeek and being able to do tool calling or o1 tool calling. What are some other unexpected or innovative uses of BAML that you’ve seen from your users?

Vaibhav Gupta: 34:51
I think, one cool use case that I’ve seen is this company that’s making Cursor, but for Xcode. Because obviously, if you’re in the Mac world, you have to use their own proprietary tools. Knowing Apple, they’re probably going to take forever to build anything like it. One interesting thing that we learned is that, oh, you need tool calling to go do these things to build something like Xcode or Cursor. But when you generate code, I don’t know if you’ve ever seen this in Cursor, it always messes up markdown stuff for me because it tries to put triple quotes and their parser for some reason doesn’t handle that correctly. If you’re generating code diffs inside of JSON blobs, they mess up all the time because you need so many escape characters.

35:34
Every new line needs to instead be a backslash n. Every quotation mark needs to be a backslash quote. But because Schema-Aligned Parsing is so flexible, we actually found it funny that they just told the LM, “Hey, output my code in triple quotes,” and Schema-Aligned Parsing took the triple quote text, which doesn’t need to be escaped and just converted to a regular string that is properly escaped. But I think that was one of the coolest things I’ve seen in a while. A lot of dynamic UIs and a lot of generative UIs that I’ve been seeing with BAML, those I think have been the coolest visual things to experience.

Jon Krohn: 36:13
Can you give an example of a generative UI?

Vaibhav Gupta: 36:16
Yeah, it’s hard to describe generative UIs, but I’ll do my best because it’s just a new concept that doesn’t really exist in so many places. Let’s take the idea of a recipe generator. When I have a recipe generator, we can all go to ChatGPT and ask it to dump out a recipe. That’ll be fine, it will do the thing. But what I really want is something which can almost show me cards of here’s all the ingredients I have and here’s what amount I want. Once the cards are done, then I want to show up and say, “Hey, there’s a separate section for here’s the steps that you have to follow and here’s the preparation steps at the end.

37:00
Wouldn’t it be nice if there was a spinner moving along, each one of those sections showing exactly what it’s currently working on?” That stuff is, again, in the old days of web dev would take you a lot of things to do. Like state management took you a lot of code to go do. We revealed this new thing called Semantic Streaming that allows you to have that data available in a type safe way. You can just build a UI like that now.

37:28
Now your chat app all of a sudden cannot just respond with text, but you can have dynamic graphs. You can have dynamic models and your chat app suddenly stops feeling like ChatGPT. Because if you’re out there trying to build a company around chat, I think you have to ask yourself, “Why would my users not just go to ChatGPT?” One huge value prop I think you can do is build the best UX for the thing that you are trying to provide. If you’re trying to analyze stocks, show me ticker symbols in real time right there, show me graphs, fun things like that that no ChatGPT is just not going to go do.

Jon Krohn: 38:07
Nice. Yeah, those are great examples. I love that, recipe cards, stocks. That does make it easy to see what a generative UI means. Nice, good thinking on the fly or maybe some examples that you’re aware from the real world, but those are cool. So, where do you think you’re going to evolve next? You must have… You seem like a really creative thinker, really sharp. You probably have a million potential directions and it’s probably going to be any number of pivots relative to what you think might happen, but maybe in the near future, what kinds of new features or capabilities might we expect from BAML?

Vaibhav Gupta: 38:44
In some ways, deciding a programming language, the hardest thing about it is knowing what not to do. Because if you’re truly inventing a new syntax, you can technically make anything happen. If your compiler can read it, you can do it. So, we often try and practice saying what we’re not doing instead of what we are doing because it’s so easy to do, try and commit to everything. But the most important thing is we’re trying to bring more powerful capabilities into BAML. One paradigm that we think is going to become more common in the future is you’ll want to say something like, “Hey, I want to send all my free users to GPT models like GPT-4o mini and I want to send all my paid users to models like o1.”

39:23
How do you represent that in your code in a way that’s elegant? How do you be able to repro exactly what a paid user sees versus a free user sees in that hot reload loop that we referred to earlier? We’re going to try and introduce concepts like that where you can do if statements and for loops and conditionals in general into BAML because this has mostly been the direction users have been asking for, make it more powerful, which is scary to us. But really the other element of it is around the tooling that you can expect to see around BAML. One of the most important things that I’ve learned about machine learning in the last 10 years is that, and I’m sure we’ve all heard this a lot of times, but data is key. Data is the most important thing that you can have.

40:08
So many times people get their data pipelines wrong and it’s not like they do it wrong because they’re intentionally doing so. It’s just that there’s a thousand footguns that you can just step on and get wrong.

Jon Krohn: 40:20
Another big problem that a lot of companies do is they develop some AI functionality assuming that the data that they need exist and they just don’t have the data.

Vaibhav Gupta: 40:32
Exactly, right. Then what the worst part people can do is they make their data pipeline so rigid and then what they do is they change their code. Now the code is sending different data, which means your data pipeline needs a lot of work. So, every change is a massive change and now you’re shipping slower because you have to go update your data pipelines. BAML is backed by a version of data schemas that are similar to Protobuf but without all the kerfuffle that comes with Protobuf. So, for those of you that don’t know, Protobuf is like a way to represent your data models in a language agnostic way that is able to version control them.

41:12
So, if you change a schema or if you change an enum and add a new category to it over time, you can know that the enum has changed and still serialize and deserialize old values. BAML lets you do that in a super ergonomic way without actually maintaining that in your code. Our data pipelines automatically evolve to do that and address that. So, if you have an enum in 5 categories and three months later you have an enum with 50 categories, we’re actually able to render that difference and we’re going to go and share more of that about the data platform over the next quarter or two for us.

Jon Krohn: 41:46
Sweet. So, we know lots of the advantages now of using BAML versus just sending a prompt without that structure and reliability that BAML provides. If we have a listener out there who wants to start with BAML right now, how do they do that? What’s it like your first time using BAML? How do you install it or get experience with it?

Vaibhav Gupta: 42:05
Yeah, so BAML, we work really hard to make it super easy for you to install. We appreciate anyone that is willing to learn the new syntax and go with it. So, we have two things to help you out. One, we have an online playground for you to experience BAML without installing it at all in your repo. So, you can just go to promptfiddle.com and you can just experience what it will be like to use BAML in VS Code or Cursor or anything else you want right there. The second thing you can do is depending on whether you’re using Typescript, Ruby, Python, Java, or anything else, we support every language. You can just install BAML using the package manager of your choice, and we have instructions how to do that on our repo.

42:44
You just do PIP install BAML-PY, then you add a couple BAML files to your repo, and that’s it. That’s the work. For anyone that really, really doesn’t want to learn the BAML syntax at all, we have a chat that you can ask and you can just describe your problem to it and it’ll actually generate the BAML code plus a couple test cases plus Python or Typescript snippets to show you how to use that BAML code in Python Typescript of your choice.

Jon Krohn: 43:17
Very cool. All right. So, listeners, you’ve now got another tool for your tool belt to go out there and check BAML out by Boundary. You won’t have a hard time finding it. We’ve of course got a link in the show notes as well. I haven’t used it personally myself yet, but next LLM project, it seems like a no-brainer to be using BAML, to be taking advantage of all the efficiencies and capabilities that BAML offers relative to just providing my plain text prompt and getting back whatever I get back from the model API that I send it to. If we have listeners that want to be working with you, they could be checking out the Boundary website and seeing if you’re doing any hiring, but I thought you might want to fill them in on the interesting hiring process that you have.

Vaibhav Gupta: 44:04
Yeah, so we’re a little weird like all things we do. We like to do things a little atypically, but we’ve actually never posted a job posting online. Part of the reason for that is because hiring people that are willing to make a compiler, we just want a bunch of people that want to do that. There’s just not that many in the internet that are necessarily actively seeking that out. But the approach that we’ve hired is we want to build a team of engineers that have good taste and know how to build tools for developers and complicated tools. So, our users don’t have to.

44:42
So, our approach so far has been simple, which is you just send me an email and I guess we’ll put my email down below if we want, which is titled “Why I’m awesome” and just brag about yourself, right? Three amazing things you’ve done and really what we’re indexing for is complexity, but really how well you communicate as well, because our syntax is a way of communicating with our users and we have to do that exceptionally well. If that goes well, then we just get on a quick little call and we chat and make sure we don’t hate each other. If that goes well, we just call up three of your references. If you’re a reference and actually, instead of interviewing you, we interview your references and we go deep into the tech that they worked on with you.

45:31
That actually gives us a better signal for what you have done and what you have been able to, how you work well with other people and gives us really good insight on how we think you’ll fit into a team. That’s been the strategy so far. At that you get a job offer.

Jon Krohn: 45:50
Very cool. So, they don’t get asked any technical questions directly.

Vaibhav Gupta: 45:54
Usually not directly. Then one last thing we do after we give you the job offer is we give you the opportunity to come and spend a week with us. So, you don’t have to actually officially commit to our company and you can get a feel for what we’re actually like in person because we are five days a week in person. The last thing that we do once everything is good and handy and we’ve got a feel for each other and you’re really, really excited and hopefully we’re really, really excited too, is you just tell me what company you want a job at and I’ll help you go interview there. I’ll help you land that job personally and we hope that if you get your dream job and you get us, you still choose us.

Jon Krohn: 46:28
Excited to announce, my friends, that the 10th annual ODSC East (Open Data Science Conference East), the one conference you don’t want to miss in 2025, is returning to Boston from May 13th to 15th! And I’ll be there leading a hands-on workshop on Agentic AI! Plus, you can kickstart your learning tomorrow! Your ODSC East pass includes the AI Builders Summit, running from January 15th to February 6th, where you can dive into LLMs, RAG, and AI Agents, no need to wait until May! No matter your skill level, ODSC East will help you gain the AI expertise to take your career to the next level. Don’t miss – the Early bird discount ends soon! Learn more at odsc.com/boston.

47:13
Wow, very cool. I want to dig into the in-person work thing there for a sec, because I badly miss… Up until the pandemic, I had always been working in person and I miss it so much. Something that you and I were talking about before we started recording was how… I think it was before we started recording. Sometimes it’s hard to remember, but we were talking about how when you’re using Slack for example, you’re using Zoom, you’re working completely remotely, you have colleagues, but when you’re going and meeting with people in person, you really know what’s going on in people’s lives.

47:48
You figure out who in the group you’re like, “Oh, we should be grabbing a beer after work.” That just organically happens and you end up having… I mean a huge proportion of my friends through my life have come through either they are the person I’m working with or their friends. Yeah, it’s a huge social experience. It makes work fun. Since the pandemic, I probably laugh like 10% or something the amount that I used to laugh because you’re working around people, might have similar backgrounds, similar interests and are often really smart, funny. So, it’s just like work can be hilarious.

Vaibhav Gupta: 48:34
Yeah. I remember when we first started this remote work thing, I mean, we were a Slack competitor, so we were full on remote work when we were doing that in the very beginning. We had to be by the mission of what we’re doing, but really it was about human connection. When me and Aaron started this journey, we told ourselves, “I don’t care how much money we make out of this. If we hate each other, that’s an L. That is the worst case outcome.” When we hired a first person, it was very similar. We just don’t want them to hate us. Ideally, they’d like us, but I think in-person is the way to go.

49:10
I think for our tech specifically, we need to be in person just because the amount of bandwidth worth’s sharing in any conversation about syntax, you cannot digitally do that, but it’s just fun. You get to do weird things. You have your own office. It feels like home. As horrible as your office is as a tiny company, it just feels like home. It’s just fun. I remember one of our colleagues has kids and they brought them into the office. The kids were just excited and you get to know them. You get to know their partners and you become friends. I think I feel the same way as you, where almost all of my closest friends except the ones from college are all through work 100%. I know them 10 years plus now, and I love that. It’s like I would not trade that.

Jon Krohn: 49:58
The college thing isn’t even really different. It was like you and these other people showing up during daytime hours and sometimes grinding it out late and grabbing a beer after. It’s really the same.

Vaibhav Gupta: 50:10
Yeah, I don’t know. I think there’s this thing about work that a lot of people have, which is they do a job because that’s a job they have. But I think every now and then, if you’re able to find a group of people that you truly like working with, even if it’s the most boring thing in the world, or maybe it’s the most exciting thing in the world, but if you have a group of people that you really like, I think going in is amazing.

Jon Krohn: 50:36
For sure. Going in is amazing. It can be so much fun, and yeah, hopefully, somehow I figure that out again someday.

Vaibhav Gupta: 50:43
Well, we should have you up in Seattle sometime. Come hang out with us.

Jon Krohn: 50:46
Yeah, for sure. I’d love to. Yeah, I love recording in person when I have the chance to do it with guests. Something recently for me in 2024, I recorded… I was the host of six television ads for Nvidia, Dell, and AT&T. With any of those shoots, it was so awesome because that felt like, again… There’s 20, 25 people on a shoot and you get to know each of them to some extent. I did all six of those ads with Bloomberg TV. There was a lot of overlap in who was showing up from Bloomberg, and so we’d be shooting in San Francisco. That’s why I was in San Francisco to meet you. I actually was doing a shoot for an Nvidia ad. So, yeah. So, there were people in San Francisco with me there that I had now seen in five other cities in the past year. You go out for dinner-

Vaibhav Gupta: 51:50
[inaudible 00:51:50].

Jon Krohn: 51:51
… before and after. Yeah, totally. It’s really cool. So, hopefully, there’s more of that in my future. The podcast is probably going to stay a remote workforce. Nice, man. Well, it’s been so great chatting. Before I let my guests go, I always ask for a book recommendation and you told me that that would be easy.

Vaibhav Gupta: 52:09
Yeah, I have only two types of things that I really, really like reading about. More recently, it’s been The Rust Manual. Please go read it if you’re a developer of any kind. I think it’s just changes the way you think about code, like the way it does exception handling and everything else, and how everything is a result type. Highly recommend it. This isn’t so much reading, but if anyone really enjoys it, CppCon puts out great lectures and great talks. Some of the best talks I’ve seen are all from CppCon. I highly, highly recommend watching those and listening to those as well.

Jon Krohn: 52:44
Very cool. The Rust Manual and CppCon lectures. Very nice.

Vaibhav Gupta: 52:49
Or talks, I guess. They’re long enough that they feel like lectures, but in the best way possible.

Jon Krohn: 52:55
Yeah, I don’t even really kind distinguish between those two terms, but I do know what you mean.

Vaibhav Gupta: 52:59
Yeah, some people don’t like lectures.

Jon Krohn: 53:01
Thank you for disambiguating like your BAML language does. I had an ambiguous prompt. Nice. All right. Then very last thing, you’ve already offered your email address, which we’ll have for our listeners in the show notes. What are other ways that people can reach out to you or follow you after the episode?

Vaibhav Gupta: 53:19
So my LinkedIn is probably the best way to get in touch with me. My co-founder is a lot more active on Twitter though for most people. Sometimes I’ll type on his Twitter when he lets me, but my Twitter’s dead and then Discord. Honestly, if you need any prompt and string help, I love seeing new problems. I’ve seen problems like this one company was trying to parse 100-page PDF of bank statements. We work with them and now they’re able to do that 100 pages reliably with zero sense of error. That was fun. I learned a lot, just weird things throughout that problem.

Jon Krohn: 53:56
Very cool.

Vaibhav Gupta: 53:57
So our Discord is a great place to ask those questions and I will do my best to learn, to help.

Jon Krohn: 54:02
Awesome. Yeah, we’ll be sure to follow up to even get that Discord so that that’s in our show notes as well. Maybe you already even provided that to us and I just didn’t notice. Awesome. Vaibhav, I’ve really enjoyed this conversation. You didn’t disappoint. You were entertaining as I anticipated you would be. Thank you so much for taking the time with us. As a busy founder of an early stage startup, it means a lot to us that you give us that time. Yeah, I learned a lot.

Vaibhav Gupta: 54:30
No, thanks for having me, Jon. This was really fun. It was a great way to spend a Wednesday.

Jon Krohn: 54:40
Vaibhav Gupta is terrifyingly clever, yet remarkably approachable and fun to speak to. I loved having him on the show today. In it, he detailed how BAML, basic butt machine learning, is a programming language specifically designed for natural language generation interactions with AI models, offering things like a hot reload loop that enables testing 240 prompts in 20 minutes versus say, only five prompts without BAML, token efficiency improvements of 20 to 30% through Schema-Aligned Parsing, which intelligently handles model outputs without requiring explicit JSON formatting, compatibility with models that don’t natively support function calling like DeepSeek-R1 and OpenAI O1, and built-in type safety and error handling for more reliable AI applications.

55:26
Separately from BAML, another highlight from you in the episode was learning about Boundary’s unique hiring approach that bypasses technical interviews in favor of candidates sharing three things that make them awesome, in-depth reference interviews, and a weeklong trial period. As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Vaibhav’s social media profiles, as well as my own at www.superdatascience.com/859. If you’d like to connect in real life as opposed to online, I’ll be giving the opening keynote at the rvatech Data and AI Summit in Richmond, Virginia on March 19th. Tickets are quite reasonable and there’s a ton of great speakers.

56:05
So, this could be a great conference to check out, especially if you live anywhere in the Richmond area. It’d be awesome to meet you there. Thanks, of course, everyone on the Super Data Science Podcast team, our podcast manager, Sonja Brajovic, media editor, Mario Pombo, partnerships manager, Natalie Ziajski, researcher, Serg Masís, our writers, Dr. Zara Karschay and Sylvia Ogweng. Of course the man himself, our founder, Kirill Eremenko. Thanks to all of them for producing another great episode for us today for enabling that super team to create this free podcast for you. We are deeply grateful to our sponsors. You can support the show by checking out our sponsors links, which are in the show notes.

56:43
If you’d ever like to sponsor the podcast yourself, sponsor an episode, get your message out through us. You can find out how to do that at jonkrohn.com/podcast. Otherwise, share this episode with people who’d love to learn about BAML. Review the episode on your favorite podcasting app or on YouTube. Subscribe obviously. If you’re not already a subscriber, feel free to edit our video content into shorts or whatever to heart’s content. Just refer to us, but mostly, it doesn’t really matter to me if you don’t do any of those things. I just hope you’ll keep on tuning in.

57:16
I’m so grateful to have you listening, and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there and I’m looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.

Podcasts SDS 859: BAML: The Programming Language for AI, with Vaibhav Gupta

SDS 859: BAML: The Programming Language for AI, with Vaibhav Gupta

Podcast Transcript

Share on

Related Podcasts

November 4, 2025

October 31, 2025

October 28, 2025

Podcasts SDS 859: BAML: The Programming Language for AI, with Vaibhav Gupta

Share

SDS 859: BAML: The Programming Language for AI, with Vaibhav Gupta

Podcast Transcript

Share on

Related Podcasts

November 4, 2025

SDS 937: How to Design AI-First Products, with Marc Dupuis

October 31, 2025

SDS 936: LLMs Are Delighted to Help Phishing Scams

October 28, 2025

SDS 935: Global Issues Accelerated by AI (with Solutions), feat. Stephanie Hare