26 minutes
SDS 841: Andrew Ng on AI Vision, Agents and Business Value
Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn
In this special episode recorded live at ScaleUp:AI in New York, Jon Krohn speaks to Andrew Ng in response to his conference talk on smart agentic AI workflows. Jon follows up with Andrew about smart agentic workflows and when to use them, how businesses should direct their efforts in investing in AI, and the new ways that AI tools can process visual and unstructured data.
Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.
About Andrew Ng
Andrew Ng is the founder of DeepLearning.AI, Managing General Partner at AI Fund, Executive Chairman of LandingAI, and Chairman & Co-founder of Coursera. A pioneer in machine learning and online education, he has taught AI to over 8 million people and led transformative projects, including serving as Baidu’s Chief Scientist. Formerly Director of the Stanford AI Lab, Andrew now focuses on advancing responsible AI adoption through education and entrepreneurship. A member of Amazon’s board of directors, he was named to the 2023 Time100 AI list of influential leaders in the field.
Overview
[Andrew shares his screen in this episode, so check out the video of this podcast for the full experience. You can find the episode in its entirety on our YouTube channel.]
Andrew Ng wants all companies that are planning on building AI applications to focus on using agentic workflows. He has seen many businesses getting a ton of value from creating applications on top of popular GenAI tools like OpenAI or Anthrophic with relatively little upfront costs. This decision to use a structured process, he believes, should be easy. He also believes that it’s cheaper than listeners realize to get set up. “The hardest thing,” Andrew says, “is just building something that works.” In effect, spend time working on your idea, not worrying about it.
Jon also wanted to ask Andrew about large vision models, especially the impending revolution in image processing led by LandingAI. Andrew says that large multi-model models are capable of interpreting images, but having the extra modalities that LandingAI is producing opens up so many more business applications. He says that visual AI is at the frontier of exciting new developments, as it has the capacity to solve problems in manufacturing, healthcare, security, and many more fields.
Listen to the episode to hear more about how multi-agent systems could work together, how far we have come since early ideas about AI, and Andrew’s opinion on mitigating unchecked applications of agent-generated answers.
And, if all the discussion about agentic AI inspired you, Jon is hosting an interactive half-day conference after the date of this podcast’s publication on the O’Reilly platform.
Thanks, as always, for listening. We want to give an extra special thank you to our wonderful podcast manager, Ivana Zibert, who has been such a huge asset to SuperDataScience. The team is very, very sad to see her leave, but we are also so grateful for the time we spent together!
In this episode you will learn:
Items mentioned in this podcast:
Follow Andrew:
Andrew Ng is the founder of DeepLearning.AI, Managing General Partner at AI Fund, Executive Chairman of LandingAI, and Chairman & Co-founder of Coursera. A pioneer in machine learning and online education, he has taught AI to over 8 million people and led transformative projects, including serving as Baidu’s Chief Scientist. Formerly Director of the Stanford AI Lab, Andrew now focuses on advancing responsible AI adoption through education and entrepreneurship. A member of Amazon’s board of directors, he was named to the 2023 Time100 AI list of influential leaders in the field.
Overview
[Andrew shares his screen in this episode, so check out the video of this podcast for the full experience. You can find the episode in its entirety on our YouTube channel.]
Andrew Ng wants all companies that are planning on building AI applications to focus on using agentic workflows. He has seen many businesses getting a ton of value from creating applications on top of popular GenAI tools like OpenAI or Anthrophic with relatively little upfront costs. This decision to use a structured process, he believes, should be easy. He also believes that it’s cheaper than listeners realize to get set up. “The hardest thing,” Andrew says, “is just building something that works.” In effect, spend time working on your idea, not worrying about it.
Jon also wanted to ask Andrew about large vision models, especially the impending revolution in image processing led by LandingAI. Andrew says that large multi-model models are capable of interpreting images, but having the extra modalities that LandingAI is producing opens up so many more business applications. He says that visual AI is at the frontier of exciting new developments, as it has the capacity to solve problems in manufacturing, healthcare, security, and many more fields.
Listen to the episode to hear more about how multi-agent systems could work together, how far we have come since early ideas about AI, and Andrew’s opinion on mitigating unchecked applications of agent-generated answers.
And, if all the discussion about agentic AI inspired you, Jon is hosting an interactive half-day conference after the date of this podcast’s publication on the O’Reilly platform.
Thanks, as always, for listening. We want to give an extra special thank you to our wonderful podcast manager, Ivana Zibert, who has been such a huge asset to SuperDataScience. The team is very, very sad to see her leave, but we are also so grateful for the time we spent together!
In this episode you will learn:
- (06:13) How to weigh up cost and effectiveness in new AI workflows
- (12:08) The crucial elements for building effective vision AI applications
- (15:34) How large vision models might transform global industries
- (18:40) How to mitigate risk in people not verifying accuracy in answers generated by agents
Items mentioned in this podcast:
- “The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI” Daniela Hernandez
- Society of Mind
- A Thousand Brains: A New Theory of Intelligence by Jeff Hawkins
- LandingAI
- AI Fund
- DeepLearning.AI
- Coursera
- ScaleUp:AI
- Andrew Ng’s ScaleUp:AI 2024 Slides
- Virtual half-day conference on Agentic AI
- SDS special code for a free 30-day trial on O’Reilly: SDSPOD23
- SuperDataScience
- The Super Data Science Podcast Team
Follow Andrew:
Podcast Transcript
Jon Krohn: 00:00:00
This is episode number 841 with Dr. Andrew Ng, Executive Chairman of LandingAI.
00:00:11
Welcome to the Super Data Science podcast, the most listened-to podcast in the data science industry. Each week, we bring you fun and inspiring people and ideas exploring the cutting-edge of machine learning, AI, and related technologies that are transforming our world for the better. I'm your host, Jon Krohn. Thanks for joining me today. And now, let's make the complex simple.
00:00:45
Welcome back to the Super Data Science Podcast. Our guest today is Andrew Ng, I suspect pretty much anyone working in data science knows. Nevertheless, I’ll introduce him! Some of his inimitable accomplishments include: Being director of Stanford University's AI Lab, where his research group played a key role in the development of deep learning which led to him to founding the influential Google Brain team as well as educating millions on machine learning and leading to him co-founding Coursera. He's also managing director of AI Fund, a world leading AI venture studio. He was CEO and is now executive chairman of Landing AI, a computer vision platform that specializes in domain specific large vision models, analogous to LLMs for language. He also founded Deeplearning.ai, which provides excellent technical training on machine learning, deep learning, and generative AI, as well as many other associated subjects. And there's so much I could say about him, but we'll end it off here by saying that Andrew was also co-CEO and co-founder and chairman of Coursera, which brought online learning from 300 universities to over 100 million students.
00:01:56
Today's episode was recorded live at the ScaleUp:AI conference in New York a few weeks ago. I conducted this Q&A session with Andrew immediately after he gave a talk. So some of my questions refer back to that talk. That said, the interview should be clear to understand without being aware of Andrew's talk 'cause I think I provide enough context at each point. But just in case you're curious, we've included the slides from his talk in the show notes so you can check those out. One quirk about this interview is that at the end, Andrew shares his screen to demonstrate cutting-edge vision model capabilities. Screen sharing obviously isn't ideal in an audio only podcast, but if that section doesn't completely resonate with you, you can check out the YouTube version of this episode to get the full picture.
00:02:45
In today's episode, Andrew details why a cheaper AI model with smart, agentic AI workflows might outperform more expensive, more advanced models. He provides the surprising truth about AI API costs that most businesses don't realize. He talks about how the Society of Mind Theory from the 1980s is making an unexpected comeback in modern AI. He talks about a groundbreaking new way to process visual data that goes beyond traditional computer vision. And he wraps up by talking about why unstructured data could be the key to AI's next big revolution. All right, you ready for this special episode? Let's go.
00:03:32
Welcome to the second stage for your interactive session. That was an amazing talk, as we always expect from you. This is my first session that I'm hosting today, so I'll introduce myself to the audience as well as to you, Andrew. I'm Jon Krohn. I'm chief data scientist and co-founder of an AI startup called Nebula. But I'm perhaps best known as the host of SuperDataScience, which is the world's most listened to data science podcast. And I'm delighted that for three years in a row now, I've been hosting sessions here at ScaleUp:AI. So, thanks Insight Partners for inviting me back again.
00:04:03
Andrew, in your talk, you discussed how your team found that with GPT-3.5 and an agentive workflow, it can outperform a more advanced foundational model, such as GPT-4, with a zero-shot approach. How should companies balance their investments between pursuing more powerful models versus leveraging more effective agent architectures?
Andrew Ng 00:04:27
I think almost all companies, with the exception of a few, you know, giants, should be focusing on building applications using agentive workflows. If you have an extra few billion dollars to spare, by all means, go compete with OpenAI and Anthropic and Gemini or whatever. I think that, in all seriousness, you want to do it, go for it. You know, just plan to spend a few billion dollars, maybe. But I think that for most businesses, there's so many opportunities to build applications and it turns out that if you look at the use of generative AI, the cost of using these models is falling rapidly. So, over the last year and a half is falling by maybe about 80% year-on-year. So, I find that, you know, kind of two years ago, there were teams worried about, "Oh, GPT-4 is kind of expensive." But the prices are falling so quickly that I would advise to worry much more about building something valuable and then I think there's a good chance that the use of these APIs to use GenAI, will just become cheaper over time. And there are companies spending many millions of dollars on GenAI APIs. So it can get expensive. But the vast majority of businesses I see have generative AI bills, you build an application on top of, OpenAI or Anthropic or these other APIs, I see so many businesses that are getting so much value out of it. And frankly, the bill they're sending, you know, is so small that you'd be surprised at how small they are. You won't be surprised Jon.
Jon Krohn: 00:06:05
Well, it makes perfect sense. And yeah, so unless, people do have billions of dollars to spend, let me move on to a related question, where, assuming that people aren't going to be trying to train their own LLMs themselves, if you're an enterprise, should you be thinking more about always trying to use the latest and greatest LLM, or be thinking about grabbing the best agentive workflows? It seems like there's kind of a trade-off there between cost and efficiency. Because yes, while costs have gone down dramatically, say by 80%, you could save a lot of money by working with GPT-4o mini, instead of GPT-4o. And so, if I can be using that cheaper GPT-4o mini and getting better results by leveraging a more effective agentive workflow, it seems like, do you think that's the way to go for the most part?
Andrew Ng 00:06:21
You know, I would say, don't worry about, I feel like as a general suggestion, I would say don't worry about the price of the LLM, to get started. And I think, for development purposes, is actually, you know, it's not impossible, but honestly, so I still do a fair amount of coding myself, right? And, sometimes I would be spending all day on a weekend coding, for many hours, experimenting and then I find that at the end of the day, I just ran up like a $5 OpenAI bill. Right?
Jon Krohn: 00:07:26
Yeah, yeah, yeah.
Andrew Ng 00:07:29
And now, it is possible. There are some agentive workflows that can get more expensive. It is possible to run up, you know, tens of dollars, maybe low hundreds of dollars. But is actually cheaper than you would think. And so, my advice would seem is the hardest thing is just building something that works. That's still pretty hard. So, use the best model, build something that works. And after you have something, if we're so lucky to build something so valuable that its usage is too expensive. That's a wonderful problem to have. A lot fewer people have that problem than I wish. But when we have that problem, we then often have tools to lower the costs. But I think that a lot more people are worried about the price of using these generative AI APIs than, than is necessarily the case. And the most important thing is, I would say use the best model, use demo, use the latest best model, just build something that is valuable. And only after you succeed like that, and only if it turns out to be expensive, then work on the cost optimization after that.
Jon Krohn: 00:08:31
Okay, nice. And then, so if you are lucky enough to get to that stage, maybe there's a balance of experimenting both with lower cost options, like moving to GPT-4o mini say or experimenting with different agentive workflows, and just trying to see which gets you the best results for your use case?
Andrew Ng: 00:08:48
Yeah, yes. And, then just be clear, there are teams that have, you know, found that they were spending too much money on these, and they spend time optimizing it. So you can use cheaper models, you can take a smaller model and do something called supervised fine-tuning to optimize it for your own workflow. So, there are multiple tools. But I think using these other tools optimize costs before you've, you know, are first built something valuable, I think that will most likely be premature optimization. And I, I would shy away from that.
Jon Krohn: 00:09:17
Nice. That's a great answer. Digging into agentive AI a bit more, which was a big theme in your talk and something that you said is maybe the technology we should be most excited about at this time, going back to a WIRED article in 2013, you mentioned how in the early days of AI, the prevailing opinion was that human intelligence derived from thousands of simple agents working in concert. This is what MIT's Marvin Minsky called "The Society of Mind". But then, you mentioned later in this WIRED article, that you stumbled upon the single algorithm theory popularized by Jeff Hawkins, which led you to deep learning. Now, 11 years after that WIRED article, are agents and multi-agent systems, in particular, marrying both of these concepts together, where, you know, we kind of have both ideas now blending together and providing powerful tooling?
Andrew Ng 00:10:12
You know, that's interesting question. Boy, I totally forgotten about that. Well done digging it up. So, I think that what's been remarkable about the large language model revolution is how much of it is because of one or a relatively small number of algorithms, namely the Transformer neural network. And it turns out that the reason large language models which are based on a neural network called the Transformer network, the reason they can demonstrate such a amazing capabilities is, I want to say, a lot of this is because of the richness of the data we feed it. So, there's this hypothesis, not proven, but there's hypothesis that even human intelligence, the human brain, a lot of human intelligence is due to one or a very small number of algorithms that, when fed all the richness of data from the world, allows us to learn to do all of these amazing things that humans can do. And then, as children grow up into adults, they also, really, to a large part, I think, because of the data they were fed, maybe a little bit of genetics, maybe a little bit the algorithm, but really a lot the data. The same infant brain could grow up, right, to become a doctor or an architect or software engineer, or whatever. And that's the data and the interactions. And I think, agentive workflows, always dangerous to make analogies between AI and humans, but I think there is a little bit of that getting the AI models to specialize a little bit for different tasks, based on how we prompted it or how we feed additional data to do specific job role, jobs such as real-based tasks.
Jon Krohn: 00:11:57
Nice. Yeah. And I appreciate that you loved us digging that out. I wish I could take credit. I have an amazing researcher, Serg Masis, who I have to give credit to, for pulling out that question and that idea. I want to move on to large vision models now as a topic, which follows from that five key AI trend slide that you kept coming back to in your talk. So, this is particularly related to the image processing revolution that's coming that you mentioned on that five key AI trends slide. So, LandingAI has a product called Vision Agent that led the way in this image processing revolution trend. Can you elaborate on why planning, using multiple tools, and code generation are so crucial for building effective vision AI applications? And how Vision Agent addresses these challenges?
Andrew Ng 00:12:49
I think the vision revolution is coming a little bit after the text processing revolution. And it turns out that large multimodal models they're, at least today, they're kind of okay at interpreting images. But when I mentioned the way we write text prompts in a non-agentive workflow, that's a bit like, you know, ask people to type the essay from the first word to the last word in one go. The way that large multimodal models, or vision language models, I use is, if I imagined that I want to solve a task, I could say, you know maybe "Here's a picture, take a glance, give me the answer." Right? And there's some things we could do that where, I don't know, for example, if I asked you yesterday, I was doing a demo of counting the number of people on a football field, on a soccer field. And, you know, "Can I show you a picture of a bunch of people, and take a glance, how many people are there?" It's actually hard to do. If you ask me to count the number of people, then see that go, "1, 2, 3, 4, 5," and that's more of an iterative, agentive workflow, rather than, "Here's a picture, what's the answer?" Which would be more of a zero shot, just type out the answer.
00:13:57
And so, what we found was that if we generate a plan that is expressed in code to say, "These are the tools, these are the function calls," I want to, you know, detect the people one at a time, then just count how many people are detected. Very simple plan. But simple plan like that, expressing code, can process images much more accurately for many kind of mission-critical image tasks. And we found also that for a lot of the image vision workflows, you know, writing the code required going to find the right library, the right open source model, integrating that. There's a lot of this kind of crafty, annoying coding work that we could do, but it take us like half a day. But we could write an agent to write a lot of that code for us. To come up with the plan, write the code to express the plan, and then test the code. And, so we found that to really lower the bar for developers to get a lot of the high stakes, very important visual AI type of question answered and hence, the Vision Agents. Still a lot of work to do there. But I'm actually quite excited at the number of users, you know, that are using it successfully to build software for, for visual AI tasks.
Jon Krohn: 00:15:04
Yeah, it is exciting to see how this is going to accelerate. I 100% agree with you that the text processing revolution is something we're in the midst of now. And people are only beginning to realize those applications. And things like your, the work that you're doing at LandingAI is going to be the next big thing, for sure. Having those extra modalities provides so much more optionality in real-world applications. I have one last question for you myself before I get to some audience questions. And this is related to your fifth and final key AI trend on that same five key AI trends slide that I was just mentioning. And this is related to unstructured data. So, you've previously talked about how by volume, most of the world's data are unstructured. And so, with the rise of generative AI, we're now able to tackle this vast amount of unstructured data. With the kinds of technology that you were just talking about with Vision Agent and other large vision models, how do you see visual AI transforming industries outside of traditional use cases, like manufacturing and healthcare? And what untapped areas might benefit most from these vision capabilities?
Andrew Ng 00:16:17
I think there'll be a lot and it's, I find it, I'm going to give that unsatisfying answer. It's a little bit like "Where will electricity be used?" It's like, "Boy, that's a head-scratcher, because it's so general." But maybe, I think that definitely manufacturing, I think robotic automation, including self-driving cars, that'll be revolutionary for I think, healthcare, I think security and then maybe, hey, but am I able to screen share? Can I share something? And can I screen share? Can people see this?
Jon Krohn: 00:16:20
I don't know the answer to that question. I'm looking around the room.
Andrew Ng 00:16:57
Or is it here? Can people see it?
Jon Krohn: 00:16:59
Yeah, I can see it. And yeah, I'm getting a thumbs up from backstage.
Andrew Ng 00:17:00
Oh, cool, awesome. All right. So, this is actually a demo that, we just put together over the weekend at LandingAI. But this is a video retrieval task, where we use our Vision Agent to write code to index and retrieve these videos. So, actually, let me just, I don't know, let's see. Gray wolf at, actually, skier. So, this is a little demo that has, so it turns out, see, a lot of businesses have tons of videos, and they just sit in blob storage in a cloud. But, you know, using Vision Agent, you can write code, write a demo to index these videos to help you find these. And I see, I actually think there's a lot of media companies with a lot of, well, this is raising my hand, right?
00:17:43
Right? And, so UI in green, you know, where the parts of the skier airborne. And these parts are not the skier airborne or, let's see, gray wolf. I just tried this over the weekend. See if this works. But so I find that, you know, we've actually found a bunch of, so there, there's a gray wolf at night. The UI shows in green where it's found it. If I click somewhere else, you know, there's no gray wolf, right, elsewhere. But there's a bear. Or, black luggage. Right? So, I actually have been traveling a lot these days. But, you know...
Jon Krohn: 00:18:23
Gotcha. Yeah, that's, that's very cool, Andrew to see.
Andrew Ng 00:18:29
But with rainbow strap.
Jon Krohn: 00:18:31
I'm going to, I'm going to give you one audience question, Andrew, since we had a number come in. And I think my personal favorite that's come in is, "How would you mitigate the risk of users indiscriminately relying on probabilistic answers generated by agents?" So, you know, if you have an agent powered by an LLM, you're going to be getting probabilistic answers. How do you mitigate that risk relative to the more deterministic answers that you would have retrieved, you know, in a more kind of classical keyword search or like a Google kind of search?
Andrew Ng 00:19:04
You know, I do wonder how deterministic some of the deterministic things are. So, I think…
Jon Krohn: 00:19:10
Right.
Andrew Ng 00:19:11
…maybe, I think machine learning in various forms has been working, you know, in many industries the last 10, 15 years. And I think a lot of machine learning answers are not fully deterministic and even web search is based on machine learning and is actually hard to predict exactly what a given web search will output. So, I feel like part of it will be user training, which maybe isn't a popular answer because that's, you know, hard. But I think some of it will be user training. And I think some of it will be putting in place the guardrails and mechanisms that make these safer even for less trained users. So, for example, common design pattern in GenAI workflows is a confirmation flow, where before we place an order for a user, right, to charge the credit card and ship a product, we'll often not have the AI just say, "Done." We'll have it, you know, generate API call with a pop-up modal that says, "Do you really want to buy this? I'm about to charge your credit card $20. Please hit yes or no." And with that type of confirmation flow, it makes it safer that I won't just charge your credit card without you explicitly saying yes to this endeavor. So, I find that there are these design patterns that can kind of guardrail and make the AI safer. But I think it'll be a mixture of software improvements and UI improvements with guardrails, as well as some amount of user training. And I feel like when after ChatGPT was released, there was some bad episodes, right? Like, I guess widely quoted was a lawyer that quoted made-up cases in a court filing, got in trouble for that.
Jon Krohn: 00:20:44
Yeah, yeah, yeah.
Andrew Ng 00:20:45
And then, that was really unfortunate. And one side effect of that was, you know, a lot of lawyers learned to not do that. So, that actually had a massive training effect across the industry. So yeah, I think it might be like that. But, just one thing. I don't want to pretend AI can never make mistakes. Definitely make mistakes. Can't use it everywhere. It's definitely unreliable in some places. I feel like the hype about how bad it is, it’s overblown. So, it's far from perfect, and it does have problems. I think the number of times I see AI not get shipped, that we can't use it because of these problems, is actually much less than what we'd expect.
Jon Krohn: 00:21:25
I think that there's a lot of memory of those kinds of early examples prominently in people's minds, like exactly the lawyer who brought fake cases and trials, to a real trial, that had been generated by an AI system. And so, we're kind of used to those stories. But today, the systems are so much better than they were a year ago or two years ago. Hallucinations are so much less of a big deal. And I think that's going to happen more and more. Andrew, thank you so much for taking the time with us today. We are, unfortunately, out of time. I had more questions. The audience had more questions, but we really appreciate you taking all the time that you did for us today. Thank you, Andrew.
Andrew Ng 00:22:03
Thank you so much. Thanks everyone.
Jon Krohn: 00:22:09
What an experience to be able to interview Andrew Ng! In today’s episode, the well-spoken icon covered how agent-based workflows using GPT-3.5 can outperform more expensive models like GPT-4 in certain tasks, suggesting companies should focus on building effective applications rather than pursuing more powerful models. He also talked about how the cost of using generative AI APIs has fallen by about 80% year-over-year, making it much more accessible than many businesses realize. And he also talked about how most companies, GenAI bills are surprisingly small, despite relatively high protocol costs. He talked about how modern AI is combining two historic approaches, Marvin Minsky's "Society of Mind" theory of multiple simple agents working together, and the single-algorithm theory that drove deep learning advances. He talked about how Vision Agent technology is revolutionizing image processing by breaking down complex visual tasks into smaller steps and generating code to execute them, making it easier for developers to build sophisticated visual AI applications. And he talked about how video and image processing capabilities are expanding beyond traditional use cases like manufacturing and healthcare, with new applications in media indexing, security and robotics demonstrating the transformative potential of visual AI.
00:23:31
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Andrew’s social media profiles, as well as my own, at www.superdatascience.com/841. Beyond social media, another way we can interact is coming up tomorrow on December 4th, if you got interested in agentic AI from Andrew today, then I've got great news for you 'cause I'm hosting a virtual half day conference on agentic AI tomorrow. It'll be interactive, practical, and it'll feature some of the most influential people in the AI agent space as speakers. It'll be live in the O 'Reilly platform, which many employers and universities provide access to. Otherwise, you can grab a free 30 day trial of O 'Reilly using our special code SDSPOD23. We've got a link to that code ready for you in the show notes. Yeah, I'm really looking forward to that. We're going to have speakers covering introductions to agentic AI. We will have hands on Python, implementations of agentic AI and we'll have a product manager come on and tell us how we can build effective products that leverage agentic AI systems. So one not to miss.
00:24:50
Thanks to everyone on the SuperDataScience Podcast team — our podcast manager Sonja Brajovic, media editor Mario Pombo, partnerships manager Natalie Ziajski, researcher Serg Masis, writers Dr. Zara Karschay and Silvia Ogweng, and founder Kirill Eremenko — for producing another special episode for us today.
00:25:09
I'll also provide a hat tip to Ivana Zibert, who has been our podcast manager for longer than I've been hosting the show for, so for more than four years and she's been unreal, an absolute linchpin in making sure that we are releasing our 104 episodes per year every, you know, two times a week, on time, every time to such a high level of quality. So we will miss you very much, Ivana. But we're in great hands with our new podcast manager, Sonja. Welcome aboard, Sonja.
00:25:47
All right. Yeah. If you enjoyed this episode, share it with someone who you think might like it, review it on your favorite podcasting app, or on YouTube, subscribe of course, if you're not already a subscriber, but most importantly, we just hope you'll keep on tuning in. I'm so grateful to have you listening, and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rockin' it out there, and I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.
This is episode number 841 with Dr. Andrew Ng, Executive Chairman of LandingAI.
00:00:11
Welcome to the Super Data Science podcast, the most listened-to podcast in the data science industry. Each week, we bring you fun and inspiring people and ideas exploring the cutting-edge of machine learning, AI, and related technologies that are transforming our world for the better. I'm your host, Jon Krohn. Thanks for joining me today. And now, let's make the complex simple.
00:00:45
Welcome back to the Super Data Science Podcast. Our guest today is Andrew Ng, I suspect pretty much anyone working in data science knows. Nevertheless, I’ll introduce him! Some of his inimitable accomplishments include: Being director of Stanford University's AI Lab, where his research group played a key role in the development of deep learning which led to him to founding the influential Google Brain team as well as educating millions on machine learning and leading to him co-founding Coursera. He's also managing director of AI Fund, a world leading AI venture studio. He was CEO and is now executive chairman of Landing AI, a computer vision platform that specializes in domain specific large vision models, analogous to LLMs for language. He also founded Deeplearning.ai, which provides excellent technical training on machine learning, deep learning, and generative AI, as well as many other associated subjects. And there's so much I could say about him, but we'll end it off here by saying that Andrew was also co-CEO and co-founder and chairman of Coursera, which brought online learning from 300 universities to over 100 million students.
00:01:56
Today's episode was recorded live at the ScaleUp:AI conference in New York a few weeks ago. I conducted this Q&A session with Andrew immediately after he gave a talk. So some of my questions refer back to that talk. That said, the interview should be clear to understand without being aware of Andrew's talk 'cause I think I provide enough context at each point. But just in case you're curious, we've included the slides from his talk in the show notes so you can check those out. One quirk about this interview is that at the end, Andrew shares his screen to demonstrate cutting-edge vision model capabilities. Screen sharing obviously isn't ideal in an audio only podcast, but if that section doesn't completely resonate with you, you can check out the YouTube version of this episode to get the full picture.
00:02:45
In today's episode, Andrew details why a cheaper AI model with smart, agentic AI workflows might outperform more expensive, more advanced models. He provides the surprising truth about AI API costs that most businesses don't realize. He talks about how the Society of Mind Theory from the 1980s is making an unexpected comeback in modern AI. He talks about a groundbreaking new way to process visual data that goes beyond traditional computer vision. And he wraps up by talking about why unstructured data could be the key to AI's next big revolution. All right, you ready for this special episode? Let's go.
00:03:32
Welcome to the second stage for your interactive session. That was an amazing talk, as we always expect from you. This is my first session that I'm hosting today, so I'll introduce myself to the audience as well as to you, Andrew. I'm Jon Krohn. I'm chief data scientist and co-founder of an AI startup called Nebula. But I'm perhaps best known as the host of SuperDataScience, which is the world's most listened to data science podcast. And I'm delighted that for three years in a row now, I've been hosting sessions here at ScaleUp:AI. So, thanks Insight Partners for inviting me back again.
00:04:03
Andrew, in your talk, you discussed how your team found that with GPT-3.5 and an agentive workflow, it can outperform a more advanced foundational model, such as GPT-4, with a zero-shot approach. How should companies balance their investments between pursuing more powerful models versus leveraging more effective agent architectures?
Andrew Ng 00:04:27
I think almost all companies, with the exception of a few, you know, giants, should be focusing on building applications using agentive workflows. If you have an extra few billion dollars to spare, by all means, go compete with OpenAI and Anthropic and Gemini or whatever. I think that, in all seriousness, you want to do it, go for it. You know, just plan to spend a few billion dollars, maybe. But I think that for most businesses, there's so many opportunities to build applications and it turns out that if you look at the use of generative AI, the cost of using these models is falling rapidly. So, over the last year and a half is falling by maybe about 80% year-on-year. So, I find that, you know, kind of two years ago, there were teams worried about, "Oh, GPT-4 is kind of expensive." But the prices are falling so quickly that I would advise to worry much more about building something valuable and then I think there's a good chance that the use of these APIs to use GenAI, will just become cheaper over time. And there are companies spending many millions of dollars on GenAI APIs. So it can get expensive. But the vast majority of businesses I see have generative AI bills, you build an application on top of, OpenAI or Anthropic or these other APIs, I see so many businesses that are getting so much value out of it. And frankly, the bill they're sending, you know, is so small that you'd be surprised at how small they are. You won't be surprised Jon.
Jon Krohn: 00:06:05
Well, it makes perfect sense. And yeah, so unless, people do have billions of dollars to spend, let me move on to a related question, where, assuming that people aren't going to be trying to train their own LLMs themselves, if you're an enterprise, should you be thinking more about always trying to use the latest and greatest LLM, or be thinking about grabbing the best agentive workflows? It seems like there's kind of a trade-off there between cost and efficiency. Because yes, while costs have gone down dramatically, say by 80%, you could save a lot of money by working with GPT-4o mini, instead of GPT-4o. And so, if I can be using that cheaper GPT-4o mini and getting better results by leveraging a more effective agentive workflow, it seems like, do you think that's the way to go for the most part?
Andrew Ng 00:06:21
You know, I would say, don't worry about, I feel like as a general suggestion, I would say don't worry about the price of the LLM, to get started. And I think, for development purposes, is actually, you know, it's not impossible, but honestly, so I still do a fair amount of coding myself, right? And, sometimes I would be spending all day on a weekend coding, for many hours, experimenting and then I find that at the end of the day, I just ran up like a $5 OpenAI bill. Right?
Jon Krohn: 00:07:26
Yeah, yeah, yeah.
Andrew Ng 00:07:29
And now, it is possible. There are some agentive workflows that can get more expensive. It is possible to run up, you know, tens of dollars, maybe low hundreds of dollars. But is actually cheaper than you would think. And so, my advice would seem is the hardest thing is just building something that works. That's still pretty hard. So, use the best model, build something that works. And after you have something, if we're so lucky to build something so valuable that its usage is too expensive. That's a wonderful problem to have. A lot fewer people have that problem than I wish. But when we have that problem, we then often have tools to lower the costs. But I think that a lot more people are worried about the price of using these generative AI APIs than, than is necessarily the case. And the most important thing is, I would say use the best model, use demo, use the latest best model, just build something that is valuable. And only after you succeed like that, and only if it turns out to be expensive, then work on the cost optimization after that.
Jon Krohn: 00:08:31
Okay, nice. And then, so if you are lucky enough to get to that stage, maybe there's a balance of experimenting both with lower cost options, like moving to GPT-4o mini say or experimenting with different agentive workflows, and just trying to see which gets you the best results for your use case?
Andrew Ng: 00:08:48
Yeah, yes. And, then just be clear, there are teams that have, you know, found that they were spending too much money on these, and they spend time optimizing it. So you can use cheaper models, you can take a smaller model and do something called supervised fine-tuning to optimize it for your own workflow. So, there are multiple tools. But I think using these other tools optimize costs before you've, you know, are first built something valuable, I think that will most likely be premature optimization. And I, I would shy away from that.
Jon Krohn: 00:09:17
Nice. That's a great answer. Digging into agentive AI a bit more, which was a big theme in your talk and something that you said is maybe the technology we should be most excited about at this time, going back to a WIRED article in 2013, you mentioned how in the early days of AI, the prevailing opinion was that human intelligence derived from thousands of simple agents working in concert. This is what MIT's Marvin Minsky called "The Society of Mind". But then, you mentioned later in this WIRED article, that you stumbled upon the single algorithm theory popularized by Jeff Hawkins, which led you to deep learning. Now, 11 years after that WIRED article, are agents and multi-agent systems, in particular, marrying both of these concepts together, where, you know, we kind of have both ideas now blending together and providing powerful tooling?
Andrew Ng 00:10:12
You know, that's interesting question. Boy, I totally forgotten about that. Well done digging it up. So, I think that what's been remarkable about the large language model revolution is how much of it is because of one or a relatively small number of algorithms, namely the Transformer neural network. And it turns out that the reason large language models which are based on a neural network called the Transformer network, the reason they can demonstrate such a amazing capabilities is, I want to say, a lot of this is because of the richness of the data we feed it. So, there's this hypothesis, not proven, but there's hypothesis that even human intelligence, the human brain, a lot of human intelligence is due to one or a very small number of algorithms that, when fed all the richness of data from the world, allows us to learn to do all of these amazing things that humans can do. And then, as children grow up into adults, they also, really, to a large part, I think, because of the data they were fed, maybe a little bit of genetics, maybe a little bit the algorithm, but really a lot the data. The same infant brain could grow up, right, to become a doctor or an architect or software engineer, or whatever. And that's the data and the interactions. And I think, agentive workflows, always dangerous to make analogies between AI and humans, but I think there is a little bit of that getting the AI models to specialize a little bit for different tasks, based on how we prompted it or how we feed additional data to do specific job role, jobs such as real-based tasks.
Jon Krohn: 00:11:57
Nice. Yeah. And I appreciate that you loved us digging that out. I wish I could take credit. I have an amazing researcher, Serg Masis, who I have to give credit to, for pulling out that question and that idea. I want to move on to large vision models now as a topic, which follows from that five key AI trend slide that you kept coming back to in your talk. So, this is particularly related to the image processing revolution that's coming that you mentioned on that five key AI trends slide. So, LandingAI has a product called Vision Agent that led the way in this image processing revolution trend. Can you elaborate on why planning, using multiple tools, and code generation are so crucial for building effective vision AI applications? And how Vision Agent addresses these challenges?
Andrew Ng 00:12:49
I think the vision revolution is coming a little bit after the text processing revolution. And it turns out that large multimodal models they're, at least today, they're kind of okay at interpreting images. But when I mentioned the way we write text prompts in a non-agentive workflow, that's a bit like, you know, ask people to type the essay from the first word to the last word in one go. The way that large multimodal models, or vision language models, I use is, if I imagined that I want to solve a task, I could say, you know maybe "Here's a picture, take a glance, give me the answer." Right? And there's some things we could do that where, I don't know, for example, if I asked you yesterday, I was doing a demo of counting the number of people on a football field, on a soccer field. And, you know, "Can I show you a picture of a bunch of people, and take a glance, how many people are there?" It's actually hard to do. If you ask me to count the number of people, then see that go, "1, 2, 3, 4, 5," and that's more of an iterative, agentive workflow, rather than, "Here's a picture, what's the answer?" Which would be more of a zero shot, just type out the answer.
00:13:57
And so, what we found was that if we generate a plan that is expressed in code to say, "These are the tools, these are the function calls," I want to, you know, detect the people one at a time, then just count how many people are detected. Very simple plan. But simple plan like that, expressing code, can process images much more accurately for many kind of mission-critical image tasks. And we found also that for a lot of the image vision workflows, you know, writing the code required going to find the right library, the right open source model, integrating that. There's a lot of this kind of crafty, annoying coding work that we could do, but it take us like half a day. But we could write an agent to write a lot of that code for us. To come up with the plan, write the code to express the plan, and then test the code. And, so we found that to really lower the bar for developers to get a lot of the high stakes, very important visual AI type of question answered and hence, the Vision Agents. Still a lot of work to do there. But I'm actually quite excited at the number of users, you know, that are using it successfully to build software for, for visual AI tasks.
Jon Krohn: 00:15:04
Yeah, it is exciting to see how this is going to accelerate. I 100% agree with you that the text processing revolution is something we're in the midst of now. And people are only beginning to realize those applications. And things like your, the work that you're doing at LandingAI is going to be the next big thing, for sure. Having those extra modalities provides so much more optionality in real-world applications. I have one last question for you myself before I get to some audience questions. And this is related to your fifth and final key AI trend on that same five key AI trends slide that I was just mentioning. And this is related to unstructured data. So, you've previously talked about how by volume, most of the world's data are unstructured. And so, with the rise of generative AI, we're now able to tackle this vast amount of unstructured data. With the kinds of technology that you were just talking about with Vision Agent and other large vision models, how do you see visual AI transforming industries outside of traditional use cases, like manufacturing and healthcare? And what untapped areas might benefit most from these vision capabilities?
Andrew Ng 00:16:17
I think there'll be a lot and it's, I find it, I'm going to give that unsatisfying answer. It's a little bit like "Where will electricity be used?" It's like, "Boy, that's a head-scratcher, because it's so general." But maybe, I think that definitely manufacturing, I think robotic automation, including self-driving cars, that'll be revolutionary for I think, healthcare, I think security and then maybe, hey, but am I able to screen share? Can I share something? And can I screen share? Can people see this?
Jon Krohn: 00:16:20
I don't know the answer to that question. I'm looking around the room.
Andrew Ng 00:16:57
Or is it here? Can people see it?
Jon Krohn: 00:16:59
Yeah, I can see it. And yeah, I'm getting a thumbs up from backstage.
Andrew Ng 00:17:00
Oh, cool, awesome. All right. So, this is actually a demo that, we just put together over the weekend at LandingAI. But this is a video retrieval task, where we use our Vision Agent to write code to index and retrieve these videos. So, actually, let me just, I don't know, let's see. Gray wolf at, actually, skier. So, this is a little demo that has, so it turns out, see, a lot of businesses have tons of videos, and they just sit in blob storage in a cloud. But, you know, using Vision Agent, you can write code, write a demo to index these videos to help you find these. And I see, I actually think there's a lot of media companies with a lot of, well, this is raising my hand, right?
00:17:43
Right? And, so UI in green, you know, where the parts of the skier airborne. And these parts are not the skier airborne or, let's see, gray wolf. I just tried this over the weekend. See if this works. But so I find that, you know, we've actually found a bunch of, so there, there's a gray wolf at night. The UI shows in green where it's found it. If I click somewhere else, you know, there's no gray wolf, right, elsewhere. But there's a bear. Or, black luggage. Right? So, I actually have been traveling a lot these days. But, you know...
Jon Krohn: 00:18:23
Gotcha. Yeah, that's, that's very cool, Andrew to see.
Andrew Ng 00:18:29
But with rainbow strap.
Jon Krohn: 00:18:31
I'm going to, I'm going to give you one audience question, Andrew, since we had a number come in. And I think my personal favorite that's come in is, "How would you mitigate the risk of users indiscriminately relying on probabilistic answers generated by agents?" So, you know, if you have an agent powered by an LLM, you're going to be getting probabilistic answers. How do you mitigate that risk relative to the more deterministic answers that you would have retrieved, you know, in a more kind of classical keyword search or like a Google kind of search?
Andrew Ng 00:19:04
You know, I do wonder how deterministic some of the deterministic things are. So, I think…
Jon Krohn: 00:19:10
Right.
Andrew Ng 00:19:11
…maybe, I think machine learning in various forms has been working, you know, in many industries the last 10, 15 years. And I think a lot of machine learning answers are not fully deterministic and even web search is based on machine learning and is actually hard to predict exactly what a given web search will output. So, I feel like part of it will be user training, which maybe isn't a popular answer because that's, you know, hard. But I think some of it will be user training. And I think some of it will be putting in place the guardrails and mechanisms that make these safer even for less trained users. So, for example, common design pattern in GenAI workflows is a confirmation flow, where before we place an order for a user, right, to charge the credit card and ship a product, we'll often not have the AI just say, "Done." We'll have it, you know, generate API call with a pop-up modal that says, "Do you really want to buy this? I'm about to charge your credit card $20. Please hit yes or no." And with that type of confirmation flow, it makes it safer that I won't just charge your credit card without you explicitly saying yes to this endeavor. So, I find that there are these design patterns that can kind of guardrail and make the AI safer. But I think it'll be a mixture of software improvements and UI improvements with guardrails, as well as some amount of user training. And I feel like when after ChatGPT was released, there was some bad episodes, right? Like, I guess widely quoted was a lawyer that quoted made-up cases in a court filing, got in trouble for that.
Jon Krohn: 00:20:44
Yeah, yeah, yeah.
Andrew Ng 00:20:45
And then, that was really unfortunate. And one side effect of that was, you know, a lot of lawyers learned to not do that. So, that actually had a massive training effect across the industry. So yeah, I think it might be like that. But, just one thing. I don't want to pretend AI can never make mistakes. Definitely make mistakes. Can't use it everywhere. It's definitely unreliable in some places. I feel like the hype about how bad it is, it’s overblown. So, it's far from perfect, and it does have problems. I think the number of times I see AI not get shipped, that we can't use it because of these problems, is actually much less than what we'd expect.
Jon Krohn: 00:21:25
I think that there's a lot of memory of those kinds of early examples prominently in people's minds, like exactly the lawyer who brought fake cases and trials, to a real trial, that had been generated by an AI system. And so, we're kind of used to those stories. But today, the systems are so much better than they were a year ago or two years ago. Hallucinations are so much less of a big deal. And I think that's going to happen more and more. Andrew, thank you so much for taking the time with us today. We are, unfortunately, out of time. I had more questions. The audience had more questions, but we really appreciate you taking all the time that you did for us today. Thank you, Andrew.
Andrew Ng 00:22:03
Thank you so much. Thanks everyone.
Jon Krohn: 00:22:09
What an experience to be able to interview Andrew Ng! In today’s episode, the well-spoken icon covered how agent-based workflows using GPT-3.5 can outperform more expensive models like GPT-4 in certain tasks, suggesting companies should focus on building effective applications rather than pursuing more powerful models. He also talked about how the cost of using generative AI APIs has fallen by about 80% year-over-year, making it much more accessible than many businesses realize. And he also talked about how most companies, GenAI bills are surprisingly small, despite relatively high protocol costs. He talked about how modern AI is combining two historic approaches, Marvin Minsky's "Society of Mind" theory of multiple simple agents working together, and the single-algorithm theory that drove deep learning advances. He talked about how Vision Agent technology is revolutionizing image processing by breaking down complex visual tasks into smaller steps and generating code to execute them, making it easier for developers to build sophisticated visual AI applications. And he talked about how video and image processing capabilities are expanding beyond traditional use cases like manufacturing and healthcare, with new applications in media indexing, security and robotics demonstrating the transformative potential of visual AI.
00:23:31
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Andrew’s social media profiles, as well as my own, at www.superdatascience.com/841. Beyond social media, another way we can interact is coming up tomorrow on December 4th, if you got interested in agentic AI from Andrew today, then I've got great news for you 'cause I'm hosting a virtual half day conference on agentic AI tomorrow. It'll be interactive, practical, and it'll feature some of the most influential people in the AI agent space as speakers. It'll be live in the O 'Reilly platform, which many employers and universities provide access to. Otherwise, you can grab a free 30 day trial of O 'Reilly using our special code SDSPOD23. We've got a link to that code ready for you in the show notes. Yeah, I'm really looking forward to that. We're going to have speakers covering introductions to agentic AI. We will have hands on Python, implementations of agentic AI and we'll have a product manager come on and tell us how we can build effective products that leverage agentic AI systems. So one not to miss.
00:24:50
Thanks to everyone on the SuperDataScience Podcast team — our podcast manager Sonja Brajovic, media editor Mario Pombo, partnerships manager Natalie Ziajski, researcher Serg Masis, writers Dr. Zara Karschay and Silvia Ogweng, and founder Kirill Eremenko — for producing another special episode for us today.
00:25:09
I'll also provide a hat tip to Ivana Zibert, who has been our podcast manager for longer than I've been hosting the show for, so for more than four years and she's been unreal, an absolute linchpin in making sure that we are releasing our 104 episodes per year every, you know, two times a week, on time, every time to such a high level of quality. So we will miss you very much, Ivana. But we're in great hands with our new podcast manager, Sonja. Welcome aboard, Sonja.
00:25:47
All right. Yeah. If you enjoyed this episode, share it with someone who you think might like it, review it on your favorite podcasting app, or on YouTube, subscribe of course, if you're not already a subscriber, but most importantly, we just hope you'll keep on tuning in. I'm so grateful to have you listening, and I hope I can continue to make episodes you love for years and years to come. Until next time, keep on rockin' it out there, and I'm looking forward to enjoying another round of the Super Data Science Podcast with you very soon.
Show all
arrow_downward