Podcastskeyboard_arrow_rightSDS 740: Q*: OpenAI’s Rumored AGI Breakthrough

11 minutes

Data ScienceArtificial Intelligence

SDS 740: Q*: OpenAI’s Rumored AGI Breakthrough

Podcast Guest: Jon Krohn

Thursday Dec 14, 2023

Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn


How long will it be until we reach artificial general intelligence, and will regulations over AI be effective in slowing its approach? In this week’s Five-Minute Friday, Jon Krohn peeks behind the curtains of OpenAI, where development of the world’s first model that can solve complex, nonlinear logical problems, Q*, might be well underway. 


There has been no shortage of sensational news coming out of OpenAI these last few months, and the emergence of a potential new model, Q* (Q star), has only added fuel to the rumors of why CEO Sam Altman was ousted and then rehired just as swiftly. Despite reports from Reuters and The Information that Q*’s development led to widespread upheavals at the AI research organization, we have no concrete proof that the new AI system is responsible for the fallout.

And yet the scant information we have on Q* does seem to mark its significance as a sticking point between AGI’s “doomers” and “boomers” at OpenAI. If the rumors are true, Q* will be able to churn out solutions to problems that require logical reasoning. We already have models that can achieve the kind of math problems you were presented with at elementary school (“If Pixie the cat has 13 mouse toys and she loses 6 but her owner buys 4 new ones for her, how many mouse toys does Pixie have?”). Such problems can only be solved in steps, requiring comprehension not only of simple mathematics but also context, logic and time. Large language models (LLMs) such as ChatGPT can break down the problem into its components, known as “chain-of-thought prompting”.

What concerns some AI practitioners is the explorative capabilities of Q*, which enable the agent to reach conclusions to nonlinear logical problems by finding new pathways that are otherwise unexplored, unrestricted by training data. This ability, as Jon explains in the podcast, will have ramifications for feedback in training models, replacing the need for humans entirely and bringing us closer to artificial general intelligence (AGI). We don’t have to look too far back to see similar developments, such as AlphaGo, which made game moves that initially confounded the world’s best players. Q* could stand to do the same, only with a more concerning roster of mathematical proofs that could interest those who wish to harm as much as those who wish to help.

Listen to the episode to hear the techniques applied to establish Q* as a formidable new player in the AI landscape, why OpenAI named the agent Q*, and what Q* cannot solve (just yet).

ITEMS MENTIONED IN THIS PODCAST:

DID YOU ENJOY THE PODCAST?
  • How long will it be until we reach artificial general intelligence, and will regulations over AI be effective in slowing its approach? 
  • Download The Transcript 
(00:05): This is Five-Minute Friday on Q*. 

(00:27): Welcome back to The Super Data Science Podcast. I'm your host, Jon Krohn. Today’s episode is all about a rumored new model out of OpenAI called Q* and it’s been causing quite a stir both in the A.I. community and beyond it.

(00:42): Q* is reported to have the ability to solve relatively complex math problems that are expressed in natural language in words. So these are problems like: “The cafeteria had 23 apples. They used 20 apples for lunch and bought 6 more. How many apples do they have?”. So yeah, if Q* can solve relatively complex problems like this, math word problems like this, that indicates a significant leap forward in AI capabilities. 

(01:08): The Q* rumors began during the corporate drama a few weeks ago during which OpenAI fired and then quickly re-hired its CEO, Sam Altman. Around the same time, reports from Reuters and an online outlet called The Information linked the development of Q* to this upheaval. More specifically, OpenAI staff reportedly sent a letter to the OpenAI board, warning of the potential dangers this new AI could pose to humanity. 

 (01:33): Since those reports, however, no concrete evidence convincingly linking Q* to Altmangate emerged and so maybe it was one of many factors that led the OpenAI board to state that Altman “was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities” or maybe it wasn’t. Altman did, weeks before his firing, state in public that he’d recently “been in the room” for a startling new demonstration of a new A.I. capability, which could have been a Q* demo, and there is speculation that a power struggle between AGI “doomers” versus “boomers”, with Altman in the presumably pro-Q* “boomers” camp, and this could have precipitated Altman’s firing, but alternatively it could simply have been petty internal politics that led to the brief coup. 

 (02:23): With that speculative context explaining why Q* has generated so much interest, let's now step back from the speculation and look at what we actually know about the Q* development. OpenAI hasn't released specific details on Q*, but they have recently published work on solving grade-school math problems using A.I. Critically, this kind of step-by-step reasoning in solving math problems has broader implications beyond just mathematics. 

(02:49): For instance, consider a complex math problem where you need to keep a running tally to solve it. This step-by-step process is similar to what large language models (LLMs) use when they execute upon so-called “chain-of-thought prompting”. So Google researchers demonstrated that this kind of chain-of-thought prompting leads LLMs to perform better on not just math, but other complex logical reasoning problems as well. In this chain-of-thought prompting, the LLM uses its own output as a sort of scratch space, kind of working memory, like when you are thinking through a problem. And this allows to break down complex problems into simpler, manageable steps. 

(03:26): OpenAI's own published work in this area involved paying humans to provide feedback on 800,000 individual intermediate steps across 75,000 grade-school math-word problems, creating a formidable training set. They then used a technique where an LLM generates multiple solutions at each step in the solution while a second model acts as a verifier to pick the best one. Their research shows that this method, when set loose on their big human-generated data set, can tackle more challenging problems more efficiently.

(03:56): However, not all logic problems can be solved linearly. Some, like seating arrangements at a wedding with specific guest preferences, present what computer scientists call NP-hard problems. These NP-hard problems often require not only exploring many possible solutions but, critically, also backtracking when necessary. 

(04:18): Such NP-hard problems bring us to the concept of Tree of Thoughts, which was proposed by researchers from Princeton and Google's DeepMind group in May. This Tree of Thoughts approach allows an LLM to explore different reasoning chains, branching off in various directions, not just one, not just linear. It seems Tree of Thoughts is a solid step towards enabling A.I. to engage in more creative and more complex problem-solving. 

(04:46): From public statements in interviews, it appears that an even bigger goal for A.I. researchers like those at OpenAI and DeepMind is to combine the reasoning abilities of LLMs with a sophisticated so-called search-tree method akin to what was used in the famous AlphaGo algorithm. AlphaGo's success in the supposedly intuition-driven game of Go came from its ability to simulate thousands of games and use neural networks to evaluate the best prospective moves from those thousands of simulations and thousands of possible moves at any given point in game play.

(05:22): This, at long last, brings us to why this new OpenAI model is called Q* because Q* can help us train a model that can do this kind of simulation and prospective-move evaluation. While Q* is a mathematical term from reinforcement learning literature stretching back several decades, it was popularized exactly one decade ago, in 2013, when DeepMind incorporated deep neural networks, also known as deep learning, into reinforcement learning to create something called Deep Q-learning. They employed this Deep Q-learning to allow machines to excel at playing a broad range of Atari video games. You can check out the hour-long YouTube tutorial with accompanying Python code in a Jupyter notebook that I put together five years ago on Deep Q-learning for all of the mathematical detail, but broadly speaking, the concept of Q* within Deep Q-learning is that Q* guides identification of the optimal action a reinforcement learning algorithm should take given the particular situation that it’s in. 

(06:24): The key takeaway from this is that, whether we’re talking about applications to Atari video games or Go or math-word problems or general logical reasoning, the Q* model that OpenAI has now rumoured to be working on, alludes to the creation of an A.I. that can improve its reasoning through a sort of automated self-play, playing games against itself, this generates a lot of data and it also allows the algorithm to explore and learn from a vast array of possible solutions. Critically, this automated self-play would mean expensive, slow humans would no longer be needed to generate vast amounts of training data and, as demonstrated by the startlingly creative moves that AlphaGo makes while playing Go, Q* could solve new problems beyond anything included within its training data. That’s huge, this is a really big deal toward potentially realizing AGI — artificial general intelligence, a machine that could meet or exceed human ability on any cognitive task, at least someday, more on that on a bit. 

(07:32): If OpenAI’s Q* approach, which blends Deep Q-learning with LLMs does scale up to allow generative A.I. systems to solve complex math and logical problems, this could portend in the relative immediate term that machines could perhaps resolve complex mathematical proofs, define new physics concepts and assist us with technological breakthroughs like nuclear fusion. There are also security concerns then like encryption being cracked, hence the rumors of Q* being linked to OpenAI “doomers” attempting to push the commercially oriented “boomer” Sam Altman out. 

(08:11): Gaps between Q* and AGI, however, likely still exist. For example, while humans can update their mental models as they work through a problem, today's neural networks have a rigid separation between the training and inference phases. Bridging this gap may require another fundamental architectural innovation in A.I. beyond Q*.

(08:33): Now, whether it’s Q* or these additional fundamental problems that lie beyond Q*, one interesting tidbit from all this is that it appears Dr. Noam Brown, who was our guest in Episode #569 and particularly the research he was involved with on the natural-language negotiation game Diplomacy, which we detailed in Episode #663, it appears that Noam is playing a key role in this transformative, frontier research since he recently joined OpenAI. Indeed, Noam Brown tweeted in June that for years he’s been “researching A.I. self-play in games like Poker and in games like Diplomacy” and that now he’ll “investigate how to make these methods truly general.”

(09:24): Noam predicted that “if we can discover a general version, the benefits could be huge.” He sais “Yes, inference may be 1,000x slower and more costly, but what inference cost would we pay for a new cancer drug? Or for a proof of the Riemann Hypothesis?” And indeed, Prof. Yann LeCun — the celebrated deep learning pioneer as well as the Chief A.I. Scientist at Meta, where Noam worked until earlier this year — Yann LeCun tweeted just last month that “Q* is OpenAI’s attempt at planning… [t]hey pretty much hired Noam Brown to work on that.” 

(10:01): In conclusion, while Q* might not end up being the monumental leap towards AGI or the huge threat to humanity that some rumor it to possibly be, Q* could nevertheless represent an important step towards more sophisticated and general reasoning abilities in A.I. — far beyond what we have today. The journey to understanding and replicating human-like reasoning in AI is complex and full of unknowns, but developments like Q* are crucial milestones along this path. Stay tuned to this podcast as we continue to explore these fascinating advancements in the world of artificial intelligence. 

(10:36): In the meantime, beyond the Q* YouTube and GitHub links of mine I already mentioned in this episode, you can get even more detail on Deep Q-learning by checking out my hands-on deep reinforcement learning video course in the O’Reilly platform or you can check out Chapter 13 of my book, Deep Learning Illustrated. 

(10:53): All right, that’s it for today. I hope you found today’s episode to be both interesting and informative. Until next time, keep on rockin’ it out there and I’m looking forward to enjoying another round of the Super Data Science podcast with you very soon. 

Show all

arrow_downward

Share on