New top score on ARC-AGI-2-pub (29.4%) - Jeremy Berman
🎯 Summary
Podcast Summary: Jeremy Berman’s ARC-AGI Breakthrough
Focus Area
This episode centers on artificial general intelligence (AGI) research, specifically discussing Jeremy Berman’s record-breaking 29.4% score on the ARC-AGI-2 benchmark. The conversation explores evolutionary programming approaches, reasoning capabilities in AI systems, and the fundamental challenges in achieving true machine intelligence.
Key Technical Insights
• Evolutionary Algorithm Innovation: Berman’s breakthrough involved evolving natural language descriptions of algorithms rather than explicit Python code, leveraging the superior expressiveness of English over programming languages for complex reasoning tasks • Reasoning vs. Domain Skills: The discussion distinguishes between teaching AI specific skills versus the “meta-skill” of reasoning - the ability to learn how to learn, which represents the core challenge in achieving AGI • RL-Trained Models vs. Traditional LLMs: Models trained with reinforcement learning (like O1) demonstrate built-in revision loops and deeper thinking capabilities, reducing the need for artificial prompting strategies
Business/Investment Angle
• Compute Infrastructure Scaling: Discussion of massive investments (Nvidia’s $100B into OpenAI, Sam Altman’s gigawatt compute plans) suggesting the industry believes computational scale will solve current limitations • Human Data Dependency: Current AI systems heavily rely on human evaluation and fine-tuning, representing both a bottleneck and business opportunity for companies like Prolific • Continual Learning Market Gap: The inability of current models to learn new skills without forgetting old ones represents a significant commercial opportunity for breakthrough solutions
Notable Companies/People
• Jeremy Berman: Research scientist at Reflection AGI, former Y Combinator CTO who pivoted to AGI research • François Chollet: ARC-AGI creator and neurosymbolic AI advocate, now building solutions at his company • Jeff Hawkins: Author of “A Thousand Brains,” influential in hierarchical temporal memory approaches • Ryan Greenblatt: Previous ARC-AGI researcher whose evolutionary programming approach inspired Berman’s work
Future Implications
The conversation suggests the field is moving toward hybrid systems combining neural networks with symbolic reasoning capabilities. Key developments expected include: composable AI architectures with freezable expert layers, dynamic model merging capabilities, and eventually real-time adaptive fine-tuning. The industry appears to be approaching an “RL S-curve” followed by a focus on making language models truly composable and capable of continual learning without catastrophic forgetting.
Target Audience
This episode is most valuable for AI researchers, ML engineers, and technical leaders working on reasoning systems, program synthesis, or AGI development. The deep technical discussion of evolutionary algorithms, symbolic vs. neural approaches, and architectural considerations makes it particularly relevant for practitioners building next-generation AI systems.
Comprehensive Analysis
This podcast episode represents a fascinating deep-dive into one of the most significant recent breakthroughs in artificial general intelligence research. Jeremy Berman’s achievement of a 29.4% score on the ARC-AGI-2 benchmark marks a substantial leap forward in machine reasoning capabilities, and his methodology reveals important insights about the future direction of AGI development.
The Technical Breakthrough
Berman’s approach represents an elegant evolution of previous work by Ryan Greenblatt, but with a crucial innovation: instead of evolving Python programs directly, his system evolves natural language descriptions of algorithms. This shift acknowledges a fundamental truth about human cognition - we typically understand problems in natural language before translating them into code. As Berman notes, “You can describe every single ARC-AGI task in 10 bullet points of plain English, most of them in 5 bullet points.” This insight led him to leverage the superior expressiveness of natural language over programming languages for complex reasoning tasks.
The evolutionary component of his approach addresses a critical challenge in AI problem-solving: the trade-off between breadth and depth of search. Rather than trying thousands of shallow attempts or pursuing a single deep revision path, Berman’s system finds a “Goldilocks zone” that balances exploration with refinement. Interestingly, he discovered that ARC-AGI-2 required more breadth than depth, partly because newer RL-trained models like O1 already incorporate sophisticated internal revision loops.
The Reasoning Revolution
A central theme of the conversation is the distinction between domain-specific skills and the meta-skill of reasoning. Berman argues that “reasoning is that meta-skill” - the ability to learn how to learn new capabilities. This perspective frames AGI not as the accumulation of many specific abilities, but as the development of a general learning mechanism that can acquire any skill.
The discussion reveals how reinforcement learning has fundamentally changed the landscape. Pre-O1 models required elaborate prompting strategies to simulate thinking, but RL-trained models have internalized these revision loops. This represents a shift from “stochastic guessing” to genuine internal reasoning processes, though questions remain about whether this constitutes true understanding or sophisticated pattern matching.
The Symbolic vs. Neural Debate
The conversation touches on one of the most fundamental debates in AI: whether neural networks alone can achieve human-level reasoning or whether symbolic components are necessary. Berman takes an optimistic view, arguing that neural networks have the theoretical capacity to represent symbolic systems, while acknowledging current limitations. The discussion references François Chollet’s neurosymbolic approach and Jerry Fodor’s
🏢 Companies Mentioned
đź’¬ Key Insights
"You can always teach a language model a skill, right? But it's the meta-skill—the skill to create the skills—that is AGI. To me, that's reasoning. Reasoning is that meta-skill."
"Forcing the language models to develop these deep trees from the ground up can only be developed from the ground up. We need to come up with new techniques and environments to grow the trees instead of pre-training, which is pre-filling random—it's not random, but it's a web; it's not a tree."
"He says you can always teach a language model a skill, but it's the meta-skill—the skill to create the skills—that is AGI. To me, that's reasoning. Reasoning is that meta-skill."
"Shole hits on a core problem with language models: their reasoning is domain-specific. When you train a language model to reason about math, most of the reasoning circuits it gains live in the math weights. You try to train it on science, and it's some generalization, but not as much as you would want."
"When we do SGD, because there are all these shortcuts, it will always just find the wrong thing."
"Stochastic gradient descent does not find the algorithms that allow the systems to behave as if they are Turing machines. God knows how it happened in our brains; there was some hint of evolution where it suddenly got the merge operator or something, and we've got this incredible Turing complete algorithm in our finite brain."