Columbia CS Professor: Why LLMs Can’t Discover New Science

Unknown Source October 13, 2025 51 min

artificial-intelligence generative-ai ai-infrastructure startup openai anthropic google

🎧 Listen to Original

46 Companies

69 Key Quotes

4 Topics

1 Insights

🎯 Summary

Podcast Summary: Columbia CS Professor: Why LLMs Can’t Discover New Science

This 50-minute podcast episode features a discussion with Columbia CS Professor Fischal Mishra (alongside a16z’s Martin Casari) about the fundamental limitations of Large Language Models (LLMs) in achieving true scientific discovery, contrasting their current capabilities with the requirements for Artificial General Intelligence (AGI). The conversation centers on Mishra’s formal mathematical models that explain how LLMs reason and why they are inherently bound by their training data.

1. Focus Area

The primary focus is on Formal Modeling of LLMs and Reasoning Limitations. Specific topics include:

The mathematical underpinnings of LLM inference (Transformer architecture, next-token prediction).
The concept of Bayesian Manifolds (or reduced state spaces) that constrain LLM outputs.
The mechanism behind Retrieval Augmented Generation (RAG), which Mishra accidentally pioneered.
The distinction between interpolation/pattern matching (what LLMs do) and true paradigm shifts (what AGI requires).
The impossibility of Recursive Self-Improvement without external information.

2. Key Technical Insights

LLMs operate on Bayesian Manifolds: LLMs reduce the complexity of the world into a geometric manifold of reduced degrees of freedom. They are confident and coherent only when traversing within this learned manifold. Veering off it leads to confident hallucination.
Entropy and Context: Reasoning quality is tied to the entropy of the next-token distribution. Highly specific, information-rich prompts (low prediction entropy) force the model into a narrower, more confident path, explaining why Chain-of-Thought (CoT) prompting works—it forces the model to follow known algorithmic steps (low entropy paths).
The Matrix Abstraction: LLMs implicitly represent a massive, sparse matrix where rows are prompts and columns are vocabulary tokens (next-token distributions). They do not store this matrix explicitly but interpolate across the subset of rows seen during training to generate a posterior distribution for new prompts.

3. Business/Investment Angle

RAG as an Accidental Breakthrough: Mishra’s work on fixing the Cricinfo stats page led directly to the core mechanism now known as RAG, demonstrating that augmenting models with external, structured data retrieval is crucial for complex tasks.
Plateauing Capabilities: The rapid, surprising pace of LLM development (GPT-3 to GPT-4) is beginning to plateau. Current improvements are incremental (better camera on the iPhone analogy), suggesting that scaling alone will not lead to fundamentally new capabilities like scientific discovery.
Defining AGI: True AGI requires the ability to generate new science (e.g., inventing Relativity by rejecting Newtonian physics), which necessitates breaking out of the learned manifold—something current LLMs, bound by their inductive closure, cannot do.

4. Notable Companies/People

Fischal Mishra (Columbia CS Professor): The central expert, known for developing formal, predictive models of LLM behavior, stemming from his background in networking and his accidental invention of RAG while working on the Cricinfo platform.
Martin Casari (a16z): Host and interviewer, highlighting the predictive power of Mishra’s formal models compared to other industry rhetoric.
Cricinfo/ESPN: The context for Mishra’s initial problem-solving, leading to the development of the RAG precursor in 2020/2021.

5. Future Implications

The industry is hitting a wall where scaling current Transformer architectures will not yield true scientific breakthroughs. The path forward requires moving beyond the inductive closure of training data. The next fundamental advance will likely involve models capable of rejecting established paradigms and generating entirely new mathematical or scientific frameworks, which is the definition of AGI in this context.

6. Target Audience

This episode is highly valuable for AI Researchers, Machine Learning Engineers, and Technology Strategists/VCs who need a rigorous, non-rhetorical understanding of the current theoretical limits of LLMs and what architectural shifts might be necessary for the next major leap in AI capability.

🏢 Companies Mentioned

Yann ✅ ai_research

ARC prize ✅ ai_research

X (Twitter) ✅ ai_infrastructure

Stats Guru ✅ ai_application

Columbia ✅ research_institution

Sam's model ✅ ai_company

That I ✅ unknown

Because I ✅ unknown

Which I ✅ unknown

International Math Olympiad ✅ unknown

REST API ✅ unknown

But I ✅ unknown

And GPT ✅ unknown

Because Stats Guru ✅ unknown

And I ✅ unknown

💬 Key Insights

"Language is great, but language is not the answer. You know, when I'm looking at catching a ball that is coming to me, I'm mentally doing that simulation in my head. I'm not translating it to language to figure out where it'll land."

Impact Score: 10

"So, the way I would say that LLMs currently navigate through this known Bayesian manifold, AGI will create new manifolds. So, right now these models navigate, they do not create."

Impact Score: 10

"But creating new dots, I think we need an architectural advance."

Impact Score: 10

"So, any LLM that was trained on pre-1915 physics would never have come up with the theory of relativity. Einstein had to reject the Newtonian physics and come up with a space-time continuum. He completely rewrote the rules."

Impact Score: 10

"So, you can sort of self-improve up to a point, but beyond the point, these models can only sort of generate what they have been trained on."

Impact Score: 10

"So, any model, any LLM that was trained on pre-1915 physics would never have come up with the theory of relativity. Einstein had to reject the Newtonian physics and come up with a space-time continuum."

Impact Score: 10

📊 Topics

#artificialintelligence 114 #generativeai 18 #aiinfrastructure 7 #startup 1

🧠 Key Takeaways

💡 put this in the notes for this, but the single best talk I've ever seen on trying to understand how LLMs work is one that Fischal did at MIT, which Harry Baller Christian pointed me to, and I watched that