772 | Unpacking LLMs.txt with Carolyn Shelby

Unknown Source October 16, 2025 43 min

artificial-intelligence generative-ai ai-infrastructure startup google microsoft meta

🎧 Listen to Original

73 Companies

77 Key Quotes

4 Topics

1 Insights

🎯 Summary

Summary of Edge of the Web Radio Episode 772: Communicating Value to LLMs with Yoast’s Carolyn Shelby

This episode of The Edge of the Web Radio features Carolyn Shelby, Principal SEO at Yoast, focusing on the emerging need for websites to communicate content value directly to Large Language Models (LLMs) and the introduction of the llms.txt file as a potential solution.

1. Main Narrative Arc and Key Discussion Points

The conversation begins with a lighthearted introduction to Carolyn Shelby, including her unique role as Queen of the micronation, Ladonia. The discussion quickly pivots to the core technical topic: how search engine optimization (SEO) must evolve to interact with generative AI systems like ChatGPT and Google’s AI Overviews. The central theme is the shift from traditional crawling/indexing (via sitemap.xml and robots.txt) to explicitly guiding LLMs toward a website’s most authoritative content using the new llms.txt file.

2. Major Topics, Themes, and Subject Areas Covered

LLM Interaction: The difference between LLM training data and real-time inference queries.
Content Prioritization: Strategies for identifying and showcasing a website’s “best content” to AI agents.
Technical SEO Evolution: How traditional files (robots.txt, sitemap.xml) compare to the new directive file (llms.txt).
AI Adoption Challenges: The risk of LLMs relying on less reliable sources (like Reddit) if authoritative content isn’t easily accessible.
Micronations and Digital Sovereignty: A brief, engaging tangent on Ladonia, illustrating concepts of self-declared standards and community building.

3. Technical Concepts, Methodologies, or Frameworks Discussed

llms.txt File: A newly proposed, simple markdown file placed in the root directory, intended to act as a “treasure map” for LLMs, directing them only to the most valuable content on the site.
LLM Crawling vs. Inference: The distinction between the data used to train models and the real-time data retrieval during a user query (inference time). It was noted that LLM “crawlers” do not execute JavaScript or handle human interaction tasks (like toggling interfaces).
Structured Data/Schema: Mentioned as a historical example of a standard that gained uniform acceptance among search engines.
Content Strategy Analogy: The llms.txt file is likened to a roadmap for hub-and-spoke or pillar content strategies, but specifically for AI consumption.

4. Business Implications and Strategic Insights

Risk of Omission: If a brand fails to guide LLMs, the AI may pull answers from less reliable sources (e.g., Reddit), potentially leading to inaccurate technical support or brand representation.
Accessibility: Yoast developed the llms.txt generator because, while easy to create manually, many users are uncomfortable curating this specialized file.
Competitive Neutrality: The speaker suggests it might be beneficial that no single major player (like Google) is aggressively pushing llms.txt yet, preventing competitors from rejecting it purely based on its originator (unlike AMP or IndexNow).

5. Key Personalities, Experts, or Thought Leaders Mentioned

Carolyn Shelby (Yoast): Principal SEO, expert in technical and enterprise SEO, and the featured guest.
Jeremy Howard: Credited with proposing the llms.txt file format in September 2024.
Sponsors/Other Guests: Bruce Clay (Prewriter.ai), Ross Simmons, Brittany Muller, and Barry Adams (mentioned in housekeeping).

6. Predictions, Trends, or Future-Looking Statements

Yoast’s strategy is to deploy the generator across its 13 million users to rapidly influence the adoption curve and help establish llms.txt as an industry standard for AI communication.
It is predicted that true uniformity in how LLMs adhere to these files will be rare, similar to how different search engines utilize Schema markup differently.

7. Practical Applications and Actionable Advice Provided

How to Curate Content: Website owners should identify content that answers the most common questions asked about their brand, products, or services.
Data Sources for Curation: Use customer support logs, “People Also Ask” data, and Google Trends to determine which questions are most critical to answer authoritatively.
Accessibility Check: Ensure critical knowledge base content is not locked behind interfaces requiring human interaction or JavaScript execution, as LLM crawlers cannot bypass these barriers.

8. Controversies, Challenges, or Problems Highlighted

Adoption Curve: The primary challenge is getting LLM providers (like OpenAI, Google) to agree to respect and utilize this new file format consistently.
Internal Conflict: Website owners face the challenge of objectively deciding which of their own content is “worthy” of being highlighted in the llms.txt file.
Google’s Mixed Signals: The observation that Google appears conflicted on the health of the web, simultaneously promoting AI Overviews while also having news about its PPC revenue being impacted by them.

9. Context About Why This Conversation Matters to the Industry

This conversation is crucial because it addresses the next frontier of technical SEO: communicating directly with the AI systems that are increasingly mediating user access to information. As AI Overviews and generative search become dominant, the

🏢 Companies Mentioned

Everything I ✅ unknown

What AI ✅ unknown

Are Google AI Overviews ✅ unknown

Pro Tools ✅ unknown

AI Mode ✅ unknown

Cindy Crumb ✅ unknown

SEO News ✅ unknown

Google AI Mode ✅ unknown

Barry Adams ✅ unknown

Dana DiTamazo ✅ unknown

Will Reynolds ✅ unknown

SE Ranking ✅ unknown

Google Trends ✅ unknown

💬 Key Insights

"We got to stop thinking about them like spiders. We have to start thinking about them like goldfish. They swim over, they get what they need, they go back, they barf it up onto the floor, and then they forget instantly."

Impact Score: 10

"The LLMs, at least at inference time right now, have no memory. So the reason you see these huge spikes in server load from the LLMs coming back and hitting a page to answer a question is because every time someone asks that question, they have to go collect the data because they have no memory."

Impact Score: 10

"Look at the questions that people ask about your brand, and then look at the content that you have that answers those questions. You have to have some means or someone in your company that answers phone calls, like talk to your front-line customer support people."

Impact Score: 10

"I think it's almost a blessing if we don't get a single one large player behind it because we wouldn't want—I think you run the risk of alienating other people if there's one company that's like, 'Yeah, we're going to make this happen,' and then their competitors like, 'Just because of that, we are absolutely not going to do that.'"

Impact Score: 10

"We also understand that the LLMs are on a particular honor system of abiding by certain blockages... we don't have a regulatory environment where either the robots.txt file or this file is really abided by."

Impact Score: 10

"I see this as a kind of a watershed moment of how we can communicate to LLMs."

Impact Score: 10

📊 Topics

#artificialintelligence 102 #generativeai 12 #aiinfrastructure 9 #startup 2