AI Apps Are Broken — Here's How To Fix Them

Unknown Source May 23, 2025 30 min

artificial-intelligence startup generative-ai ai-infrastructure google openai

🎧 Listen to Original

33 Companies

65 Key Quotes

4 Topics

1 Insights

🎯 Summary

Podcast Episode Summary: AI Apps Are Broken — Here’s How To Fix Them

This 30-minute episode of “The Breakdown” features Y Combinator Partner Pete Cumin, who argues that current AI applications, particularly those integrated into existing software like Gmail, are fundamentally flawed because they rely on outdated software development paradigms rather than leveraging the full potential of modern LLMs.

1. Focus Area

The discussion centers on the design and implementation of AI-powered applications and agents, contrasting the “superhuman” feeling of using raw LLM tools (like coding assistants) with the frustrating, chore-like experience of many existing, integrated AI features (e.g., Gmail’s draft writer). The core theme is moving beyond the “AI horse-drawn carriage” mentality to build truly transformative AI software.

2. Key Technical Insights

The System Prompt as the Core Program: The hidden, generic system prompt used by developers (like in Gmail) acts as the “one-size-fits-all” code, enforcing a safe, lowest-common-denominator output that fails to capture individual user context or tone.
User-Editable System Prompts Enable Personalization: Allowing users to see and edit the system prompt transforms the AI from a generic tool into a personalized agent. By defining their own persona and rules (e.g., “You are Pete, keep emails short”), users can achieve outputs that sound authentic and match their mental model.
Coding Agents Lead the Way: Tools like Cursor and Windsurf are far ahead because they treat the LLM as a powerful text processor capable of translating detailed English descriptions (prompts) directly into functional code, a domain where current models excel.

3. Business/Investment Angle

The “AI Horse-Drawn Carriage” Trap: Companies integrating AI by simply “slotting it in” to existing UIs (like wrapping a website in a mobile app) are missing the opportunity for true disruption. Investment should target applications built from the ground up around AI’s capabilities.
Shift in Liability and Control: When users control the system prompt, the liability for the output shifts from the application developer (who must enforce safety) to the user, potentially unlocking more powerful, less constrained applications across various professional domains (accounting, legal).
The Next “Cursor Moment”: The industry is waiting for the moment when non-technical professionals in every field (accountants, lawyers) have a tool that allows them to program their workflows using natural language instructions, mirroring the productivity gains seen by developers.

4. Notable Companies/People

Pete Cumin (YC Partner, Founder of Optimizely): The central voice, author of the essay criticizing current AI app design.
Google/Gmail (Gemini Integration): Used as the primary negative example of a powerful model hidden behind a frustrating, overly cautious, and generic UI/system prompt.
Cursor and Windsurf: Cited as positive examples of tools that give users direct, powerful access to the underlying models, leading to a “superhuman” development experience.

5. Future Implications

The industry is moving toward user-driven programming where the system prompt becomes the accessible “code” for non-technical users. Future successful AI applications will treat the LLM as a malleable agent that users train iteratively, moving away from the current model where developers must anticipate every user need upfront. This shift requires new UI conventions for training, feedback, and prompt editing.

6. Target Audience

This episode is highly valuable for AI Product Managers, Software Engineers, Founders, and Venture Capitalists focused on the application layer of generative AI. It provides a critical framework for evaluating the next generation of AI software beyond simple feature additions.

🏢 Companies Mentioned

Sundar or Pichai ✅ big_tech

Facebook ✅ big_tech

Optimizely ✅ ai_application

And Tom ✅ unknown

Or I ✅ unknown

Whereas I ✅ unknown

If I ✅ unknown

Claude Codes ✅ unknown

So I ✅ unknown

Which I ✅ unknown

Hi Gary ✅ unknown

What I ✅ unknown

And I ✅ unknown

These AI ✅ unknown

But I ✅ unknown

💬 Key Insights

"I hope that developers stop treating these prompts like black boxes. Being able to look at the ground truth of what this agent is being instructed to do on my behalf is incredibly valuable."

Impact Score: 10

"there's a higher level of abstraction on top of system prompt writing that you shouldn't actually have to go and edit the system prompt that you're able to like nudge or like say, no, no, that, okay, here's a new term sheet."

Impact Score: 10

"My big contention here is that because this is all just English, we no longer have to treat these things like black boxes. The system prompt is almost like this, this document that you don't have to edit if you don't want to, but in the limit, if there's something really goes wrong, you can go in and tinker with it if you want."

Impact Score: 10

"Well, I think the interesting point is if you do change the model where the user is in charge or at least has access to the system prompt, then the repercussions of that are on the user and not on the company that built the tool."

Impact Score: 10

"And to me, like we will have caught up when everybody has the same experience that we have when we're using these coding agents in their particular domain, right? And so when accountants can build accounting agents... when lawyers can be lawyering agents... it's basically every profession I think will have its Cursor moment or its Windsurf moment."

Impact Score: 10

"This is really the code. This is the programming you are doing for this agent. But if you read it, it's pretty accessible, right? It says, if it's a tech-related email, label it tech. If it's somebody trying to sell me something, archive it. And this is like a great example of how the LLM technology is actually good enough to let non-programmers program these apps."

Impact Score: 10

📊 Topics

#artificialintelligence 127 #startup 4 #aiinfrastructure 1 #generativeai 1