AI Agents Can Code 10,000 Lines of Hacking Tools In Seconds - Dr. Ilia Shumailov (ex-GDM)
🎯 Summary
Podcast Summary: AI Agents Can Code 10,000 Lines of Hacking Tools In Seconds - Dr. Ilia Shumailov (ex-GDM)
This 61-minute episode features Dr. Ilia Shumailov, a former DeepMind researcher now focused on AI security, discussing the radical shift in adversarial thinking required by the advent of highly capable AI agents. The core narrative moves from the current state of LLM vulnerabilities (like prompt injection) to a future where agentic systems necessitate entirely new security paradigms, moving beyond traditional human-centric controls.
1. Focus Area
The primary focus is AI Security and Safety in the context of Agentic Systems. Specific topics include: the failure modes of large models compared to older systems, the threat posed by agents capable of rapid, complex code generation (e.g., hacking tools), the inadequacy of traditional security models (like those based on human rationality), and a novel approach to trusted computation that leverages ML models instead of traditional cryptography.
2. Key Technical Insights
- Evolving Failure Modes: Modern, highly capable LLMs fail differently than models from five years ago. While they are more robust against traditional adversarial examples derived from gradient information, they become significantly more vulnerable to simple rephrasing and instruction manipulation (like indirect prompt injection), especially as their instruction-following capabilities increase.
- Agents as Worst-Case Adversaries: AI agents surpass traditional worst-case adversaries (like an irrational child) because they can operate 24/7, possess near-total system knowledge, and can generate massive amounts of complex malicious code (e.g., 10,000 lines of hacking tools) almost instantly, invalidating assumptions based on human limitations.
- ML for Trusted Computation: Shumailov proposes using ML models (like Jemma) as “resettable trusted third parties” to solve problems like the Millionaire Problem (private comparison of values) without relying on complex, expensive cryptographic protocols. This approach relies on verifiable inference rather than mathematical proof.
3. Business/Investment Angle
- New Security Market: The shift from human-centric security to agent-centric security creates a massive, urgent need for new tooling that provides fine-grained control, transparency, and verifiable constraints around agent execution and data access.
- Personalized Agents Require New Controls: The coming wave of personalized AI models, which will handle sensitive user data, cannot be secured by current methods (like embedding rules in prompts). Investment is needed in system-level orchestration and access control layers built around the models.
- Decoupling Logic from Data: The proposed security architecture (like the Camel system) allows for the creation of generic, off-the-shelf agent logic modules that can be safely attached to disparate, user-specific private data sources via strict, formally verifiable data flow policies.
4. Notable Companies/People
- Dr. Ilia Shumailov: Former DeepMind researcher, now building security tooling for agentic fleets. His background spans both machine learning and security (PhD under Ross Anderson).
- DeepMind: Mentioned as the location of his previous work on ML security, including defending Gemini against indirect prompt injections.
- Camel System: The proposed framework (detailed in the “Diffusion Prompt Injections by Design” paper) that uses formal semantics (like Python code) to define execution graphs and enforce static/dynamic policies on data flow between tools and data sources.
5. Future Implications
The industry is moving toward a state where security tooling must be built for non-human, highly capable adversaries. Traditional security assumptions based on human rationality, time constraints, and physical penalties are obsolete. The future of secure AI integration depends on building robust, external orchestration and policy enforcement layers (like Camel) around foundation models, rather than trying to fix the models themselves.
6. Target Audience
AI/ML Engineers, Cybersecurity Professionals (especially those dealing with emerging threats), CTOs, and Security Architects. This content is highly technical and strategic, focusing on the fundamental breakdown of existing security models due to agentic capabilities.
🏢 Companies Mentioned
💬 Key Insights
"when we talked about model collapse, we kind of referred to two phenomena happening at the same time. One of them was the tails are shrinking and basically improbable events become more improbable. And then the second phenomena was over time and this accumulates with fails."
"the actual issue is the fact that before you develop security tooling, you really need to have something to secure because every single, small detail changes how you build security systems. So unless you know everything about the system and it's kind of frozen in time, you can't really build security. And by the time you have something to secure, it's already too late."
"Is there one thing that all of the frontier labs could implement that would improve the security of their models? No, I don't, I don't think this exists today. Like we don't know how to solve problems. I think this is the honest answer. Is that for most of the issues we have today, we just don't have a solution."
"we can change the architecture of the model such that they become sensitive to certain tokens when you supply them to a transformer, that when you supply them, they start using the memory in the wrong way. So like they start routing, for example, data from one user to another user."
"We have written a whole new branch of literature on what we call like architectural backdoors, where you don't actually hide malicious functionality in parameters of the models. Instead, you hide it in the structure of the model itself, like a structure so that even if you find you in the model, it still has the same baseline behavior."
"What this thing does is they say, oh, for some models, when you load them, you actually want to load the latest, the latest representation from an external machine. What this thing does is literally remote code loaded on your machine, executed on your machine, loaded on top of stuff."