EP 536: Agentic AI - The risks and how to tackle them responsibly
🎯 Summary
Podcast Episode Summary: EP 536: Agentic AI - The risks and how to tackle them responsibly
This episode of the Everyday AI Show features Sarah Bird, Chief Product Officer of Responsible AI at Microsoft, discussing the significant shift in responsible AI practices necessitated by the rise of Agentic AI and Multi-Agentic Systems. The core narrative revolves around how moving from simple human-chatbot interaction to autonomous, task-executing agents drastically increases the surface area for risk, demanding new governance, testing, and monitoring paradigms.
1. Focus Area: The discussion centers on Responsible AI (RAI) governance, security, and testing specifically tailored for Agentic AI and Multi-Agentic Orchestration systems, contrasting these new challenges with the more straightforward RAI concerns of earlier generative AI models.
2. Key Technical Insights:
- Shift in Human Oversight: The paradigm is moving from humans being “in the inner loop” (checking every small step) to being “in the outer loop” (monitoring long-running tasks and intervening when alerts trigger).
- Component-Based Governance: Multi-agent systems, often used for task decomposition (assembly line style), must be governed at the individual agent level rather than just the system boundary, requiring granular visibility.
- Agent Identity and Security: Agents must be treated as new entities requiring robust security protocols, similar to users or devices. Microsoft is addressing this with tools like Entra Agent ID for tracking and access control.
3. Business/Investment Angle:
- High Adoption Rate: Agent deployment is imminent, with 81% of employers looking to deploy agents alongside their workforce in the next 18 months, signaling a major operational shift.
- Testing as a Prerequisite for Trust: Organizations are often delaying deployment because they underestimate the necessary testing rigor, highlighting an immediate need for mature testing frameworks to build organizational trust.
- Focus on Traditional Security: A significant portion of the new RAI effort involves applying established security patterns (like access control and threat monitoring) to agents, indicating that securing these entities is as crucial as addressing novel AI risks.
4. Notable Companies/People:
- Sarah Bird (Chief Product Officer of Responsible AI, Microsoft): The expert guest detailing Microsoft’s approach to building tools and frameworks for governing agentic systems.
- Microsoft: Mentioned for recent announcements at Build, including Foundry Observability (for monitoring agent performance and task adherence) and Entra Agent ID.
- Microsoft Research: Mentioned for developing the Agentic UI, a research system exploring optimal human-agent interaction patterns.
5. Future Implications: The industry is moving toward complex, autonomous workflows where agents operate for extended durations without direct human input. This necessitates innovation in human-AI interfaces (Agentic UI) to effectively communicate aggregate performance data and intervention points to humans whose skill sets must adapt to monitoring performance metrics rather than individual outputs.
6. Target Audience: AI/ML Professionals, Product Managers, Security/Governance Officers, and Business Leaders involved in deploying or scaling AI solutions, particularly those moving beyond simple chatbots into autonomous workflow automation.
Comprehensive Summary
The podcast episode addresses the critical evolution of Responsible AI (RAI) as the industry transitions from basic conversational AI to sophisticated Agentic AI and Multi-Agentic Systems. Host Jordan Wilson opens by noting that the complexity of agents—which can take actions on behalf of users for extended periods—fundamentally changes governance requirements compared to single-turn interactions.
Sarah Bird, Microsoft’s CPO of Responsible AI, confirms this shift, noting that while awareness of RAI has dramatically increased, the technical challenge has grown because agents introduce a larger “surface area” for potential failure or misuse. When agents execute tasks autonomously, the traditional mitigation of having the human “in the loop” for every step is lost, pushing humans to the “outer loop” for monitoring and intervention.
Bird clarifies that multi-agent orchestration, in practice, often involves task decomposition, where one large goal is broken into smaller, manageable subtasks assigned to specialized agents. This componentized approach is beneficial for RAI because each agent can be individually tested and governed. However, this complexity requires robust observability. Microsoft’s Foundry Observability tool is highlighted as essential for monitoring individual agent behavior, ensuring they stay on task, use the correct tools, and adhere to user intent, which is crucial for debugging and maintaining security practices like least privilege access across the system.
A major theme is the need to secure agents as first-class entities within organizational IT infrastructure. Bird notes that the focus has broadened from novel AI risks (like harmful content) to ensuring agents are governed using traditional security frameworks. The introduction of Entra Agent ID exemplifies this, ensuring agents are tracked and controlled just like human users or devices.
The discussion also touches on the evolving role of the human supervisor. As agents work longer, humans must shift from micro-management to interpreting aggregate data (e.g., “Is the system performing correctly 99.8% of the time?”). This requires new interfaces, like the research-based Agentic UI, designed to surface meaningful intervention points based on performance metrics rather than raw output review.
Finally, Bird categorizes risks into three main areas, referencing the International Safety Report framework: Malfunctions (e.g., going off-task, data leakage, prompt injection), and Misuse (which includes both user misunderstanding and malicious intent). The conversation underscores that building trust in these powerful, autonomous systems hinges on rigorous, co-developed testing that starts early in the development cycle, not as a last
🏢 Companies Mentioned
đź’¬ Key Insights
"What is maybe the one most important takeaway that you have for business leaders when it comes to understanding the risk of agentic AI and doing it responsibly? Yeah, I think you have to go into it eyes open. You need to know that there are risks and understand the risks and pick use cases"
"This is my agency, this critical thinking. So how do we need to get the workforce ready for working hand-in-hand with agentic AI?"
"But that is a different way of working, and so preparing the workforce to actually be ready for this new skill set and collaborate with these tools, that's some systemic type of risk that we need to go and address."
"The last risk, and then something that is very top of mind for me, is the systemic risk with AI."
"It could be that it's leaking sensitive data accidentally, right? And those are some of the big ones we see with agents. It's vulnerable to prompt injection attacks, right? Those are all types of malfunctions."
"a new Entra Agent ID so that agents can be tracked in your system just like anything else and making sure that we're connecting this governance and security playing with what the developer is doing so that when I build an agent in Foundry or in Copilot, it just has an identity already attached so I've done the right thing for my organization..."