Teaching Models to Forget Information

Unknown Source October 06, 2025 33 min

artificial-intelligence ai-infrastructure google apple meta openai

🎧 Listen to Original

29 Companies

51 Key Quotes

2 Topics

2 Insights

🎯 Summary

Podcast Episode Summary: Teaching Models to Forget Information

This 32-minute episode of “Does Compute” features a discussion with Virginia Smith, Leonardo Associate Professor of Machine Learning at Carnegie Mellon University, focusing on the critical challenges of safety, privacy, and reliability in large-scale machine learning systems, particularly Large Language Models (LLMs). The central narrative revolves around the gap between the perceived capabilities of current AI safety techniques (like watermarking and unlearning) and their actual robustness in practice.

1. Focus Area

The primary focus is AI Safety and Machine Learning Robustness, specifically addressing immediate societal harms caused by current models, including the generation of misinformation/disinformation and privacy violations (data memorization). The discussion heavily covers technical solutions being researched to mitigate these risks, such as Federated Learning, Differential Privacy, AI Watermarking, and Machine Unlearning.

2. Key Technical Insights

Federated Learning & Personalization for Privacy: Federated Learning allows for privacy-preserving model training across decentralized data silos (like mobile phones). Effective implementation often requires model personalization—developing models tailored to unique user behavior—which improves fairness, robustness, and accuracy, rather than relying on a single monolithic model.
Joint Differential Privacy (JDP): JDP is a necessary relaxation of traditional Differential Privacy (DP) when implementing personalized federated learning. It provides meaningful privacy guarantees against the rest of the network while acknowledging that a user’s personalized model should retain information about that user’s own data.
Fragility of AI Safety Techniques: Current state-of-the-art techniques like Watermarking are easily defeated (either by removal or by malicious actors using robust watermarks to frame model developers). Similarly, Machine Unlearning (teaching models to forget specific data, like copyrighted or unsafe material) appears effective only on narrow benchmarks; slight perturbations (like fine-tuning on benign related data) can cause the model to perfectly reproduce the “unlearned” information, suggesting the knowledge remains latent in the network.

3. Business/Investment Angle

Regulatory Preemption: Global companies (Google, Apple, Meta) are actively deploying privacy-preserving ML techniques like Federated Learning ahead of anticipated regulation, indicating a strategic move to establish robust internal frameworks.
Risk of Misattribution and Reputational Damage: The fragility of watermarking presents a business risk where malicious actors could easily generate degraded or unsafe content, falsely attribute it to a major model developer via the watermark, and damage their reputation.
Safety by Design Imperative: There is a critical need for better understanding and auditing of training data before model release, as current data scraping practices (especially from the open internet) introduce significant safety liabilities (e.g., exposure to CSAM).

4. Notable Companies/People

Virginia Smith (CMU): The expert guest, whose research focuses on safety, optimization, and distributed systems, particularly in privacy-preserving ML.
Thorn: An organization mentioned that collaborates with researchers on child safety issues, highlighting the immediate, real-world harm caused by generative models being used to create child sexual abuse material (CSAM).
Google, Apple, Meta: Mentioned as major players deploying Federated Learning at scale on mobile devices for applications like next-word prediction.

5. Future Implications

The industry is heading toward a realization that current “patch” solutions for safety (watermarking, unlearning) are insufficient because they do not address the fundamental lack of understanding regarding how information is stored and retrieved within large models. Future efforts must focus on “safe by design” principles, starting with rigorous auditing of training data and developing fundamentally more transparent and reliable methods for controlling model knowledge. The immediate future suggests an escalation of misuse, particularly in areas like child safety and disinformation, outpacing the effectiveness of current countermeasures.

6. Target Audience

This episode is highly valuable for AI/ML Researchers, Data Scientists, AI Ethics and Policy Professionals, and Technology Executives concerned with the governance, risk management, and long-term viability of deploying large-scale generative models.

🏢 Companies Mentioned

SILAP ✅ ai_research

Harry Potter ✅ unknown

So I ✅ unknown

What I ✅ unknown

But I ✅ unknown

If I ✅ unknown

And Federated Learning ✅ unknown

Federated Learning ✅ unknown

And I ✅ unknown

Carnegie Mellon University ✅ unknown

Machine Learning ✅ unknown

Leonardo Associate Professor ✅ unknown

Virginia Smith ✅ unknown

Computer Science ✅ unknown

Carnegie Mellon University School ✅ unknown

💬 Key Insights

"if we think we could use this to unlearn, you know, sort of unsafe behaviors, I think it actually could be quite easy to just interact with the model and get back that unsafe information."

Impact Score: 10

"a lot of the information is actually still retained in the network in various ways in the model. So it might sort of suppress that particular output, but it doesn't mean that we've actually unlearned that information."

Impact Score: 10

"And I think this has real concerning ramifications in terms of actually using unlearning for something maybe less benign than learning about Harry Potter. But if we think we could use this to unlearn, you know, sort of unsafe behaviors, I think it actually could be quite easy to just interact with the model and get back that unsafe information."

Impact Score: 10

"And so what this tells me is that we don't yet have a great understanding of how these models are actually working, and that in particular for unlearning, that a lot of the information is actually still retained in the network in various ways in the model. So it might sort of suppress that particular output, but it doesn't mean that we've actually unlearned that information."

Impact Score: 10

"But if you do any one of the following: if you perturb in various ways the questions that are being asked of the models, the prompts that are being given, or if you take the model and perturb that, so you maybe fine-tune it even on entirely benign knowledge, you can see that it can now reproduce that supposedly unlearned information."

Impact Score: 10

"Now, the flip side to this, the downside to these robust watermarks is that it can be very easy then to take an image or take text and to manipulate it in various ways and have it still appear watermarked. And you may wonder why this is a problem. But I think this is a real issue for incentivizing the model developers to actually produce these solutions."

Impact Score: 10

📊 Topics

#artificialintelligence 43 #aiinfrastructure 2

🧠 Key Takeaways

💡 be thinking of before we just throw up our hands and go, "It's too much, it's too stressful, too big to get ahead of