Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747
🎯 Summary
[{“key_takeaways”=>[“Benchmark performance often decouples from real-world utility, as models optimized for benchmarks can fail spectacularly under slight distribution shifts.”, “Catastrophic Overtraining: Models trained on excessive data (high token-to-parameter ratio) can become worse starting points for fine-tuning than earlier checkpoints, even when trained on high-quality data.”, “This overtraining effect is observed in both fine-tuning for specific tasks and in efficiency-driven quantization, where increased pre-training data makes the resulting model more brittle.”, “The brittleness caused by overtraining leads to catastrophic forgetting or poor adaptation when the model is pushed in new directions (e.g., via fine-tuning or quantization noise).”, “Current unlearning and safety alignment methods often fail because the assumption that harmful or private knowledge is neatly localized in specific neurons is generally false in standard LLMs.”, “Memorization Sinks propose an architectural modification during pre-training to explicitly encourage the disentanglement of document-specific knowledge into dedicated ‘sink’ neurons, enabling targeted removal (unlearning) without destroying shared knowledge.”], “overview”=>”This episode features Aditi Raghunathan discussing the critical limitations of current LLM pre-training paradigms, particularly the over-reliance on benchmark performance which masks failures in real-world deployment and fine-tuning. Her research highlights that models optimized purely for massive pre-training data can become brittle, leading to ‘catastrophic overtraining’ where increased compute actively degrades adaptability for downstream tasks like safety alignment or personalization. The discussion explores novel approaches, such as ‘Memorization Sinks,’ designed to enforce disentanglement during training to enable reliable unlearning and better model customization.”, “themes”=>[“Limitations of Current LLM Pre-training and Scaling Laws”, “Benchmark Inadequacy vs. Deployment Robustness”, “Catastrophic Overtraining and Model Brittleness”, “Adaptability and Fine-Tuning Challenges”, “Unlearning, Safety Alignment, and Knowledge Isolation”, “Architectural Interventions for Better Model Control (Memorization Sinks)”]}]
🏢 Companies Mentioned
💬 Key Insights
"So instead of waiting for it to happen by magic, we are like, let's try to actually enforce that by design. And we find in our paper, both in experiments and analysis, that normal training does not actually lead to this assumption to be true..."
"What we show is that that actually is not true—that there's no reason in how we have framed these models that encourages or that should allow this information to be disentangled in this nice way."
"So if you take a small model and keep training on a lot of data, we see that eventually the model that has been trained on more data, that you've thrown more compute at, is worse as a starting point for fine-tuning than an earlier checkpoint that you had."
"And I think the part that's also concerning is when this lack of understanding becomes a real issue, when people think about getting models to be safe in some way... And a lot of our guardrails around these things are very brittle, and we currently don't have a way to do better because we don't really understand these systems."
"We measure performance on a benchmark and if that's all we care about, it seems like we can do really well because if you collect data that looks like the data you want to do well on, you can just throw a lot of compute at it, but does that actually solve the task if we just test it in a slightly different way that is also meaningful from a deployment perspective?"
"And the stuff that's very specific to a particular document is in these memorization neurons, and since those are not updated on any other documents, that information is sort of preserved, disentangled, kept aside. And at test time, you can just drop out these memorization neurons, and then you're good to go."