Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU)
🎯 Summary
[{“key_takeaways”=>[“Deep learning is mysterious, but often not in the ways commonly believed; its phenomena can be understood through concepts like soft inductive biases and rigorous generalization frameworks.”, “The classical bias-variance trade-off is a misnomer; it is possible to achieve both low bias and low variance through methods like ensembling or, surprisingly, by building very large neural networks.”, “Scale in deep learning appears to induce a simplicity bias, meaning larger models are often more inclined towards simpler, better-generalizing solutions, as evidenced by phenomena like double descent.”, “Parameter counting is a poor proxy for model complexity; the properties of the induced distribution over functions are more critical.”, “The best approach to model construction is to honestly represent beliefs, embracing expressiveness while incorporating simplicity biases (like Occam’s Razor) as soft preferences rather than hard constraints.”, “Challenging conventional wisdom, such as the idea that model size should adapt to data availability, is crucial for scientific progress in AI research.”, “Understanding the ‘why’ behind model performance through scientific inquiry (theory and empiricism) leads to knowledge that outlives specific, rapidly obsolete engineering solutions.”], “overview”=>”Professor Andrew Gordon Wilson argues that the perceived mystery of deep learning often stems from fundamental misconceptions about generalization, model construction, and the bias-variance trade-off. He posits that deep learning’s power lies in its relative universality and effective representation learning, often exhibiting a surprising simplicity bias even in highly over-parameterized models. This perspective suggests that embracing expressiveness alongside soft simplicity biases, rather than hard constraints, is a more principled approach to building robust AI systems.”, “themes”=>[“Demystifying Deep Learning”, “Inductive Biases and Model Construction”, “The Bias-Variance Trade-off Reconsidered”, “The Role of Scale and Over-parameterization (Double Descent)”, “Scientific Approach vs. Engineering Iteration”, “Honest Representation of Beliefs in Modeling”]}]
🏢 Companies Mentioned
💬 Key Insights
"I think the bias variance trade-off is an incredible misnomer. There doesn't actually have to be a trade-off."
"Quite often, you can increase model expressiveness while simultaneously increasing its biases. So larger models are often more inclined towards simple solutions."
"This involves work on understanding inductive biases, so what assumptions we should be making. And so this relates to symmetries, like equivariances, maybe we're modeling molecules, their rotation invariant, images could be translation invariant."
"I think the bias variance trade-off is an incredible misnomer. There doesn't actually have to be a trade-off."
"It's completely fine to build a huge model that will also have a stronger bias for simple solutions, have more of an Occam's razor-like behavior than even smaller models."
"Whereas instead, I think if we just embrace the honest belief that there are many possible solutions, even if they're not probable for any given problem, combined with this sort of simplicity bias, we won't tend to overfit. And interestingly, the prescription is almost the opposite of what people think it perhaps should be in principle. Like build a smaller model is usually the prescription for avoiding overfitting."