LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

Unknown Source June 17, 2025 60 min

artificial-intelligence investment generative-ai ai-infrastructure google

🎧 Listen to Original

22 Companies

92 Key Quotes

4 Topics

3 Insights

🎯 Summary

Podcast Summary: LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

This episode of the Twomo AI podcast features an in-depth discussion with Ben Wellington, Deputy Head of Feature Forecasting at Two Sigma, focusing on how Large Language Models (LLMs) and Generative AI are revolutionizing the creation and utilization of predictive features for quantitative equity trading.

1. Focus Area

The primary focus is the application of advanced NLP/GenAI techniques (specifically LLMs) to financial data analysis for the purpose of generating novel, high-signal features used in quantitative investment models. The discussion bridges Ben Wellington’s background in traditional NLP with the current paradigm shift driven by data-centric AI in finance.

2. Key Technical Insights

The ROI Revolution in Feature Engineering: LLMs have drastically reduced the cost (from potentially months of specialized engineering to minutes) required to test complex hypotheses derived from unstructured data (e.g., analyzing video content for non-verbal cues like “nose touching” during CEO interviews). This lowers the barrier to entry for exploring previously intractable features.
Shift from Syntactic/One-Hot Encoding to Embeddings: The evolution from older NLP methods (like counting specific words or using one-hot encoding) to using dense vector embeddings allows models to capture semantic relationships between concepts (e.g., understanding that “innovate” and “creative” are related), leading to better generalization in prediction models.
The Importance of Raw, Historical Data Capture: A core philosophy at Two Sigma is the imperative to record the rawest form of data possible (e.g., raw video feeds, unedited news wires). This “time capsule” approach ensures that future, yet-to-be-invented analytical techniques (like advanced vision models) can be applied retrospectively to historical data, a capability that is impossible if only derived features are saved.

3. Business/Investment Angle

Feature Forecasting as the Core Business: Two Sigma’s objective is to predict future asset prices by quantifying the world into millions of observable “features.” Feature forecasting is the dedicated process of discovering, quantifying, and validating these signals.
The Value of Holistic Data Capture: The firm prioritizes capturing data across all traded entities simultaneously (e.g., tracking job postings for every company), recognizing that a holistic, cross-sectional view often yields more predictive power than siloed data sets.
Competitive Edge in Data Provenance: Having proprietary, time-stamped historical records (like unedited news feeds) that competitors lack provides a significant edge, as this data allows for testing hypotheses that others cannot validate historically.

4. Notable Companies/People

Ben Wellington (Two Sigma): Deputy Head of Feature Forecasting, expert in NLP, driving the integration of GenAI into feature discovery.
Two Sigma: The quantitative investment manager where this work is applied, focused on using data science to predict asset prices.
NYU: Mentioned as the location where Ben Wellington pursued his PhD in machine translation, highlighting the historical context of NLP research.

5. Future Implications

The conversation suggests a renaissance in feature creation. As the technical overhead for extracting complex signals from unstructured data plummets due to LLMs, researchers will shift from prioritizing technically feasible ideas to pursuing any hypothesis that seems potentially valuable, regardless of initial complexity. This democratization of feature engineering will likely lead to a rapid expansion of the feature space used in quantitative finance.

6. Target Audience

This episode is highly valuable for AI/ML Engineers, Quantitative Researchers (Quants), Data Scientists working in finance, and technology leaders interested in the practical, high-stakes application of Generative AI beyond consumer-facing products.

🏢 Companies Mentioned

NYU ✅ ai_research

Scott Stevenson ✅ unknown

When I ✅ unknown

Should I ✅ unknown

As I ✅ unknown

And LLMs ✅ unknown

Could I ✅ unknown

But I ✅ unknown

Help Wanted ✅ unknown

Penn Treebank ✅ unknown

And I ✅ unknown

Ben Wellington ✅ unknown

Sam Charrington ✅ unknown

Twomo AI ✅ unknown

Maybe I ✅ unknown

💬 Key Insights

"...which when you combine orthogonal signals, you get a much smoother response than when they're correlated signals."

Impact Score: 10

"I'm not always looking at the best at things; I'm looking for a group of things that each have their own take that when I average out among them, I'm better off and more robust in the future than had I just picked one."

Impact Score: 10

"there's not going to be a horse to bet on. You're going to be well-suited to have a diversified set of inputs to build interesting things, and you need to be comfortable using a wide array of technologies, not just betting on a single one."

Impact Score: 10

"So, it is kind of scary for us to use an off-the-shelf model that's been trained on data in 2020 to ask questions from a document of 2019, right? So, if I could say, 'Hey, here's this Enron conference call. Do you think it's good or bad?' Is the word 'Enron' going to trigger a negative reaction because somewhere deep in the psyche of the LLM, there was a big bankruptcy?"

Impact Score: 10

"That's an exact example where somebody had forced that hop [intermediate text step], you would actually have a less good system than we have today when they said, 'Oh, look, let me remove these abstractions that humans have added and just let the system go with enough data.'"

Impact Score: 10

"The things that you build need to be plug-and-playable with the changing world."

Impact Score: 10

📊 Topics

#artificialintelligence 80 #investment 8 #aiinfrastructure 2 #generativeai 2

🧠 Key Takeaways

💡 be so naive that we can turn away anything that we don't understand