Building AI-Ready Cultures in Life Sciences R&D - with Xiong Liu of Novartis
π― Summary
Podcast Episode Summary: Building AI-Ready Cultures in Life Sciences R&D - with Xiong Liu of Novartis
This 30-minute episode features Xiong Liu, Director of Data Science in AI at Novartis, discussing the paradigm shift in Life Sciences R&D driven by Generative AI (GenAI) and Foundation Models (FMs), and the cultural and technical challenges of scaling these technologies beyond pilot projects.
1. Focus Area
The primary focus is the transition in Life Sciences R&D from traditional, task-specific Machine Learning (ML) to leveraging large-scale Foundation Models (like LLMs) for accelerating drug discovery, molecular design, and clinical protocol optimization. Key themes include the architectural shift in AI, the necessity of domain-specific fine-tuning, and the organizational alignment required for successful AI adoption.
2. Key Technical Insights
- Paradigm Shift to Foundation Models: The discussion contrasts older ML (requiring labeled, proprietary data for specific tasks) with FMs, which are pre-trained on massive, domain-relevant datasets (like molecular data or public text corpora). This allows smaller, specific datasets to benefit from generalized knowledge captured by the FM, enabling downstream tasks even with limited proprietary data.
- BERT vs. GPT Architectures in Life Sciences: The evolution from models like BERT (excellent for representation/embeddings) to GPT-like models (leveraging the decoder for data generation) is highlighted. This enables the generation of novel sequences, such as new molecular structures represented by SMILES strings or graphs, constrained by desired properties (e.g., toxicity, solubility).
- Domain-Specific Fine-Tuning: While FMs provide a powerful baseline, effective application in life sciences requires fine-tuningβadjusting the modelβs weights using proprietary, high-quality domain data to personalize predictions for specific indications (e.g., lung cancer pathways).
3. Business/Investment Angle
- Scaling Challenge: Despite high investment interest (73% of leaders investing), few life sciences organizations (under 20%) have successfully scaled GenAI beyond pilots, indicating significant hurdles in implementation and ROI validation.
- Data Dependency vs. Model Availability: While foundational architectures are often open-source or publicly available, the bottleneck shifts to acquiring and integrating high-quality, domain-specific experimental data (like single-cell RNA-seq) needed for effective fine-tuning and validation.
- ROI and Evaluation: Establishing clear, domain-driven evaluation metrics is crucial for measuring the ROI of AI-driven research, especially given the risk of model hallucination in complex biological outputs that cannot be instantly verified like text or code.
4. Notable Companies/People
- Xiong Liu (Novartis): The guest, providing insights from his role as Director of Data Science in AI, emphasizing practical implementation and cultural readiness within a major pharmaceutical company.
- Google/OpenAI (Implied): Mentioned in the context of the foundational Transformer architecture breakthrough that underpins modern LLMs.
- Deloitte: Referenced for a recent survey highlighting the gap between GenAI investment and successful scaling in the life sciences sector.
5. Future Implications
The industry is moving toward a hybrid model where public FMs are leveraged, but significant in-house expertise is needed to build, fine-tune, and validate domain-specific models. The future success of AI adoption hinges not just on technology but on building AI-ready cultures characterized by tight alignment and effective communication between AI professionals, basic scientists, and leadership. The concept of using FMs to model biological systems is closely linked to accelerating concepts like digital twins in drug development.
6. Target Audience
This episode is highly valuable for AI/Tech Leaders in Pharma/Biotech, R&D Directors, Data Science Managers, and Strategy Executives navigating the practical adoption, cultural integration, and investment justification for advanced AI/ML technologies in the highly regulated life sciences domain.
π’ Companies Mentioned
π¬ Key Insights
"a way of phenomenon is about hallucination, right? So, when those models, AI models, they generate new outputs, they probably seemingly like answers. But if you delve into that... But when it goes to the bio-medical, biochemistry, etc. For example, it generates a new set of genes, a new molecule, is this something that you cannot be answered or validated instantly?"
"I mean, that is potentially possible. But unfortunately, to get those, you know, single-cell RNA-seq data, all those kind of more, you know, experiment-driven data, probably we're not exactly there yet, because, you know, we still rely on those traditional biotechnology methods to collect those data through experiments on cell lines, on human tissues, etc. So, there's, you know, where AI, the architecture, the concept there, but the, sometimes we are limited by the, you know, experimental side."
"So now, certainly, it can allow you to generate new sequence, new graphs. And really, yeah, additionally, also based on your constraints, your requirements, your domain-specific knowledge. So, for example, you can generate many different types of new molecules, but you can add constraints like have, you know, better toxicity, solubility, etc. It can allow you to, you know, do that simultaneously."
"People have been already applying those models. I just give some examples about using foundation models, GenAI to generate new cell types and gene expressions so that we can do all kinds of in silico, you know, treatment methods to see, to observe the predicted phenotypes, etc."
"When we delve into those domain-specific use cases, we probably can better understand the benefit of foundation models. So, for example, when we study the disease pathways and identify new targets for specific indications... But now with foundation models, it has a lot of pre-trained models based on publicly available molecular data. So, for example, cell atlases, etc."
"The benefit is that even if you have very small data for your own machine learning tasks, you can leverage those foundation models because they already capture some relevant information, although it may not be specifically to your data. So that's why they are called foundation models. They could be useful for many downstream tasks."