📆 ThursdAI - Oct 16 - VEO3.1, Haiku 4.5, ChatGPT adult mode, Claude Skills, NVIDIA DGX spark, Wordlabs RTFM & more AI news
🎯 Summary
ThursdAI Podcast Summary: VEO3.1, Haiku 4.5, ChatGPT Adult Mode, and Scientific Breakthroughs (Oct 16)
This episode of ThursdAI covered a dense week of major announcements across open-source models, large commercial LLM updates, significant hardware developments, and a groundbreaking application of AI in scientific discovery. The overarching theme was the rapid maturation of AI capabilities, particularly in multimodal generation and specialized scientific reasoning.
1. Focus Area
The discussion centered primarily on Artificial Intelligence and Machine Learning, with deep dives into:
- Large Language Models (LLMs) & APIs: Updates from Anthropic (Haiku 4.5), OpenAI (Adult Mode, Memory Management), and Microsoft (Windows 11 Copilot).
- Video Generation: The release of Google DeepMind’s VEO 3.1 and updates to competitors like Sora Pro.
- Open Source Models: New releases from Qwen (smaller VL models) and a major scientific breakthrough using Google’s C2S Scale 27B model.
- Hardware: Announcements regarding NVIDIA DGX Spark and Apple’s M5 chip.
- AI Agents & Coding: The launch of a free, ad-supported tier by AMP.
2. Key Technical Insights
- C2S Scale’s Scientific Emergence: Google’s 27B Gemma-based model achieved a novel scientific discovery regarding cancer cell behavior by treating scRNA sequence profiles as a “language.” This was attributed to an emergent capability of scale, utilizing over a billion tokens of biological data processed through SFT and RL stages.
- Qwen 3 VL Performance Leap: The new smaller Qwen 3 Vision-Language models (4B and 8B) demonstrated performance matching or exceeding previous large models (like Qwen 2.5 72B) on several benchmarks. Notably, the 8B model achieved a 33.9 score on OS World, significantly outperforming the 72B predecessor (8%) and rivaling larger models for on-device agent tasks.
- VEO 3.1 Enhancements: The new video model focuses on cinematic updates and improved control, suggesting a move toward professional-grade video creation tools, alongside competitors pushing generation times past 20 seconds (Baidu’s News Streamer).
3. Business/Investment Angle
- OpenAI’s Content Policy Shift: The planned introduction of “adult mode” signals OpenAI’s strategy to capture a broader user base by relaxing overly restrictive guardrails, potentially impacting market share against competitors who already offer more flexibility.
- AI in Drug Discovery: The C2S Scale breakthrough validates the massive commercial potential of applying LLMs to complex biological data, suggesting a future where AI accelerates drug discovery through high-throughput virtual screening and hypothesis generation.
- Hardware Competition Intensifies: NVIDIA’s DGX Spark targets low-power, high-density inference, while Apple’s M5 chip emphasizes significant on-device AI acceleration, indicating a bifurcation in hardware strategy between cloud-scale training/inference and edge/personal device efficiency.
4. Notable Companies/People
- Google DeepMind: Highlighted for the VEO 3.1 video model (interview with Jessica Gallagos) and the revolutionary C2S Scale 27B model.
- OpenAI/Sam Altman: Mentioned for the planned release of “adult mode” and updates to ChatGPT’s memory management.
- Anthropic: Released Haiku 4.5, noted as being twice as fast as its predecessor.
- Qwen Team: Praised for the extensive testing and release of smaller, high-performing VL models.
- Cognition: Mentioned for the breaking news release of SweetGrap (discussed with Svik).
5. Future Implications
The conversation strongly suggests the industry is moving toward:
- Specialized, High-Impact AI: Models trained specifically on domain-specific “languages” (like biological sequences) can yield profound, emergent scientific insights, moving beyond general-purpose reasoning.
- Ubiquitous OS Integration: Microsoft embedding Copilot directly into Windows 11 signals a future where operating systems are primarily controlled via natural language commands.
- Increased Model Accessibility: The release of powerful models like C2S Scale (27B) and smaller Qwen VL models means cutting-edge performance is becoming accessible for local deployment and specialized fine-tuning.
6. Target Audience
This episode is highly valuable for AI/ML Engineers, AI Researchers, Product Managers in Tech, and Technology Investors who need a rapid, comprehensive overview of the latest model releases, technical breakthroughs, and strategic shifts across the AI ecosystem.
🏢 Companies Mentioned
đź’¬ Key Insights
"Ruler is basically like an automated LLM judge. We do a bunch of work to make LLM judge work really, really well for RL. And in practice, this works phenomenally well, surprisingly well, where... a very hard part in doing RL was getting that reward function..."
"The Thinking Machines post about LoRAs... basically showed when you're doing RL at least, there is zero discernible difference in training all the parameters versus training a LoRA."
"If you have a bunch of different LoRAs, and LoRAs are just, it stands for Low-Rank Adapter... you can actually at inference time still batch together inference requests from different requests that are using different LoRAs, and it can all be run efficiently in the same batch as if they were all running against the same model."
"Fundamentally the way RL works is... you're going to have your deep research agent go off and research 100 different questions... Comes back with the final answer, and then you're going to take all those final answers and you're going to have to grade them some way... Then you're going to use that to train and update your model..."
"Traditionally, if you're managing your own GPUs and starting everything up that way, you might take like a couple of minutes every time you start a run... With serverless RL... it just takes a couple of seconds to start up, and you can be off to the races."
"If you've done as much as you can and you're sort of hitting a limit, it's like, hey, my agent is doing what I want 80% of the time and then there's like these 20% of the cases where it's not, that's where I think it makes sense to bring in RL and you can get that last 20% of the performance."