Data Volume, Quality, and Model Degradation for AI at Scale - with Sunitha Rao of Hitachi Vantara
🎯 Summary
Podcast Episode Summary: Data Volume, Quality, and Model Degradation for AI at Scale - with Sunitha Rao of Hitachi Vantara
This episode focuses on the critical, often overlooked, infrastructure and data management challenges enterprises face when scaling AI deployments, moving beyond the initial hype cycles to achieve measurable ROI and sustainability.
1. Focus Area: The discussion centers on AI Infrastructure Modernization, specifically addressing the ripple effects of explosive data growth on AI model performance, data quality governance, compute resource management (especially GPUs), and integrating sustainability (ESG) into IT investment decisions. Key themes include data pipeline quality control, mitigating model degradation, and defining infrastructure in the context of hybrid/multi-cloud MLOps.
2. Key Technical Insights:
- Expensive Garbage Out: Poor data quality (noise, duplication, bias) leads to significantly expensive failures at scale, necessitating proactive quality gates (schema checks, anomaly detection) early in the data pipeline, not just during training.
- Infrastructure Redefined: Modern AI infrastructure is fluid, requiring unified frameworks that seamlessly integrate data, elastic compute, tiered storage, and MLOps orchestration across hybrid environments.
- SLOs for Degradation Control: Service Level Objectives (SLOs) must move beyond simple latency metrics to encompass data freshness, training-to-serving skew percentages, and overall outcome KPIs to prevent preventable model degradation.
3. Business/Investment Angle:
- Workload Mapping is Key to ROI: The primary driver for successful investment, ROI, and sustainability is accurately mapping specific workloads to the correct execution venue (on-prem, cloud, edge).
- Sustainability as Cost Control: Aligning IT investments with ESG goals is crucial; leaders should “use carbon like cash” by implementing policy engines that weigh cost, carbon savings, and performance when placing data residency.
- Focus on Better Needles: Investment should shift from simply building bigger compute “haystacks” to deploying “better needles”—smarter, more efficient data management and quality tools within the existing system.
4. Notable Companies/People:
- Sunitha Rao (Guest): Special Vice President and General Manager for Hybrid Cloud Business at Hitachi Vantara. She provides the expert perspective on data infrastructure foundations, hybrid cloud management, and aligning AI scaling with sustainability.
- Hitachi Vantara: Mentioned as a company focused on providing energy-efficient storage frameworks and unified platforms to address these infrastructure gaps.
5. Future Implications: The industry is moving toward a highly integrated, policy-driven infrastructure where data placement, resource utilization, and sustainability metrics are intrinsically linked to model performance. The definition of infrastructure will become increasingly abstract and fluid, driven by workload requirements rather than just physical hardware. Continuous, self-learning monitoring systems will replace static threshold alerts to manage complex data flows.
6. Target Audience: AI/ML Leaders, Enterprise IT Executives (CIOs, CTOs), Data Strategy Officers, and Infrastructure Architects who are responsible for scaling AI deployments, managing hybrid cloud environments, and justifying AI investment ROI while meeting corporate sustainability mandates.
Comprehensive Summary
The podcast episode with Sunitha Rao of Hitachi Vantara provides a pragmatic deep dive into the infrastructure realities underpinning enterprise AI success, arguing that the focus must shift from sheer data volume and compute capacity to data quality, infrastructure unification, and sustainable placement strategies.
Narrative Arc and Key Discussion Points: The conversation begins by acknowledging the widespread challenge: enterprises are investing heavily in AI, yet ROI is often elusive, partly due to the explosive growth of data. Rao systematically breaks down the core infrastructure challenges: data silos requiring “silo busters,” compute bottlenecks (GPU shortages), networking demands (ultra-low latency), inadequate storage frameworks (legacy systems not suited for AI read/write patterns), and the complexity of orchestration in hybrid/multi-cloud MLOps environments. She concludes this section by emphasizing that cost and sustainability (ESG) must be primary considerations, not afterthoughts.
The discussion pivots to the crucial distinction between data volume and data quality. Rao stresses that “garbage in is an expensive garbage out,” highlighting that poor quality data introduces security risks (leakage, bias amplification) and significantly inflates operational costs. She advocates for establishing rigorous quality frameworks—focusing on clean, diverse, and deduplicated data—before training begins.
Rao then redefines modern infrastructure as less about “hammers and nails” and more about deploying “better needles.” To combat model degradation, she outlines three critical control frameworks:
- Data Freshness and Quality Gates: Implementing checks like schema validation and PII detection in streaming ETL pipelines (e.g., Hitachi Vantara’s PII data service).
- Early Alerting and Detection: Moving toward self-learning models for continuous monitoring and automated playbook responses.
- Reproducibility and Versioning: Tightly linking data sets, features, and code bases.
The concept of Service Level Objectives (SLOs) is introduced as the mechanism to enforce these controls, moving beyond simple latency to track KPIs like training-to-serving skew and pass rates.
Finally, the conversation addresses cost, scale, and sustainability. Rao’s core recommendation is mapping workloads to the right execution venue. This decision dictates performance, compliance, cost models, and sustainability outcomes. She advocates for a “carbon-aware” approach, using policy engines to weigh infrastructure cost against carbon savings when determining where data should reside, aligning infrastructure investment directly with measurable ROI and ESG goals.
🏢 Companies Mentioned
đź’¬ Key Insights
"garbage in doesn't just mean garbage out; it means expensive garbage out."
"I still believe somebody used to, when we really started talking about the carbon aware, people now call it "use carbon like cash.""
"The most important aspect is where you want the data to land, and that defines the investments, that defines the ROI, that defines the sustainability."
"I would really recommend looking at mapping workloads to the right execution venue. That is the key."
"The most important thing is you need to have KPIs on a pretty regular basis so that the KPIs are the key performance indicators that basically help us to track the data freshness, it will help you to track the training-to-serving skew percentages."
"The most important concept in all of these three different frameworks that I spoke about just now in the context of degradation, especially, is SLOs."