Ep156: LLM Migrations to One Cloud: Coveo's Strategic Move to Amazon Bedrock

Unknown Source October 09, 2025 25 min

artificial-intelligence generative-ai investment ai-infrastructure google microsoft

49 Companies

46 Key Quotes

4 Topics

🎯 Summary

Comprehensive Summary: Ep156: LLM Migrations to One Cloud: Coveo’s Strategic Move to Amazon Bedrock

This podcast episode details Coveo’s strategic migration of its core Generative AI (Gen AI) functionality, specifically its Relevance Generative Answering (CRGA) RAG solution, from a multi-cloud setup (utilizing GPT models on Azure) entirely to Amazon Bedrock, leveraging the Amazon Titan Lite model. The discussion focuses heavily on the technical complexity of LLM model swapping and the robust framework Coveo built to manage this process.

1. Focus Area

The primary focus is the successful migration of a production-grade, enterprise RAG application (Coveo Relevance Generative Answering) to a single cloud provider (AWS) using Amazon Bedrock. Key sub-topics include multi-cloud governance simplification, data residency compliance, LLM prompt engineering challenges across different models, and the development of automated evaluation frameworks for maintaining quality during model swaps.

2. Key Technical Insights

LLM Migration as a “Mind Transplant”: Migrating between foundation models (even minor version upgrades) is not a simple software upgrade but requires extensive re-engineering of prompts because different models interpret system messages and context differently due to varying training data.
Automated Evaluation Framework Necessity: Coveo developed a comprehensive framework to define and automatically test over 20 expected behaviors (e.g., answering only when context is present, respecting language, maintaining consistent meaning) to ensure quality parity or improvement post-migration.
Automated Prompt Optimization: The evaluation framework feeds into an automated process where prompt candidates are generated, evaluated against metrics, and refined iteratively by an LLM until performance plateaus, ensuring optimal results for the new model (Titan Lite).

3. Business/Investment Angle

Simplifying Governance and Compliance: Moving to a single cloud (AWS) resolves Coveo’s multi-cloud governance complexity, security rule fragmentation, and simplifies meeting critical customer requirements for data residency and regional compliance.
Cost and Performance Optimization: The migration to Titan Lite on Bedrock resulted in reduced infrastructure complexity and lower costs while meeting or exceeding the performance benchmarks previously achieved with proprietary models like GPT-4o mini.
Rapid Deployment for Customers: Coveo’s managed RAG solution allows enterprise customers (like SAP Concur, saving €8M annually) to deploy generative answering solutions in days, not months, providing immediate, high-confidence results that reduce case creation by 20-30%.

4. Notable Companies/People

Coveo: The AI relevance company, providing enterprise search, personalization, and generative answering solutions built on a unified indexing platform that respects access rights.
Sebastian Kulio (VP of AI Strategy, Coveo): The key speaker detailing the technical challenges, the “mind transplant” analogy, and the success of the migration.
Janik Kongvijiam (AWS Solution Architect): Host guiding the discussion, emphasizing the role of AWS services.
Amazon Bedrock: AWS’s fully managed service for building generative AI applications, used here to host the Amazon Titan Lite model.
Amazon Titan Lite: The specific foundation model Coveo selected for the migration on Bedrock.
ELB with Talent: Mentioned as a key partner assisting with the Bedrock infrastructure setup and load testing (handling over 70 billion tokens monthly).

5. Future Implications

The industry trend is moving toward cloud consolidation for Gen AI workloads to simplify governance and compliance. Furthermore, the necessity of sophisticated, automated evaluation and prompt engineering frameworks will become standard practice for any ISV relying on external foundation models, especially given the rapid deprecation cycles imposed by model providers. Coveo is positioning its platform to power both grounded generative answers and complex AI agents via unified retrieval APIs integrated across various platforms (including Amazon Q and Microsoft Co-Pilot).

6. Target Audience

This episode is highly valuable for AI/ML Engineers, CTOs, Cloud Architects, and Product Leaders at ISVs who are currently managing multi-cloud AI deployments, struggling with prompt stability across model updates, or seeking to leverage managed services like Amazon Bedrock for enterprise-grade RAG solutions.

🏢 Companies Mentioned

Agent Force ✅ ai_application

GPT ✅ ai_model_provider

Microsoft Co ✅ unknown

Agent Force ✅ unknown

Amazon Q ✅ unknown

If I ✅ unknown

As I ✅ unknown

Titan Lite ✅ unknown

So Sebastian ✅ unknown

Amazon Titan Lite ✅ unknown

So AWS ✅ unknown

So Bedrock ✅ unknown

GPT LLM ✅ unknown

Azure Cloud ✅ unknown

SAP Concur ✅ unknown

💬 Key Insights

"If the context does not contain the answer, the LLM should not answer. And this is actually harder to do because the LLMs, they want to talk. They always want to say something. So it's really harder to make them not answer when they should not."

Impact Score: 10

"If I give you an example of a behavior for us, it's, for example, if the context contains the information, we want the LLM to answer. This is an easy one. This is the obvious one. The opposite needs to be true. So if the [context does not contain the information, the model should not answer]."

Impact Score: 10

"We worked really hard on our evaluation framework to automate a big part of the evaluation and the migration. So for us, it's really important that we can keep the same performance over time and improve it."

Impact Score: 10

"Currently, the model suppliers are asking us to migrate every six months. So they are deprecating really fast the model's version."

Impact Score: 10

"Why do we see this change of behavior is because these models are trained on different datasets? So the inner knowledge of the model is always different. The model will interpret the system message differently. So you will need to talk to it differently."

Impact Score: 10

"The hard part is making the system work as you want with a new LLM. Because migrating from one LLM to another is very complex. If here we have people that have worked with LLMs, you know that if you're migrating from one version to another, you have to redo your prompt. It's not true that you can use the same prompt from one LLM and use it for the second one."

Impact Score: 10

📊 Topics

#artificialintelligence 62 #generativeai 10 #investment 7 #aiinfrastructure 2