#333.src - Bodyguard: Coder contre la haine avec Charles Cohen
🎯 Summary
Podcast Summary: #333.src - Bodyguard: Coder contre la haine avec Charles Cohen
This episode of If This Then Dev features Bruno Soulès in conversation with Charles Cohen, the 29-year-old President and Founder of Bodyguard, a company dedicated to combating online hate speech and cyberbullying. The discussion centers on the genesis of Bodyguard, the technical philosophy behind its moderation engine, and the critical distinction between different types of Artificial Intelligence in solving complex real-world problems like online toxicity.
1. Focus Area
The primary focus is Online Content Moderation and AI Strategy. Specific topics include:
- The personal motivation behind fighting online injustice (triggered by a tragic case of cyberbullying).
- The limitations of keyword-based moderation and early Machine Learning approaches.
- The technical architecture of Bodyguard, heavily relying on Symbolic AI (Rule Engines) rather than solely on Deep Learning/LLMs.
- The necessity of contextual understanding, severity classification, and behavioral analysis in effective moderation.
2. Key Technical Insights
- Symbolic AI Superiority for Specific Tasks: Cohen argues that his custom-built Symbolic AI (rule-based engine coded in Java) remains more performant than many current LLMs for the specific task of hate speech detection, emphasizing its transparency and update speed.
- Context is King (Beyond Keywords): Effective moderation requires analyzing context (what precedes/follows a word), the target (who the comment is directed at), the subject matter of the content being commented on, and the severity/frequency of the language used.
- Natural Implementation of ML Concepts: Cohen discovered concepts like One-Hot Encoding organically by optimizing performance—realizing that searching for integers (encoded words) was faster than searching through strings, demonstrating an intuitive approach to data processing optimization.
3. Business/Investment Angle
- Agility vs. Scale in Moderation: Large platforms (like Meta) are too slow (“too big”) to keep up with the rapidly evolving tactics of toxic users (“the mouse”). Bodyguard’s strength lies in its technological agility, allowing rapid updates to linguistic rules.
- Trust Management as Essential Growth Factor: The episode opens with a mention of Vanta, highlighting that in the digital evolution, proving trust (via compliance and security) is essential for business growth.
- The Value of Non-Regression Testing: Cohen stressed the importance of maintaining robust, human-validated test sets (non-regression tests) to ensure that algorithmic or rule updates do not inadvertently break existing, correctly classified moderation scenarios—a critical need when dealing with high-stakes content.
4. Notable Companies/People
- Charles Cohen (Bodyguard Founder): Self-taught developer who founded Bodyguard at 21 after being motivated by a cyberbullying tragedy. He champions a critical, pragmatic view of AI technologies.
- Yann Guérin (Founder of Gladia.io): Mentioned as the person who later identified Cohen’s early optimization technique as formal One-Hot Encoding.
- Vanta: Mentioned as the episode sponsor, focusing on automated compliance and trust management for businesses.
5. Future Implications
The conversation strongly suggests a future where AI solutions are specialized and complementary, rather than monolithic. Cohen predicts that developers must maintain a critical spirit, avoiding the trap of assuming LLMs are the universal solution. For moderation, the future requires hybrid systems that combine the speed and transparency of rule-based Symbolic AI with the pattern recognition capabilities of Machine Learning/Deep Learning, depending on the specific moderation challenge.
6. Target Audience
This episode is highly valuable for Software Developers, CTOs, Product Managers, and AI/ML Engineers who are building or integrating content moderation systems, or those interested in the practical trade-offs between different AI paradigms (Symbolic vs. Deep Learning) in real-world applications.
🏢 Companies Mentioned
đź’¬ Key Insights
"Le *machine learning* répondait : "Ça, c'est un article politique, ça, c'est un article géopolitique, ça, c'est un article sur la technologie." On avait des grandes classifications, mais on n'avait pas des réponses précises à des questions."
"On utilise du LLM pour comprendre l'article et surtout avoir des réponses avec des questions très précises : "Est-ce que l'article parle de gens qui ont été tués ? Est-ce que l'article parle d'un objet ? [...] Est-ce que l'humain, quelle est la couleur de peau de la personne qui est sur l'article ? Sur la photo ?""
"c'était la possibilité qu'on ne pouvait pas faire avec du *machine learning*, c'était de poser des questions clés."
"À la différence du *machine learning*, le LLM, tu peux lui donner le commentaire, mais tu peux lui donner aussi le sujet de l'article. Tu peux lui demander en sortie d'avoir une liste de classifications mais également qu'il te dise vers qui le commentaire est destiné. Tu peux aussi lui demander une notion de sévérité, si c'est *low*, *medium*, *high*, d'imposer une sévérité sur les classifications qu'il a trouvées."
"L'intégration de l'IA chez Bodyguard, c'est une intégration à trois niveaux : une intégration technologique, dans la technologie Bodyguard de modération et de classification. [...] l'intégration dans le produit Bodyguard [...] Et l'intégration de l'IA chez Bodyguard, et c'est un des sujets les plus importants, c'est l'intégration d'outils d'IA en interne pour améliorer la productivité de toute la boîte."
"Donc je comprends : "Bon, alors, ben déjà , c'est une complémentarité." Et donc c'est là où je me dis : "Ça va nécessiter une complémentarité de technologies qu'on doit mettre en place.""