← Back to Library

AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand Safety

Authors: Adi Levi, Or Levi, Sardhendu Mishra, Jonathan Morra

Published: 2025-08-07

arXiv ID: 2508.05527v1

Added to Library: 2025-08-14 23:06 UTC

Safety

📄 Abstract

As the volume of video content online grows exponentially, the demand for moderation of unsafe videos has surpassed human capabilities, posing both operational and mental health challenges. While recent studies demonstrated the merits of Multimodal Large Language Models (MLLMs) in various video understanding tasks, their application to multimodal content moderation, a domain that requires nuanced understanding of both visual and textual cues, remains relatively underexplored. In this work, we benchmark the capabilities of MLLMs in brand safety classification, a critical subset of content moderation for safe-guarding advertising integrity. To this end, we introduce a novel, multimodal and multilingual dataset, meticulously labeled by professional reviewers in a multitude of risk categories. Through a detailed comparative analysis, we demonstrate the effectiveness of MLLMs such as Gemini, GPT, and Llama in multimodal brand safety, and evaluate their accuracy and cost efficiency compared to professional human reviewers. Furthermore, we present an in-depth discussion shedding light on limitations of MLLMs and failure cases. We are releasing our dataset alongside this paper to facilitate future research on effective and responsible brand safety and content moderation.

🔍 Key Points

  • Introduction of a novel, multimodal and multilingual dataset for evaluating content moderation in brand safety, filling a critical gap in existing research.
  • Benchmarking the capabilities of Multimodal Large Language Models (MLLMs) like GPT, Gemini, and Llama for brand safety tasks, demonstrating their effectiveness compared to professional human reviewers.
  • Detailed evaluation of model performance across various risk categories, highlighting the strengths and weaknesses of different MLLMs, particularly in terms of accuracy and cost-efficiency.
  • Identification of limitations in MLLMs, including incorrect associations and language bias, which underscores the need for further improvements in AI moderation systems.
  • Proposed hybrid human-AI moderation systems leveraging MLLMs to expedite and enhance the content moderation process while retaining human oversight for precision.

💡 Why This Paper Matters

This paper presents significant advances in the field of content moderation using MLLMs, highlighting the urgent need and opportunity to incorporate AI-driven approaches into brand safety protocols. By releasing a new dataset and evaluating multiple leading models, the study not only fills an existing research void but also lays the groundwork for future developments in scalable and effective moderation strategies that align with evolving online content landscapes. Furthermore, the findings advocate for a hybrid approach to moderation, which can mitigate the limitations of both AI and human reviews in complex content scenarios, making this work highly relevant in contemporary discussions about online safety and ethics.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper provides valuable insights into the application of advanced AI models in moderating online content, particularly video, which is a high-stakes domain where harmful content can rapidly proliferate. Understanding the strengths and limitations of MLLMs, especially concerning their performance compared to human reviewers, is crucial for developing secure systems that can accurately detect and mitigate misuse or harmful content while avoiding biases. Additionally, the identification of model failures and their potential causes sheds light on areas needing improvement, making this research invaluable for the ongoing development of robust AI moderation technologies.

📚 Read the Full Paper