← Back to Library

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Authors: Xin Wang, Yunhao Chen, Juncheng Li, Yixu Wang, Yang Yao, Tianle Gu, Jie Li, Yan Teng, Xingjun Ma, Yingchun Wang, Xia Hu

Published: 2026-01-04

arXiv ID: 2601.01592v1

Added to Library: 2026-01-07 10:03 UTC

Red Teaming

📄 Abstract

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT architects a paradigm shift in automated red-teaming by introducing an adversarial kernel that enables modular separation across five critical dimensions: model integration, dataset management, attack strategies, judging methods, and evaluation metrics. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.

🔍 Key Points

  • Introduction of OpenRT, a modular and high-throughput red-teaming framework for Multimodal Large Language Models (MLLMs), addressing fragmentation and scalability issues in existing benchmarks.
  • Integration of 37 diverse attack methodologies that allow comprehensive evaluation of MLLM safety, including advanced strategies like multi-modal perturbations and multi-agent evolutionary approaches.
  • Empirical analysis reveals significant safety vulnerabilities across 20 evaluated models, with average Attack Success Rates (ASR) reaching up to 49.14%, exposing inadequacies in current safety mechanisms.
  • The framework showcases that even advanced models do not inherently possess improved robustness against adaptive, multi-modal attacks, and highlights the trend of evolving attack strategies that exploit novel model capabilities.
  • Provision of an open-source platform that encourages continuous development and community engagement towards enhancing AI safety through systematic evaluation and defense strategies.

💡 Why This Paper Matters

The OpenRT framework represents a significant advancement in the systematic evaluation of Multimodal Large Language Models' safety. Its comprehensive design and implementation of diverse attack methodologies expose critical vulnerabilities in state-of-the-art models, highlighting the urgent need for improved safety mechanisms. OpenRT not only provides a valuable resource for researchers looking to understand and mitigate risks associated with MLLMs but also sets a foundation for future work in AI safety.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant to AI security researchers as it addresses current challenges in evaluating the safety of advanced AI models, particularly in the context of multimodal interactions. The introduction of an open-source framework that consolidates various attack methodologies offers researchers a powerful tool for benchmarking and developing resilient models. Moreover, findings regarding vulnerabilities in leading models underline the importance of continuous testing and evolution of AI systems in response to emerging threats.

📚 Read the Full Paper