← Back to Library

JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework

Authors: Yuxuan Zhou, Yang Bai, Kuofeng Gao, Tao Dai, Shu-Tao Xia

Published: 2025-11-10

arXiv ID: 2511.07315v1

Added to Library: 2025-11-11 05:00 UTC

Red Teaming

📄 Abstract

The widespread application of large VLMs makes ensuring their secure deployment critical. While recent studies have demonstrated jailbreak attacks on VLMs, existing approaches are limited: they require either white-box access, restricting practicality, or rely on manually crafted patterns, leading to poor sample diversity and scalability. To address these gaps, we propose JPRO, a novel multi-agent collaborative framework designed for automated VLM jailbreaking. It effectively overcomes the shortcomings of prior methods in attack diversity and scalability. Through the coordinated action of four specialized agents and its two core modules: Tactic-Driven Seed Generation and Adaptive Optimization Loop, JPRO generates effective and diverse attack samples. Experimental results show that JPRO achieves over a 60\% attack success rate on multiple advanced VLMs, including GPT-4o, significantly outperforming existing methods. As a black-box attack approach, JPRO not only uncovers critical security vulnerabilities in multimodal models but also offers valuable insights for evaluating and enhancing VLM robustness.

🔍 Key Points

  • JPRO introduces a novel multi-agent collaborative framework for automated jailbreaking of vision-language models (VLMs), effectively addressing previous methods' limitations regarding diversity and scalability.
  • The framework utilizes four specialized agents (Planner, Attacker, Modifier, Verifier) to dynamically generate semantically coherent adversarial image-text pairs and maintain malicious intent across multi-turn dialogues.
  • JPRO achieves a high attack success rate (over 60%) across advanced VLMs, significantly outperforming existing black-box attack methods, which highlights its effectiveness in exposing security vulnerabilities.
  • The paper provides empirical evidence through extensive experimentation, demonstrating JPRO's superior performance and transferability across different model architectures.
  • The findings emphasize the need for improved security measures in the deployment of large multimodal models, as JPRO reveals critical vulnerabilities that could be exploited.

💡 Why This Paper Matters

This paper is a significant contribution to the field of AI security, as it presents innovative methodologies to circumvent the safety mechanisms of VLMs. By exposing vulnerabilities through automated black-box attacks, the research not only enhances the understanding of potential risks associated with AI deployment but also informs the development of more robust defense strategies against malicious use.

🎯 Why It's Interesting for AI Security Researchers

The paper is essential for AI security researchers as it highlights the challenges and vulnerabilities posed by multimodal AI systems. By introducing new, automated techniques for jailbreaking, it provides insights into the adversarial capabilities that could be employed against AI systems, emphasizing the necessity for ongoing research in defensive strategies and the strengthening of safety mechanisms.

📚 Read the Full Paper