← Back to Library

Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models

Authors: Lei Jiang, Zixun Zhang, Zizhou Wang, Xiaobing Sun, Zhen Li, Liangli Zhen, Xiaohua Xu

Published: 2025-06-20

arXiv ID: 2506.16760v1

Added to Library: 2025-06-23 04:01 UTC

Red Teaming

📄 Abstract

Large Vision-Language Models (LVLMs) demonstrate exceptional performance across multimodal tasks, yet remain vulnerable to jailbreak attacks that bypass built-in safety mechanisms to elicit restricted content generation. Existing black-box jailbreak methods primarily rely on adversarial textual prompts or image perturbations, yet these approaches are highly detectable by standard content filtering systems and exhibit low query and computational efficiency. In this work, we present Cross-modal Adversarial Multimodal Obfuscation (CAMO), a novel black-box jailbreak attack framework that decomposes malicious prompts into semantically benign visual and textual fragments. By leveraging LVLMs' cross-modal reasoning abilities, CAMO covertly reconstructs harmful instructions through multi-step reasoning, evading conventional detection mechanisms. Our approach supports adjustable reasoning complexity and requires significantly fewer queries than prior attacks, enabling both stealth and efficiency. Comprehensive evaluations conducted on leading LVLMs validate CAMO's effectiveness, showcasing robust performance and strong cross-model transferability. These results underscore significant vulnerabilities in current built-in safety mechanisms, emphasizing an urgent need for advanced, alignment-aware security and safety solutions in vision-language systems.

🔍 Key Points

  • Introduction of Cross-Modal Adversarial Multimodal Obfuscation (CAMO), a novel black-box jailbreak framework that effectively combines visual and textual prompts to bypass safety mechanisms in Large Vision-Language Models (LVLMs).
  • Demonstration of CAMO's efficiency in requiring significantly fewer queries compared to existing methods while maintaining high attack success rates against various defense mechanisms.
  • Conducted extensive evaluations across multiple models, including prominent proprietary and open-source architectures, which showed that CAMO consistently outperformed state-of-the-art methods in evading detection and eliciting harmful outputs.
  • Robust experimental results indicate that CAMO's innovative use of multi-step reasoning supports strong cross-modal transferability, highlighting inherent vulnerabilities in current LVLM safety protocols.

💡 Why This Paper Matters

This paper provides critical insights into the vulnerabilities of large vision-language models by showcasing CAMO, a novel attack framework that successfully obfuscates harmful instructions across modalities. It underscores the urgent need for more resilient security measures that account for cross-modal interactions and adversarial prompt complexities, which is crucial for the safety of deployed AI systems.

🎯 Why It's Interesting for AI Security Researchers

This research is highly relevant to AI security researchers as it uncovers significant weaknesses in current safety mechanisms of LVLMs, emphasizes the importance of multi-modal reasoning in adversarial contexts, and calls for advanced protective strategies. Understanding these exploitations is vital for improving AI model robustness against malicious inputs and evolving threats.

📚 Read the Full Paper