Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

📄 Abstract

With the emergence of strong visual-language capabilities, multimodal large language models (MLLMs) have demonstrated tremendous potential for real-world applications. However, the security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments. Recent studies have successfully induced harmful responses from target MLLMs by encoding harmful textual semantics directly into visual inputs. However, in these approaches, the visual modality primarily serves as a trigger for unsafe behavior, often exhibiting semantic ambiguity and lacking grounding in realistic scenarios. In this work, we define a novel setting: visual-centric jailbreak, where visual information serves as a necessary component in constructing a complete and realistic jailbreak context. Building on this setting, we propose the VisCo (Visual Contextual) Attack. VisCo fabricates contextual dialogue using four distinct visual-focused strategies, dynamically generating auxiliary images when necessary to construct a visual-centric jailbreak scenario. To maximize attack effectiveness, it incorporates automatic toxicity obfuscation and semantic refinement to produce a final attack prompt that reliably triggers harmful responses from the target black-box MLLMs. Specifically, VisCo achieves a toxicity score of 4.78 and an Attack Success Rate (ASR) of 85% on MM-SafetyBench against GPT-4o, significantly outperforming the baseline, which performs a toxicity score of 2.48 and an ASR of 22.2%. The code is available at https://github.com/Dtc7w3PQ/Visco-Attack.

🔍 Key Points

Introduction of the Visual-Centric Jailbreak setting, emphasizing the need for realistic visual contexts in inducing harmful outputs from MLLMs.
Development of the VisCo (Visual Contextual) Attack, which fabricates contextual dialogues through four visual-centered strategies to create coherent attack prompts.
Demonstration of significant improvements in attack effectiveness, with a reported Attack Success Rate (ASR) of 85% and toxicity score of 4.78 on MM-SafetyBench, surpassing existing methods.
Integration of automatic toxicity obfuscation and semantic refinement processes to enhance the attack's efficacy and evade safety filters.
Extensive evaluation on various multimodal large language models (MLLMs) to validate the robustness of the proposed attack methodology.

💡 Why This Paper Matters

This paper is critical as it highlights the vulnerabilities of multimodal large language models (MLLMs) by presenting a novel approach to visual-centric adversarial attacks. The findings underscore the significance of security in the deployment of AI technologies that integrate visual and textual modalities, showing how seemingly innocuous visual contexts can be weaponized. By demonstrating the effectiveness of the VisCo Attack, the research calls into question the robustness of safety measures currently employed in MLLMs and points towards the need for improved defenses.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper compelling as it not only uncovers new attack vectors exploiting visual-language integration but also provides insights into the mechanisms of adversarial behavior in MLLMs. The novel methodologies and substantial results presented challenge existing safety protocols, urging advancements in security strategies, making it a pivotal contribution for research aimed at safeguarding AI applications against adversarial threats.

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper