← Back to Library

VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models

Authors: Bowen Zheng, Yongli Xiang, Ziming Hong, Zerong Lin, Chaojian Yu, Tongliang Liu, Xinge You

Published: 2026-02-24

arXiv ID: 2602.20999v1

Added to Library: 2026-02-25 03:00 UTC

Red Teaming

📄 Abstract

Image-to-Video (I2V) generation models, which condition video generation on reference images, have shown emerging visual instruction-following capability, allowing certain visual cues in reference images to act as implicit control signals for video generation. However, this capability also introduces a previously overlooked risk: adversaries may exploit visual instructions to inject malicious intent through the image modality. In this work, we uncover this risk by proposing Visual Instruction Injection (VII), a training-free and transferable jailbreaking framework that intentionally disguises the malicious intent of unsafe text prompts as benign visual instructions in the safe reference image. Specifically, VII coordinates a Malicious Intent Reprogramming module to distill malicious intent from unsafe text prompts while minimizing their static harmfulness, and a Visual Instruction Grounding module to ground the distilled intent onto a safe input image by rendering visual instructions that preserve semantic consistency with the original unsafe text prompt, thereby inducing harmful content during I2V generation. Empirically, our extensive experiments on four state-of-the-art commercial I2V models (Kling-v2.5-turbo, Gemini Veo-3.1, Seedance-1.5-pro, and PixVerse-V5) demonstrate that VII achieves Attack Success Rates of up to 83.5% while reducing Refusal Rates to near zero, significantly outperforming existing baselines.

🔍 Key Points

  • Introduction of Visual Instruction Injection (VII) as a novel jailbreaking framework for Image-to-Video (I2V) models, leveraging visual instruction-following behaviors.
  • Development of two key modules: Malicious Intention Reprogramming (MIR) for distilling and disguising malicious intent, and Visual Instruction Grounding (VIG) for embedding these instructions onto safe visual inputs.
  • Achieving high Attack Success Rates (up to 83.5%) while maintaining near-zero Refusal Rates through extensive empirical evaluation across four state-of-the-art I2V models.
  • Demonstration of the limitations of current safety mechanisms against dynamic visual instructions, emphasizing the urgent need for advanced defense strategies in I2V generation.
  • Exploration of hyperparameter configurations (e.g., typography, language, and instruction placement) to enhance IX efficacy, showcasing the robustness of VII across various settings.

💡 Why This Paper Matters

This paper is significant as it exposes a critical vulnerability in advanced Image-to-Video generation models, demonstrating how adversaries can exploit visual instruction-following capabilities to induce unsafe content. The proposed VII framework not only advances understanding of multi-modal threats but also highlights the pressing need for improved safety mechanisms in generative models, making it a pertinent contribution to the field of AI safety and security.

🎯 Why It's Interesting for AI Security Researchers

The findings of this paper are highly relevant to AI security researchers as they reveal a novel attack vector that could be employed by malicious actors to bypass existing safety features in generative models. Understanding the dynamics of visual instruction-following behavior in AI systems will inform the development of more robust defense mechanisms, ensuring the safety and ethical use of AI technologies in sensitive applications.

📚 Read the Full Paper