← Back to Library

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

Authors: Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan

Published: 2025-12-07

arXiv ID: 2512.06674v1

Added to Library: 2025-12-09 03:01 UTC

Red Teaming

📄 Abstract

Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the security of such multimodal systems, particularly their vulnerability to jailbreak attacks, remains critically underexplored. To bridge this gap, we propose RunawayEvil, the first multimodal jailbreak framework for I2V models with dynamic evolutionary capability. Built on a "Strategy-Tactic-Action" paradigm, our framework exhibits self-amplifying attack through three core components: (1) Strategy-Aware Command Unit that enables the attack to self-evolve its strategies through reinforcement learning-driven strategy customization and LLM-based strategy exploration; (2) Multimodal Tactical Planning Unit that generates coordinated text jailbreak instructions and image tampering guidelines based on the selected strategies; (3) Tactical Action Unit that executes and evaluates the multimodal coordinated attacks. This self-evolving architecture allows the framework to continuously adapt and intensify its attack strategies without human intervention. Extensive experiments demonstrate RunawayEvil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX. Specifically, RunawayEvil outperforms existing methods by 58.5 to 79 percent on COCO2017. This work provides a critical tool for vulnerability analysis of I2V models, thereby laying a foundation for more robust video generation systems.

🔍 Key Points

  • Introduction of RunawayEvil, the first multimodal jailbreak framework for Image-to-Video (I2V) models, employing a self-evolving architecture to enhance attack strategies.
  • Deployment of the 'Strategy-Tactic-Action' paradigm to coordinate attacks across both text and image inputs, enabling adaptive and dynamic jailbreaks.
  • Demonstrated state-of-the-art attack success rates on existing I2V models (Open-Sora 2.0 and CogVideoX), outperforming prior methods by 58.5% to 79% on COCO2017 dataset.
  • Utilization of reinforcement learning for strategy customization and a memory bank of successful attacks to continually refine and adapt attack methods without human intervention.
  • Empirical results showing a significant improvement in robustness against I2V model defenses, laying groundwork for enhancing security measures in multimodal systems.

💡 Why This Paper Matters

This paper establishes a pivotal foundation for understanding and addressing vulnerabilities in Image-to-Video generative models through the innovative RunawayEvil framework. It not only highlights the security risks posed by such advanced multimodal systems but also demonstrates a scalable method to exploit these risks strategically. As I2V models become increasingly integrated into creative applications and content generation, ensuring their security through rigorous testing methodologies like RunawayEvil is crucial for the responsible advancement of AI technologies.

🎯 Why It's Interesting for AI Security Researchers

The paper is of significant interest to AI security researchers as it reveals the uncharted landscape of security vulnerabilities in multimodal systems, particularly in I2V generation. The introduction of a self-amplifying and adaptable jailbreak approach challenges current security paradigms and provides a new tool for analyzing the resilience of generative models. Given the increasing usage of AI in sensitive and high-stake environments, research like this is essential to develop defenses against sophisticated jailbreak attacks and ensure robust AI applications.

📚 Read the Full Paper