← Back to Library

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Authors: Shaoxiong Guo, Tianyi Du, Lijun Li, Yuyao Wu, Jie Li, Jing Shao

Published: 2025-09-30

arXiv ID: 2509.26473v1

Added to Library: 2025-10-01 04:00 UTC

Red Teaming

📄 Abstract

Unified Multimodal understanding and generation Models (UMMs) have demonstrated remarkable capabilities in both understanding and generation tasks. However, we identify a vulnerability arising from the generation-understanding coupling in UMMs. The attackers can use the generative function to craft an information-rich adversarial image and then leverage the understanding function to absorb it in a single pass, which we call Cross-Modal Generative Injection (CMGI). Current attack methods on malicious instructions are often limited to a single modality while also relying on prompt rewriting with semantic drift, leaving the unique vulnerabilities of UMMs unexplored. We propose STaR-Attack, the first multi-turn jailbreak attack framework that exploits unique safety weaknesses of UMMs without semantic drift. Specifically, our method defines a malicious event that is strongly correlated with the target query within a spatio-temporal context. Using the three-act narrative theory, STaR-Attack generates the pre-event and the post-event scenes while concealing the malicious event as the hidden climax. When executing the attack strategy, the opening two rounds exploit the UMM's generative ability to produce images for these scenes. Subsequently, an image-based question guessing and answering game is introduced by exploiting the understanding capability. STaR-Attack embeds the original malicious question among benign candidates, forcing the model to select and answer the most relevant one given the narrative context. Extensive experiments show that STaR-Attack consistently surpasses prior approaches, achieving up to 93.06% ASR on Gemini-2.0-Flash and surpasses the strongest prior baseline, FlipAttack. Our work uncovers a critical yet underdeveloped vulnerability and highlights the need for safety alignments in UMMs.

🔍 Key Points

  • Identification of a novel vulnerability in Unified Multimodal Models (UMMs) termed Cross-Modal Generative Injection (CMGI), allowing adversaries to inject harmful information through the coupling of generation and understanding functions.
  • Development of the STaR-Attack framework, which leverages a three-act narrative structure to obscure malicious intents and achieve high attack success rates without semantic drift.
  • Introduction of a dynamic difficulty mechanism that adjusts the complexity of queries presented to UMMs based on model responses, enhancing attack effectiveness and adaptability.
  • Experimental results demonstrate STaR-Attack's superiority over existing methods, achieving up to 93.06% Attack Success Rate (ASR) and highlighting its effectiveness across a range of UMMs and datasets.
  • Highlighting the urgent need for improved safety alignments and robust defenses in UMMs to address the vulnerabilities exposed by this research.

💡 Why This Paper Matters

The STaR-Attack framework presents a significant breakthrough in understanding how vulnerabilities within UMMs can be exploited, indicating that despite their advanced capabilities, these models are not impervious to carefully crafted adversarial attacks. The findings underscore the necessity for better security protocols and alignment strategies in future model development, ensuring that as AI systems become more powerful, they also become more secure.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant for AI security researchers as it not only identifies a critical vulnerability in state-of-the-art multimodal AI models but also introduces a novel attack strategy that effectively circumvents existing safety mechanisms. The insights gained from the STaR-Attack framework can drive further research into both defensive strategies and understanding the broader implications of integrating generative and understanding capabilities in AI systems.

📚 Read the Full Paper