← Back to Library

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Authors: Meiwen Ding, Song Xia, Chenqi Kong, Xudong Jiang

Published: 2026-03-31

arXiv ID: 2603.29418v1

Added to Library: 2026-04-01 02:01 UTC

Red Teaming

📄 Abstract

Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves them vulnerable to prompt injection attacks. Existing prompt injection methods predominantly rely on textual prompts or perceptible visual prompts that are observable by human users. In this work, we study imperceptible visual prompt injection against powerful closed-source MLLMs, where adversarial instructions are embedded in the visual modality. Our method adaptively embeds the malicious prompt into the input image via a bounded text overlay to provide semantic guidance. Meanwhile, the imperceptible visual perturbation is iteratively optimized to align the feature representation of the attacked image with those of the malicious visual and textual targets at both coarse- and fine-grained levels. Specifically, the visual target is instantiated as a text-rendered image and progressively refined during optimization to more faithfully represent the desired semantics and improve transferability. Extensive experiments on two multimodal understanding tasks across multiple closed-source MLLMs demonstrate the superior performance of our approach compared to existing methods.

🔍 Key Points

  • Introduction of CoTTA, a novel adversarial prompt injection framework that enables targeted manipulation of multimodal large language models (MLLMs) using imperceptible visual prompts.
  • Combines adversarial perturbations with a covert textual trigger to enhance attack stealth and effectiveness, allowing for precise control over the model's responses.
  • Implements a dual-target alignment mechanism that aligns source image features with both adversarial text and a dynamically updated target image, improving transferability and robustness of attacks.
  • Achieves superior performance across multiple multimodal understanding tasks, showing significant improvement over existing prompt injection and adversarial attack methods.
  • Ablation studies highlight the importance of each component of the framework in achieving high attack success rates. Through extensive experiments, CoTTA demonstrates over 80% attack success rates against powerful closed-source MLLMs.

💡 Why This Paper Matters

This paper is significant as it uncovers substantial vulnerabilities in multimodal large language models, a crucial area of research given the increasing deployment of these systems in real-world applications. By developing CoTTA, the authors not only demonstrate the feasibility of imperceptible prompt injection attacks but also provide a framework that can inspire future research in AI security, prompting necessary advancements in defensive measures.

🎯 Why It's Interesting for AI Security Researchers

The study is of particular interest to AI security researchers as it reveals new attack vectors against multimodal large language models, which are becoming increasingly prevalent in various applications. Understanding these vulnerabilities enables researchers to develop more robust defenses against such adversarial tactics, ultimately enhancing the security and reliability of AI systems.

📚 Read the Full Paper