Physical Prompt Injection Attacks on Large Vision-Language Models

📄 Abstract

Large Vision-Language Models (LVLMs) are increasingly deployed in real-world intelligent systems for perception and reasoning in open physical environments. While LVLMs are known to be vulnerable to prompt injection attacks, existing methods either require access to input channels or depend on knowledge of user queries, assumptions that rarely hold in practical deployments. We propose the first Physical Prompt Injection Attack (PPIA), a black-box, query-agnostic attack that embeds malicious typographic instructions into physical objects perceivable by the LVLM. PPIA requires no access to the model, its inputs, or internal pipeline, and operates solely through visual observation. It combines offline selection of highly recognizable and semantically effective visual prompts with strategic environment-aware placement guided by spatiotemporal attention, ensuring that the injected prompts are both perceivable and influential on model behavior. We evaluate PPIA across 10 state-of-the-art LVLMs in both simulated and real-world settings on tasks including visual question answering, planning, and navigation, PPIA achieves attack success rates up to 98%, with strong robustness under varying physical conditions such as distance, viewpoint, and illumination. Our code is publicly available at https://github.com/2023cghacker/Physical-Prompt-Injection-Attack.

🔍 Key Points

Introduction of Physical Prompt Injection Attacks (PPIA), a novel black-box attack that targets Large Vision-Language Models (LVLMs) through embedded visual prompts.
PPIA operates without access to model internals or user queries, making it viable for real-world applications where adversaries lack direct control over inputs.
The attack method combines the offline selection of semantically effective prompts with strategic placement in the environment, ensuring high recognition and interpretability by the LVLMs.
Extensive evaluation reveals a high attack success rate (up to 98%) across multiple state-of-the-art LVLMs in both simulated and real-world conditions, showcasing robustness against variations like distance and lighting.
The paper highlights practical implications for the deployment of LVLMs, exposing a significant vulnerability in their handling of visual prompts.

💡 Why This Paper Matters

This paper presents a critical advancement in understanding the security vulnerabilities of Large Vision-Language Models by demonstrating the feasibility of Physical Prompt Injection Attacks. By circumventing conventional mechanisms of user input manipulation and leveraging physical environmental cues, PPIA sheds light on potential risks that may arise in real-world applications of LVLMs. The high success rates and robustness across varying conditions underline the urgency for security measures in multimodal AI systems, emphasizing the need for further research and development in this domain.

🎯 Why It's Interesting for AI Security Researchers

The research is of paramount interest to AI security researchers as it uncovers a previously unexplored attack vector against multimodal models. The introduction of a practical and effective methodology for manipulating model behavior through physical prompts broadens the scope of adversarial research. Furthermore, the findings pose significant implications for the deployment of AI systems in uncontrolled environments, prompting the need for enhanced defense mechanisms against such attacks. This work enriches the dialogue surrounding AI safety and robustness, making it crucial for ongoing research in secure AI applications.

Physical Prompt Injection Attacks on Large Vision-Language Models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper