Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection

Authors: Subaru Kimura, Ryota Tanaka, Shumpei Miyawaki, Jun Suzuki, Keisuke Sakaguchi

Published: 2024-08-07

arXiv ID: 2408.03554v1

Added to Library: 2025-11-11 14:32 UTC

Red Teaming

📄 Abstract

We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.

🔍 Key Points

Introduction of a novel attack method called Goal Hijacking via Visual Prompt Injection (GHVPI) specifically targeting large vision-language models (LVLMs).
Empirical analysis shows that GPT-4V has a notable attack success rate of 15.8% and Gemini has 6.6%, highlighting significant vulnerabilities in these state-of-the-art models to GHVPI attacks.
Identifies critical factors influencing GHVPI attack success, including character recognition capabilities and instruction-following abilities of LVLMs.
The research empirically establishes that GHVPI attacks can effectively redirect the execution tasks of LVLMs from original tasks to attacker-defined tasks.
A preliminary defense mechanism shows potential to reduce GHVPI attack success rates, suggesting paths for improving model resilience.

💡 Why This Paper Matters

This paper highlights significant security vulnerabilities in state-of-the-art vision-language models, specifically their susceptibility to GHVPI attacks that can lead to manipulation of their intended tasks. The findings underscore the need for heightened awareness and research into defense mechanisms against such visual prompt injection attacks, given the increasing integration of AI systems in critical applications where security is paramount.

🎯 Why It's Interesting for AI Security Researchers

The implications of this research are substantial for AI security researchers as it reveals a previously under-explored attack vector (visual prompt injection) that threatens the reliability of AI systems utilized across various fields. Understanding GHVPI vulnerabilities can lead to the development of more robust AI systems, contributing to safer AI deployment in real-world scenarios. Additionally, this research promotes further investigation into the robustness of AI models, a critical concern given the implications of adversarial attacks in sensitive applications.

Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper