Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

Authors: Neha Nagaraja, Lan Zhang, Zhilong Wang, Bo Zhang, Pawan Patil

Published: 2026-03-04

arXiv ID: 2603.03637v1

Added to Library: 2026-03-05 03:01 UTC

Red Teaming

📄 Abstract

Multimodal Large Language Models (MLLMs) integrate vision and text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images to override model behavior. Our end-to-end IPI pipeline incorporates segmentation-based region selection, adaptive font scaling, and background-aware rendering to conceal prompts from human perception while preserving model interpretability. Using the COCO dataset and GPT-4-turbo, we evaluate 12 adversarial prompt strategies and multiple embedding configurations. The results show that IPI can reliably manipulate the output of the model, with the most effective configuration achieving up to 64\% attack success under stealth constraints. These findings highlight IPI as a practical threat in black-box settings and underscore the need for defenses against multimodal prompt injection.

🔍 Key Points

Introduction of Image-based Prompt Injection (IPI) as a novel adversarial attack on Multimodal Large Language Models (MLLMs), exploiting the integration of vision and language to manipulate model outputs.
The study presents an end-to-end IPI framework that utilizes advanced techniques such as segmentation-based region selection, adaptive font scaling, and background-aware rendering to subtly embed prompts in images.
Empirical evaluation demonstrated that IPI can achieve up to 64% attack success under stealth constraints, thereby highlighting the effectiveness and practical implications of the attack in real-world scenarios.
The research underscores the significant vulnerabilities of MLLMs to prompt injections that utilize visual modalities, an area that has been underexplored compared to their text counterparts.
Proposed preventive measures against IPI attacks signal a proactive approach to enhancing the security of multimodal AI systems, suggesting avenues for future research.

💡 Why This Paper Matters

This paper is particularly relevant as it uncovers critical security vulnerabilities in Multimodal Large Language Models (MLLMs) through the introduction of a new form of adversarial attack, Image-based Prompt Injection (IPI). It highlights the potential for attackers to craft subtle, effective instructions embedded within natural images, presenting a significant threat to the integrity of AI systems that rely on multimodal inputs. Overall, the findings stress the urgency of developing sophisticated defenses against such attacks to secure AI applications in various domains, including autonomous systems and image processing tools.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would be interested in this paper because it addresses a pressing vulnerability in current AI systems, specifically the black-box manipulation of MLLMs through visual means. Understanding the mechanics and implications of prompt injection attacks like IPI will empower researchers to improve model robustness, design more secure AI systems, and address ethical concerns around AI behavior control and adversarial robustness. Furthermore, the proposed defense strategies could inform future research directions in securing AI infrastructure.

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper