AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

📄 Abstract

Multimodal agents built on large vision-language models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to prompt injection, especially through visual inputs. We introduce AgentTypo, a black-box red-teaming framework that mounts adaptive typographic prompt injection by embedding optimized text into webpage images. Our automatic typographic prompt injection (ATPI) algorithm maximizes prompt reconstruction by substituting captioners while minimizing human detectability via a stealth loss, with a Tree-structured Parzen Estimator guiding black-box optimization over text placement, size, and color. To further enhance attack strength, we develop AgentTypo-pro, a multi-LLM system that iteratively refines injection prompts using evaluation feedback and retrieves successful past examples for continual learning. Effective prompts are abstracted into generalizable strategies and stored in a strategy repository, enabling progressive knowledge accumulation and reuse in future attacks. Experiments on the VWA-Adv benchmark across Classifieds, Shopping, and Reddit scenarios show that AgentTypo significantly outperforms the latest image-based attacks such as AgentAttack. On GPT-4o agents, our image-only attack raises the success rate from 0.23 to 0.45, with consistent results across GPT-4V, GPT-4o-mini, Gemini 1.5 Pro, and Claude 3 Opus. In image+text settings, AgentTypo achieves 0.68 ASR, also outperforming the latest baselines. Our findings reveal that AgentTypo poses a practical and potent threat to multimodal agents and highlight the urgent need for effective defense.

🔍 Key Points

Introduction of the AgentTypo framework, which utilizes adaptive typographic prompt injection to exploit vulnerabilities in multimodal agents using large vision-language models (LVLMs).
Development of an automatic typographic prompt injection (ATPI) algorithm that employs Bayesian optimization techniques to optimize text attributes like placement, font size, and stealthiness to enhance attack effectiveness while minimizing detection risks.
Proposal of AgentTypo-pro, a multi-LLM system that iteratively refines prompt injections through feedback, retrieves successful examples for continual learning, and abstracts effective prompts into generalizable strategies stored in a strategy repository.
Extensive experiments demonstrate that AgentTypo significantly outperforms existing baselines in various attack scenarios, achieving higher attack success rates (ASRs) across different LVLMs and indicating robust performance against multiple models. For instance, the image-only attack success rate improved from 0.23 to 0.45, illustrating real-world implications for AI security vulnerabilities.
Highlighting the urgent need for effective defenses against typographic prompt injection attacks as traditional detection mechanisms may be inadequate for such stealthy approaches.

💡 Why This Paper Matters

This paper presents significant advancements in the understanding of how typographic vulnerabilities can be weaponized against LVLM agents. The AgentTypo framework not only demonstrates the feasibility of such attacks but also establishes a methodological foundation for further exploration of multimodal vulnerabilities. The implications of this research resonate within the broader context of AI security, exposing critical risks inherent in deploying LVLM-based agents in real-world applications without adequate defenses.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is critically relevant as it addresses a novel attack vector—typographic prompt injection—against multimodal agents that use LVLMs, a rapidly evolving domain. The detailed exploration of attack methodologies and the empirical evidence supporting their effectiveness provide crucial insights into potential security vulnerabilities that could be exploited in practical applications. Furthermore, the findings underscore the necessity for improved defensive strategies against such sophisticated attacks, making the paper an important contribution to the field of AI security research.

AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper