Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Authors: Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu

Published: 2025-12-09

arXiv ID: 2512.08417v1

Added to Library: 2025-12-10 03:00 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents Rennervate, a defense framework to detect and prevent IPI attacks. Rennervate leverages attention features to detect the covert injection at a fine-grained token level, enabling precise sanitization that neutralizes IPI attacks while maintaining LLM functionalities. Specifically, the token-level detector is materialized with a 2-step attentive pooling mechanism, which aggregates attention heads and response tokens for IPI detection and sanitization. Moreover, we establish a fine-grained IPI dataset, FIPI, to be open-sourced to support further research. Extensive experiments verify that Rennervate outperforms 15 commercial and academic IPI defense methods, achieving high precision on 5 LLMs and 6 datasets. We also demonstrate that Rennervate is transferable to unseen attacks and robust against adaptive adversaries.

🔍 Key Points

Introduction of Rennervate, a novel defense framework against Indirect Prompt Injection (IPI) attacks in Large Language Models (LLMs) using attention features for fine-grained detection and sanitization.
Development of a 2-step attentive pooling mechanism that aggregates attention heads and response tokens to improve detection accuracy and robustness against evasive IPI attacks.
Creation of a comprehensive fine-grained IPI dataset (FIPI) with 100,000 instances, aiding in the training and evaluation of defense methods against IPI attacks.
Extensive evaluations demonstrating Rennervate's superior performance compared to 15 existing defense methods, successfully detecting attacks with over 99% accuracy and achieving minimal impact on LLM functionality post-sanitization.
Rennervate showcases transferability to unseen IPI attacks and is robust against adaptive adversaries, confirming its practical applicability in varied real-world scenarios.

💡 Why This Paper Matters

This paper significantly contributes to the field of AI security by addressing the vulnerability of LLMs to IPI attacks, which have emerged as a critical threat. By introducing Rennervate, the authors not only provide a practical solution for mitigating this risk but also advance the understanding of how attention mechanisms within LLMs can be leveraged for security purposes. The results affirm the effectiveness of this approach, highlighting the importance of robust defense mechanisms in the deployment of AI technologies in sensitive applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is of immense interest to AI security researchers as it tackles a pressing vulnerability specific to LLMs—Indirect Prompt Injection attacks. The innovative methods presented, such as the attention-based detection framework and the comprehensive dataset, establish a foundation for further research in the field of AI security. Furthermore, as LLM applications proliferate across various industries, understanding and mitigating security threats becomes crucial, making this research a key contribution to safeguarding AI technologies.

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper