Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Authors: Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu

Published: 2025-12-09

arXiv ID: 2512.08417v2

Added to Library: 2025-12-12 03:01 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents Rennervate, a defense framework to detect and prevent IPI attacks. Rennervate leverages attention features to detect the covert injection at a fine-grained token level, enabling precise sanitization that neutralizes IPI attacks while maintaining LLM functionalities. Specifically, the token-level detector is materialized with a 2-step attentive pooling mechanism, which aggregates attention heads and response tokens for IPI detection and sanitization. Moreover, we establish a fine-grained IPI dataset, FIPI, to be open-sourced to support further research. Extensive experiments verify that Rennervate outperforms 15 commercial and academic IPI defense methods, achieving high precision on 5 LLMs and 6 datasets. We also demonstrate that Rennervate is transferable to unseen attacks and robust against adaptive adversaries.

🔍 Key Points

Introduction of Rennervate, a novel defense framework against Indirect Prompt Injection (IPI) attacks in Large Language Models (LLMs) that leverages attention mechanisms for fine-grained detection and sanitization.
Development of a token-level detector with a 2-step attentive pooling technique that enhances the robustness and accuracy of identifying and neutralizing injected instructions.
Creation of the Fine-grained Indirect Prompt Injection (FIPI) dataset to support IPI research, containing 100,000 IPI instances across various NLP tasks, providing essential resources for testing and validating defenses against IPI attacks.
All experiments show that Rennervate outperforms 15 existing commercial and academic IPI defense methods on multiple LLMs, indicating strong effectiveness and generalizability to unseen attacks.
Demonstration of robust performance against adaptive adversaries, showcasing the practicality and resilience of Rennervate in real-world scenarios.

💡 Why This Paper Matters

This paper presents significant advancements in AI security through Rennervate, which addresses a critical vulnerability in LLMs—Indirect Prompt Injection attacks. By employing sophisticated attention-based techniques for detection and sanitization, the framework promises improved security for LLM-integrated applications, which are becoming increasingly prevalent in various domains such as finance, healthcare, and automated systems. The establishment of the FIPI dataset further aids research efforts in developing more robust AI defenses.

🎯 Why It's Interesting for AI Security Researchers

The findings of this paper are crucial for AI security researchers due to the growing reliance on LLMs in sensitive applications, where security threats like Indirect Prompt Injection can have severe consequences. Understanding and mitigating these risks directly contributes to the trustworthiness and safety of AI systems. Furthermore, the innovative methodologies and comprehensive evaluations presented offer valuable insights and tools that can inspire future research and development of secure AI technologies.

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper