RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

📄 Abstract

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face a critical trade-off: prevention-based fine-tuning often degrades general utility via the "alignment tax", while detection-based filtering incurs prohibitive latency and memory costs. To bridge this gap, we propose RedVisor, a unified framework that synthesizes the explainability of detection systems with the seamless integration of prevention strategies. To the best of our knowledge, RedVisor is the first approach to leverage fine-grained reasoning paths to simultaneously detect attacks and guide the model's safe response. We implement this via a lightweight, removable adapter positioned atop the frozen backbone. This adapter serves a dual function: it first generates an explainable analysis that precisely localizes the injection and articulates the threat, which then explicitly conditions the model to reject the malicious command. Uniquely, the adapter is active only during this reasoning phase and is effectively muted during the subsequent response generation. This architecture yields two distinct advantages: (1) it mathematically preserves the backbone's original utility on benign inputs; and (2) it enables a novel KV Cache Reuse strategy, eliminating the redundant prefill computation inherent to decoupled pipelines. We further pioneer the integration of this defense into the vLLM serving engine with custom kernels. Experiments demonstrate that RedVisor outperforms state-of-the-art defenses in detection accuracy and throughput while incurring negligible utility loss.

🔍 Key Points

Introduction of RedVisor, a unified framework that combines detection and prevention strategies to mitigate prompt injection attacks on LLMs.
Utilization of a lightweight, removable adapter that operates only during security reasoning, preserving the original model utility during response generation.
Pioneering the use of a Zero-Copy KV Cache Reuse strategy that significantly reduces computational overhead by eliminating redundant operations during inference phases.
Demonstration of superior detection accuracy and throughput in extensive experiments across multiple benchmark datasets, outperforming state-of-the-art alternatives while maintaining low attack success rates.

💡 Why This Paper Matters

The paper presents RedVisor as an innovative solution to a pressing security challenge in large language models, namely prompt injection attacks. By integrating detection mechanisms with safe response guidance while ensuring low latency and computational efficiency, it significantly enhances the reliability and usability of AI applications in sensitive and security-critical environments. Its ability to preserve the utility of the underlying LLM while effectively guarding against specific types of attacks makes it a valuable advancement in AI security.

🎯 Why It's Interesting for AI Security Researchers

This paper is of great interest to AI security researchers as it addresses a critical vulnerability in LLMs—prompt injection attacks. With the increasing deployment of AI models in various applications, understanding and mitigating such security threats is vital. The novel methods proposed, including the architecture of the RedVisor framework and the Zero-Copy KV Cache Reuse strategy, provide innovative solutions to enhance model safety without sacrificing performance, making it a significant contribution to the field.

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper