Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

📄 Abstract

Even when prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection (MTI), a modular framework that systematically perturbs cached key vectors at selected layers and timesteps through controlled magnitude and frequency, using additive Gaussian noise, zeroing, and orthogonal rotations. A theoretical analysis quantifies how these perturbations propagate through attention, linking logit deviations to the Frobenius norm of corruption and softmax Lipschitz dynamics. Empirical results show that MTI significantly alters next-token distributions and downstream task performance across GPT-2 and LLaMA-2/7B, as well as destabilizes retrieval-augmented and agentic reasoning pipelines. These findings identify cache integrity as a critical yet underexplored vulnerability in current LLM deployments, positioning cache corruption as a reproducible and theoretically grounded threat model for future robustness and security research.

🔍 Key Points

Introduction of the Malicious Token Injection (MTI) attack framework that targets the key-value cache of transformer models during inference, revealing a significant attack surface that has been largely overlooked.
Theoretically quantifies the impact of cache perturbations on attention mechanisms, linking the extent of cache corruption to shifts in token distribution and downstream model performance.
Empirical results show that the MTI framework consistently reduces task performance across various NLP benchmarks, including classification and question answering, highlighting the vulnerability during inference.
Identifies specific vulnerabilities in retrieval-augmented generation systems and agentic reasoning pipelines, demonstrating that perturbations in cached representations can significantly impair their functionality.
Presents lightweight defense strategies such as cache resetting and dropout-mask randomization that offer partial mitigation against cache corruption, underlining the importance of cache integrity in model robustness.

💡 Why This Paper Matters

This paper is highly relevant as it introduces a novel perspective on the vulnerabilities of large language models by focusing on the key-value cache during inference. The findings stress the importance of maintaining cache integrity to ensure robust and secure LLM deployments, particularly in safety-critical applications. It paves the way for future research on defending against cache-side attacks and understanding their broader implications in model behavior and reliability.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is of significant interest as it not only exposes a critical and previously underexplored area of vulnerability in large language models but also provides a formalized methodology for evaluating such risks. The introduction of a systematic attack framework paired with theoretical and empirical validations presents a comprehensive case for reconsidering how LLM security is approached. As AI systems become more integrated into sensitive applications, understanding and mitigating these vulnerabilities is crucial.

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper