← Back to Library

Memory Poisoning Attack and Defense on Memory Based LLM-Agents

Authors: Balachandra Devarangadi Sunil, Isheeta Sinha, Piyush Maheshwari, Shantanu Todmal, Shreyan Mallik, Shuchi Mishra

Published: 2026-01-09

arXiv ID: 2601.05504v2

Added to Library: 2026-01-13 04:01 UTC

Safety

πŸ“„ Abstract

Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory and influence future responses. Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate and 70 % attack success rate under idealized conditions. However, the robustness of these attacks in realistic deployments and effective defensive mechanisms remain understudied. This work addresses these gaps through systematic empirical evaluation of memory poisoning attacks and defenses in Electronic Health Record (EHR) agents. We investigate attack robustness by varying three critical dimensions: initial memory state, number of indication prompts, and retrieval parameters. Our experiments on GPT-4o-mini, Gemini-2.0-Flash and Llama-3.1-8B-Instruct models using MIMIC-III clinical data reveal that realistic conditions with pre-existing legitimate memories dramatically reduce attack effectiveness. We then propose and evaluate two novel defense mechanisms: (1) Input/Output Moderation using composite trust scoring across multiple orthogonal signals, and (2) Memory Sanitization with trust-aware retrieval employing temporal decay and pattern-based filtering. Our defense evaluation reveals that effective memory sanitization requires careful trust threshold calibration to prevent both overly conservative rejection (blocking all entries) and insufficient filtering (missing subtle attacks), establishing important baselines for future adaptive defense mechanisms. These findings provide crucial insights for securing memory-augmented LLM agents in production environments.

πŸ” Key Points

  • The paper demonstrates high vulnerability of memory-augmented LLM agents to Memory Injection Attacks (MINJA), achieving over 95% injection success in controlled settings and emphasizing the risk in real-world applications like healthcare EHR systems.
  • The authors systematically evaluated the effectiveness of MINJA under realistic operational conditions, showing that realistic memory states can significantly reduce attack effectiveness, which was not explored in prior research.
  • Two novel defense mechanisms were proposed: Input/Output Moderation, utilizing composite trust scoring, and Memory Sanitization with trust-aware retrieval, showing promising results in mitigating memory poisoning while emphasizing the importance of trust threshold calibration.
  • The study highlights the delicate balance between security and usability, revealing that overly conservative defenses can lead to ineffective memory utilization, while also exposing the susceptibility of trust-based defenses to adversarial manipulation.
  • The findings establish baselines for further research, suggesting the need for adaptive defenses that adjust based on observed trust distributions and specific application needs.

πŸ’‘ Why This Paper Matters

This paper provides essential insights into the vulnerabilities and the effectiveness of defense mechanisms for memory-augmented language model agents, particularly in high-stakes environments like healthcare. By uncovering the nuances of attack robustness and defense calibration, it lays groundwork for developing more resilient AI systems that can operate securely in real-world settings.

🎯 Why It's Interesting for AI Security Researchers

This paper is crucial for AI security researchers because it addresses a growing area of concernβ€”memory poisoning in language models. The findings detail both the potential risks and vulnerabilities inherent in persistent memory systems, alongside empirical evaluations of novel defense strategies that can inform future security protocols and AI design frameworks. This work is particularly relevant as AI systems become increasingly integrated into sensitive applications requiring robust security measures.

πŸ“š Read the Full Paper