← Back to Library

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Authors: Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, XiaoFeng Wang

Published: 2025-09-29

arXiv ID: 2510.02373v1

Added to Library: 2025-10-06 04:02 UTC

Safety

📄 Abstract

Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex environments. However, this reliance on memory introduces a critical security risk: an adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior. This vulnerability is characterized by two core aspects: First, the malicious effect of injected records is only activated within a specific context, making them hard to detect when individual memory entries are audited in isolation. Second, once triggered, the manipulation can initiate a self-reinforcing error cycle: the corrupted outcome is stored as precedent, which not only amplifies the initial error but also progressively lowers the threshold for similar attacks in the future. To address these challenges, we introduce A-MemGuard (Agent-Memory Guard), the first proactive defense framework for LLM agent memory. The core idea of our work is the insight that memory itself must become both self-checking and self-correcting. Without modifying the agent's core architecture, A-MemGuard combines two mechanisms: (1) consensus-based validation, which detects anomalies by comparing reasoning paths derived from multiple related memories and (2) a dual-memory structure, where detected failures are distilled into ``lessons'' stored separately and consulted before future actions, breaking error cycles and enabling adaptation. Comprehensive evaluations on multiple benchmarks show that A-MemGuard effectively cuts attack success rates by over 95% while incurring a minimal utility cost. This work shifts LLM memory security from static filtering to a proactive, experience-driven model where defenses strengthen over time. Our code is available in https://github.com/TangciuYueng/AMemGuard

🔍 Key Points

  • A-MemGuard is the first proactive defense framework designed specifically for LLM agents' memory, addressing vulnerabilities to memory poisoning attacks.
  • The framework employs a consensus-based validation mechanism that detects context-sensitive memory anomalies by comparing reasoning paths from multiple memories.
  • A dual-memory structure allows the LLM agents to learn from mistakes by storing detected failures as 'lessons' which help prevent recurrence of similar mistakes in future interactions.
  • Experimental results demonstrate A-MemGuard significantly reduces attack success rates by over 95%, while maintaining high performance and low utility loss in benign tasks.
  • The framework shows strong scalability and generalizability across multiple benchmarks, enhancing security without modifying the agent's core architecture.

💡 Why This Paper Matters

This paper introduces A-MemGuard, a critical advancement in securing LLM agents against memory poisoning attacks, an area previously underexplored in AI security. By proactively validating and correcting memory usage, it not only protects agent decision-making but also enables continual learning from previous errors, thus enhancing overall agent reliability. The demonstrated efficacy and minimal performance trade-offs make A-MemGuard a substantial contribution to the security of AI systems.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant for AI security researchers as it tackles a significant and emerging threat in the use of LLMs: memory poisoning attacks. It lays the groundwork for further exploration into proactive defense mechanisms, moving beyond reactive strategies that have been common in existing literature. The methodologies and findings presented provide a new perspective on enhancing the resilience of AI agents, making it a valuable resource for research in AI safety and security.

📚 Read the Full Paper