← Back to Library

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Authors: Ruoyao Wen, Hao Li, Chaowei Xiao, Ning Zhang

Published: 2026-02-07

arXiv ID: 2602.07398v1

Added to Library: 2026-02-10 03:03 UTC

Red Teaming

📄 Abstract

Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft. LLM agents maintain working memory through their context window, which stores interaction history for decision-making. Conventional agents indiscriminately accumulate all tool outputs and reasoning traces in this memory, creating two critical vulnerabilities: (1) injected instructions persist throughout the workflow, granting attackers multiple opportunities to manipulate behavior, and (2) verbose, non-essential content degrades decision-making capabilities. Existing defenses treat bloated memory as given and focus on remaining resilient, rather than reducing unnecessary accumulation to prevent the attack. We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management. Inspired by process memory isolation in operating systems, AgentSys organizes agents hierarchically: a main agent spawns worker agents for tool calls, each running in an isolated context and able to spawn nested workers for subtasks. External data and subtask traces never enter the main agent's memory; only schema-validated return values can cross boundaries through deterministic JSON parsing. Ablations show isolation alone cuts attack success to 2.19%, and adding a validator/sanitizer further improves defense with event-triggered checks whose overhead scales with operations rather than context length. On AgentDojo and ASB, AgentSys achieves 0.78% and 4.25% attack success while slightly improving benign utility over undefended baselines. It remains robust to adaptive attackers and across multiple foundation models, showing that explicit memory management enables secure, dynamic LLM agent architectures. Our code is available at: https://github.com/ruoyaow/agentsys-memory.

🔍 Key Points

  • Introduction of AgentSys, a framework for managing LLM agent memory to mitigate indirect prompt injection attacks.
  • Implementation of hierarchical memory management through isolated worker agents that only allow schema-validated communication with the main agent.
  • Demonstrated significant reduction in attack success rates (ASR) to 0.78% on AgentDojo and 4.25% on ASB while maintaining or improving benign utility compared to undefended agents.
  • Evaluation shows robustness against adaptive attackers, effectively mitigating dimensions of attack persistence and utility degradation typically encountered in LLM systems.
  • Ablation studies validate the importance of explicit memory management in enhancing security without imposing excessive overhead.

💡 Why This Paper Matters

This paper is relevant and important because it addresses a pressing security vulnerability in large language models (LLMs) by proposing a novel architecture for memory management that successfully mitigates risks from indirect prompt injection. As LLMs become more integrated into critical applications, clear frameworks like AgentSys enhance both their security and effectiveness, making it a vital resource for AI systems development.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of particular interest to AI security researchers as it presents innovative defense mechanisms against emerging threats in LLMs. By showcasing a systematic approach to memory management, it contributes to the ongoing dialogue on securing AI models, addressing key vulnerabilities that could compromise data integrity and user trust. The findings and methodologies proposed could inspire new lines of research in protecting LLMs and advancing secure AI practices.

📚 Read the Full Paper