← Back to Library

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Authors: Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh

Published: 2026-03-31

arXiv ID: 2603.30016v1

Added to Library: 2026-04-01 03:01 UTC

Red Teaming

📄 Abstract

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.

🔍 Key Points

  • The paper articulates the need for dynamic replanning and policy updates in AI agents to ensure security and adaptability in dynamic environments, explicitly addressing vulnerabilities to indirect prompt injection attacks.
  • It emphasizes that decisions regarding sensitive contexts and actions should involve LLMs within strictly controlled environments to mitigate security risks while leveraging model expressivity for complex decision-making.
  • A human-in-the-loop approach is critical for dealing with ambiguous scenarios, highlighting the importance of personalization and interactive feedback in enhancing the overall security of AI agents.

💡 Why This Paper Matters

This paper presents a comprehensive framework for designing secure AI agents against indirect prompt injection attacks, blending system-level defenses with model-based decision-making, and underscoring the necessity of human oversight. Its insights provide a much-needed blueprint for improving the safety and robustness of AI agents in real-world applications, especially as their use proliferates in sensitive domains.

🎯 Why It's Interesting for AI Security Researchers

The findings in this paper are highly relevant to AI security researchers, as they address the growing concerns around the vulnerability of AI agents to various prompt injection attacks. By providing new perspectives on system architecture, dynamic security policies, and the integration of human feedback, this research contributes to the development of more secure, resilient AI systems. As AI continues to be integrated into critical infrastructure and applications, understanding these challenges and proposed solutions is vital for building trust in AI technologies.

📚 Read the Full Paper