Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

Authors: Wenhui Zhu, Xuanzhao Dong, Xiwen Chen, Rui Cai, Peijie Qiu, Zhipeng Wang, Oana Frunza, Shao Tang, Jindong Gu, Yalin Wang

Published: 2026-04-04

arXiv ID: 2604.03870v1

Added to Library: 2026-04-07 02:01 UTC

Red Teaming

📄 Abstract

The rapid deployment of open-source frameworks has significantly advanced the development of modern multi-agent systems. However, expanded action spaces, including uncontrolled privilege exposure and hidden inter-system interactions, pose severe security challenges. Specifically, Indirect Prompt Injections (IPI), which conceal malicious instructions within third-party content, can trigger unauthorized actions such as data exfiltration during normal operations. While current security evaluations predominantly rely on isolated single-turn benchmarks, the systemic vulnerabilities of these agents within complex dynamic environments remain critically underexplored. To bridge this gap, we systematically evaluate six defense strategies against four sophisticated IPI attack vectors across nine LLM backbones. Crucially, we conduct our evaluation entirely within dynamic multi-step tool-calling environments to capture the true attack surface of modern autonomous agents. Moving beyond binary success rates, our multidimensional analysis reveals a pronounced fragility. Advanced injections successfully bypass nearly all baseline defenses, and some surface-level mitigations even produce counterproductive side effects. Furthermore, while agents execute malicious instructions almost instantaneously, their internal states exhibit abnormally high decision entropy. Motivated by this latent hesitation, we investigate Representation Engineering (RepE) as a robust detection strategy. By extracting hidden states at the tool-input position, we revealed that the RepE-based circuit breaker successfully identifies and intercepts unauthorized actions before the agent commits to them, achieving high detection accuracy across diverse LLM backbones. This study exposes the limitations of current IPI defenses and provides a highly practical paradigm for building resilient multi-agent architectures.

🔍 Key Points

The paper identifies vulnerabilities in Large Language Model (LLM) agents, specifically Indirect Prompt Injection (IPI) attacks that can exploit these systems through disguised malicious instructions.
A comprehensive evaluation of six defense strategies against four sophisticated IPI attack vectors shows that current defenses are brittle and often exacerbate vulnerabilities.
The study introduces Representation Engineering (RepE) as a novel detection strategy, which successfully intercepts unauthorized actions before agents execute them, demonstrating significantly higher detection accuracy.
The analysis reveals that while agents can perform malicious instructions quickly, they exhibit high decision entropy, indicating internal conflict, thus providing insights into model behavior during exploitation.
The paper proposes an experimental framework that goes beyond binary success rates to a multidimensional evaluation, capturing vulnerabilities under dynamic, real-world interactions.

💡 Why This Paper Matters

This paper is crucial in shedding light on the vulnerabilities of modern LLM agents to sophisticated attacks like Indirect Prompt Injection, highlighting the limitations of conventional defense mechanisms. By proposing a robust detection strategy through Representation Engineering, it sets a new standard for enhancing the security of autonomous agents and ensures their safe deployment in dynamic environments.

🎯 Why It's Interesting for AI Security Researchers

This paper will engage AI security researchers as it provides empirical insights into the emerging threats against LLM agents, the deficiencies of current mitigation strategies, and a promising alternative for safer agent architecture. It emphasizes the importance of robust assessment frameworks and paves the way for developing security protocols to safeguard LLM-integrated systems in real-world applications.

Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper