← Back to Library

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Authors: Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Błaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H. S. Torr, Adam Mahdi, Adel Bibi

Published: 2025-12-29

arXiv ID: 2512.23128v1

Added to Library: 2026-01-07 10:07 UTC

Red Teaming

📄 Abstract

Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25\% of tasks on average (13\% for GPT-5 to 43\% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.

🔍 Key Points

  • Introduction of the Task-Redirecting Agent Persuasion Benchmark (TRAP) for evaluating prompt injection attacks on web-based agents.
  • Demonstrated average susceptibility of 25% to prompt injections across six LLM models, highlighting model-specific vulnerabilities.
  • Developed a modular attack framework that integrates components of persuasion and interaction, enabling versatile experimental setups and comprehensive analysis.
  • Presented empirical findings that effective manipulation techniques vary based on user interface forms (buttons vs hyperlinks) and contextual tailoring.
  • Identified significant differences in attack success rates across models, emphasizing the correlation between model robustness and vulnerability.

💡 Why This Paper Matters

The TRAP benchmark provides an essential tool for assessing and improving the security of LLM-driven web agents against sophisticated prompt injection attacks. By systematically exploring the vulnerabilities of various models and exposing their weaknesses, this paper highlights critical areas for enhancing the resilience and reliability of AI systems in real-world applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is especially pertinent for AI security researchers as it not only identifies and categorizes vulnerabilities within LLM agents but also offers a structured framework for evaluating and defending against such attacks. By detailing specific manipulation methods and their effectiveness, the findings contribute to a deeper understanding of AI security challenges and inform the development of stronger protective measures.

📚 Read the Full Paper