← Back to Library

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

Authors: Jingyu Zhang

Published: 2025-12-30

arXiv ID: 2512.24415v1

Added to Library: 2026-01-07 10:05 UTC

Red Teaming

📄 Abstract

Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction style can be exploited: a small fraction of users can induce unauthorized concessions, shifting costs to others and eroding trust in agentic workflows. We present a cross-domain benchmark of profit-seeking direct prompt injection in customer-service interactions, spanning 10 service domains and 100 realistic attack scripts grouped into five technique families. Across five widely used models under a unified rubric with uncertainty reporting, attacks are highly domain-dependent (airline support is most exploitable) and technique-dependent (payload splitting is most consistently effective). We release data and evaluation code to support reproducible auditing and to inform the design of oversight and recovery workflows for trustworthy, human centered agent interfaces.

🔍 Key Points

  • Introduced a benchmark for profit-seeking prompt injection in customer service interactions across 10 service domains and 100 attack scripts.
  • Demonstrated that vulnerability to prompt injection is highly domain-dependent, with airline support being the most exploitable.
  • Identified that payload splitting is the most consistently effective attack technique among the evaluated families.
  • Provided a unified protocol for comparing vulnerabilities across five widely used language models.
  • Released data and evaluation code to support replicability and auditing for improved oversight in AI systems.

💡 Why This Paper Matters

This paper is crucial as it highlights the vulnerabilities of language model agents in customer service settings, demonstrating how seemingly helpful interactions can be exploited by malicious users. The benchmark established serves as an essential tool for assessing and improving the security and trustworthiness of AI-driven customer service systems, thereby contributing to the development of safer AI applications.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper particularly interesting because it addresses the emerging threat of prompt injection attacks on language models, an area that has not been extensively benchmarked. The paper provides empirical data on model vulnerabilities and attack effectiveness across different domains, which can inform the design of more resilient AI systems and help in formulating defense mechanisms against adversarial tactics.

📚 Read the Full Paper