← Back to Library

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Authors: Jianshuo Dong, Sheng Guo, Hao Wang, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

Published: 2025-09-28

arXiv ID: 2509.23694v1

Added to Library: 2025-09-30 04:06 UTC

Safety

📄 Abstract

Search agents connect LLMs to the Internet, enabling access to broader and more up-to-date information. However, unreliable search results may also pose safety threats to end users, establishing a new threat surface. In this work, we conduct two in-the-wild experiments to demonstrate both the prevalence of low-quality search results and their potential to misguide agent behaviors. To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient, enabling lightweight and harmless safety assessments of search agents. Building on this framework, we construct the SafeSearch benchmark, which includes 300 test cases covering five categories of risks (e.g., misinformation and indirect prompt injection). Using this benchmark, we evaluate three representative search agent scaffolds, covering search workflow, tool-calling, and deep research, across 7 proprietary and 8 open-source backend LLMs. Our results reveal substantial vulnerabilities of LLM-based search agents: when exposed to unreliable websites, the highest ASR reached 90.5% for GPT-4.1-mini under a search workflow setting. Moreover, our analysis highlights the limited effectiveness of common defense practices, such as reminder prompting. This emphasizes the value of our framework in promoting transparency for safer agent development. Our codebase and test cases are publicly available: https://github.com/jianshuod/SafeSearch.

🔍 Key Points

  • Introduction of a systematic, automated red-teaming framework to enhance the safety of LLM-based search agents.
  • Development of the SafeSearch benchmark comprising 300 test cases that cover five categories of risks including misinformation and prompt injection.
  • Real-world experiments demonstrating substantial vulnerabilities of LLM-based search agents, with up to 90.5% attack success rate when exposed to unreliable websites.
  • Evaluation of the effectiveness of common defense strategies, revealing limited success, particularly with reminder prompting and outlining a significant knowledge-action gap.
  • Results underscore the importance of balancing safety and helpfulness in agent design, suggesting that safer agents can still provide valuable assistance.

💡 Why This Paper Matters

This paper is crucial as it not only highlights the significant risks associated with LLM-integrated search agents due to unreliable internet results, but it also proposes a novel evaluative framework (SafeSearch) and benchmark to systematically identify and mitigate these risks. Its findings contribute to the understanding of how LLMs can be safely operated in real-world environments, paving the way for safer AI applications.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper relevant due to its focus on enhancing the safety and reliability of AI systems that interact with external information sources. The introduction of a testing framework to proactively identify vulnerabilities is an important step that addresses growing concerns around misinformation and user safety, making it a valuable resource for those aiming to improve AI safety standards.

📚 Read the Full Paper