← Back to Library

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Authors: Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

Published: 2025-09-28

arXiv ID: 2509.23694v3

Added to Library: 2025-12-08 18:01 UTC

Red Teaming

📄 Abstract

Search agents connect LLMs to the Internet, enabling access to broader and more up-to-date information. However, unreliable search results may also pose safety threats to end users, establishing a new threat surface. In this work, we conduct two in-the-wild experiments to demonstrate both the prevalence of low-quality search results and their potential to misguide agent behaviors. To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient, enabling lightweight and harmless safety assessments of search agents. Building on this framework, we construct the SafeSearch benchmark, which includes 300 test cases covering five categories of risks (e.g., misinformation and indirect prompt injection). Using this benchmark, we evaluate three representative search agent scaffolds, covering search workflow, tool-calling, and deep research, across 7 proprietary and 8 open-source backend LLMs. Our results reveal substantial vulnerabilities of LLM-based search agents: when exposed to unreliable websites, the highest ASR reached 90.5% for GPT-4.1-mini under a search workflow setting. Moreover, our analysis highlights the limited effectiveness of common defense practices, such as reminder prompting. This emphasizes the value of our framework in promoting transparency for safer agent development. Our codebase and test cases are publicly available: https://github.com/jianshuod/SafeSearch.

🔍 Key Points

  • Introduces an automated red-teaming framework for assessing the safety of LLM-based search agents, which is systematic, scalable, and cost-effective.
  • Developed the SafeSearch benchmark comprising 300 test cases across five categories of risks, including misinformation and harmful outputs, allowing for thorough assessment of search agent vulnerabilities.
  • Conducted experiments revealing vulnerabilities of LLM-based search agents, with findings that demonstrate significant risks posed by unreliable search results, such as up to 90.5% attack success rate in certain settings.
  • Evaluated defense strategies, highlighting the limited effectiveness of common practices like reminder prompting and emphasizing the need for more robust safety mechanisms.
  • Demonstrated the ability of search agents to achieve both safety and helpfulness through better design, particularly under complex scaffold systems.

💡 Why This Paper Matters

This paper makes significant strides in understanding and mitigating the risks associated with LLM-based search agents, a critical area as these agents become more integrated into information retrieval systems. By providing a systematic approach to identifying safety vulnerabilities, this work lays the groundwork for creating safer AI systems that can navigate the complexities of real-world internet content. This is invaluable for the development of AI technologies that must responsibly handle information.

🎯 Why It's Interesting for AI Security Researchers

As AI systems are increasingly deployed in real-world scenarios, understanding and ensuring their safety becomes paramount. This paper addresses crucial aspects of AI security, particularly regarding the vulnerabilities that arise when LLMs interact with real-time search results. As such, it is highly relevant for AI security researchers interested in developing robust defense mechanisms and safety protocols for intelligent systems that leverage external information sources.

📚 Read the Full Paper