SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Authors: Jianshuo Dong, Sheng Guo, Hao Wang, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

Published: 2025-09-28

arXiv ID: 2509.23694v2

Added to Library: 2025-10-02 01:01 UTC

Safety

📄 Abstract

Search agents connect LLMs to the Internet, enabling access to broader and more up-to-date information. However, unreliable search results may also pose safety threats to end users, establishing a new threat surface. In this work, we conduct two in-the-wild experiments to demonstrate both the prevalence of low-quality search results and their potential to misguide agent behaviors. To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient, enabling lightweight and harmless safety assessments of search agents. Building on this framework, we construct the SafeSearch benchmark, which includes 300 test cases covering five categories of risks (e.g., misinformation and indirect prompt injection). Using this benchmark, we evaluate three representative search agent scaffolds, covering search workflow, tool-calling, and deep research, across 7 proprietary and 8 open-source backend LLMs. Our results reveal substantial vulnerabilities of LLM-based search agents: when exposed to unreliable websites, the highest ASR reached 90.5% for GPT-4.1-mini under a search workflow setting. Moreover, our analysis highlights the limited effectiveness of common defense practices, such as reminder prompting. This emphasizes the value of our framework in promoting transparency for safer agent development. Our codebase and test cases are publicly available: https://github.com/jianshuod/SafeSearch.

🔍 Key Points

Introduction of the SafeSearch benchmark with 300 test cases categorized into five risk types, aimed at assessing the safety of LLM-based search agents.
Development of an automated red-teaming framework that provides systematic, scalable, and cost-efficient evaluations of search agents against unreliable search results.
Demonstration of substantial vulnerabilities in LLM-based search agents, highlighting a maximum Attack Success Rate (ASR) of 90.5% in certain settings, emphasizing the ineffectiveness of current defense strategies like reminder prompting.
Findings suggest that search agents can misguide users due to unreliable web content, with the analysis revealing the interplay between agent scaffolds, backend LLM choice, and agent safety outcomes.
Contribution to the understanding of the limitations of existing safety evaluation practices, advocating for the necessity of transparent evaluation frameworks in search agent development.

💡 Why This Paper Matters

This paper makes a significant contribution to the field of AI safety by addressing the emerging threat landscape for LLM-based search agents. The proposed SafeSearch benchmark and red-teaming framework collectively enhance the rigor of safety evaluations, enabling developers to identify vulnerabilities while promoting transparency and accountability in AI systems. These advancements are crucial for building more reliable AI systems that can safely interact with web content.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper highly relevant due to its comprehensive approach to identifying vulnerabilities in LLM-based search systems. The introduction of systematic evaluation methods and the direct empirical evidence of risks posed by unreliable search results provide a critical foundation for improving AI safety protocols.

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper