← Back to Library

SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

Authors: Hongfei Xia, Hongru Wang, Zeming Liu, Qian Yu, Yuhang Guo, Haifeng Wang

Published: 2025-09-09

arXiv ID: 2509.07315v1

Added to Library: 2025-09-10 04:01 UTC

Safety

📄 Abstract

Large Language Models (LLMs) have exhibited great performance in autonomously calling various tools in external environments, leading to better problem solving and task automation capabilities. However, these external tools also amplify potential risks such as financial loss or privacy leakage with ambiguous or malicious user instructions. Compared to previous studies, which mainly assess the safety awareness of LLMs after obtaining the tool execution results (i.e., retrospective evaluation), this paper focuses on prospective ways to assess the safety of LLM tool utilization, aiming to avoid irreversible harm caused by directly executing tools. To this end, we propose SafeToolBench, the first benchmark to comprehensively assess tool utilization security in a prospective manner, covering malicious user instructions and diverse practical toolsets. Additionally, we propose a novel framework, SafeInstructTool, which aims to enhance LLMs' awareness of tool utilization security from three perspectives (i.e., \textit{User Instruction, Tool Itself, and Joint Instruction-Tool}), leading to nine detailed dimensions in total. We experiment with four LLMs using different methods, revealing that existing approaches fail to capture all risks in tool utilization. In contrast, our framework significantly enhances LLMs' self-awareness, enabling a more safe and trustworthy tool utilization.

🔍 Key Points

  • The introduction of SafeToolBench, a benchmark that assesses tool utilization safety in LLMs in a prospective manner, covering various risky instructions and applications.
  • Development of the SafeInstructTool framework, which enhances LLMs' awareness regarding tool utilization security from three perspectives: User Instruction, Tool Itself, and Joint Instruction-Tool.
  • Comprehensive experiments conducted with multiple LLMs demonstrating that SafeInstructTool significantly outperforms existing methods in identifying risks associated with tool utilization.
  • Establishment of risk categories (Privacy Leak, Property Damage, Physical Injury, Bias & Offensiveness) to systematically evaluate and categorize potential threats from LLM tool interactions.
  • Analysis highlighting the necessity of considering joint risks arising from both user instructions and tools, further illustrating gaps in current LLM safety assessments.

💡 Why This Paper Matters

This paper is crucial as it addresses a significant gap in AI safety, focusing not just on reactive measures post-execution but on proactive strategies to prevent risks associated with LLMs using external tools. By providing both a benchmark and a framework to assess and improve tool utilization safety, this work lays the foundation for safer interactions between LLMs and real-world applications.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of interest to AI security researchers due to its focus on identifying and mitigating potential risks associated with LLM interactions with external tools, a growing concern in the deployment of AI systems. The introduction of a prospective assessment framework and benchmark equips researchers with tools to explore safety methodologies, contributing to the broader discourse on AI ethics and security.

📚 Read the Full Paper