TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

📄 Abstract

Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing defenses mainly detect anomalous instructions and remain ineffective against implicit TPAs. In this paper, we present TRUSTDESC, the first framework for preventing tool poisoning by automatically generating trusted tool descriptions from implementations. TRUSTDESC derives implementation-faithful descriptions through a three-stage pipeline. SliceMin performs reachability-aware static analysis and LLM-guided debloating to extract minimal tool-relevant code slices. DescGen synthesizes descriptions from these slices while mitigating misleading or adversarial code artifacts. DynVer refines descriptions through dynamic verification by executing synthesized tasks and validating behavioral claims. We evaluate TRUSTDESC on 52 real-world tools across multiple tool ecosystems. Results show that TRUSTDESC produces accurate tool descriptions that improve task completion rates while mitigating implicit TPAs at their root, with minimal time and monetary overhead.

🔍 Key Points

TrustDesc is the first framework to generate trusted tool descriptions from implementations, tackling tool poisoning attacks at their source.
The framework employs a three-stage pipeline: SliceMin for reachability-aware static analysis, DescGen for description synthesis, and DynVer for dynamic verification of descriptions.
Evaluation shows that TrustDesc improves task completion rates by 4.3% on average across 52 real-world tools, significantly mitigating both explicit and implicit tool poisoning attacks.
TrustDesc demonstrates minimal time and cost overhead in generating trusted descriptions, making it a practical solution for enhancing LLM applications.
The approach reveals latent security constraints within tool implementations, thus potentially improving the overall safety and reliability of LLM-integrated systems.

💡 Why This Paper Matters

This paper introduces TrustDesc, a groundbreaking framework that enhances the security and reliability of tool descriptions in LLM applications by addressing vulnerabilities posed by tool poisoning attacks. By automating the generation of trusted descriptions, it offers a significant improvement in task success rates while minimizing operational costs. The approach is crucial as reliance on potentially misleading or erroneous descriptions can lead to significant security risks in real-world applications, thus making TrustDesc a vital innovation for future developments in AI security.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper of particular interest as it addresses the growing concern of tool poisoning attacks in LLM applications, which can undermine the efficacy and safety of AI systems. TrustDesc's novel methodology combines static and dynamic analysis to provide a robust defense against these threats, presenting a significant advancement in the field of AI security. The paper not only contributes to the theoretical understanding of security vulnerabilities but also offers practical solutions that can be implemented in real-world applications, making it highly relevant for researchers focused on secure AI deployments.

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper