← Back to Library

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

Authors: Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong

Published: 2025-10-17

arXiv ID: 2510.15476v1

Added to Library: 2025-10-20 04:00 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs. Despite intense research into both attack and defense techniques, the field remains fragmented: definitions, threat models, and evaluation criteria vary widely, impeding systematic progress and fair comparison. In this Systematization of Knowledge (SoK), we address these challenges by (1) proposing a holistic, multi-level taxonomy that organizes attacks, defenses, and vulnerabilities in LLM prompt security; (2) formalizing threat models and cost assumptions into machine-readable profiles for reproducible evaluation; (3) introducing an open-source evaluation toolkit for standardized, auditable comparison of attacks and defenses; (4) releasing JAILBREAKDB, the largest annotated dataset of jailbreak and benign prompts to date; and (5) presenting a comprehensive evaluation and leaderboard of state-of-the-art methods. Our work unifies fragmented research, provides rigorous foundations for future studies, and supports the development of robust, trustworthy LLMs suitable for high-stakes deployment.

🔍 Key Points

  • Proposed a comprehensive, multi-level taxonomy that systematically organizes attacks, defenses, and vulnerabilities in prompt security of large language models.
  • Introduced an open-source evaluation toolkit that allows standardized and reproducible comparison of prompt attack and defense methods.
  • Released JAILBREAKDB, a large and well-annotated dataset of jailbreak and benign prompts, facilitating future research in LLM security.
  • Formalized threat models in machine-readable profiles to enable consistent evaluation of LLM vulnerabilities and defenses.
  • Conducted a comprehensive evaluation of state-of-the-art attacks and defenses to identify strengths and weaknesses across various methods.

💡 Why This Paper Matters

This paper plays a crucial role in unifying fragmented research on prompt security in large language models by providing a structured framework for understanding vulnerabilities and defenses. Its empirical toolkit and extensive datasets enable systematic evaluation and comparison, paving the way for more robust AI systems.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper highly relevant as it addresses pressing security concerns surrounding large language models, particularly through its proposed methodologies for identifying and evaluating vulnerabilities. The systematic approach and resources offered can aid in the development of effective defenses against adversarial attacks and contribute to the overall safety and reliability of AI applications.

📚 Read the Full Paper