An Early Categorization of Prompt Injection Attacks on Large Language Models

Authors: Sippo Rossi, Alisia Marianne Michel, Raghava Rao Mukkamala, Jason Bennett Thatcher

Published: 2024-01-31

arXiv ID: 2402.00898v1

Added to Library: 2025-11-11 14:13 UTC

Red Teaming

📄 Abstract

Large language models and AI chatbots have been at the forefront of democratizing artificial intelligence. However, the releases of ChatGPT and other similar tools have been followed by growing concerns regarding the difficulty of controlling large language models and their outputs. Currently, we are witnessing a cat-and-mouse game where users attempt to misuse the models with a novel attack called prompt injections. In contrast, the developers attempt to discover the vulnerabilities and block the attacks simultaneously. In this paper, we provide an overview of these emergent threats and present a categorization of prompt injections, which can guide future research on prompt injections and act as a checklist of vulnerabilities in the development of LLM interfaces. Moreover, based on previous literature and our own empirical research, we discuss the implications of prompt injections to LLM end users, developers, and researchers.

🔍 Key Points

Introduction of a categorization framework for prompt injection attacks on large language models (LLMs), helping to systematically identify and understand vulnerabilities.
Identification of two main categories of prompt injections: direct and indirect, further subdivided into specific classes, highlighting the diversity of attack vectors.
Empirical research illustrating various examples and implications of prompt injection attacks, showcasing the significant risks they pose to end-users and developers of LLMs.
Discussion of the ethical considerations and limitations associated with prompt injection research and developing countermeasures.
Recommendations for future research directions, focusing on understanding and mitigating the vulnerabilities of LLM interfaces against prompt injections.

💡 Why This Paper Matters

This paper is significant as it provides a foundational understanding and framework for emerging research on prompt injections, which are critical vulnerabilities in the deployment and use of large language models. By categorizing and analyzing these attacks, the authors contribute to a growing field that seeks to ensure the safe use of AI technologies in practical applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is crucial for AI security researchers because it addresses a new and rapidly evolving threat landscape associated with LLMs. Understanding prompt injections not only aids in developing more secure AI systems but also informs developers and organizations about potential attack vectors and the necessary defenses required to protect against them.

An Early Categorization of Prompt Injection Attacks on Large Language Models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper