← Back to Library

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

Authors: Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong

Published: 2023-10-19

arXiv ID: 2310.12815v4

Added to Library: 2025-11-11 14:35 UTC

Red Teaming

📄 Abstract

A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

🔍 Key Points

  • The research introduces a formal framework for understanding and implementing prompt injection attacks in LLM-integrated applications, helping to bridge the gap in existing literature that largely focuses on case studies.
  • Ten distinct defense strategies against prompt injection attacks are evaluated systematically, with both prevention- and detection-based defenses categorized and assessed for effectiveness.
  • A comprehensive benchmark is established through an evaluation of five existing prompt injection attacks and ten defenses across ten large language models (LLMs) and seven different tasks, facilitating future research in this security domain.
  • Novel attacks are developed by combining existing methods, demonstrating that more effective prompt injection attacks can be created through a systematic understanding of patterns in prior attacks.
  • The authors make their experimental platform publicly available, promoting collaboration and further investigation within the research community.

💡 Why This Paper Matters

The study significantly advances the field of security related to large language models by providing a structured methodology for identifying and evaluating prompt injection attacks and defenses. It not only formalizes the understanding of these attacks but also lays the groundwork for future research in developing better defense mechanisms. This contribution is critical as LLM-integrated applications become increasingly prevalent, highlighting the necessity of robust security strategies to protect against new attack vectors.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant to AI security researchers as it tackles a pressing issue in the deployment of LLMs, which are vulnerable to manipulative inputs leading to undesired outputs. With the increasing application of LLMs in sensitive domains such as hiring, finance, and content moderation, understanding and mitigating vulnerabilities through structured methodologies is imperative. The framework proposed can serve as a foundation for further study and development of more resilient AI systems, making it crucial for researchers focused on ensuring the reliability and safety of AI applications.

📚 Read the Full Paper