PromptLocate: Localizing Prompt Injection Attacks

Authors: Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong

Published: 2025-10-14

arXiv ID: 2510.12252v2

Added to Library: 2025-11-11 14:00 UTC

Red Teaming

📄 Abstract

Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its input data with an injected prompt, which consists of injected instruction(s) and data. Localizing the injected prompt within contaminated data is crucial for post-attack forensic analysis and data recovery. Despite its growing importance, prompt injection localization remains largely unexplored. In this work, we bridge this gap by proposing PromptLocate, the first method for localizing injected prompts. PromptLocate comprises three steps: (1) splitting the contaminated data into semantically coherent segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks.

🔍 Key Points

Introduction of PromptLocate, the first method for localizing prompt injection attacks, significantly improving post-attack analysis capabilities.
The three-step methodology includes semantically segmenting contaminated data, identifying instruction-contaminated segments using a tailored oracle detector, and pinpointing data-contaminated segments based on contextual inconsistency.
Extensive empirical evaluation across multiple datasets and attacks demonstrating high accuracy and resilience against adaptive attack variants, notably outperforming existing attribution methods.
Practical applications in post-attack forensic analysis and data recovery, illustrating how localized prompts can aid in tracing malicious users and restoring task performance.
Analysis of potential limitations, including the reliance on detector accuracy and the challenges of handling adaptive attacks effectively.

💡 Why This Paper Matters

The paper presents PromptLocate as a crucial advancement in the forensic analysis of prompt injection attacks, addressing an underexplored area in AI security. Its innovative approach to segmenting and localizing injected prompts not only enhances understanding of the attack vectors but also provides vital tools for mitigating risks associated with large language models. This research contributes to establishing safer AI systems by offering a definitive method for identifying and mitigating the impacts of prompt injection.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper highly relevant because it tackles a pressing issue of prompt injection attacks that can manipulate language models into executing undesired actions. The methodological advancements of PromptLocate improve detection and localization, which are critical for developing robust defensive measures against such vulnerabilities in AI systems. The empirical results affirm its efficacy, making it a valuable resource for security researchers focused on enhancing AI safety and reliability.

PromptLocate: Localizing Prompt Injection Attacks

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper