← Back to Library

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Authors: Yu He, Yifei Chen, Yiming Li, Shuo Shao, Leyi Qi, Boheng Li, Dacheng Tao, Zhan Qin

Published: 2025-10-03

arXiv ID: 2510.02964v1

Added to Library: 2025-10-06 04:01 UTC

Red Teaming

πŸ“„ Abstract

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.

πŸ” Key Points

  • First comprehensive study of External Data Extraction Attacks (EDEAs) against Retrieval-Augmented Large Language Models (RA-LLMs).
  • Proposes a unified framework that decomposes EDEAs into three functional components: extraction instruction, jailbreak operator, and retrieval trigger.
  • Introduces SECRET, an effective and scalable external data extraction attack that significantly outperforms existing attacks and is effective against multiple commercial LLMs.
  • Findings highlight that SECRET can extract sensitive information from RA-LLMs with a success rate much higher than previously documented methods.
  • The paper extensively evaluates the attack's performance across multiple models and datasets, establishing its robustness even against potential defenses.

πŸ’‘ Why This Paper Matters

This paper is highly relevant as it addresses a significant and emerging threat in AI securityβ€”EDEAs against RA-LLMs. The introduction of a formal framework and novel attack methodology not only enhances the understanding of the risks associated with using retrieval-augmented techniques in LLM applications but also calls for urgent attention to data privacy implications in critical domains like healthcare and finance.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper invaluable as it uncovers potential vulnerabilities in widely-adopted RA-LLM systems, providing a structured method to evaluate and craft adversarial attacks. The proposed framework and techniques can inform the development of more robust defenses against such attacks, thereby contributing to safer AI deployment in sensitive applications.

πŸ“š Read the Full Paper