← Back to Library

DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection

Authors: Jerry Wang, Fang Yu

Published: 2025-07-20

arXiv ID: 2507.15042v1

Added to Library: 2025-11-11 14:06 UTC

Red Teaming

📄 Abstract

Adversarial prompt attacks can significantly alter the reliability of Retrieval-Augmented Generation (RAG) systems by re-ranking them to produce incorrect outputs. In this paper, we present a novel method that applies Differential Evolution (DE) to optimize adversarial prompt suffixes for RAG-based question answering. Our approach is gradient-free, treating the RAG pipeline as a black box and evolving a population of candidate suffixes to maximize the retrieval rank of a targeted incorrect document to be closer to real world scenarios. We conducted experiments on the BEIR QA datasets to evaluate attack success at certain retrieval rank thresholds under multiple retrieving applications. Our results demonstrate that DE-based prompt optimization attains competitive (and in some cases higher) success rates compared to GGPP to dense retrievers and PRADA to sparse retrievers, while using only a small number of tokens (<=5 tokens) in the adversarial suffix. Furthermore, we introduce a readability-aware suffix construction strategy, validated by a statistically significant reduction in MLM negative log-likelihood with Welch's t-test. Through evaluations with a BERT-based adversarial suffix detector, we show that DE-generated suffixes evade detection, yielding near-chance detection accuracy.

🔍 Key Points

  • Introduction of DeRAG, a black-box adversarial attack framework specifically targeting Retrieval-Augmented Generation (RAG) systems using prompt suffix perturbations.
  • Application of Differential Evolution (DE) as a novel technique for optimizing adversarial suffixes, which treats the RAG model as a black box, thus enabling effective attacks without requiring gradient access.
  • Experiments show that DE-based prompt suffix optimization achieves competitive and sometimes superior attack success rates compared to existing methods, while using significantly fewer tokens (≤ 5 tokens).
  • The study finds that shorter suffixes yield high success rates, highlighting the inefficiency of longer perturbations in adversarial attacks, and introduces a readability-aware construction strategy for suffixes.
  • Strong ability of DE-generated suffixes to evade detection by existing models, showing near-chance accuracy, indicates a significant threat to the robustness of RAG systems.

💡 Why This Paper Matters

This paper is relevant and important as it expands the understanding of vulnerabilities in RAG systems and introduces a robust technique to exploit these weaknesses through adversarial attacks. By leveraging DE for prompt optimization, the authors highlight critical security challenges posed by increasingly complex AI models, thereby contributing to the conversation on AI safety and robustness.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of interest to AI security researchers because it not only demonstrates a practical and effective method for executing black-box attacks on state-of-the-art retrieval systems, but also raises awareness about the inherent vulnerabilities in AI applications that rely on external data. The findings challenge the assumptions of security in RAG systems and propose essential considerations for developing more resilient architectures against adversarial manipulations.

📚 Read the Full Paper