← Back to Library

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

Authors: Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, Aleksandra Korolova

Published: 2025-09-27

arXiv ID: 2509.23519v1

Added to Library: 2025-12-08 18:02 UTC

Red Teaming

📄 Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Models by grounding their outputs in external documents. These systems, however, remain vulnerable to attacks on the retrieval corpus, such as prompt injection. RAG-based search systems (e.g., Google's Search AI Overview) present an interesting setting for studying and protecting against such threats, as defense algorithms can benefit from built-in reliability signals -- like document ranking -- and represent a non-LLM challenge for the adversary due to decades of work to thwart SEO. Motivated by, but not limited to, this scenario, this work introduces ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents. Our first contribution adopts a graph-theoretic perspective to identify a "consistent majority" among retrieved documents to filter out malicious ones. We introduce a novel algorithm based on finding a Maximum Independent Set (MIS) on a document graph where edges encode contradiction. Our MIS variant explicitly prioritizes higher-reliability documents and provides provable robustness guarantees against bounded adversarial corruption under natural assumptions. Recognizing the computational cost of exact MIS for large retrieval sets, our second contribution is a scalable weighted sample and aggregate framework. It explicitly utilizes reliability information, preserving some robustness guarantees while efficiently handling many documents. We present empirical results showing ReliabilityRAG provides superior robustness against adversarial attacks compared to prior methods, maintains high benign accuracy, and excels in long-form generation tasks where prior robustness-focused methods struggled. Our work is a significant step towards more effective, provably robust defenses against retrieved corpus corruption in RAG.

🔍 Key Points

  • Introduction of ReliabilityRAG, a framework utilizing reliability metrics for increased robustness in Retrieval-Augmented Generation (RAG) systems.
  • Development of a graph-theoretic algorithm that finds a Maximum Independent Set (MIS) to filter out contradictory and potentially malicious documents from the retrieved corpus.
  • Scalable weighted sample and aggregate framework to efficiently handle larger document retrievals while maintaining robustness guarantees.
  • Empirical evaluations demonstrate that ReliabilityRAG outperforms previous methods in both robustness against attacks and benign performance in long-form generation tasks.
  • The approach builds on reliability signals, effectively addressing the gap in current RAG defenses that treat documents as unordered and overlook retrieval quality.

💡 Why This Paper Matters

The paper presents ReliabilityRAG, a significant advancement in protecting RAG-based systems against adversarial attacks. By leveraging document reliability information, it offers a theoretically grounded and empirically validated strategy that enhances the quality of generated outputs while maintaining robustness against manipulation. Its contributions are crucial for building more secure and reliable AI-driven information retrieval systems, particularly in sensitive applications requiring high accuracy and trustworthiness.

🎯 Why It's Interesting for AI Security Researchers

This paper addresses a critical vulnerability in RAG systems, making it essential for AI security researchers focused on adversarial machine learning and robust AI deployment. Its innovative approaches and empirical results not only contribute to foundational knowledge in AI safety but also provide practical solutions to prevent attacks that can manipulate generated outputs, thus impacting trust in AI technologies.

📚 Read the Full Paper