← Back to Library

Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG

Authors: Haoze Guo, Ziqi Wei

Published: 2026-01-16

arXiv ID: 2601.10923v1

Added to Library: 2026-01-19 03:01 UTC

Red Teaming

📄 Abstract

Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web, amplifying both their usefulness and their attack surface. Most notably, indirect prompt injection and retrieval poisoning attack the web-native carriers that survive ingestion pipelines and are very concerning. We provide OpenRAG-Soc, a compact, reproducible benchmark-and-harness for web-facing RAG evaluation under these threats, in a discrete data package. The suite combines a social corpus with interchangeable sparse and dense retrievers and deployable mitigations - HTML/Markdown sanitization, Unicode normalization, and attribution-gated answered. It standardizes end-to-end evaluation from ingestion to generation and reports attacks time of one of the responses at answer time, rank shifts in both sparse and dense retrievers, utility and latency, allowing for apples-to-apples comparisons across carriers and defenses. OpenRAG-Soc targets practitioners who need fast, and realistic tests to track risk and harden deployments.

🔍 Key Points

  • Introduction of OpenRAG-Soc, a benchmark designed to evaluate Retrieval-Augmented Generation (RAG) systems against indirect prompt injection and retrieval poisoning attacks in real-world web scenarios.
  • The benchmark integrates a social corpus and introduces various methods for sanitization, normalization, and attribution gating to enhance response security without significantly increasing latency.
  • The paper systematically evaluates the effectiveness of these defenses quantitatively through metrics such as attack success rates, retrieval rank shifts, and utility impacts, highlighting trade-offs in defense configurations.
  • Results indicate that the proposed defenses, particularly sanitization and attribution gating, significantly reduce the ability of attackers to exploit RAG systems, while maintaining reasonable levels of utility and low latency.
  • The framework also emphasizes reproducibility and ease of use, providing practitioners with tools for fast and effective evaluation of RAG systems in the face of evolving security threats.

💡 Why This Paper Matters

This paper is significant as it addresses the critical security risks posed by indirect prompt injection and retrieval poisoning in Retrieval-Augmented Generation systems, an area of growing concern as AI models increasingly rely on web content. By introducing OpenRAG-Soc, the authors provide a robust framework for both evaluating and mitigating these risks, ensuring that RAG systems can be safely deployed in real-world applications.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is particularly relevant because it not only outlines the vulnerabilities introduced by insecure web content ingestion in AI systems but also offers structured solutions to these challenges. The methodologies and findings presented could inform future research on improving the security of AI models, especially those integrated with public data sources, making it crucial for safeguarding against sophisticated adversarial strategies.

📚 Read the Full Paper