โ† Back to Library

Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks

Authors: Haowei Fu, Bo Ni, Han Xu, Kunpeng Liu, Dan Lin, Tyler Derr

Published: 2025-12-01

arXiv ID: 2512.03100v1

Added to Library: 2025-12-04 03:01 UTC

Safety

๐Ÿ“„ Abstract

Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs) with external knowledge for diverse, knowledge-intensive tasks. However, while such knowledge injection improves performance, it also exposes new attack surfaces. Membership Inference Attacks (MIAs), which aim to determine whether a given data sample was included in a model's training set, pose serious threats to privacy and trust in sensitive domains. To this end, we first systematically evaluate the vulnerability of RAG- and SFT-based LLMs to various MIAs. Then, to address the privacy risk, we further introduce a novel, model-agnostic defense framework, Ensemble Privacy Defense (EPD), which aggregates and evaluates the outputs of a knowledge-injected LLM, a base LLM, and a dedicated judge model to enhance resistance against MIAs. Comprehensive experiments show that, on average, EPD reduces MIA success by up to 27.8\% for SFT and 526.3\% for RAG compared to inference-time baseline, while maintaining answer quality.

๐Ÿ” Key Points

  • The paper introduces Ensemble Privacy Defense (EPD), a model-agnostic framework designed to mitigate membership inference attacks on knowledge-intensive large language models (LLMs).
  • Comprehensive experiments revealed that EPD significantly reduces MIA success ratesโ€”up to 27.8% for Supervised Finetuning (SFT) models and 526.3% for Retrieval-Augmented Generation (RAG) modelsโ€”while maintaining high answer quality.
  • The study benchmarks the vulnerability of RAG and SFT paradigms against various MIAs, demonstrating that RAG is generally more resistant to membership inference compared to SFT, which is linked to parameter modifications during training.
  • The evaluation of EPD showed that, even for RAG models which already exhibit strong MIA defenses, the ensemble approach enhances privacy protection without requiring retraining, making it highly practical for real-world applications.
  • Overall, this work provides insights into the effectiveness of different MIA defense strategies, suggesting that hybrid models combining high-accuracy with privacy-preserving components can contribute to safer AI deployments.

๐Ÿ’ก Why This Paper Matters

This paper is significant as it addresses critical privacy concerns surrounding the deployment of large language models in sensitive domains. It provides a novel solution in the form of the Ensemble Privacy Defense framework, which effectively enhances MIA resistance while preserving the quality and accuracy of generated responses. The insights gained from systematic evaluations of RAG and SFT against membership inference attacks contribute valuable knowledge to both the fields of AI safety and model design.

๐ŸŽฏ Why It's Interesting for AI Security Researchers

For AI security researchers, this paper presents essential advancements in understanding and mitigating risks associated with membership inference attacks, which are pivotal for protecting sensitive data in AI models. The innovative EPD framework and its demonstrated effectiveness under varied paradigms create opportunities for further exploration in the contextual application of AI defense mechanisms, supporting the creation of more robust and trustworthy AI systems.

๐Ÿ“š Read the Full Paper