← Back to Library

Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

Authors: Wenkai Guo, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, Jing Yuan

Published: 2025-09-25

arXiv ID: 2509.20680v1

Added to Library: 2025-09-26 04:00 UTC

Safety

📄 Abstract

Fine-tuning large language models (LLMs) with local data is a widely adopted approach for organizations seeking to adapt LLMs to their specific domains. Given the shared characteristics in data across different organizations, the idea of collaboratively fine-tuning an LLM using data from multiple sources presents an appealing opportunity. However, organizations are often reluctant to share local data, making centralized fine-tuning impractical. Federated learning (FL), a privacy-preserving framework, enables clients to retain local data while sharing only model parameters for collaborative training, offering a potential solution. While fine-tuning LLMs on centralized datasets risks data leakage through next-token prediction, the iterative aggregation process in FL results in a global model that encapsulates generalized knowledge, which some believe protects client privacy. In this paper, however, we present contradictory findings through extensive experiments. We show that attackers can still extract training data from the global model, even using straightforward generation methods, with leakage increasing as the model size grows. Moreover, we introduce an enhanced attack strategy tailored to FL, which tracks global model updates during training to intensify privacy leakage. To mitigate these risks, we evaluate privacy-preserving techniques in FL, including differential privacy, regularization-constrained updates and adopting LLMs with safety alignment. Our results provide valuable insights and practical guidelines for reducing privacy risks when training LLMs with FL.

🔍 Key Points

  • Federated Learning (FL) is shown not to adequately protect privacy in LLM training, contradicting the common assumption of its effectiveness against data leakage.
  • The study introduces basic and enhanced attack strategies that exploit the iterative nature of FL aggregation, revealing significant risks of extracting training data, especially as model size increases.
  • Extensive experiments demonstrated that even with straightforward generation methods, attackers could retrieve sensitive data, highlighting vulnerabilities in current privacy-preserving frameworks.
  • The authors evaluate several techniques, such as differential privacy and update regularizations, concluding that while these mitigate risks, they also compromise model performance.
  • The paper calls for the need for new algorithms that better balance privacy protection and model performance in FL settings.

💡 Why This Paper Matters

This paper provides critical insights into the privacy vulnerabilities inherent in Federated Learning systems used for fine-tuning Large Language Models. By challenging widely held beliefs about the privacy-preserving capabilities of FL, the authors present significant findings that demonstrate the feasibility of extracting sensitive training data. The study's experiments underline the considerable privacy risks associated with model updates and the necessity for robust defense mechanisms, making it a crucial addition to the literature on AI security and data privacy.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper relevant as it identifies and analyzes serious privacy vulnerabilities in Federated Learning, a rapidly growing field in AI. The novel attack strategies presented in the study not only highlight existing weaknesses in commonly used methods but also stress the urgent need for improved privacy-preserving techniques. As researchers continue to develop and deploy LLMs in sensitive domains, understanding these risks is essential for creating secure and robust AI systems.

📚 Read the Full Paper