← Back to Library

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Authors: Dong Yan, Jian Liang, Ran He, Tieniu Tan

Published: 2026-02-12

arXiv ID: 2602.11528v1

Added to Library: 2026-02-13 03:01 UTC

Safety

📄 Abstract

Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text shared online, enabling rapid and large-scale privacy breaches. Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements. Moreover, they are inherently limited as altering user text to hide sensitive cues still allows attribute inference to occur through models' reasoning capabilities. To address these limitations, we propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS). TRACE leverages attention mechanisms and inference chain generation to identify and anonymize privacy-leaking textual elements, while RPS employs a lightweight two-stage optimization strategy to induce model rejection behaviors, thereby preventing attribute inference. Evaluations across diverse LLMs show that TRACE-RPS reduces attribute inference accuracy from around 50\% to below 5\% on open-source models. In addition, our approach offers strong cross-model generalization, prompt-variation robustness, and utility-privacy tradeoffs. Our code is available at https://github.com/Jasper-Yan/TRACE-RPS.

🔍 Key Points

  • Proposes TRACE-RPS, a unified framework combining fine-grained anonymization and inference-preventing optimization to defensively protect user attributes against inference attacks by large language models (LLMs).
  • TRACE employs attention mechanisms and inference chain generation for effective identification and anonymization of privacy-leaking textual elements, demonstrating a significant reduction in attribute inference accuracy down to below 5% for certain models.
  • RPS utilizes a two-stage optimization process to induce refusal behaviors from LLMs, effectively guiding them towards rejecting attempts to infer sensitive attributes, while maintaining the original context of user text.
  • Conducts extensive experiments across a variety of models and attributes, demonstrating robust performance and generalization of the defense mechanisms against differing types of attacks and model variations.
  • Highlights the utility-privacy tradeoffs, providing strong privacy protections without excessively compromising the utility of the user text.

💡 Why This Paper Matters

This paper is relevant as it addresses critical privacy concerns stemming from the use of large language models, demonstrating a sophisticated approach to mitigating attribute inference attacks through innovative defense methodologies. The TRACE-RPS framework empowers users to proactively safeguard their privacy while interacting with these advanced AI systems, marking a significant advancement towards user-controlled privacy.

🎯 Why It's Interesting for AI Security Researchers

The findings of this paper would greatly interest AI security researchers as it not only contributes to the body of knowledge about privacy and security in AI systems but also introduces novel methodologies that may be applicable in real-world scenarios where privacy breaches via large language models are a growing concern. Furthermore, the implementation of fine-grained control in anonymization and proactive defense against inference attacks aligns with ongoing efforts to enhance the security and ethical use of AI technologies.

📚 Read the Full Paper