← Back to Library

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Authors: Zhenyi Wang, Siyu Luan

Published: 2026-03-25

arXiv ID: 2603.24857v1

Added to Library: 2026-03-27 03:01 UTC

Red Teaming

📄 Abstract

As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.

🔍 Key Points

  • Proposes a unified closed-loop threat taxonomy categorizing AI security threats across four interrelated axes: Data→Data (D→D), Data→Model (D→M), Model→Data (M→D), and Model→Model (M→M).
  • Details various attack techniques within each category, such as data decryption, poisoning, fine-tuning vulnerabilities, and jailbreak attacks, along with their mathematical formalizations.
  • Identifies interdependencies among attacks, demonstrating how the compromise of one component (data or model) propagates risks through the ML pipeline.
  • Discusses defensive strategies for each type of attack, highlighting the need for comprehensive and adaptive safeguards against AI security threats.
  • Provides experimental evaluations and case studies illustrating the vulnerability and defense mechanisms applicable across different attack categories.

💡 Why This Paper Matters

This paper is critical in addressing the growing complexity of AI security, presenting a cohesive framework that facilitates understanding of interrelated security threats and their interactions. By identifying and categorizing these threats, the proposed unified framework serves as a foundational resource for developing effective defenses, making it an invaluable contribution to the field of AI security.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper particularly relevant because it not only presents a comprehensive overview of existing security threats in machine learning but also offers a unique framework for analyzing and understanding the interdependencies between various attacks. The insights on attack propagation and defensive strategies can guide future research in designing more robust AI systems, paving the way for practical applications in securing machine learning models against a wide array of adversarial threats.

📚 Read the Full Paper