← Back to Library

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Authors: Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang

Published: 2025-11-12

arXiv ID: 2511.08905v2

Added to Library: 2025-12-01 04:00 UTC

📄 Abstract

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100 percent Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.

🔍 Key Points

  • Introduction of the Adversarial Confusion Attack, targeting the systematic disruption of multimodal large language models (MLLMs) by maximizing next-token entropy.
  • Demonstrated that a single adversarial image can cause a confusion effect across various models, highlighting the vulnerability of MLLMs to this type of attack.
  • Characterization of five distinct confusion modes experienced by models under attack, outlining the spectrum from blindness to complete semantic collapse.
  • Evaluation of transferability of the adversarial attack to both unseen open-source and proprietary models, showcasing the broad applicability of the method.
  • Discussion of practical implications, including the potential use of adversarial images embedded in websites to impede the functionality of MLLM-powered AI agents.

💡 Why This Paper Matters

This paper presents a novel threat to multimodal large language models, showing that adversarial techniques can be effectively utilized to create confusion in model outputs. The development of the Adversarial Confusion Attack is significant not only for understanding the vulnerabilities of current AI models but also for designing defenses against potential misuse, making it a crucial read for both AI developers and security researchers.

🎯 Why It's Interesting for AI Security Researchers

The findings in this paper are particularly relevant for AI security researchers as they expose critical vulnerabilities in multimodal large language models. Understanding these weaknesses can drive the development of stronger security protocols and preventative measures against misuse and adversarial attacks, which are increasingly relevant in a world where AI systems are integrated into numerous applications.

📚 Read the Full Paper