← Back to Library

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Authors: Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang

Published: 2025-11-12

arXiv ID: 2511.08905v1

Added to Library: 2025-11-14 23:01 UTC

📄 Abstract

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100 percent Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.

🔍 Key Points

  • Introduction of Rebellion, a novel reasoning training method that enhances audio reasoning models (ARMs) by making them robust against sophisticated audio jailbreak attacks.
  • Identification of the vulnerability of standard reasoning training (RT) to advanced jailbreaks due to representation drift, which allows harmful responses to bypass safety guardrails.
  • Rigorous experimental validation showing that Rebellion maintains high performance in benign tasks while significantly reducing harmful outputs in presence of audio jailbreaks, demonstrating strong safety-accuracy trade-offs.
  • Discovery of a 'think twice' behavior in Rebellion-trained ARMs, indicating an internal safety check mechanism that leads to correct refusal of harmful queries despite initial compliance triggered by jailbreaks.
  • Establishment of a dual dataset approach, using both safety and benign reasoning data for training, thus ensuring comprehensive reasoning capabilities.

💡 Why This Paper Matters

This paper is significant as it addresses a critical gap in the safety of audio reasoning models when exposed to sophisticated attacks. By proposing Rebellion, it not only provides a practical solution for enhancing the security of ARMs but also contributes to the broader discourse on robustness and safety in AI models, which is increasingly essential as these systems are deployed in sensitive applications.

🎯 Why It's Interesting for AI Security Researchers

The findings presented in this paper are especially relevant to AI security researchers, as they highlight vulnerabilities in existing audio reasoning models and propose effective countermeasures against emerging threat vectors like jailbreak embeddings. With the increasing reliance on AI for decision-making processes across various sectors, understanding and mitigating such risks is crucial for developing safe and reliable AI systems.

📚 Read the Full Paper