← Back to Library

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Authors: Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger

Published: 2025-06-08

arXiv ID: 2506.06975v3

Added to Library: 2025-06-12 01:01 UTC

Red Teaming

📄 Abstract

As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API providers may discreetly serve quantized or fine-tuned variants, which can degrade performance and compromise safety. Detecting such substitutions is difficult, as users lack access to model weights and, in most cases, even output logits. To tackle this problem, we propose a rank-based uniformity test that can verify the behavioral equality of a black-box LLM to a locally deployed authentic model. Our method is accurate, query-efficient, and avoids detectable query patterns, making it robust to adversarial providers that reroute or mix responses upon the detection of testing attempts. We evaluate the approach across diverse threat scenarios, including quantization, harmful fine-tuning, jailbreak prompts, and full model substitution, showing that it consistently achieves superior statistical power over prior methods under constrained query budgets.

🔍 Key Points

  • Introduction of the Rank-Based Uniformity Test (RUT), which provides a statistically principled method for auditing the behavior of black-box LLM APIs.
  • Demonstration of RUT's effectiveness in detecting various forms of model substitutions, including quantization, harmful fine-tuning, jailbreaking, and full model replacement, outperforming existing methods in terms of statistical power with constrained query budgets.
  • Empirical evaluations on real-world scenarios that show RUT's robustness against adversarial evasion techniques employed by malicious API providers.
  • RUT’s design guarantees query efficiency, requiring only one API call per prompt and maintaining performance under user-like query patterns, making it practical for deployment in real-world applications.
  • Publication of thorough experimental results, validating RUT across various models (e.g., Llama, Mistral, Gemma), thus providing a comprehensive assessment of different threat models.

💡 Why This Paper Matters

This paper presents a significant advancement in the analysis and auditing of API-served large language models by introducing a novel and effective method that allows for reliable detection of model substitutions. In light of increasing reliance on black-box API access for LLMs, ensuring model integrity is crucial. The methodology developed here not only fills a gap in current auditing capabilities but also provides a scalable solution that is essential for maintaining the reliability and safety of AI applications.

🎯 Why It's Interesting for AI Security Researchers

This paper will intrigue AI security researchers as it addresses critical challenges in the transparency and accountability of AI systems. With the rise of LLMs and their deployment via APIs, understanding and ensuring the integrity of these models against adversarial manipulations is vital. The findings and methods discussed in this work contribute to the broader field of AI safety, offering new strategies for detecting covert alterations that could compromise system performance and ethical considerations.

📚 Read the Full Paper