← Back to Library

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Authors: Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger

Published: 2025-06-08

arXiv ID: 2506.06975v1

Added to Library: 2025-06-10 04:03 UTC

Red Teaming

📄 Abstract

As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API providers may discreetly serve quantized or fine-tuned variants, which can degrade performance and compromise safety. Detecting such substitutions is difficult, as users lack access to model weights and, in most cases, even output logits. To tackle this problem, we propose a rank-based uniformity test that can verify the behavioral equality of a black-box LLM to a locally deployed authentic model. Our method is accurate, query-efficient, and avoids detectable query patterns, making it robust to adversarial providers that reroute or mix responses upon the detection of testing attempts. We evaluate the approach across diverse threat scenarios, including quantization, harmful fine-tuning, jailbreak prompts, and full model substitution, showing that it consistently achieves superior statistical power over prior methods under constrained query budgets.

🔍 Key Points

  • Introduction of the rank-based uniformity test (RUT) for auditing LLM APIs, which assesses the behavioral equality of model outputs through a statistically principled approach.
  • RUT demonstrates superior accuracy and query efficiency compared to existing methods like Maximum Mean Discrepancy (MMD) and Kolmogorov–Smirnov (KS) tests in a variety of scenarios including model quantization and jailbreaking prompts.
  • Extensive empirical evaluation across various threat scenarios (quantization, harmful fine-tuning, model substitution) proves the robustness and sensitivity of RUT under constrained query budgets.
  • Cross-validation of RUT on real-world LLM API deployments shows its practicality for identifying significant deviations in models served by external providers.
  • Comprehensive analysis of failure modes and detection power reinforces the necessity for robust auditing mechanisms in the rapidly evolving landscape of LLMs.

💡 Why This Paper Matters

This paper presents vital advancements in the auditing of LLM APIs by introducing a practical solution for verifying model integrity in a black-box context. RUT is not only efficient but also robust against adversarial detection strategies, making it an essential tool for stakeholders in AI who rely on API accessibility for model applications. The implications of improving detection capabilities against deceptive practices in model deployment significantly elevate the trustworthiness and safety of AI technologies.

🎯 Why It's Interesting for AI Security Researchers

Given the increasing reliance on API-based large language models (LLMs) in commercial and research contexts, ensuring the integrity and performance of these models is crucial for both ethical AI development and user safety. This paper would be of keen interest to AI security researchers as it addresses the emerging risks associated with model substitutions, malicious fine-tuning, and other adversarial tactics—highlighting crucial methodologies for monitoring and auditing AI systems effectively.

📚 Read the Full Paper