Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Authors: Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger

Published: 2025-06-08

arXiv ID: 2506.06975v2

Added to Library: 2025-06-11 01:00 UTC

Red Teaming

📄 Abstract

As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API providers may discreetly serve quantized or fine-tuned variants, which can degrade performance and compromise safety. Detecting such substitutions is difficult, as users lack access to model weights and, in most cases, even output logits. To tackle this problem, we propose a rank-based uniformity test that can verify the behavioral equality of a black-box LLM to a locally deployed authentic model. Our method is accurate, query-efficient, and avoids detectable query patterns, making it robust to adversarial providers that reroute or mix responses upon the detection of testing attempts. We evaluate the approach across diverse threat scenarios, including quantization, harmful fine-tuning, jailbreak prompts, and full model substitution, showing that it consistently achieves superior statistical power over prior methods under constrained query budgets.

🔍 Key Points

Introduction of a novel Rank-Based Uniformity Test (RUT) for auditing LLM APIs, which effectively evaluates behavioral equality in black-box settings.
RUT demonstrates superior performance over existing methods such as Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) tests, particularly under constrained query budgets and various threat scenarios.
Comprehensive empirical validation across diverse model substitution threats including quantization, harmful fine-tuning, jailbreak prompts, and complete model replacement, confirming RUT's efficacy in detecting these issues.
The test maintains high statistical power even with adversarial manipulation from API providers, showcasing robustness and query efficiency by requiring only one API call per prompt.
The approach includes real-world evaluations of live commercial LLM APIs, demonstrating practical applicability for ensuring model integrity.

💡 Why This Paper Matters

This paper is relevant and important as it addresses significant vulnerabilities within the rapidly growing field of AI and LLMs, particularly with API access. The proposed RUT enables users to reliably audit the behavior of LLMs, ensuring that they get the model performance they expect while guarding against malicious alterations. By providing a robust solution to protect the integrity of AI services, this research contributes to safer and more trustworthy AI deployments.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of great interest to AI security researchers as it highlights the vulnerabilities posed by opaque API interfaces of large language models. The innovative methodologies and practical experiments put forth in this research provide critical tools to detect and mitigate risks associated with model substitutions and malicious modifications, which are essential for maintaining security in AI applications.

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper