← Back to Library

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

Authors: Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang

Published: 2026-03-03

arXiv ID: 2603.04459v2

Added to Library: 2026-03-13 03:04 UTC

📄 Abstract

The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and enabling systematic comparisons. Yet, it remains unclear why certain benchmarks gain prominence, and no systematic assessment has been conducted on their academic influence or code quality. This paper fills this gap by presenting the first multi-dimensional evaluation of the influence (based on five metrics) and code quality (based on both automated and human assessment) on LLM safety benchmarks, analyzing 31 benchmarks and 382 non-benchmarks across prompt injection, jailbreak, and hallucination. We find that benchmark papers show no significant advantage in academic influence (e.g., citation count and density) over non-benchmark papers. We uncover a key misalignment: while author prominence correlates with paper influence, neither author prominence nor paper influence shows a significant correlation with code quality. Our results also indicate substantial room for improvement in code and supplementary materials: only 39% of repositories are ready-to-use, 16% include flawless installation guides, and a mere 6% address ethical considerations. Given that the work of prominent researchers tends to attract greater attention, they need to lead the effort in setting higher standards.

🔍 Key Points

  • WebWeaver framework allows inference of complete LLM-MAS topology through a single compromised agent, enhancing realism in attack modeling.
  • Introduced covert recursive jailbreak mechanisms and a fully jailbreak-free diffusion design to ensure topology inference even when traditional methods fail.
  • Implemented a masking strategy in the diffusion module to preserve known topology during inference, which provides theoretical correctness guarantees.
  • Extensive experiments show WebWeaver outperforms existing state-of-the-art (SOTA) methods with a 60% increase in inference accuracy under active defense measures.
  • Developed a new dialogue dataset with annotated topology and agent prompts to support both current and future research in multi-agent systems security.

💡 Why This Paper Matters

The WebWeaver framework represents a significant advancement in the security of large language model-based multi-agent systems by addressing the critical issue of topology confidentiality. By providing a practical and effective method to infer such topologies, the paper highlights vulnerabilities in existing assumptions surrounding multi-agent system security, thus paving the way for more robust defensive strategies. The empirical results demonstrate real-world applicability and suggest a shift in how security assessments should be approached in collaborative LLM settings.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of great interest to AI security researchers as it uncovers significant vulnerabilities in LLM-MAS regarding topology confidentiality, an often-overlooked aspect of AI system security. The novel methodologies presented—particularly the use of context-based inferences and adaptive jailbreak mechanisms—challenge existing defenses and prompt the need for enhanced protective measures. Additionally, the findings highlight the risks associated with real-world applications of LLMs in collaborative environments, making it a critical read for anyone focused on ensuring AI system integrity.

📚 Read the Full Paper