← Back to Library

LLM Robustness Leaderboard v1 --Technical report

Authors: Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe

Published: 2025-08-08

arXiv ID: 2508.06296v2

Added to Library: 2025-08-14 23:13 UTC

Red Teaming

📄 Abstract

This technical report accompanies the LLM robustness leaderboard published by PRISM Eval for the Paris AI Action Summit. We introduce PRISM Eval Behavior Elicitation Tool (BET), an AI system performing automated red-teaming through Dynamic Adversarial Optimization that achieves 100% Attack Success Rate (ASR) against 37 of 41 state-of-the-art LLMs. Beyond binary success metrics, we propose a fine-grained robustness metric estimating the average number of attempts required to elicit harmful behaviors, revealing that attack difficulty varies by over 300-fold across models despite universal vulnerability. We introduce primitive-level vulnerability analysis to identify which jailbreaking techniques are most effective for specific hazard categories. Our collaborative evaluation with trusted third parties from the AI Safety Network demonstrates practical pathways for distributed robustness assessment across the community.

🔍 Key Points

  • Introduction of the PRISM Eval Behavior Elicitation Tool (BET) for automated red-teaming of LLMs through Dynamic Adversarial Optimization, achieving a 100% Attack Success Rate against 37 of 41 state-of-the-art models.
  • Proposal of a novel fine-grained robustness metric for evaluating the average difficulty of eliciting harmful behaviors from LLMs, revealing a 300-fold variation in attack difficulty across models.
  • Development of primitive-level vulnerability analysis that identifies the effectiveness of individual jailbreaking techniques across different hazard categories, enabling nuanced understanding of model vulnerabilities.
  • Validation of the LLM evaluator against human judgments, achieving a 91.58% agreement and showcasing the reliability of automated assessments.
  • Collaborative evaluation methodology demonstrating the potential for distributed and objective assessments across multiple organizations, suggesting pathways for community-wide robustness testing.

💡 Why This Paper Matters

This technical report is significant as it advances the state of LLM safety evaluation by introducing innovative methodologies for systematically assessing robustness against adversarial threats. The findings that many current models remain highly vulnerable despite varying levels of robustness highlight the urgency for improved safety measures, making this research imperative for developers and researchers in AI.

🎯 Why It's Interesting for AI Security Researchers

This paper is of great interest to AI security researchers as it not only addresses the current challenges in LLM safety evaluation but also provides practical methodologies and metrics for understanding the vulnerabilities of LLMs. By revealing how different models withstand adversarial inputs, it lays the groundwork for future research in enhancing model robustness and informs the development of more effective safety mechanisms.

📚 Read the Full Paper