← Back to Library

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Authors: Jarrod Barnes

Published: 2026-01-28

arXiv ID: 2601.21083v2

Added to Library: 2026-02-03 08:09 UTC

Red Teaming

📄 Abstract

As large language models improve, so do their offensive applications: frontier agents now generate working exploits for under $50 in compute (Heelan, 2026). Defensive incident response (IR) agents must keep pace, but existing benchmarks conflate action execution with correct execution, hiding calibration failures when agents process adversarial evidence. We introduce OpenSec, a dual-control reinforcement learning environment that evaluates IR agents under realistic prompt injection scenarios. Unlike static capability benchmarks, OpenSec scores world-state-changing containment actions under adversarial evidence via execution-based metrics: time-to-first-containment (TTFC), blast radius (false positives per episode), and injection violation rates. Evaluating four frontier models on 40 standard-tier episodes, we find consistent over-triggering in this setting: GPT-5.2, Gemini 3, and DeepSeek execute containment in 100% of episodes with 90-97% false positive rates. Claude Sonnet 4.5 shows partial calibration (85% containment, 72% FP), demonstrating that OpenSec surfaces a calibration failure mode hidden by aggregate success metrics. Code available at https://github.com/jbarnes850/opensec-env.

🔍 Key Points

  • Introduction of OpenSec, a dual-control reinforcement learning environment that assesses incident response agents under adversarial evidence, addressing the shortcomings of existing benchmarks that confuse action execution and correctness.
  • Evaluation of four frontier language models reveals high false positive rates among agents, with GPT-5.2, Gemini 3, and DeepSeek showing 90-97% false positives for containment actions despite achieving high overall containment rates.
  • Demonstration that OpenSec effectively surfaces calibration failures that are often hidden in traditional metrics, particularly showing that Sonnet 4.5 has better calibration than other models despite all achieving high containment rates.
  • Development of a scoring system based on execution rather than reported success, thereby providing more realistic assessments of agent performance in incident response scenarios.
  • Establishment of trust-aware scenario designs and injection vulnerability metrics, which highlight the importance of accurate evidence processing in real-world cybersecurity incidents.

💡 Why This Paper Matters

This paper is an important contribution to the field of cybersecurity and AI as it provides a novel framework (OpenSec) to evaluate incident response agents against adversarial scenarios. By revealing the calibration failures of leading models, it emphasizes the critical need for reliable agents capable of distinguishing genuine threats from false alarms. The findings highlight significant challenges in the deployment of AI in cybersecurity, urging the need for refined evaluation methods that align closely with operational realities.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper valuable due to its innovative approach to benchmarking incident response agents, particularly in adversarial contexts. The insights gained from OpenSec about model calibration, false positives, and the evaluation of agent behaviors offer critical implications for improving AI reliability in cybersecurity applications. The findings also raise awareness about the limitations of conventional evaluation metrics, prompting further investigation into robust AI solutions for defense against evolving cyber threats.

📚 Read the Full Paper