← Back to Library

Quantifying Return on Security Controls in LLM Systems

Authors: Richard Helder Moulton, Austin O'Brien, John D. Hastings

Published: 2025-12-17

arXiv ID: 2512.15081v1

Added to Library: 2026-01-07 10:10 UTC

Red Teaming

📄 Abstract

Although large language models (LLMs) are increasingly used in security-critical workflows, practitioners lack quantitative guidance on which safeguards are worth deploying. This paper introduces a decision-oriented framework and reproducible methodology that together quantify residual risk, convert adversarial probe outcomes into financial risk estimates and return-on-control (RoC) metrics, and enable monetary comparison of layered defenses for LLM-based systems. A retrieval-augmented generation (RAG) service is instantiated using the DeepSeek-R1 model over a corpus containing synthetic personally identifiable information (PII), and subjected to automated attacks with Garak across five vulnerability classes: PII leakage, latent context injection, prompt injection, adversarial attack generation, and divergence. For each (vulnerability, control) pair, attack success probabilities are estimated via Laplace's Rule of Succession and combined with loss triangle distributions, calibrated from public breach-cost data, in 10,000-run Monte Carlo simulations to produce loss exceedance curves and expected losses. Three widely used mitigations, attribute-based access control (ABAC); named entity recognition (NER) redaction using Microsoft Presidio; and NeMo Guardrails, are then compared to a baseline RAG configuration. The baseline system exhibits very high attack success rates (>= 0.98 for PII, latent injection, and prompt injection), yielding a total simulated expected loss of $313k per attack scenario. ABAC collapses success probabilities for PII and prompt-related attacks to near zero and reduces the total expected loss by ~94%, achieving an RoC of 9.83. NER redaction likewise eliminates PII leakage and attains an RoC of 5.97, while NeMo Guardrails provides only marginal benefit (RoC of 0.05).

🔍 Key Points

  • Introduces a decision-oriented framework for quantifying residual risk in LLM systems through a reproducible methodology that assesses the effectiveness of different security controls.
  • Demonstrates the empirical evaluation of three common mitigation strategies (ABAC, NER, and NeMo Guardrails) against multiple vulnerability classes using adversarial testing and Monte Carlo simulations.
  • Findings show that ABAC significantly reduces attack success rates to near zero, leading to a substantial reduction in projected financial losses and achieving an RoC of 9.83, indicating high cost-effectiveness.
  • NER effectively eliminates PII leakage but is less effective against other vulnerabilities, achieving an RoC of 5.97, highlighting the need for layered security approaches.
  • NeMo Guardrails provided minimal benefits, showcasing the limitations of output-only controls when significant vulnerabilities exist in the model's reasoning and retrieved context.

💡 Why This Paper Matters

This paper is highly relevant as it provides a systematic approach to quantify and compare the effectiveness of security controls in large language models, thereby aiding organizations in making informed, data-driven decisions regarding the deployment of such systems in security-critical environments. The empirical results underscore the importance of implementing robust controls, particularly ABAC, to mitigate serious vulnerabilities effectively.

🎯 Why It's Interesting for AI Security Researchers

The work is particularly interesting to AI security researchers as it bridges the gap between theoretical vulnerability identification and practical, quantitative assessment of security controls. By applying advanced probabilistic methods and simulations, the research offers actionable insights that can enhance the security posture of LLMs, which are becoming increasingly integrated into sensitive workflows. Furthermore, the comprehensive evaluation of different mitigation strategies contributes valuable knowledge to the ongoing discourse on AI robustness and trustworthiness.

📚 Read the Full Paper