← Back to Library

Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime

Authors: Herman Errico

Published: 2026-02-10

arXiv ID: 2602.09433v1

Added to Library: 2026-02-11 03:02 UTC

📄 Abstract

As artificial intelligence systems evolve from passive assistants into autonomous agents capable of executing consequential actions, the security boundary shifts from model outputs to tool execution. Traditional security paradigms - log aggregation, perimeter defense, and post-hoc forensics - cannot protect systems where AI-driven actions are irreversible, execute at machine speed, and originate from potentially compromised orchestration layers. This paper introduces Autonomous Action Runtime Management (AARM), an open specification for securing AI-driven actions at runtime. AARM defines a runtime security system that intercepts actions before execution, accumulates session context, evaluates against policy and intent alignment, enforces authorization decisions, and records tamper-evident receipts for forensic reconstruction. We formalize a threat model addressing prompt injection, confused deputy attacks, data exfiltration, and intent drift. We introduce an action classification framework distinguishing forbidden, context-dependent deny, and context-dependent allow actions. We propose four implementation architectures - protocol gateway, SDK instrumentation, kernel eBPF, and vendor integration - with distinct trust properties, and specify minimum conformance requirements for AARM-compliant systems. AARM is model-agnostic, framework-agnostic, and vendor-neutral, treating action execution as the stable security boundary. This specification aims to establish industry-wide requirements before proprietary fragmentation forecloses interoperability.

🔍 Key Points

  • Introduction of the Four-Checkpoint Framework for analyzing LLM safety mechanisms, organizing them by processing stage (input vs. output) and detection level (literal vs. intent).
  • Development of 13 targeted evasion techniques that systematically test the robustness of defenses at each checkpoint, allowing focused evaluation of safety mechanisms.
  • Evaluation of three state-of-the-art LLMs (GPT-5, Claude Sonnet 4, Gemini 2.5 Pro) using over 3,300 test cases, revealing significant vulnerabilities predominantly in output-stage defenses (CP3 and CP4).
  • Introduction of the Weighted Attack Success Rate (WASR) metric, which quantifies the severity of information leakage beyond binary metrics, highlighting that 52.7% of attacks reveal harmful information when considering partial compliance.
  • Identification that current defenses are robust against literal input attacks (CP1) but weak against intent-level manipulation and output-stage techniques.

💡 Why This Paper Matters

This paper presents crucial insights into the vulnerabilities of large language models, specifically highlighting the weaknesses in existing safety mechanisms when faced with sophisticated evasion techniques. By establishing the Four-Checkpoint Framework, the authors provide a structured approach to diagnose and strengthen LLM defenses, which is imperative as these models become increasingly integrated into sensitive and high-stakes applications.

🎯 Why It's Interesting for AI Security Researchers

This work is particularly relevant to AI security researchers as it not only addresses the pressing issue of ensuring the safety of large language models but also advances the methodologies for evaluating their defenses. Understanding where current defenses falter allows researchers and developers to create more resilient models and informs the ongoing discourse around ethical AI deployment. The findings on partial leakage emphasize the need for improved evaluation metrics, which have wide implications for the field.

📚 Read the Full Paper