← Back to Library

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Authors: Yuxu Ge

Published: 2026-03-07

arXiv ID: 2603.07191v1

Added to Library: 2026-03-10 03:00 UTC

Red Teaming

📄 Abstract

Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and uncontrolled tool invocation -- that existing guardrails fail to address systematically. In this work, we propose the Layered Governance Architecture (LGA), a four-layer framework comprising execution sandboxing (L1), intent verification (L2), zero-trust inter-agent authorization (L3), and immutable audit logging (L4). To evaluate LGA, we construct a bilingual benchmark (Chinese original, English via machine translation) of 1,081 tool-call samples -- covering prompt injection, RAG poisoning, and malicious skill plugins -- and apply it to OpenClaw, a representative open-source agent framework. Experimental results on Layer 2 intent verification with four local LLM judges (Qwen3.5-4B, Llama-3.1-8B, Qwen3.5-9B, Qwen2.5-14B) and one cloud judge (GPT-4o-mini) show that all five LLM judges intercept 93.0-98.5% of TC1/TC2 malicious tool calls, while lightweight NLI baselines remain below 10%. TC3 (malicious skill plugins) proves harder at 75-94% IR among judges with meaningful precision-recall balance, motivating complementary enforcement at Layers 1 and 3. Qwen2.5-14B achieves the best local balance (98% IR, approximately 10-20% FPR); a two-stage cascade (Qwen3.5-9B->GPT-4o-mini) achieves 91.9-92.6% IR with 1.9-6.7% FPR; a fully local cascade (Qwen3.5-9B->Qwen2.5-14B) achieves 94.7-95.6% IR with 6.0-9.7% FPR for data-sovereign deployments. An end-to-end pipeline evaluation (n=100) demonstrates that all four layers operate in concert with 96% IR and a total P50 latency of approximately 980 ms, of which the non-judge layers contribute only approximately 18 ms. Generalization to the external InjecAgent benchmark yields 99-100% interception, confirming robustness beyond our synthetic data.

🔍 Key Points

  • The paper introduces the Layered Governance Architecture (LGA), a four-layer framework designed to mitigate execution-layer vulnerabilities in autonomous agents powered by large language models.
  • LGA consists of execution sandboxing (L1), intent verification (L2), zero-trust inter-agent authorization (L3), and immutable audit logging (L4) to ensure a comprehensive defense against security threats.
  • The experimental evaluation demonstrates that LGA can intercept 93-98% of malicious tool calls through intent verification alone, significantly outperforming existing lightweight natural language inference approaches with less than 10% interception.
  • The paper identifies three specific execution-layer threat classes (prompt injection, retrieval-augmented generation (RAG) poisoning, and malicious skill plugins) that current security measures inadequately address.
  • An end-to-end pipeline evaluation incorporating all four layers of LGA achieved a combined 96% interception rate with a low latency of approximately 980 ms, validating the architecture's efficiency.

💡 Why This Paper Matters

This paper presents significant advancements in the governance and security of autonomous agent systems by formalizing a comprehensive architecture (LGA) that systematically addresses critical vulnerabilities. Its findings are particularly relevant in the current landscape of AI technology, where the increasing deployment of autonomous agents aligns with pressing security concerns. By implementing the proposed layers of defense, practitioners can enhance the robustness of their systems against sophisticated attacks while maintaining acceptable operational latency.

🎯 Why It's Interesting for AI Security Researchers

The paper's examination of execution-layer vulnerabilities and the innovative governance framework it proposes are crucial for AI security researchers, particularly as adverse incidents involving autonomous systems become more common. The methodologies and findings provide valuable insights into developing more resilient AI applications, thereby contributing to the broader conversation about safety and compliance in AI deployments.

📚 Read the Full Paper