← Back to Library

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Authors: Saikat Maiti

Published: 2026-03-18

arXiv ID: 2603.17419v1

Added to Library: 2026-03-19 02:01 UTC

Red Teaming

📄 Abstract

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communication. Recent red teaming research demonstrates that these agents exhibit critical vulnerabilities in realistic settings: unauthorized compliance with non-owner instructions, sensitive information disclosure, identity spoofing, cross-agent propagation of unsafe practices, and indirect prompt injection through external resources [7]. In healthcare environments processing Protected Health Information, every such vulnerability becomes a potential HIPAA violation. This paper presents a security architecture deployed for nine autonomous AI agents in production at a healthcare technology company. We develop a six-domain threat model for agentic AI in healthcare covering credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks, and fleet configuration drift. We implement four-layer defense in depth: (1) kernel level workload isolation using gVisor on Kubernetes, (2) credential proxy sidecars preventing agent containers from accessing raw secrets, (3) network egress policies restricting each agent to allowlisted destinations, and (4) a prompt integrity framework with structured metadata envelopes and untrusted content labeling. We report results from 90 days of deployment including four HIGH severity findings discovered and remediated by an automated security audit agent, progressive fleet hardening across three VM image generations, and defense coverage mapped to all eleven attack patterns from recent literature. All configurations, audit tooling, and the prompt integrity framework are released as open source.

🔍 Key Points

  • Development of a six-domain threat model for autonomous AI agents in healthcare, mapping vulnerabilities to HIPAA provisions.
  • Implementation of a comprehensive four-layer security architecture that includes kernel isolation, credential proxy, network egress policies, and a prompt integrity framework.
  • Deployment of an automated security audit agent that continuously monitors the fleet of AI agents, resulting in rapid identification and remediation of security vulnerabilities.
  • Empirical results showing progressive hardening of a production fleet over 90 days, addressing high-severity security findings effectively.
  • Open sourcing of the security architecture, tools, and findings to promote transparency and collaboration in tackling AI security challenges.

💡 Why This Paper Matters

The paper details a critical approach to securing autonomous AI agents in healthcare, emphasizing the importance of addressing vulnerabilities associated with the handling of sensitive data under the HIPAA regulations. By showcasing a robust security architecture and continuous monitoring mechanisms, it provides a significant contribution to the development of secure AI systems. This architecture not only mitigates risks but also fosters trust in the deployment of AI in sensitive environments, ultimately improving patient safety and data integrity.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant to AI security researchers as it addresses the emerging security challenges posed by autonomous AI agents, particularly in regulated environments such as healthcare. Researchers can benefit from the detailed threat model, innovative defense mechanisms, and practical implementation experiences shared. Moreover, the release of open-source tools and configurations enables further research and development in building secure AI systems, promoting collaboration within the AI security community.

📚 Read the Full Paper