← Back to Library

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

Authors: Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang

Published: 2025-12-07

arXiv ID: 2512.06716v1

Added to Library: 2025-12-09 03:00 UTC

Red Teaming

📄 Abstract

Autonomous Large Language Model (LLM) agents exhibit significant vulnerability to Indirect Prompt Injection (IPI) attacks. These attacks hijack agent behavior by polluting external information sources, exploiting fundamental trade-offs between security and functionality in existing defense mechanisms. This leads to malicious and unauthorized tool invocations, diverting agents from their original objectives. The success of complex IPIs reveals a deeper systemic fragility: while current defenses demonstrate some effectiveness, most defense architectures are inherently fragmented. Consequently, they fail to provide full integrity assurance across the entire task execution pipeline, forcing unacceptable multi-dimensional compromises among security, functionality, and efficiency. Our method is predicated on a core insight: no matter how subtle an IPI attack, its pursuit of a malicious objective will ultimately manifest as a detectable deviation in the action trajectory, distinct from the expected legitimate plan. Based on this, we propose the Cognitive Control Architecture (CCA), a holistic framework achieving full-lifecycle cognitive supervision. CCA constructs an efficient, dual-layered defense system through two synergistic pillars: (i) proactive and preemptive control-flow and data-flow integrity enforcement via a pre-generated "Intent Graph"; and (ii) an innovative "Tiered Adjudicator" that, upon deviation detection, initiates deep reasoning based on multi-dimensional scoring, specifically designed to counter complex conditional attacks. Experiments on the AgentDojo benchmark substantiate that CCA not only effectively withstands sophisticated attacks that challenge other advanced defense methods but also achieves uncompromised security with notable efficiency and robustness, thereby reconciling the aforementioned multi-dimensional trade-off.

🔍 Key Points

  • Introduction of the Cognitive Control Architecture (CCA) as a holistic framework for enhancing the security of Large Language Model (LLM) agents against Indirect Prompt Injection (IPI) attacks.
  • The CCA features two synergistic pillars: a proactive Intent Graph for control-flow and data-flow integrity, and a dual-layered Tiered Adjudicator for high-fidelity anomaly detection and decision-making under attack scenarios.
  • Experimental results demonstrate that the CCA significantly reduces Attack Success Rates (ASR), while maintaining high task execution capabilities, outperforming traditional defense mechanisms in both security and efficiency.
  • The framework provides a much-needed solution to the fragmented nature of existing defense systems by achieving full-lifecycle cognitive supervision, thus addressing the multi-dimensional trade-offs between security, functionality, and efficiency.

💡 Why This Paper Matters

This paper presents the Cognitive Control Architecture (CCA), which is crucial for advancing the security of AI agents that utilize LLMs, particularly in unpredictable environments influenced by IPI threats. Its innovative design not only fortifies agents against sophisticated attacks but also ensures they maintain operational effectiveness, thereby supporting the safe deployment of autonomous AI in critical real-world tasks.

🎯 Why It's Interesting for AI Security Researchers

The relevance of this paper to AI security researchers lies in its focus on developing a robust defense mechanism against evolving cybersecurity threats, specifically within LLM frameworks. The comprehensive approach of the CCA offers a significant leap forward in ensuring agent integrity, making it highly pertinent for researchers interested in advancing the security paradigms for autonomous systems and addressing the challenges posed by IPI attacks.

📚 Read the Full Paper