← Back to Library

Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework

Authors: Alsharif Abuadbba, Nazatul Sultan, Surya Nepal, Sanjay Jha

Published: 2026-02-02

arXiv ID: 2602.01942v1

Added to Library: 2026-02-03 08:00 UTC

πŸ“„ Abstract

AI is moving from domain-specific autonomy in closed, predictable settings to large-language-model-driven agents that plan and act in open, cross-organizational environments. As a result, the cybersecurity risk landscape is changing in fundamental ways. Agentic AI systems can plan, act, collaborate, and persist over time, functioning as participants in complex socio-technical ecosystems rather than as isolated software components. Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail to capture risks that arise from autonomy, interaction, and emergent behavior. This article introduces the 4C Framework for multi-agent AI security, inspired by societal governance. It organizes agentic risks across four interdependent dimensions: Core (system, infrastructure, and environmental integrity), Connection (communication, coordination, and trust), Cognition (belief, goal, and reasoning integrity), and Compliance (ethical, legal, and institutional governance). By shifting AI security from a narrow focus on system-centric protection to the broader preservation of behavioral integrity and intent, the framework complements existing AI security strategies and offers a principled foundation for building agentic AI systems that are trustworthy, governable, and aligned with human values.

πŸ” Key Points

  • Introduction of RACA, a novel set of representation-aware coverage criteria specifically designed for Large Language Model (LLM) safety testing, addressing scalability and irrelevance issues of traditional neuron-level criteria.
  • RACA operates through three key stages: identifying safety-critical representations using a calibration set, calculating conceptual activation scores, and computing coverage results based on six comprehensive sub-criteria focused on individual and compositional safety concepts.
  • Comprehensive experiments demonstrate RACA's superiority over traditional neuron-level coverage metrics, showcasing its effectiveness in identifying high-quality jailbreak prompts and its robust application in real-world scenarios such as test set prioritization and attack prompt sampling.
  • RACA's design principles ensure it is synonym-insensitive to prevent redundancy, invalid-insensitive to eliminate irrelevant inputs, and jailbreak-sensitive to focus on potential threats, making it a principled evaluation framework for LLM safety.
  • RACA confirms its generalization ability across various LLM architectures and configurations, proving applicable even when the calibration set size is reduced.

πŸ’‘ Why This Paper Matters

This paper is significant as it proposes a much-needed solution to the pressing issue of LLM safety testing amidst rising security concerns about harmful content generation. By developing RACA, the authors provide a systematic approach that not only enhances the robustness and effectiveness of safety evaluations for LLMs but also offers practical applications in real-world scenarios. The findings emphasize the critical nature of adapting testing frameworks to the unique characteristics of LLMs, facilitating improved AI safety management.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of keen interest to AI security researchers as it addresses a fundamental challenge in the fieldβ€”ensuring the safety of LLMs against adversarial attacks, particularly jailbreaks. The introduction of RACA presents a specialized framework that focuses on the unique architectures and operational nuances of LLMs, promising to advance methodologies for evaluating and enhancing the safety of AI systems. The research has implications for developing more robust defenses and tools in the battle against AI misuse, which is a significant concern within the AI community.

πŸ“š Read the Full Paper