← Back to Library

Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions

Authors: Piercosma Bisconti, Marcello Galisai, Federico Pierucci, Marcantonio Bracale, Matteo Prandi

Published: 2025-12-02

arXiv ID: 2512.02682v1

Added to Library: 2025-12-03 03:00 UTC

Safety

📄 Abstract

This paper examines why safety mechanisms designed for human-model interaction do not scale to environments where large language models (LLMs) interact with each other. Most current governance practices still rely on single-agent safety containment, prompts, fine-tuning, and moderation layers that constrain individual model behavior but leave the dynamics of multi-model interaction ungoverned. These mechanisms assume a dyadic setting: one model responding to one user under stable oversight. Yet research and industrial development are rapidly shifting toward LLM-to-LLM ecosystems, where outputs are recursively reused as inputs across chains of agents. In such systems, local compliance can aggregate into collective failure even when every model is individually aligned. We propose a conceptual transition from model-level safety to system-level safety, introducing the framework of the Emergent Systemic Risk Horizon (ESRH) to formalize how instability arises from interaction structure rather than from isolated misbehavior. The paper contributes (i) a theoretical account of collective risk in interacting LLMs, (ii) a taxonomy connecting micro, meso, and macro-level failure modes, and (iii) a design proposal for InstitutionalAI, an architecture for embedding adaptive oversight within multi-agent systems.

🔍 Key Points

  • Proposes the Emergent Systemic Risk Horizon (ESRH) as a framework for understanding how individual-level safety mechanisms can lead to collective failures in LLM-to-LLM interactions.
  • Introduces a comprehensive taxonomy of risks associated with multi-agent language models, categorizing them into micro, meso, and macro levels to better understand the dynamics of emergent behavior.
  • Describes the concept of Institutional AI, which embeds governance directly within multi-agent systems to enable adaptive oversight and norm maintenance rather than relying solely on external human supervision.

💡 Why This Paper Matters

This paper is critical as it addresses the gap in safety mechanisms for managing risks in LLM-to-LLM interactions, which could potentially exacerbate systemic failures. By proposing innovative frameworks and taxonomies, it sets the groundwork for more secure AI systems that can dynamically adjust to emergent risks during multi-agent interactions.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant to AI security researchers because it tackles the increasingly prevalent issue of managing risks associated with autonomous AI systems interacting in complex networks. Understanding systemic risks in these contexts is vital for developing robust safety protocols and governance mechanisms that can prevent or mitigate collective failures.

📚 Read the Full Paper