← Back to Library

ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents

Authors: David Peer, Sebastian Stabinger

Published: 2025-10-18

arXiv ID: 2510.16381v1

Added to Library: 2025-11-14 23:09 UTC

📄 Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities, yet their deployment in high-stakes domains is hindered by inherent limitations in trustworthiness, including hallucinations, instability, and a lack of transparency. To address these challenges, we introduce a generic neuro-symbolic approach, which we call Autonomous Trustworthy Agents (ATA). The core of our approach lies in decoupling tasks into two distinct phases: Offline knowledge ingestion and online task processing. During knowledge ingestion, an LLM translates an informal problem specification into a formal, symbolic knowledge base. This formal representation is crucial as it can be verified and refined by human experts, ensuring its correctness and alignment with domain requirements. In the subsequent task processing phase, each incoming input is encoded into the same formal language. A symbolic decision engine then utilizes this encoded input in conjunction with the formal knowledge base to derive a reliable result. Through an extensive evaluation on a complex reasoning task, we demonstrate that a concrete implementation of ATA is competitive with state-of-the-art end-to-end reasoning models in a fully automated setup while maintaining trustworthiness. Crucially, with a human-verified and corrected knowledge base, our approach significantly outperforms even larger models, while exhibiting perfect determinism, enhanced stability against input perturbations, and inherent immunity to prompt injection attacks. By generating decisions grounded in symbolic reasoning, ATA offers a practical and controllable architecture for building the next generation of transparent, auditable, and reliable autonomous agents.

🔍 Key Points

  • Introduction of the Malicious Token Injection (MTI) attack framework that targets the key-value cache of transformer models during inference, revealing a significant attack surface that has been largely overlooked.
  • Theoretically quantifies the impact of cache perturbations on attention mechanisms, linking the extent of cache corruption to shifts in token distribution and downstream model performance.
  • Empirical results show that the MTI framework consistently reduces task performance across various NLP benchmarks, including classification and question answering, highlighting the vulnerability during inference.
  • Identifies specific vulnerabilities in retrieval-augmented generation systems and agentic reasoning pipelines, demonstrating that perturbations in cached representations can significantly impair their functionality.
  • Presents lightweight defense strategies such as cache resetting and dropout-mask randomization that offer partial mitigation against cache corruption, underlining the importance of cache integrity in model robustness.

💡 Why This Paper Matters

This paper is highly relevant as it introduces a novel perspective on the vulnerabilities of large language models by focusing on the key-value cache during inference. The findings stress the importance of maintaining cache integrity to ensure robust and secure LLM deployments, particularly in safety-critical applications. It paves the way for future research on defending against cache-side attacks and understanding their broader implications in model behavior and reliability.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is of significant interest as it not only exposes a critical and previously underexplored area of vulnerability in large language models but also provides a formalized methodology for evaluating such risks. The introduction of a systematic attack framework paired with theoretical and empirical validations presents a comprehensive case for reconsidering how LLM security is approached. As AI systems become more integrated into sensitive applications, understanding and mitigating these vulnerabilities is crucial.

📚 Read the Full Paper