← Back to Library

AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs

Authors: Yihan Wang, Huanqi Yang, Shantanu Pal, Weitao Xu

Published: 2025-12-24

arXiv ID: 2512.20986v1

Added to Library: 2026-01-07 10:08 UTC

Red Teaming

📄 Abstract

The integration of Large Language Models (LLMs) into wearable sensing is creating a new class of mobile applications capable of nuanced human activity understanding. However, the reliability of these systems is critically undermined by their vulnerability to prompt injection attacks, where attackers deliberately input deceptive instructions into LLMs. Traditional defenses, based on static filters and rigid rules, are insufficient to address the semantic complexity of these new attacks. We argue that a paradigm shift is needed -- from passive filtering to active protection and autonomous reasoning. We introduce AegisAgent, an autonomous agent system designed to ensure the security of LLM-driven HAR systems. Instead of merely blocking threats, AegisAgent functions as a cognitive guardian. It autonomously perceives potential semantic inconsistencies, reasons about the user's true intent by consulting a dynamic memory of past interactions, and acts by generating and executing a multi-step verification and repair plan. We implement AegisAgent as a lightweight, full-stack prototype and conduct a systematic evaluation on 15 common attacks with five state-of-the-art LLM-based HAR systems on three public datasets. Results show it reduces attack success rate by 30\% on average while incurring only 78.6 ms of latency overhead on a GPU workstation. Our work makes the first step towards building secure and trustworthy LLM-driven HAR systems.

🔍 Key Points

  • Introduction of AegisAgent, an autonomous defense agent designed to counter prompt injection attacks specifically in Large Language Model (LLM) - Human Activity Recognition (HAR) systems, shifting from passive filtering to active protection and reasoning.
  • AegisAgent incorporates a unique multi-layered architecture comprising an Input Sanitizer, Consistency Verifier, and Robust Reasoner, enabling it to detect, correct, and recover from injection attacks without human intervention.
  • The paper presents a systematic evaluation demonstrating AegisAgent's effectiveness, achieving an average attack success rate reduction of 30% and a detection accuracy of 85% across five different state-of-the-art LLM-HAR systems and multiple public datasets.
  • A comprehensive analysis of various prompt injection attack methods is performed, outlining their severity, impact, and the necessity for new automated defense mechanisms tailored to LLM-driven HAR systems.
  • AegisAgent is designed to be model-agnostic, requiring no specific training, making it versatile for integration across different models and applications.

💡 Why This Paper Matters

The paper presents AegisAgent as a significant advancement in enhancing the security of LLM-HAR systems against prompt injection attacks. By providing an autonomously operating defense mechanism, it addresses a critical vulnerability, ensuring not only the reliability of such systems but also contributing to the safety and accountability of AI applications that rely on human activity understanding.

🎯 Why It's Interesting for AI Security Researchers

This work is particularly relevant for AI security researchers as it tackles a newly emerging category of attacks on multimodal AI systems, highlighting the vulnerabilities of LLMs when integrated with sensory data. It presents a novel defense mechanism that not only mitigates security risks but also enriches the understanding of how to secure AI systems against complex, emergent threats in a real-world context.

📚 Read the Full Paper