← Back to Library

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

Authors: Qianlong Lan, Anuj Kaul

Published: 2026-03-24

arXiv ID: 2603.23791v1

Added to Library: 2026-03-26 02:02 UTC

📄 Abstract

Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage split-compute architecture that distributes security checks across the client and the cloud. The system consists of a local visual Sentinel, a cloud-based Deep Planner, and a deterministic Guard that enforces execution-time policies. Across 1,000 adversarial samples, edge-only defenses fail to detect 86.9% of semantic attacks. In contrast, the full hybrid architecture reduces the overall attack success rate (ASR) to below 1% (0.88% under static evaluation and 0.67% under adaptive evaluation), while maintaining deterministic constraints on side-effecting actions. By filtering presentation-layer attacks locally, the system avoids unnecessary cloud inference and achieves an approximately 17,000x latency advantage over cloud-only baselines. These results indicate that deterministic enforcement at the execution boundary can complement probabilistic language models, and that split-compute provides a practical foundation for securing interactive LLM agents.

🔍 Key Points

  • The paper introduces the Tree-structured Injection for Payloads (TIP) framework, a novel black-box attack method designed to exploit the Model Context Protocol (MCP) in large language models (LLMs).
  • TIP generates stealthy injection payloads through a tree-structured adaptive search method that prioritizes semantic coherence and adversarial effectiveness, allowing it to surpass existing techniques in attack success rates and efficiency.
  • Extensive experiments demonstrate that TIP achieves over 95% attack success rates in undefended settings and maintains over 50% effectiveness against state-of-the-art defense mechanisms, highlighting a significant vulnerability in the MCP ecosystem.
  • The authors emphasize the practical implications of their findings, exposing real-world security risks associated with tool-augmented LLMs and stressing the urgent need for improved defenses against such attacks.
  • A case study illustrates the effectiveness of TIP in real-world scenarios, showcasing its ability to exploit the inherent trust placed in third-party tools and the consequences of such vulnerabilities.

💡 Why This Paper Matters

This paper is highly relevant as it uncovers critical vulnerabilities in the integration of large language models with external tools through the Model Context Protocol. The proposed TIP framework not only highlights the ease with which adversaries can manipulate these systems but also reveals the inadequacies of current defense mechanisms, necessitating urgent improvements in AI security protocols.

🎯 Why It's Interesting for AI Security Researchers

The findings of this study are of significant interest to AI security researchers as they highlight an underexplored attack vector in LLMs, demonstrating the potential for real-world exploitation of tool-augmented systems. The empirical results and advanced methodologies presented in the paper provide valuable insights for developing more robust defenses and understanding adversarial behaviors in AI systems.

📚 Read the Full Paper