← Back to Library

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

Authors: J Alex Corll

Published: 2026-03-12

arXiv ID: 2603.11875v1

Added to Library: 2026-03-13 03:01 UTC

📄 Abstract

Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs on every request and therefore must be fast, deterministic, non-promptable, and auditable. We introduce Mirror, a data-curation design pattern that organizes prompt injection corpora into matched positive and negative cells so that a classifier learns control-plane attack mechanics rather than incidental corpus shortcuts. Using 5,000 strictly curated open-source samples -- the largest corpus supportable under our public-data validity contract -- we define a 32-cell mirror topology, fill 31 of those cells with public data, train a sparse character n-gram linear SVM, compile its weights into a static Rust artifact, and obtain 95.97\% recall and 92.07\% F1 on a 524-case holdout at sub-millisecond latency with no external model runtime dependencies. On the same holdout, our next line of defense, a 22-million-parameter Prompt Guard~2 model reaches 44.35\% recall and 59.14\% F1 at 49\,ms median and 324\,ms p95 latency. Linear models still leave residual semantic ambiguities such as use-versus-mention for later pipeline layers, but within that scope our results show that for L1 prompt injection screening, strict data geometry can matter more than model scale.

🔍 Key Points

  • The paper introduces a Cascade Red Teaming Framework that maps attacker goals and capabilities to a curated set of algorithmic, software, and hardware attack gadgets, enabling the composition of end-to-end attack chains in Compound AI systems.
  • It demonstrates two novel attacks that exploit traditional software and hardware vulnerabilities alongside algorithmic weaknesses to compromise AI safety and user confidentiality.
  • The authors systematize attack primitives and create a comprehensive corpus of vulnerabilities across various layers of the Compound AI stack, including traditional software vulnerabilities and hardware side channels.
  • The research highlights how system-level vulnerabilities can amplify adversarial threats in complex AI pipelines, where traditional defenses may not be sufficient, emphasizing the need for a more holistic security approach.
  • Case studies are presented showing effective multi-stage attacks on AI systems, illustrating how attackers can bypass guardrails and specific defenses through systematic exploitation of both algorithmic and system-level vulnerabilities.

💡 Why This Paper Matters

This paper is highly relevant as it addresses a critical intersection between traditional cybersecurity vulnerabilities and modern AI systems. It highlights the often-overlooked traditional software and hardware vulnerabilities that can be exploited in conjunction with adversarial attacks on AI, providing not only a detailed analysis of potential attack vectors but also a framework for understanding and testing these vulnerabilities. By demonstrating the risks inherent in Compound AI systems, it lays the groundwork for bolstering defenses and developing robust security strategies.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper of great interest as it expands the understanding of attack surfaces in AI systems beyond algorithmic vulnerabilities to include systemic risks from traditional software and hardware flaws. The introduction of the Cascade Red Teaming Framework offers a structured method for evaluating security across complex AI pipelines, which is crucial for enhancing the resilience of these systems against sophisticated attacks. Furthermore, the paper's experimental validation of various attack methods provides direct insights into potential real-world implications, making it a valuable resource for researchers focused on improving AI security.

📚 Read the Full Paper