← Back to Library

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Authors: Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah, Jungpil Shin

Published: 2026-03-19

arXiv ID: 2603.18433v1

Added to Library: 2026-03-20 03:00 UTC

Safety

πŸ“„ Abstract

Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.

πŸ” Key Points

  • Proposes Prompt Control-Flow Integrity (PCFI) to address prompt injection attacks on LLMs by preserving the authority structure of prompts through a structured representation
  • PCFI introduces a three-stage middleware pipeline that includes lexical heuristics, role-switch detection, and hierarchical policy enforcement to assess and control prompt requests before reaching LLMs
  • Achieves a 0% Attack Pass-Through Rate and a 0% False Positive Rate during evaluation, showcasing effective detection of malicious prompts with minimal latency overhead (0.04ms)
  • Implements PCFI as a practical FastAPI-based middleware for real-world applications, emphasizing ease of integration without requiring model retraining
  • Identifies gaps in existing defense mechanisms, highlighting the need for approaches that consider the intricate structure of prompt assembly in LLM systems.

πŸ’‘ Why This Paper Matters

This paper is significant as it provides a robust and lightweight defense mechanism specifically designed for large language model systems against prompt injection attacks. The proposed PCFI framework not only strengthens the security of LLM applications but also enhances their reliability and safety in practical deployments. By modeling prompts as structured entities with an authority hierarchy, it successfully mitigates risks associated with adversarial prompts, thereby contributing to safer AI interactions.

🎯 Why It's Interesting for AI Security Researchers

This paper is of great interest to AI security researchers as it tackles a critically relevant and emerging threatβ€”prompt injection in LLM systems. The novel defense strategy presented through PCFI addresses existing vulnerabilities in deployed AI setups and offers a systematic approach to maintaining the integrity of prompt-based interactions. Its emphasis on lightweight and practical solutions provides valuable insights for developing robust security frameworks in AI applications, making it a pertinent study for researchers focused on securing AI systems against increasingly sophisticated adversarial techniques.

πŸ“š Read the Full Paper