LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs

Authors: Tom Pawelek, Raj Patel, Charlotte Crowell, Noorbakhsh Amiri, Sudip Mittal, Shahram Rahimi, Andy Perkins

Published: 2025-09-23

arXiv ID: 2509.18557v1

Added to Library: 2025-09-24 04:01 UTC

Red Teaming

📄 Abstract

Compared to traditional models, agentic AI represents a highly valuable target for potential attackers as they possess privileged access to data sources and API tools, which are traditionally not incorporated into classical agents. Unlike a typical software application residing in a Demilitarized Zone (DMZ), agentic LLMs consciously rely on nondeterministic behavior of the AI (only defining a final goal, leaving the path selection to LLM). This characteristic introduces substantial security risk to both operational security and information security. Most common existing defense mechanism rely on detection of malicious intent and preventing it from reaching the LLM agent, thus protecting against jailbreak attacks such as prompt injection. In this paper, we present an alternative approach, LLMZ+, which moves beyond traditional detection-based approaches by implementing prompt whitelisting. Through this method, only contextually appropriate and safe messages are permitted to interact with the agentic LLM. By leveraging the specificity of context, LLMZ+ guarantees that all exchanges between external users and the LLM conform to predefined use cases and operational boundaries. Our approach streamlines the security framework, enhances its long-term resilience, and reduces the resources required for sustaining LLM information security. Our empirical evaluation demonstrates that LLMZ+ provides strong resilience against the most common jailbreak prompts. At the same time, legitimate business communications are not disrupted, and authorized traffic flows seamlessly between users and the agentic LLM. We measure the effectiveness of approach using false positive and false negative rates, both of which can be reduced to 0 in our experimental setting.

🔍 Key Points

Introduction of LLMZ+, a new security framework for agentic LLMs based on contextual prompt whitelisting rather than traditional detection methods.
The framework aims to enhance security against prompt injection attacks ('jailbreaks') by only allowing contextually appropriate messages to interact with the LLM, thus reducing the attack surface.
LLMZ+ utilizes a dual filter mechanism (ingress and egress) to evaluate user input and model output for compliance with predefined operational boundaries.
Empirical evaluations showed that LLMZ+ successfully narrowed false positive and false negative rates to zero in controlled experiments, demonstrating its effectiveness over conventional methods.
The paper discusses practical deployment strategies and performance considerations for LLMZ+, making it applicable in real-world scenarios.

💡 Why This Paper Matters

The paper is significant as it addresses growing security concerns regarding agentic LLMs, which are susceptible to prompt injection attacks that can lead to serious data breaches and unauthorized actions. By presenting LLMZ+ as a proactive security mechanism, the authors push the boundaries of AI security practices toward more dynamic and context-aware solutions, marking a crucial advancement in the field.

🎯 Why It's Interesting for AI Security Researchers

This paper is of considerable interest to AI security researchers as it introduces a novel approach to securing AI systems from increasingly sophisticated adversarial attacks. The method's reliance on contextual understanding rather than heuristic detection offers a promising pathway for improving the robustness of AI systems. Furthermore, the empirical results and practical insights provide valuable data for future research and development efforts in AI security.

LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper