Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI

📄 Abstract

Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot prevent. We introduce two novel primitives--authenticated prompts and authenticated context--that provide cryptographically verifiable provenance across LLM workflows. Authenticated prompts enable self-contained lineage verification, while authenticated context uses tamper-evident hash chains to ensure integrity of dynamic inputs. Building on these primitives, we formalize a policy algebra with four proven theorems providing protocol-level Byzantine resistance--even adversarial agents cannot violate organizational policies. Five complementary defenses--from lightweight resource controls to LLM-based semantic validation--deliver layered, preventative security with formal guarantees. Evaluation against representative attacks spanning 6 exhaustive categories achieves 100% detection with zero false positives and nominal overhead. We demonstrate the first approach combining cryptographically enforced prompt lineage, tamper-evident context, and provable policy reasoning--shifting LLM security from reactive detection to preventative guarantees.

🔍 Key Points

Identification of hidden-comment prompt injection as a vulnerability in LLM Skills, enabling attackers to embed malicious instructions that could alter the model's tool-call intentions.
Demonstration through experiments shows that common LLMs are susceptible to these hidden-comment injections, which can lead to severe security implications.
Proposed defensive mechanisms, including a prompt-level guardrail that treats Skills as untrusted and forces the model to identify suspicious instructions, effectively prevent these attacks while maintaining legitimate functionalities.
Design implications highlight the need for clearer separation between user-visible documentation and model-consumed content to reduce the risk of user misinterpretation and improve overall system security.
Emphasis on the human factors in LLM interactions, showcasing how hidden instruction injections exploit the gap between user perceptions and model behaviors.

💡 Why This Paper Matters

This paper is significant as it sheds light on a previously unaddressed vulnerability in LLM agents related to the use of Skills, demonstrating how hidden comments can be a vector for malicious prompt injection. It establishes a foundation for improving security protocols and enhancing user trust in the use of LLMs in sensitive environments. The proposed defensive strategies have important implications for system design to mitigate exploitation risks without compromising usability.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper relevant as it addresses a critical aspect of security concerning large language models—specifically, the manipulation of LLM behavior through hidden comment injections. It uncovers a nuanced attack vector that could be exploited in various applications, from software development tools to conversational agents. The findings prompt further investigation into securing LLMs against similar threats, informing best practices for model development and deployment.

Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper