KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents

📄 Abstract

Tool-calling autonomous agents based on large language models using ReAct exhibit three limitations: serial latency, quadratic context growth, and vulnerability to prompt injection and hallucination. Recent work moves towards separating planning from execution but in each case the model remains coupled to the execution mechanics. We introduce a system-level abstraction for LLM agents which decouples the execution of agent workflows from the LLM reasoning layer. We define two first-class abstractions: (1) Intent-Gated Execution (IGX), a security paradigm that enforces intent at execution, and (2) an Executive Kernel that manages scheduling, tool dispatch, dependency resolution, failures and security. In KAIJU, the LLM plans upfront, optimistically scheduling tools in parallel with dependency-aware parameter injection. Tools are authorised via IGX based on four independent variables: scope, intent, impact, and clearance (external approval). KAIJU supports three adaptive execution modes (Reflect, nReflect, and Orchestrator), providing progressively finer-grained execution control apt for complex investigation and deep analysis or research. Empirical evaluation against a ReAct baseline shows that KAIJU has a latency penalty on simple queries due to planning overhead, convergence at moderate complexity, and a structural advantage on computational queries requiring parallel data gathering. Beyond latency, the separation enforces behavioural guarantees that ReAct cannot match through prompting alone. Code available at https://github.com/compdeep/kaiju

🔍 Key Points

Introduction of FoodGuardBench, the first comprehensive benchmark for evaluating the safety of LLMs in the food safety domain, containing 3,339 queries based on FDA guidelines.
Identification of critical vulnerabilities in existing LLMs, including inadequate safety alignment, susceptibility to common jailbreak attacks, and generation of harmful instructions.
Development of FoodGuard-4B, a specialized guardrail model that significantly enhances threat detection against malicious input in food-related contexts, achieving high accuracy and low false positive rates.
Empirical evaluations showing that current LLM guardrails often fail to detect significant risks specific to food safety, necessitating targeted safety solutions.
Comprehensive methodology for evaluating food safety risks in LLMs, including a robust adversarial query generation pipeline.

💡 Why This Paper Matters

This paper presents significant advancements in the evaluation and enhancement of food safety protocols within large language models. By creating tools like FoodGuardBench and FoodGuard-4B, it addresses a critical gap in the safety alignment of LLMs used in high-stakes domains such as food preparation and health guidance. The systematic evaluation of vulnerabilities and the proposed solutions provide a foundational framework for improving LLM deployment in sensitive applications, thus enhancing user safety and trust.

🎯 Why It's Interesting for AI Security Researchers

The paper is highly relevant to AI security researchers as it tackles the pressing issue of safety and robustness in AI systems operating in critical areas like food safety. By exposing vulnerabilities and proposing a novel benchmark and guardrail model, it sets the stage for future research focusing on aligning AI systems with specific safety requirements. This work not only highlights the risks associated with LLM deployment but also offers practical tools to mitigate such risks, paving the way for safer AI applications.

KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper