CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

📄 Abstract

LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code execution capabilities remains underexplored. Existing benchmarks are limited to static datasets or simulated environments, failing to capture the security risks arising from dynamic code execution, tool interactions, and multi-turn context. To bridge this gap, we introduce CIBER, an automated benchmark that combines dynamic attack generation, isolated secure sandboxing, and state-aware evaluation to systematically assess the vulnerability of code interpreter agents against four major types of adversarial attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-based Backdoor. We evaluate six foundation models across two representative code interpreter agents (OpenInterpreter and OpenCodeInterpreter), incorporating a controlled study of identical models. Our results reveal that Interpreter Architecture and Model Alignment Set the Security Baseline. Structural integration enables aligned specialized models to outperform generic SOTA models. Conversely, high intelligence paradoxically increases susceptibility to complex adversarial prompts due to stronger instruction adherence. Furthermore, we identify a "Natural Language Disguise" Phenomenon, where natural language functions as a significantly more effective input modality than explicit code snippets (+14.1% ASR), thereby bypassing syntax-based defenses. Finally, we expose an alarming Security Polarization, where agents exhibit robust defenses against explicit threats yet fail catastrophically against implicit semantic hazards, highlighting a fundamental blind spot in current pattern-matching protection approaches.

🔍 Key Points

Introduction of CIBER, an automated benchmark framework for evaluating the security of code interpreter agents against dynamic adversarial attacks.
Identification of four major types of attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-Based Backdoor, revealing significant security vulnerabilities in existing models.
Experimental results showing that architecture and model alignment significantly impact security, with specialized models outperforming generic ones in defensive capabilities.
Discovery of the 'Natural Language Disguise' effect, where natural language descriptions bypass syntax-based defenses, highlighting limitations in current threat detection methods.
The establishment of a three-tier vulnerability hierarchy that exposes the inherent weaknesses of agent defenses against explicit vs. implicit threats.

💡 Why This Paper Matters

The paper presents a significant advancement in the field of AI security by introducing the CIBER framework, which systematically evaluates the security of code interpreter agents in real-world scenarios. By uncovering critical vulnerabilities through empirical testing across various models and attack strategies, this work not only enhances our understanding of the security landscape but also sets the stage for future research and development of more robust AI systems. Its insights pave the way for better security practices in the deployment of AI agents, which are increasingly being utilized in sensitive applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant for AI security researchers because it addresses an underexplored area concerning the vulnerabilities of large language models and code interpreters. With the increasing integration of AI in critical systems, understanding the security implications of code execution capabilities is vital. The novel benchmarking method and substantial findings on attack success rates provide a foundational framework for future investigations into AI security. Additionally, the paper highlights the importance of architectural considerations in defense strategies, prompting further research on secure AI deployment.

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper