Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

March 30 - April 05, 2026

12 papers

KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents

Cormac Guerin, Frank Guerin
2026-03-31
2604.02375v1

A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation

Ha Na Cho
2026-03-31
safety
2604.00249v1

EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method

Yanting Wang, Jinyuan Jia
2026-03-31
red teaming
2603.30034v1

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh
2026-03-31
red teaming
2603.30016v1

Performative Scenario Optimization

Quanyan Zhu, Zhengye Han
2026-03-31
red teaming
2603.29982v1

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Meiwen Ding, Song Xia, Chenqi Kong, Xudong Jiang
2026-03-31
red teaming
2603.29418v1

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Bilgehan Sel, Xuanli He, Alwin Peng, Ming Jin, Jerry Wei
2026-03-30
red teaming
2603.29038v1

Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code

Zihao Xu, Xiao Cheng, Ruijie Meng, Yuekang Li
2026-03-30
2603.28345v1

Evaluating Privilege Usage of Agents on Real-World Tools

Quan Zhang, Lianhang Fu, Lvsi Lian, Gwihwan Go, Yujue Wang, Chijin Zhou, Yu Jiang, Geguang Pu
2026-03-30
red teaming
2603.28166v1

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang
2026-03-30
red teaming
2603.28013v2

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang
2026-03-30
red teaming
2603.28013v1

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur
2026-03-30
red teaming
2603.27918v1

March 23 - March 29, 2026

12 papers

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong, Songze Li
2026-03-29
red teaming
2603.27522v1

A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw AI Agent Framework

Surada Suwansathit, Yuxuan Zhang, Guofei Gu
2026-03-29
red teaming
2603.27517v1

GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models

Md Jueal Mia, Joaquin Molto, Yanzhao Wu, M. Hadi Amini
2026-03-28
red teaming
2603.28817v1

From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification

Huamin Chen, Xunzhuo Liu, Bowei He, Xue Liu
2026-03-28
2603.27299v1

SafeClaw-R: Towards Safe and Secure Multi-Agent Personal Assistants

Haoyu Wang, Zibo Xiao, Yedi Zhang, Christopher M. Poskitt, Jun Sun
2026-03-28
2603.28807v1

Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models

Hieu Xuan Le, Benjamin Goh, Quy Anh Tang
2026-03-26
red teaming
2603.25176v1

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang, Yongxin Guo, Xiaoying Tang
2026-03-26
red teaming
2603.25164v1

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan
2026-03-25
red teaming
2603.24857v1

Analysing the Safety Pitfalls of Steering Vectors

Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci
2026-03-25
red teaming
2603.24543v1

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym Andriushchenko
2026-03-25
red teaming
2603.24511v1

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Yulin Shen, Xudong Pan, Geng Hong, Min Yang
2026-03-25
red teaming
2603.24203v1

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

Qianlong Lan, Anuj Kaul
2026-03-24
2603.23791v1