Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1160 papers total

February 23 - March 01, 2026

10 papers

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Che Wang, Fuyao Zhang, Jiaming Zhang, Ziqi Zhang, Yinghui Wang, Longtao Huang, Jianbo Gao, Zhong Chen, Wei Yang Bryan Lim
2026-02-24
red teaming
2602.20708v1

An LLM-driven Scenario Generation Pipeline Using an Extended Scenic DSL for Autonomous Driving Safety Validation

Fida Khandaker Safa, Yupeng Jiang, Xi Zheng
2026-02-24
safety
2602.20644v1

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
2026-02-23
red teaming
2602.20156v3

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
2026-02-23
red teaming
2602.20156v2

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
2026-02-23
red teaming
2602.20156v1

BarrierSteer: LLM Safety via Learning Barrier Steering

Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao
2026-02-23
safety
2602.20102v1

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Zac Garby, Andrew D. Gordon, David Sands
2026-02-23
red teaming
2602.20064v1

Beyond the Binary: A nuanced path for open-weight advanced AI

Bengüsu Özcan, Alex Petropoulos, Max Reddel
2026-02-23
2602.19682v1

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

Lei Ba, Qinbin Li, Songze Li
2026-02-23
red teaming
2602.19547v1

Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement

Amirhossein Farzam, Majid Behabahani, Mani Malek, Yuriy Nevmyvaka, Guillermo Sapiro
2026-02-23
red teaming
2602.19396v1

February 16 - February 22, 2026

14 papers

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Emmanuel Bamidele
2026-02-22
2603.04443v1

MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav, Ava Cai, Kevin Zhu, Ruizhe Li, Maheep Chaudhary
2026-02-21
safety
2602.18782v1

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Mirae Kim, Seonghun Jeong, Youngjun Kwak
2026-02-20
red teaming
2602.18154v1

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Manuel Wirth
2026-02-19
red teaming
2602.18514v1

Fail-Closed Alignment for Large Language Models

Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong
2026-02-19
red teaming
2602.16977v1

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Arnold Cartagena, Ariane Teixeira
2026-02-18
safety
2602.16943v1

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar
2026-02-18
red teaming
2602.16935v1

NeST: Neuron Selective Tuning for LLM Safety

Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi
2026-02-18
safety
2602.16835v1

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak, Sanchari Chowdhuri
2026-02-18
red teaming
2602.16832v1

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Somesh Jha
2026-02-18
2602.16708v2

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Mihai Christodorescu, Somesh Jha
2026-02-18
2602.16708v1

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai
2026-02-18
safety
2602.16660v1

Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents

Doron Shavit
2026-02-18
red teaming
2602.16520v1

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut
2026-02-18
red teaming
2602.16346v2