Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

February 23 - March 01, 2026

8 papers

February 16 - February 22, 2026

16 papers

Prompt Injection as Role Confusion

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell
2026-02-22
red teaming
2603.12277v2

Prompt Injection as Role Confusion

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell
2026-02-22
red teaming
2603.12277v1

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Emmanuel Bamidele
2026-02-22
2603.04443v1

MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav, Ava Cai, Kevin Zhu, Ruizhe Li, Maheep Chaudhary
2026-02-21
safety
2602.18782v1

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Mirae Kim, Seonghun Jeong, Youngjun Kwak
2026-02-20
red teaming
2602.18154v1

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Manuel Wirth
2026-02-19
red teaming
2602.18514v1

Fail-Closed Alignment for Large Language Models

Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong
2026-02-19
red teaming
2602.16977v1

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Arnold Cartagena, Ariane Teixeira
2026-02-18
safety
2602.16943v1

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar
2026-02-18
red teaming
2602.16935v1

NeST: Neuron Selective Tuning for LLM Safety

Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi
2026-02-18
safety
2602.16835v1

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak, Sanchari Chowdhuri
2026-02-18
red teaming
2602.16832v1

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Somesh Jha
2026-02-18
2602.16708v2

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Mihai Christodorescu, Somesh Jha
2026-02-18
2602.16708v1

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai
2026-02-18
safety
2602.16660v1

Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents

Doron Shavit
2026-02-18
red teaming
2602.16520v1

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut
2026-02-18
red teaming
2602.16346v2