Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1169 papers total

February 16 - February 22, 2026

16 papers

NeST: Neuron Selective Tuning for LLM Safety

Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi
2026-02-18
safety
2602.16835v1

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak, Sanchari Chowdhuri
2026-02-18
red teaming
2602.16832v1

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Somesh Jha
2026-02-18
2602.16708v2

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Prasad Chalasani, Mihai Christodorescu, Somesh Jha
2026-02-18
2602.16708v1

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai
2026-02-18
safety
2602.16660v1

Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents

Doron Shavit
2026-02-18
red teaming
2602.16520v1

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut
2026-02-18
red teaming
2602.16346v2

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut
2026-02-18
red teaming
2602.16346v1

The Vulnerability of LLM Rankers to Prompt Injection Attacks

Yu Yin, Shuai Wang, Bevan Koopman, Guido Zuccon
2026-02-18
red teaming
2602.16752v1

Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

Scott Thornton
2026-02-18
safety
2602.16741v1

Intent Laundering: AI Safety Datasets Are Not What They Seem

Shahriar Golchin, Marc Wetter
2026-02-17
red teaming
2602.16729v1

Boundary Point Jailbreaking of Black-Box LLMs

Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, Geoffrey Irving, Yarin Gal
2026-02-16
red teaming
2602.15001v2

Boundary Point Jailbreaking of Black-Box LLMs

Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, Geoffrey Irving, Yarin Gal
2026-02-16
red teaming
2602.15001v1

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

Lukas Struppek, Adam Gleave, Kellin Pelrine
2026-02-16
red teaming
2602.14689v1

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

In Chong Choi, Jiacheng Zhang, Feng Liu, Yiliao Song
2026-02-16
red teaming
2602.14399v1

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang
2026-02-16
2602.14364v1

February 09 - February 15, 2026

8 papers