Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1169 papers total

March 02 - March 08, 2026

6 papers

February 23 - March 01, 2026

18 papers

JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks

Masahiro Kaneko, Ayana Niwa, Timothy Baldwin
2026-03-01
red teaming
2603.01291v1

Tracking Capabilities for Safer Agents

Martin Odersky, Yaoyu Zhao, Yichen Xu, Oliver Bračevac, Cao Nguyen Pham
2026-03-01
2603.00991v1

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Yilian Liu, Xiaojun Jia, Guoshun Nan, Jiuyang Lyu, Zhican Chen, Tao Guan, Shuyuan Luo, Zhongyi Zhai, Yang Liu
2026-02-28
red teaming
2603.00565v1

From Goals to Aspects, Revisited: An NFR Pattern Language for Agentic AI Systems

Yijun Yu
2026-02-28
2603.00472v1

SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

Jialiang Fan, Weizhe Xu, Mengyu Liu, Oleg Sokolsky, Insup Lee, Fangxin Kong
2026-02-27
safety
2602.24235v1

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu
2026-02-27
red teaming
2602.24009v3

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu
2026-02-27
red teaming
2602.24009v2

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu
2026-02-27
red teaming
2602.24009v1

SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu, Chenxi Song, Yujun Cai, Chi Zhang
2026-02-27
2602.23956v1

LiaisonAgent: An Multi-Agent Framework for Autonomous Risk Investigation and Governance

Chuanming Tang, Ling Qing, Shifeng Chen
2026-02-27
2603.00200v1

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia
2026-02-26
red teaming
2602.22983v2

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia
2026-02-26
red teaming
2602.22983v1

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, Hongxin Hu
2026-02-26
red teaming
2602.22724v1

Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection

Marcus Graves
2026-02-26
red teaming
2603.00164v1

CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety

Umid Suleymanov, Rufiz Bayramov, Suad Gafarli, Seljan Musayeva, Taghi Mammadov, Aynur Akhundlu, Murat Kantarcioglu
2026-02-26
safety
2602.22557v1

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

Qianlong Lan, Anuj Kaul, Shaun Jones, Stephanie Westrum
2026-02-25
red teaming
2602.22450v1

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christopher Summerfield, Andrew Strait
2026-02-25
red teaming
2602.21831v2

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christopher Summerfield, Andrew Strait
2026-02-25
2602.21831v1