Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

March 16 - March 22, 2026

24 papers

The Autonomy Tax: Defense Training Breaks LLM Agents

Shawn Li, Yue Zhao
2026-03-19
red teaming
2603.19423v1

On Optimizing Multimodal Jailbreaks for Spoken Language Models

Aravind Krishnan, Karolina Stańczak, Dietrich Klakow
2026-03-19
red teaming
2603.19127v1

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu, Kai Chen, Xuejing Yuan, Wangjun Zhang
2026-03-19
safety
2603.18449v1

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah, Jungpil Shin
2026-03-19
safety
2603.18433v1

Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety

Xuan Chen, Lu Yan, Ruqi Zhang, Xiangyu Zhang
2026-03-18
safety
2603.18245v1

IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

Priyaranjan Pattnayak, Sanchari Chowdhuri
2026-03-18
safety
2603.17915v1

The Verifier Tax: Horizon Dependent Safety Success Tradeoffs in Tool Using LLM Agents

Tanmay Sah, Vishal Srivastava, Dolly Sah, Kayden Jordan
2026-03-18
safety
2603.19328v1

Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic Segmentation

Haocheng Li, Juepeng Zheng, Shuangxi Miao, Ruibo Lu, Guosheng Cai, Haohuan Fu, Jianxi Huang
2026-03-18
2603.17705v1

VeriGrey: Greybox Agent Validation

Yuntong Zhang, Sungmin Kang, Ruijie Meng, Marcel Böhme, Abhik Roychoudhury
2026-03-18
red teaming
2603.17639v1

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti
2026-03-18
red teaming
2603.17419v1

Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift

Zhihua Wei, Qiang Li, Jian Ruan, Zhenxin Qin, Leilei Wen, Dongrui Liu, Wen Shen
2026-03-18
red teaming
2603.17372v1

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations

Haozheng Luo, Yimin Wang, Jiahao Yu, Binghui Wang, Yan Chen
2026-03-18
red teaming
2603.17305v1

MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)

Yi Ting Shen, Kentaroh Toyoda, Alex Leung
2026-03-18
red teaming
2603.18063v1

Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

Caglar Yildirim
2026-03-17
red teaming
2603.16734v1

CoMAI: A Collaborative Multi-Agent Framework for Robust and Equitable Interview Evaluation

Gengxin Sun, Ruihao Yu, Liangyi Yin, Yunqi Yang, Bin Zhang, Zhiwei Xu
2026-03-17
2603.16215v1

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

Xiaobing Sun, Perry Lam, Shaohua Li, Zizhou Wang, Rick Siow Mong Goh, Yong Liu, Liangli Zhen
2026-03-17
red teaming
2603.16192v1

SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment

Zhouwei Zhai, Mengxiang Chen, Anmeng Zhang
2026-03-17
2603.16137v2

SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment

Zhouwei Zhai, Mengxiang Chen, Anmeng Zhang
2026-03-17
2603.16137v1

Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory

Ce Zhang, Jinxi He, Junyi He, Katia Sycara, Yaqi Xie
2026-03-16
2603.15800v1

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury, Jing Liu, Toshiaki Koike-Akino, Ming Jin, Ye Wang
2026-03-16
red teaming
2603.15417v1

SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration

Yu Pan, Wenlong Yu, Tiejun Wu, Xiaohu Ye, Qiannan Si, Guangquan Xu, Bin Wu
2026-03-16
2603.15397v1

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu, Micha Nowak, Nick Winter, Eliot Jones, Andy Zou, Lama Ahmad, Kamalika Chaudhuri, Sahana Chennabasappa, Xander Davies, Lauren Deason, Benjamin L. Edelman, Tanner Emek, Ivan Evtimov, Jim Gust, Maia Hamin, Kat He, Klaudia Krawiecka, Riccardo Patana, Neil Perry, Troy Peterson, Xiangyu Qi, Javier Rando, Zifan Wang, Zihan Wang, Spencer Whitman, Eric Winsor, Arman Zharmagambetov, Matt Fredrikson, Zico Kolter
2026-03-16
red teaming
2603.15714v1

Directional Embedding Smoothing for Robust Vision Language Models

Ye Wang, Jing Liu, Toshiaki Koike-Akino
2026-03-16
red teaming
2603.15259v1

Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI

Jinhu Qi, Yifan Li, Minghao Zhao, Wentao Zhang, Zijian Zhang, Yaoman Li, Irwin King
2026-03-16
red teaming
2603.14987v1