Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

October 27 - November 02, 2025

24 papers

DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

Ruofan Liu, Yun Lin, Zhiyong Huang, Jin Song Dong
2025-11-01
2511.00447v2

DRIP: Defending Prompt Injection via De-instruction Training and Residual Fusion Model Architecture

Ruofan Liu, Yun Lin, Jin Song Dong
2025-11-01
red teaming
2511.00447v1

Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs

Mina Taraghi, Yann Pequignot, Amin Nikanjam, Mohamed Amine Merzouk, Foutse Khomh
2025-11-01
safety
2511.00382v1

Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks

Kayua Oleques Paim, Rodrigo Brandao Mansilha, Diego Kreutz, Muriel Figueredo Franco, Weverton Cordeiro
2025-11-01
red teaming
2511.00346v1

Diffusion LLMs are Natural Adversaries for any LLM

David Lüdke, Tom Wollschläger, Paul Ungermann, Stephan Günnemann, Leo Schwinn
2025-10-31
red teaming
2511.00203v1

Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Kathrin Grosse, Nico Ebert
2025-10-31
2510.27275v1

Consistency Training Helps Stop Sycophancy and Jailbreaks

Alex Irpan, Alexander Matt Turner, Mark Kurzeja, David K. Elson, Rohin Shah
2025-10-31
2510.27062v1

Reasoning Up the Instruction Ladder for Controllable Language Models

Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
2025-10-30
red teaming
2511.04694v3

Reasoning Up the Instruction Ladder for Controllable Language Models

Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
2025-10-30
red teaming
2511.04694v2

CATCH: A Modular Cross-domain Adaptive Template with Hook

Xinjin Li, Yulie Lu, Jinghan Cao, Yu Ma, Zhenglin Li, Yeyang Zhou
2025-10-30
2510.26582v1

Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token

Shaked Zychlinski, Yuval Kainan
2025-10-30
red teaming
2510.26847v1

Chain-of-Thought Hijacking

Jianli Zhao, Tingchen Fu, Rylan Schaeffer, Mrinank Sharma, Fazl Barez
2025-10-30
red teaming
2510.26418v1

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
2025-10-30
red teaming
2510.26328v1

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang
2025-10-30
red teaming
2510.26096v1

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

André V. Duarte, Xuying li, Bin Zeng, Arlindo L. Oliveira, Lei Li, Zhuo Li
2025-10-29
red teaming
2510.25941v1

VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia
2025-10-29
2510.25772v1

Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models

Juan Ren, Mark Dras, Usman Naseem
2025-10-29
red teaming
2510.25179v1

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection

Chanhyeong Yang, Taehoon Song, Jihwan Park, Hyunwoo J. Kim
2025-10-29
2510.25094v1

Compositional Image Synthesis with Inference-Time Scaling

Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn
2025-10-28
2510.24133v1

Fortytwo: Swarm Inference with Peer-Ranked Consensus

Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov, Ivan Nikitin, Alexander Firsov
2025-10-27
2510.24801v1

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

Zhenyu Zhang, Tianyi Chen, Weiran Xu, Alex Pentland, Jiaxin Pei
2025-10-27
2510.23822v1

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Yuanyuan Yuan, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v3

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, and Yuanyuan Yuan, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v2

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v1