Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1169 papers total

September 08 - September 14, 2025

14 papers

ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs

Yibo Zhang, Liang Lin
2025-09-14
2509.11128v1

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding

Seongho Joo, Hyukhun Koh, Kyomin Jung
2025-09-13
red teaming
2509.10931v1

Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications

Janis Keuper
2025-09-12
red teaming
2509.10248v3

Realism Control One-step Diffusion for Real-World Image Super-Resolution

Zongliang Wu, Siming Zheng, Peng-Tao Jiang, Xin Yuan
2025-09-12
2509.10122v2

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

Changjia Zhu, Junjie Xiong, Renkai Ma, Zhicong Lu, Yao Liu, Lingyao Li
2025-09-12
red teaming
2509.09912v1

Steering MoE LLMs via Expert (De)Activation

Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Schütze, Nanyun Peng
2025-09-11
red teaming
2509.09660v1

Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M

Piyush Pant
2025-09-10
safety
2509.09055v1

PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability

Tung Vu, Lam Nguyen, Quynh Dao
2025-09-10
safety
2509.08910v1

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park
2025-09-10
red teaming
2509.08729v1

Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations

Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt
2025-09-10
2509.08646v1

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Kai Ye, Liangcai Su, Chenxiong Qian
2025-09-09
red teaming
2509.07941v1

Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling

Minghui Li, Hao Zhang, Yechao Zhang, Wei Wan, Shengshan Hu, pei Xiaobing, Jing Wang
2025-09-09
red teaming
2509.07617v1

SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

Hongfei Xia, Hongru Wang, Zeming Liu, Qian Yu, Yuhang Guo, Haifeng Wang
2025-09-09
safety
2509.07315v1

Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?

Junjie Mu, Zonghao Ying, Zhekui Fan, Zonglei Jing, Yaoyuan Zhang, Zhengmin Yu, Wenxin Zhang, Quanchen Zou, Xiangzheng Zhang
2025-09-08
red teaming
2509.06350v1

September 01 - September 07, 2025

10 papers

Measuring the Vulnerability Disclosure Policies of AI Vendors

Yangheran Piao, Jingjie Li, Daniel W. Woods
2025-09-07
2509.06136v1

Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs

Andrew Yeo, Daeseon Choi
2025-09-07
red teaming safety
2509.05883v1

AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs

Debdeep Sanyal, Manodeep Ray, Murari Mandal
2025-09-06
red teaming
2509.08000v1

EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System

Pavan Reddy, Aditya Sanjay Gujral
2025-09-06
red teaming
2509.10540v1

Behind the Mask: Benchmarking Camouflaged Jailbreaks in Large Language Models

Youjia Zheng, Mohammad Zandsalimy, Shanu Sushmita
2025-09-05
red teaming
2509.05471v1

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models

Danielle Ensign, Henry Sleight, Kyle Fish
2025-09-05
2509.04781v1

NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models

Chuhan Zhang, Ye Zhang, Bowen Shi, Yuyou Gan, Tianyu Du, Shouling Ji, Dazhan Deng, Yingcai Wu
2025-09-04
red teaming
2509.03985v1

Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs

Shei Pern Chua, Thai Zhen Leng, Teh Kai Jun, Xiao Li, Xiaolin Hu
2025-09-04
red teaming
2509.05367v1

SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models

Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang
2025-09-03
red teaming
2509.03487v1

BioBlue: Notable runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

Roland Pihlakas, Sruthi Kuriakose
2025-09-02
safety
2509.02655v1