Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1172 papers total

October 27 - November 02, 2025

11 papers

VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia
2025-10-29
2510.25772v1

Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models

Juan Ren, Mark Dras, Usman Naseem
2025-10-29
red teaming
2510.25179v1

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection

Chanhyeong Yang, Taehoon Song, Jihwan Park, Hyunwoo J. Kim
2025-10-29
2510.25094v1

Compositional Image Synthesis with Inference-Time Scaling

Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn
2025-10-28
2510.24133v1

Fortytwo: Swarm Inference with Peer-Ranked Consensus

Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov, Ivan Nikitin, Alexander Firsov
2025-10-27
2510.24801v1

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

Zhenyu Zhang, Tianyi Chen, Weiran Xu, Alex Pentland, Jiaxin Pei
2025-10-27
2510.23822v1

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Yuanyuan Yuan, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v3

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, and Yuanyuan Yuan, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v2

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v1

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Jing-Xuan Zhang, Genshun Wan, Jin Li, Jianqing Gao
2025-10-27
2510.22961v1

FAME: Fairness-aware Attention-modulated Video Editing

Zhangkai Wu, Xuhui Fan, Zhongyuan Xie, Kaize Shi, Zhidong Li, Longbing Cao
2025-10-27
2510.22960v1

October 20 - October 26, 2025

13 papers

Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks

Md. Mehedi Hasan, Ziaur Rahman, Rafid Mostafiz, Md. Abir Hossain
2025-10-26
red teaming safety
2510.22628v1

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Pavlos Ntais
2025-10-24
red teaming
2510.22085v1

Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models

Sarah Ball, Niki Hasrati, Alexander Robey, Avi Schwarzschild, Frauke Kreuter, Zico Kolter, Andrej Risteski
2025-10-24
red teaming
2510.22014v1

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks

Havva Alizadeh Noughabi, Julien Serbanescu, Fattane Zarrinkalam, Ali Dehghantanha
2025-10-24
red teaming
2510.21983v1

Characterizing Low-Latency Sky Localization in Multi-Detector Gravitational-Wave Networks

Amazigh Ouzriat, Viola Sordini, Francesco Di Renzo
2025-10-24
2510.21930v1

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa, Jiachen T. Wang, Peng Gao, Charith Peris, Yao Ma, Rahul Gupta, Ming Jin, Prateek Mittal, Ruoxi Jia
2025-10-24
red teaming
2510.21910v1

FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models

Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin Delaney, Chris Russell
2025-10-24
2510.21363v1

When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails

Yingzhi Mao, Chunkang Zhang, Junxiang Wang, Xinyan Guan, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
2025-10-24
red teaming
2510.21285v2

When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails

Yingzhi Mao, Chunkang Zhang, Junxiang Wang, Xinyan Guan, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
2025-10-24
red teaming
2510.21285v1

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Xingwei Zhong, Kar Wai Fok, Vrizlynn L. L. Thing
2025-10-24
red teaming
2510.21214v1

The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning

Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok Yan Lam
2025-10-24
red teaming
2510.21190v1

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang
2025-10-24
red teaming
2510.21189v1

NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge

Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao
2025-10-24
red teaming
2510.21144v1