← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 770 papers total
October 27 - November 02, 2025
16 papers
Reasoning Up the Instruction Ladder for Controllable Language Models
Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
2025-10-30
red teaming
2511.04694v2
CATCH: A Modular Cross-domain Adaptive Template with Hook
Xinjin Li, Yulie Lu, Jinghan Cao, Yu Ma, Zhenglin Li, Yeyang Zhou
2025-10-30
2510.26582v1
Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token
Shaked Zychlinski, Yuval Kainan
2025-10-30
red teaming
2510.26847v1
Chain-of-Thought Hijacking
Jianli Zhao, Tingchen Fu, Rylan Schaeffer, Mrinank Sharma, Fazl Barez
2025-10-30
red teaming
2510.26418v1
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
2025-10-30
red teaming
2510.26328v1
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang
2025-10-30
red teaming
2510.26096v1
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline
André V. Duarte, Xuying li, Bin Zeng, Arlindo L. Oliveira, Lei Li, Zhuo Li
2025-10-29
red teaming
2510.25941v1
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia
2025-10-29
2510.25772v1
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
Juan Ren, Mark Dras, Usman Naseem
2025-10-29
red teaming
2510.25179v1
Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection
Chanhyeong Yang, Taehoon Song, Jihwan Park, Hyunwoo J. Kim
2025-10-29
2510.25094v1
Compositional Image Synthesis with Inference-Time Scaling
Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn
2025-10-28
2510.24133v1
Fortytwo: Swarm Inference with Peer-Ranked Consensus
Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov, Ivan Nikitin, Alexander Firsov
2025-10-27
2510.24801v1
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents
Zhenyu Zhang, Tianyi Chen, Weiran Xu, Alex Pentland, Jiaxin Pei
2025-10-27
2510.23822v1
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v1
Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition
Jing-Xuan Zhang, Genshun Wan, Jin Li, Jianqing Gao
2025-10-27
2510.22961v1
FAME: Fairness-aware Attention-modulated Video Editing
Zhangkai Wu, Xuhui Fan, Zhongyuan Xie, Kaize Shi, Zhidong Li, Longbing Cao
2025-10-27
2510.22960v1
October 20 - October 26, 2025
8 papers
Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks
Md. Mehedi Hasan, Ziaur Rahman, Rafid Mostafiz, Md. Abir Hossain
2025-10-26
red teaming
safety
2510.22628v1
Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models
Pavlos Ntais
2025-10-24
red teaming
2510.22085v1
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
Sarah Ball, Niki Hasrati, Alexander Robey, Avi Schwarzschild, Frauke Kreuter, Zico Kolter, Andrej Risteski
2025-10-24
red teaming
2510.22014v1
Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks
Havva Alizadeh Noughabi, Julien Serbanescu, Fattane Zarrinkalam, Ali Dehghantanha
2025-10-24
red teaming
2510.21983v1
Characterizing Low-Latency Sky Localization in Multi-Detector Gravitational-Wave Networks
Amazigh Ouzriat, Viola Sordini, Francesco Di Renzo
2025-10-24
2510.21930v1
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa, Jiachen T. Wang, Peng Gao, Charith Peris, Yao Ma, Rahul Gupta, Ming Jin, Prateek Mittal, Ruoxi Jia
2025-10-24
red teaming
2510.21910v1
FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models
Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin Delaney, Chris Russell
2025-10-24
2510.21363v1
When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails
Yingzhi Mao, Chunkang Zhang, Junxiang Wang, Xinyan Guan, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
2025-10-24
red teaming
2510.21285v2
‹
1
2
3
...
7
8
9
...
31
32
33
›