← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 1331 papers total
October 27 - November 02, 2025
24 papers
DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion
Ruofan Liu, Yun Lin, Zhiyong Huang, Jin Song Dong
2025-11-01
2511.00447v2
DRIP: Defending Prompt Injection via De-instruction Training and Residual Fusion Model Architecture
Ruofan Liu, Yun Lin, Jin Song Dong
2025-11-01
red teaming
2511.00447v1
Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs
Mina Taraghi, Yann Pequignot, Amin Nikanjam, Mohamed Amine Merzouk, Foutse Khomh
2025-11-01
safety
2511.00382v1
Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks
Kayua Oleques Paim, Rodrigo Brandao Mansilha, Diego Kreutz, Muriel Figueredo Franco, Weverton Cordeiro
2025-11-01
red teaming
2511.00346v1
Diffusion LLMs are Natural Adversaries for any LLM
David Lüdke, Tom Wollschläger, Paul Ungermann, Stephan Günnemann, Leo Schwinn
2025-10-31
red teaming
2511.00203v1
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents
Kathrin Grosse, Nico Ebert
2025-10-31
2510.27275v1
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan, Alexander Matt Turner, Mark Kurzeja, David K. Elson, Rohin Shah
2025-10-31
2510.27062v1
Reasoning Up the Instruction Ladder for Controllable Language Models
Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
2025-10-30
red teaming
2511.04694v3
Reasoning Up the Instruction Ladder for Controllable Language Models
Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
2025-10-30
red teaming
2511.04694v2
CATCH: A Modular Cross-domain Adaptive Template with Hook
Xinjin Li, Yulie Lu, Jinghan Cao, Yu Ma, Zhenglin Li, Yeyang Zhou
2025-10-30
2510.26582v1
Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token
Shaked Zychlinski, Yuval Kainan
2025-10-30
red teaming
2510.26847v1
Chain-of-Thought Hijacking
Jianli Zhao, Tingchen Fu, Rylan Schaeffer, Mrinank Sharma, Fazl Barez
2025-10-30
red teaming
2510.26418v1
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
2025-10-30
red teaming
2510.26328v1
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang
2025-10-30
red teaming
2510.26096v1
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline
André V. Duarte, Xuying li, Bin Zeng, Arlindo L. Oliveira, Lei Li, Zhuo Li
2025-10-29
red teaming
2510.25941v1
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia
2025-10-29
2510.25772v1
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
Juan Ren, Mark Dras, Usman Naseem
2025-10-29
red teaming
2510.25179v1
Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection
Chanhyeong Yang, Taehoon Song, Jihwan Park, Hyunwoo J. Kim
2025-10-29
2510.25094v1
Compositional Image Synthesis with Inference-Time Scaling
Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn
2025-10-28
2510.24133v1
Fortytwo: Swarm Inference with Peer-Ranked Consensus
Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov, Ivan Nikitin, Alexander Firsov
2025-10-27
2510.24801v1
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents
Zhenyu Zhang, Tianyi Chen, Weiran Xu, Alex Pentland, Jiaxin Pei
2025-10-27
2510.23822v1
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Yuanyuan Yuan, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v3
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, and Yuanyuan Yuan, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v2
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
2025-10-27
red teaming
2510.23675v1
‹
1
2
3
...
29
30
31
...
54
55
56
›