← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 1172 papers total
October 20 - October 26, 2025
6 papers
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
Xu Zhang, Hao Li, Zhichao Lu
2025-10-20
red teaming
2510.17687v1
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
Zhengshen Zhang, Hao Li, Yalun Dai, Zhengbang Zhu, Lei Zhou, Chenchen Liu, Dong Wang, Francis E. H. Tay, Sijin Chen, Ziwei Liu, Yuxiao Liu, Xinghang Li, Pan Zhou
2025-10-20
2510.17439v1
Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks
Xinkai Wang, Beibei Li, Zerui Shao, Ao Liu, Shouling Ji
2025-10-20
red teaming
2510.17277v1
JT-Safe: Intrinsically Enhancing the Safety and Trustworthiness of LLMs
Junlan Feng, Fanyu Meng, Chong Long, Pengyu Cong, Duqing Wang, Yan Zheng, Yuyao Zhang, Xuanchang Gao, Ye Yuan, Yunfei Ma, Zhijie Ren, Fan Yang, Na Wu, Di Jin, Chao Deng
2025-10-20
safety
2510.17918v1
Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models
Elias Hossain, Swayamjit Saha, Somshubhra Roy, Ravi Prasad
2025-10-20
red teaming
2510.17098v1
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation
Guoqing Luo, Iffat Maab, Lili Mou, Junichi Yamagishi
2025-10-20
2510.17062v1
October 13 - October 19, 2025
18 papers
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed, Xingzhi Guo, Daniel Kang, Joo-Kyung Kim
2025-10-19
safety
2510.17017v2
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed, Xingzhi Guo, Daniel Kang, Joo-Kyung Kim
2025-10-19
safety
2510.17017v1
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
Masahiro Kaneko, Zeerak Talat, Timothy Baldwin
2025-10-19
red teaming
2510.17006v1
Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks against LLMs
Masahiro Kaneko, Timothy Baldwin
2025-10-19
red teaming
2510.17000v1
BreakFun: Jailbreaking LLMs via Schema Exploitation
Amirkia Rafiei Oskooei, Mehmet S. Aktas
2025-10-19
red teaming
2510.17904v1
Black-box Optimization of LLM Outputs by Asking for Directions
Jie Zhang, Meng Ding, Yang Liu, Jue Hong, Florian Tramèr
2025-10-19
red teaming
2510.16794v1
Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety
Vamshi Krishna Bonagiri, Ponnurangam Kumaragurum, Khanh Nguyen, Benjamin Plaut
2025-10-18
safety
2510.16492v1
VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion
Jaekyun Park, Hye Won Chung
2025-10-18
2510.16446v1
ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents
David Peer, Sebastian Stabinger
2025-10-18
2510.16381v1
TokenAR: Multiple Subject Generation via Autoregressive Token-level enhancement
Haiyue Sun, Qingdong He, Jinlong Peng, Peng Tang, Jiangning Zhang, Junwei Zhu, Xiaobin Hu, Shuicheng Yan
2025-10-18
2510.16332v1
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
Zhehao Zhang, Weijie Xu, Shixian Cui, Chandan K. Reddy
2025-10-17
red teaming
2510.16259v1
Prompt injections as a tool for preserving identity in GAI image descriptions
Kate Glazko, Jennifer Mankoff
2025-10-17
2510.16128v1
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong
2025-10-17
red teaming
2510.15476v2
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong
2025-10-17
red teaming
2510.15476v1
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
2025-10-17
red teaming
2510.15430v2
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
2025-10-17
red teaming
2510.15430v1
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
Deyue Zhang, Dongdong Yang, Junjie Mu, Quancheng Zou, Zonghao Ying, Wenzhuo Xu, Zhao Liu, Xuan Wang, Xiangzheng Zhang
2025-10-16
red teaming
2510.15068v1
Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks
ChenYu Wu, Yi Wang, Yang Liao
2025-10-16
red teaming
2510.15017v1
‹
1
2
3
...
25
26
27
...
47
48
49
›