← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 1172 papers total
October 13 - October 19, 2025
24 papers
Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding
Ning Ding, Keisuke Fujii, Toru Tamaki
2025-10-16
2510.14617v1
Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs
Kyubyung Chae, Gihoon Kim, Gyuseong Lee, Taesup Kim, Jaejin Lee, Heejin Kim
2025-10-16
safety
2510.14565v1
Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
Andrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes
2025-10-16
red teaming
2510.14381v1
Towards Agentic Self-Learning LLMs in Search Environment
Wangtao Sun, Xiang Cheng, Jialin Fan, Yao Xu, Xing Yu, Shizhu He, Jun Zhao, Kang Liu
2025-10-16
2510.14253v2
Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks
Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu
2025-10-16
red teaming
2510.14207v2
Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks
Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu
2025-10-16
red teaming
2510.14207v1
PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
Wei Zou, Yupei Liu, Yanting Wang, Ying Chen, Neil Gong, Jinyuan Jia
2025-10-15
red teaming
2510.14005v2
Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions
Siying Liu, Shisheng Zhang, Indu Bala
2025-10-15
safety
2510.13931v1
In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers
Avihay Cohen
2025-10-15
red teaming
2510.13543v1
Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems
Karthik Avinash, Nikhil Pareek, Rishav Hada
2025-10-15
2510.13351v1
Prompt-based Adaptation in Large-scale Vision Models: A Survey
Xi Xiao, Yunbei Zhang, Lin Zhao, Yiyang Liu, Xiaoying Liao, Zheda Mai, Xingjian Li, Xiao Wang, Hao Xu, Jihun Hamm, Xue Lin, Min Xu, Qifan Wang, Tianyang Wang, Cheng Han
2025-10-15
2510.13219v1
SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
Juan Ren, Mark Dras, Usman Naseem
2025-10-15
red teaming
2510.13190v1
A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation
João A. Leite, Arnav Arora, Silvia Gargova, João Luz, Gustavo Sampaio, Ian Roberts, Carolina Scarton, Kalina Bontcheva
2025-10-14
red teaming
2510.12993v1
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents
Simon Sinong Zhan, Yao Liu, Philip Wang, Zinan Wang, Qineng Wang, Zhian Ruan, Xiangyu Shi, Xinyu Cao, Frank Yang, Kangrui Wang, Huajie Shao, Manling Li, Qi Zhu
2025-10-14
safety
2510.12985v1
CADE 2.5 - ZeResFDG: Frequency-Decoupled, Rescaled and Zero-Projected Guidance for SD/SDXL Latent Diffusion Models
Denis Rychkovskiy
2025-10-14
2510.12954v2
RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs
Tuan T. Nguyen, John Le, Thai T. Vu, Willy Susilo, Heath Cooper
2025-10-14
red teaming
2510.13901v1
UniFusion: Vision-Language Model as Unified Encoder in Image Generation
Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale
2025-10-14
2510.12789v1
Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers
Ruben Belo, Marta Guimaraes, Claudia Soares
2025-10-14
2510.12672v2
Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers
Ruben Belo, Claudia Soares, Marta Guimaraes
2025-10-14
2510.12672v1
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani, Daniele Nardi
2025-10-14
red teaming
2510.13893v1
PromptLocate: Localizing Prompt Injection Attacks
Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong
2025-10-14
red teaming
2510.12252v2
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
Dongsen Zhang, Zekun Li, Xu Luo, Xuannan Liu, Peipei Li, Wenjun Xu
2025-10-14
red teaming
2510.15994v1
SafeMT: Multi-turn Safety for Multimodal Language Models
Han Zhu, Juntao Dai, Jiaming Ji, Haoran Li, Chengkun Cai, Pengcheng Wen, Chi-Min Chan, Boyuan Chen, Yaodong Yang, Sirui Han, Yike Guo
2025-10-14
red teaming
2510.12133v1
Deep Research Brings Deeper Harm
Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu
2025-10-13
red teaming
2510.11851v2
‹
1
2
3
...
26
27
28
...
47
48
49
›