← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 773 papers total
October 13 - October 19, 2025
16 papers
Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers
Ruben Belo, Marta Guimaraes, Claudia Soares
2025-10-14
2510.12672v2
Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers
Ruben Belo, Claudia Soares, Marta Guimaraes
2025-10-14
2510.12672v1
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani, Daniele Nardi
2025-10-14
red teaming
2510.13893v1
PromptLocate: Localizing Prompt Injection Attacks
Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong
2025-10-14
red teaming
2510.12252v2
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
Dongsen Zhang, Zekun Li, Xu Luo, Xuannan Liu, Peipei Li, Wenjun Xu
2025-10-14
red teaming
2510.15994v1
SafeMT: Multi-turn Safety for Multimodal Language Models
Han Zhu, Juntao Dai, Jiaming Ji, Haoran Li, Chengkun Cai, Pengcheng Wen, Chi-Min Chan, Boyuan Chen, Yaodong Yang, Sirui Han, Yike Guo
2025-10-14
red teaming
2510.12133v1
Deep Research Brings Deeper Harm
Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu
2025-10-13
red teaming
2510.11851v2
Deep Research Brings Deeper Harm
Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu
2025-10-13
red teaming
2510.11851v1
Countermind: A Multi-Layered Security Architecture for Large Language Models
Dominik Schwarz
2025-10-13
2510.11837v1
Don't Walk the Line: Boundary Guidance for Filtered Generation
Sarah Ball, Andreas Haupt
2025-10-13
2510.11834v1
Bag of Tricks for Subverting Reasoning-based Safety Guardrails
Shuo Chen, Zhen Han, Haokun Chen, Bailan He, Shengyun Si, Jingpei Wu, Philip Torr, Volker Tresp, Jindong Gu
2025-10-13
red teaming
2510.11570v1
Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation
Jiaying Wu, Zihang Fu, Haonan Wang, Fanxiao Li, Min-Yen Kan
2025-10-13
governance
2510.11423v1
Attacks by Content: Automated Fact-checking is an AI Security Issue
Michael Schlichtkrull
2025-10-13
2510.11238v1
TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code
Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic
2025-10-13
2510.11151v1
Demystifying Numerosity in Diffusion Models -- Limitations and Remedies
Yaqi Zhao, Xiaochen Wang, Li Dong, Wentao Zhang, Yuhui Yuan
2025-10-13
2510.11117v1
SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model
Honghui Yuan, Keiji Yanai
2025-10-13
2510.10910v1
October 06 - October 12, 2025
8 papers
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
2025-10-11
red teaming
2510.10281v1
MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation
Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan
2025-10-11
red teaming
2510.10271v1
MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State Segmentation
Ching Chang, Ming-Chih Lo, Chiao-Tung Chan, Wen-Chih Peng, Tien-Fu Chen
2025-10-11
2510.09930v1
Learning Bug Context for PyTorch-to-JAX Translation with LLMs
Hung Phan, Son Le Vu, Ali Jannesari
2025-10-10
2510.09898v1
Text Prompt Injection of Vision Language Models
Ruizhe Zhu
2025-10-10
red teaming
2510.09849v1
A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages
Raoyuan Zhao, Yihong Liu, Hinrich Schütze, Michael A. Hedderich
2025-10-10
2510.09555v1
Multimodal Policy Internalization for Conversational Agents
Zhenhailong Wang, Jiateng Liu, Amin Fazel, Ritesh Sarkhel, Xing Fan, Xiang Li, Chenlei Guo, Heng Ji, Ruhi Sarikaya
2025-10-10
2510.09474v1
Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World
Ines Altemir Marinas, Anastasiia Kucherenko, Alexander Sternfeld, Andrei Kucharavy
2025-10-10
2510.09471v1
‹
1
2
3
...
11
12
13
...
31
32
33
›