← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 770 papers total
November 10 - November 16, 2025
8 papers
Why does weak-OOD help? A Further Step Towards Understanding Jailbreaking VLMs
Yuxuan Zhou, Yuzhao Peng, Yang Bai, Kuofeng Gao, Yihao Zhang, Yechao Zhang, Xun Chen, Tao Yu, Tao Dai, Shu-Tao Xia
2025-11-11
red teaming
2511.08367v1
Alignment-Aware Quantization for LLM Safety
Sunghyun Wee, Suyoung Kim, Hyeonjin Kim, Kyomin Hwang, Nojun Kwak
2025-11-11
safety
2511.07842v1
JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework
Yuxuan Zhou, Yang Bai, Kuofeng Gao, Tao Dai, Shu-Tao Xia
2025-11-10
red teaming
2511.07315v1
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers
Yilin Jiang, Mingzi Zhang, Xuanyu Yin, Sheng Jin, Suyu Lu, Zuocan Ying, Zengyi Yu, Xiangjie Kong
2025-11-10
safety
2511.06890v1
Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment
Peng Zhang, Peijie Sun
2025-11-10
safety
2511.06852v2
Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment
Peng Zhang, peijie sun
2025-11-10
red teaming
safety
2511.06852v1
SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces
Ruiheng Liu, XiaoBing Chen, Jinyu Zhang, Qiongwen Zhang, Yu Zhang, Bailong Yang
2025-11-10
safety
2511.06778v2
SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces
Ruiheng Liu, XiaoBing Chen, Jinyu Zhang, Qiongwen Zhang, Yu Zhang, Bailong Yang
2025-11-10
safety
2511.06778v1
November 03 - November 09, 2025
16 papers
EASE: Practical and Efficient Safety Alignment for Small Language Models
Haonan Shi, Guoli Wang, Tu Ouyang, An Wang
2025-11-09
red teaming
2511.06512v1
KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs
Shuyuan Liu, Jiawei Chen, Xiao Yang, Hang Su, Zhaoxia Yin
2025-11-09
red teaming
2511.07480v1
Efficient LLM Safety Evaluation through Multi-Agent Debate
Dachuan Lin, Guobin Shen, Zihao Yang, Tianrong Liu, Dongcheng Zhao, Yi Zeng
2025-11-09
red teaming
safety
2511.06396v1
RelightMaster: Precise Video Relighting with Multi-plane Light Images
Weikang Bian, Xiaoyu Shi, Zhaoyang Huang, Jianhong Bai, Qinghe Wang, Xintao Wang, Pengfei Wan, Kun Gai, Hongsheng Li
2025-11-09
2511.06271v1
RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
Seif Ikbarieh, Kshitiz Aryal, Maanak Gupta
2025-11-09
red teaming
2511.06212v1
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci
2025-11-08
red teaming
2511.05919v2
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci
2025-11-08
red teaming
2511.05919v1
MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs?
Jiayi Fu, Qiyao Sun
2025-11-08
2511.05867v2
When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins
Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, Giovanni Vigna
2025-11-08
red teaming
2511.05797v1
Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models
Teqi Hao, Xioayu Tan, Shaojie Shi, Yinghui Xu, Xihe Qiu
2025-11-07
2511.05286v1
Large Language Models for Cyber Security
Raunak Somani, Aswani Kumar Cherukuri
2025-11-06
2511.04508v1
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Tim Beyer, Jonas Dornbusch, Jakob Steimle, Moritz Ladenburger, Leo Schwinn, Stephan Günnemann
2025-11-06
red teaming
2511.04316v1
Secure Code Generation at Scale with Reflexion
Arup Datta, Ahmed Aljohani, Hyunsook Do
2025-11-05
2511.03898v1
Whisper Leak: a side-channel attack on Large Language Models
Geoff McDonald, Jonathan Bar Or
2025-11-05
2511.03675v1
Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond
Botao 'Amber' Hu, Helena Rong
2025-11-05
red teaming
2511.03434v1
Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs
Yize Liu, Yunyun Hou, Aina Sui
2025-11-05
red teaming
2511.03271v1
‹
1
2
3
5
6
7
...
31
32
33
›