← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 1169 papers total
December 08 - December 14, 2025
20 papers
ceLLMate: Sandboxing Browser AI Agents
Luoxi Meng, Henry Feng, Ilia Shumailov, Earlence Fernandes
2025-12-14
2512.12594v1
Detecting Prompt Injection Attacks Against Application Using Classifiers
Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid, Md. Abrar Faiaz Khan, Md. Omar Faruk, Yaseen Nur
2025-12-14
red teaming
2512.12583v1
Challenges of Evaluating LLM Safety for User Welfare
Manon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran, Theo Farrell, Ingmar Weber
2025-12-11
safety
2512.10687v1
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10449v3
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10449v1
How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation
Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10415v1
Phishing Email Detection Using Large Language Models
Najmul Hasan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
2025-12-10
red teaming
2512.10104v2
LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks
Najmul Hassan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
2025-12-10
red teaming
2512.10104v1
CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance
Jinru Ding, Chao Ding, Wenrao Pang, Boyi Xiao, Zhiqiang Liu, Pengcheng Chen, Jiayuan Chen, Tiantian Yuan, Junming Guan, Yidong Jiang, Dawei Cheng, Jie Xu
2025-12-10
red teaming
2512.09506v1
Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs
Sohely Jahan, Ruimin Sun
2025-12-10
safety
2512.09403v1
ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
Reachal Wang, Yuqi Jia, Neil Zhenqiang Gong
2025-12-10
red teaming
2512.09321v3
ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
Ruiqi Wang, Yuqi Jia, Neil Zhenqiang Gong
2025-12-10
red teaming
2512.09321v1
Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy
Botao 'Amber' Hu, Bangdao Chen
2025-12-09
2512.08737v1
Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs
Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
2025-12-09
red teaming
2512.08417v2
Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs
Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
2025-12-09
red teaming
2512.08417v1
Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem
Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
2025-12-09
red teaming
2512.08290v2
Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem
Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
2025-12-09
red teaming
2512.08290v1
A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties
Jinghao Wang, Ping Zhang, Carter Yagemann
2025-12-09
red teaming
2512.08185v1
RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models
Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He
2025-12-08
red teaming
2512.07761v1
Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models
Fenghua Weng, Chaochao Lu, Xia Hu, Wenqi Shao, Wenjie Wang
2025-12-08
2512.07141v1
December 01 - December 07, 2025
4 papers
SoK: Trust-Authorization Mismatch in LLM Agent Interactions
Guanquan Shi, Haohua Du, Zhiqiang Wang, Xiaoyu Liang, Weiwenpei Liu, Song Bian, Zhenyu Guan
2025-12-07
red teaming
2512.06914v2
Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents
Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang
2025-12-07
red teaming
2512.06716v2
Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents
Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang
2025-12-07
red teaming
2512.06716v1
RunawayEvil: Jailbreaking the Image-to-Video Generative Models
Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan
2025-12-07
red teaming
2512.06674v1
‹
1
2
3
...
14
15
16
...
47
48
49
›