Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1169 papers total

December 08 - December 14, 2025

20 papers

ceLLMate: Sandboxing Browser AI Agents

Luoxi Meng, Henry Feng, Ilia Shumailov, Earlence Fernandes
2025-12-14
2512.12594v1

Detecting Prompt Injection Attacks Against Application Using Classifiers

Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid, Md. Abrar Faiaz Khan, Md. Omar Faruk, Yaseen Nur
2025-12-14
red teaming
2512.12583v1

Challenges of Evaluating LLM Safety for User Welfare

Manon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran, Theo Farrell, Ingmar Weber
2025-12-11
safety
2512.10687v1

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10449v3

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10449v1

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation

Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10415v1

Phishing Email Detection Using Large Language Models

Najmul Hasan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
2025-12-10
red teaming
2512.10104v2

LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks

Najmul Hassan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
2025-12-10
red teaming
2512.10104v1

CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance

Jinru Ding, Chao Ding, Wenrao Pang, Boyi Xiao, Zhiqiang Liu, Pengcheng Chen, Jiayuan Chen, Tiantian Yuan, Junming Guan, Yidong Jiang, Dawei Cheng, Jie Xu
2025-12-10
red teaming
2512.09506v1

Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs

Sohely Jahan, Ruimin Sun
2025-12-10
safety
2512.09403v1

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Reachal Wang, Yuqi Jia, Neil Zhenqiang Gong
2025-12-10
red teaming
2512.09321v3

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Ruiqi Wang, Yuqi Jia, Neil Zhenqiang Gong
2025-12-10
red teaming
2512.09321v1

Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy

Botao 'Amber' Hu, Bangdao Chen
2025-12-09
2512.08737v1

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
2025-12-09
red teaming
2512.08417v2

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
2025-12-09
red teaming
2512.08417v1

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
2025-12-09
red teaming
2512.08290v2

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
2025-12-09
red teaming
2512.08290v1

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Jinghao Wang, Ping Zhang, Carter Yagemann
2025-12-09
red teaming
2512.08185v1

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He
2025-12-08
red teaming
2512.07761v1

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models

Fenghua Weng, Chaochao Lu, Xia Hu, Wenqi Shao, Wenjie Wang
2025-12-08
2512.07141v1

December 01 - December 07, 2025

4 papers