Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 947 papers total

December 08 - December 14, 2025

16 papers

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10449v1

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation

Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra, Yash Sinha, Murari Mandal, Dhruv Kumar
2025-12-11
red teaming
2512.10415v1

Phishing Email Detection Using Large Language Models

Najmul Hasan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
2025-12-10
red teaming
2512.10104v2

LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks

Najmul Hassan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
2025-12-10
red teaming
2512.10104v1

CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance

Jinru Ding, Chao Ding, Wenrao Pang, Boyi Xiao, Zhiqiang Liu, Pengcheng Chen, Jiayuan Chen, Tiantian Yuan, Junming Guan, Yidong Jiang, Dawei Cheng, Jie Xu
2025-12-10
red teaming
2512.09506v1

Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs

Sohely Jahan, Ruimin Sun
2025-12-10
safety
2512.09403v1

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Reachal Wang, Yuqi Jia, Neil Zhenqiang Gong
2025-12-10
red teaming
2512.09321v3

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Ruiqi Wang, Yuqi Jia, Neil Zhenqiang Gong
2025-12-10
red teaming
2512.09321v1

Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy

Botao 'Amber' Hu, Bangdao Chen
2025-12-09
2512.08737v1

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
2025-12-09
red teaming
2512.08417v2

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
2025-12-09
red teaming
2512.08417v1

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
2025-12-09
red teaming
2512.08290v2

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
2025-12-09
red teaming
2512.08290v1

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Jinghao Wang, Ping Zhang, Carter Yagemann
2025-12-09
red teaming
2512.08185v1

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He
2025-12-08
red teaming
2512.07761v1

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models

Fenghua Weng, Chaochao Lu, Xia Hu, Wenqi Shao, Wenjie Wang
2025-12-08
2512.07141v1

December 01 - December 07, 2025

8 papers

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang
2025-12-07
red teaming
2512.06716v1

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan
2025-12-07
red teaming
2512.06674v1

GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering

Jehyeok Yeon, Federico Cinus, Yifan Wu, Luca Luceri
2025-12-07
red teaming safety
2512.06655v1

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ranjie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, Yiming Li, Wenqi Ren, Xiaochun Cao, Yang Liu
2025-12-06
red teaming
2512.06589v1

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh, Naser Ezzati-Jivan
2025-12-06
red teaming
2512.06556v1

Metaphor-based Jailbreaking Attacks on Text-to-Image Models

Chenyu Zhang, Yiwen Ma, Lanjun Wang, Wenhui Li, Yi Tu, An-An Liu
2025-12-06
2512.10766v1

Beyond Model Jailbreak: Systematic Dissection of the "Ten DeadlySins" in Embodied Intelligence

Yuhang Huang, Junchao Li, Boyang Ma, Xuelong Dai, Minghui Xu, Kaidi Xu, Yue Zhang, Jianping Wang, Xiuzhen Cheng
2025-12-06
2512.06387v1

Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety

Junyu Mao, Anthony Hills, Talia Tseriotou, Maria Liakata, Aya Shamir, Dan Sayda, Dana Atzil-Slonim, Natalie Djohari, Arpan Mandal, Silke Roth, Pamela Ugwudike, Mahesan Niranjan, Stuart E. Middleton
2025-12-06
safety
2512.06227v1