Paper Library

Phishing Email Detection Using Large Language Models

Najmul Hasan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang

2025-12-10

red teaming

2512.10104v2

LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks

Najmul Hassan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang

2025-12-10

red teaming

2512.10104v1

CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance

Jinru Ding, Chao Ding, Wenrao Pang, Boyi Xiao, Zhiqiang Liu, Pengcheng Chen, Jiayuan Chen, Tiantian Yuan, Junming Guan, Yidong Jiang, Dawei Cheng, Jie Xu

2025-12-10

red teaming

2512.09506v1

Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs

Sohely Jahan, Ruimin Sun

2025-12-10

safety

2512.09403v1

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Reachal Wang, Yuqi Jia, Neil Zhenqiang Gong

2025-12-10

red teaming

2512.09321v3

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Ruiqi Wang, Yuqi Jia, Neil Zhenqiang Gong

2025-12-10

red teaming

2512.09321v1

Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy

Botao 'Amber' Hu, Bangdao Chen

2025-12-09

2512.08737v1

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu

2025-12-09

red teaming

2512.08417v2

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu

2025-12-09

red teaming

2512.08417v1

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav

2025-12-09

red teaming

2512.08290v2

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav

2025-12-09

red teaming

2512.08290v1

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Jinghao Wang, Ping Zhang, Carter Yagemann

2025-12-09

red teaming

2512.08185v1

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He

2025-12-08

red teaming

2512.07761v1

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models

Fenghua Weng, Chaochao Lu, Xia Hu, Wenqi Shao, Wenjie Wang

2025-12-08

2512.07141v1

December 08 - December 14, 2025

Phishing Email Detection Using Large Language Models

LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks

CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance

Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models

December 01 - December 07, 2025

SoK: Trust-Authorization Mismatch in LLM Agent Interactions

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Metaphor-based Jailbreaking Attacks on Text-to-Image Models

Beyond Model Jailbreak: Systematic Dissection of the "Ten DeadlySins" in Embodied Intelligence

Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety