Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1172 papers total

November 03 - November 09, 2025

24 papers

EASE: Practical and Efficient Safety Alignment for Small Language Models

Haonan Shi, Guoli Wang, Tu Ouyang, An Wang
2025-11-09
red teaming
2511.06512v1

KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs

Shuyuan Liu, Jiawei Chen, Xiao Yang, Hang Su, Zhaoxia Yin
2025-11-09
red teaming
2511.07480v1

Efficient LLM Safety Evaluation through Multi-Agent Debate

Dachuan Lin, Guobin Shen, Zihao Yang, Tianrong Liu, Dongcheng Zhao, Yi Zeng
2025-11-09
red teaming safety
2511.06396v1

RelightMaster: Precise Video Relighting with Multi-plane Light Images

Weikang Bian, Xiaoyu Shi, Zhaoyang Huang, Jianhong Bai, Qinghe Wang, Xintao Wang, Pengfei Wan, Kun Gai, Hongsheng Li
2025-11-09
2511.06271v1

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

Seif Ikbarieh, Kshitiz Aryal, Maanak Gupta
2025-11-09
red teaming
2511.06212v1

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci
2025-11-08
red teaming
2511.05919v2

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci
2025-11-08
red teaming
2511.05919v1

Can LLM Infer Risk Information From MCP Server System Logs?

Jiayi Fu, Yuansen Zhang, Yinggui Wang
2025-11-08
2511.05867v3

MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs?

Jiayi Fu, Qiyao Sun
2025-11-08
2511.05867v2

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, Giovanni Vigna
2025-11-08
red teaming
2511.05797v1

Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models

Teqi Hao, Xioayu Tan, Shaojie Shi, Yinghui Xu, Xihe Qiu
2025-11-07
2511.05286v1

Large Language Models for Cyber Security

Raunak Somani, Aswani Kumar Cherukuri
2025-11-06
2511.04508v1

AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research

Tim Beyer, Jonas Dornbusch, Jakob Steimle, Moritz Ladenburger, Leo Schwinn, Stephan Günnemann
2025-11-06
red teaming
2511.04316v1

Secure Code Generation at Scale with Reflexion

Arup Datta, Ahmed Aljohani, Hyunsook Do
2025-11-05
2511.03898v1

Whisper Leak: a side-channel attack on Large Language Models

Geoff McDonald, Jonathan Bar Or
2025-11-05
2511.03675v1

Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond

Botao 'Amber' Hu, Helena Rong
2025-11-05
red teaming
2511.03434v1

Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs

Yize Liu, Yunyun Hou, Aina Sui
2025-11-05
red teaming
2511.03271v1

Death by a Thousand Prompts: Open Model Vulnerability Analysis

Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda
2025-11-05
red teaming
2511.03247v1

Jailbreaking in the Haystack

Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena, Ziqian Zhong, Alexander Robey, Aditi Raghunathan
2025-11-05
red teaming
2511.04707v1

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Aashray Reddy, Andrew Zagula, Nicholas Saban, Kevin Zhu
2025-11-04
red teaming
2511.02376v2

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Aashray Reddy, Andrew Zagula, Nicholas Saban
2025-11-04
red teaming
2511.02376v1

LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context

Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
2025-11-04
safety
2511.02366v1

An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks

Xu Liu, Yan Chen, Kan Ling, Yichi Zhu, Hengrun Zhang, Guisheng Fan, Huiqun Yu
2025-11-04
red teaming
2511.02356v1

LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models

Ahmad Tahmasivand, Noureldin Zahran, Saba Al-Sayouri, Mohammed Fouda, Khaled N. Khasawneh
2025-11-03
2511.02866v1