Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

November 03 - November 09, 2025

20 papers

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, Giovanni Vigna
2025-11-08
red teaming
2511.05797v1

Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models

Teqi Hao, Xioayu Tan, Shaojie Shi, Yinghui Xu, Xihe Qiu
2025-11-07
2511.05286v1

Large Language Models for Cyber Security

Raunak Somani, Aswani Kumar Cherukuri
2025-11-06
2511.04508v1

AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research

Tim Beyer, Jonas Dornbusch, Jakob Steimle, Moritz Ladenburger, Leo Schwinn, Stephan Günnemann
2025-11-06
red teaming
2511.04316v1

Secure Code Generation at Scale with Reflexion

Arup Datta, Ahmed Aljohani, Hyunsook Do
2025-11-05
2511.03898v1

Whisper Leak: a side-channel attack on Large Language Models

Geoff McDonald, Jonathan Bar Or
2025-11-05
2511.03675v1

Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond

Botao 'Amber' Hu, Helena Rong
2025-11-05
red teaming
2511.03434v1

Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs

Yize Liu, Yunyun Hou, Aina Sui
2025-11-05
red teaming
2511.03271v1

Death by a Thousand Prompts: Open Model Vulnerability Analysis

Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda
2025-11-05
red teaming
2511.03247v1

Jailbreaking in the Haystack

Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena, Ziqian Zhong, Alexander Robey, Aditi Raghunathan
2025-11-05
red teaming
2511.04707v1

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Aashray Reddy, Andrew Zagula, Nicholas Saban, Kevin Zhu
2025-11-04
red teaming
2511.02376v2

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Aashray Reddy, Andrew Zagula, Nicholas Saban
2025-11-04
red teaming
2511.02376v1

LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context

Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
2025-11-04
safety
2511.02366v1

An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks

Xu Liu, Yan Chen, Kan Ling, Yichi Zhu, Hengrun Zhang, Guisheng Fan, Huiqun Yu
2025-11-04
red teaming
2511.02356v1

LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models

Ahmad Tahmasivand, Noureldin Zahran, Saba Al-Sayouri, Mohammed Fouda, Khaled N. Khasawneh
2025-11-03
2511.02866v1

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

Daniyal Ganiuly, Assel Smaiyl
2025-11-03
red teaming
2511.01634v2

Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

Xiaoyu Zhan, Wenxuan Huang, Hao Sun, Xinyu Fu, Changfeng Ma, Shaosheng Cao, Bohan Jia, Shaohui Lin, Zhenfei Yin, Lei Bai, Wanli Ouyang, Yuanqi Li, Jie Guo, Yanwen Guo
2025-11-03
2511.01618v1

Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges

Hamin Koo, Minseon Kim, Jaehyung Kim
2025-11-03
red teaming
2511.01375v1

MIQ-SAM3D: From Single-Point Prompt to Multi-Instance Segmentation via Competitive Query Refinement

Jierui Qu, Jianchun Zhao
2025-11-03
2511.01345v1

"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers

Qin Zhou, Zhexin Zhang, Zhi Li, Limin Sun
2025-11-03
red teaming
2511.01287v1

October 27 - November 02, 2025

4 papers