Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

October 13 - October 19, 2025

24 papers

Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks against LLMs

Masahiro Kaneko, Timothy Baldwin
2025-10-19
red teaming
2510.17000v1

BreakFun: Jailbreaking LLMs via Schema Exploitation

Amirkia Rafiei Oskooei, Mehmet S. Aktas
2025-10-19
red teaming
2510.17904v1

Black-box Optimization of LLM Outputs by Asking for Directions

Jie Zhang, Meng Ding, Yang Liu, Jue Hong, Florian Tramèr
2025-10-19
red teaming
2510.16794v1

Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety

Vamshi Krishna Bonagiri, Ponnurangam Kumaragurum, Khanh Nguyen, Benjamin Plaut
2025-10-18
safety
2510.16492v1

VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion

Jaekyun Park, Hye Won Chung
2025-10-18
2510.16446v1

ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents

David Peer, Sebastian Stabinger
2025-10-18
2510.16381v1

TokenAR: Multiple Subject Generation via Autoregressive Token-level enhancement

Haiyue Sun, Qingdong He, Jinlong Peng, Peng Tang, Jiangning Zhang, Junwei Zhu, Xiaobin Hu, Shuicheng Yan
2025-10-18
2510.16332v1

Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense

Zhehao Zhang, Weijie Xu, Shixian Cui, Chandan K. Reddy
2025-10-17
red teaming
2510.16259v1

Prompt injections as a tool for preserving identity in GAI image descriptions

Kate Glazko, Jennifer Mankoff
2025-10-17
2510.16128v1

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong
2025-10-17
red teaming
2510.15476v2

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong
2025-10-17
red teaming
2510.15476v1

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
2025-10-17
red teaming
2510.15430v2

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
2025-10-17
red teaming
2510.15430v1

Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling

Deyue Zhang, Dongdong Yang, Junjie Mu, Quancheng Zou, Zonghao Ying, Wenzhuo Xu, Zhao Liu, Xuan Wang, Xiangzheng Zhang
2025-10-16
red teaming
2510.15068v1

Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks

ChenYu Wu, Yi Wang, Yang Liao
2025-10-16
red teaming
2510.15017v1

Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding

Ning Ding, Keisuke Fujii, Toru Tamaki
2025-10-16
2510.14617v1

Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs

Kyubyung Chae, Gihoon Kim, Gyuseong Lee, Taesup Kim, Jaejin Lee, Heejin Kim
2025-10-16
safety
2510.14565v1

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Andrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes
2025-10-16
red teaming
2510.14381v1

Towards Agentic Self-Learning LLMs in Search Environment

Wangtao Sun, Xiang Cheng, Jialin Fan, Yao Xu, Xing Yu, Shizhu He, Jun Zhao, Kang Liu
2025-10-16
2510.14253v2

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu
2025-10-16
red teaming
2510.14207v2

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu
2025-10-16
red teaming
2510.14207v1

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

Wei Zou, Yupei Liu, Yanting Wang, Ying Chen, Neil Gong, Jinyuan Jia
2025-10-15
red teaming
2510.14005v2

Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions

Siying Liu, Shisheng Zhang, Indu Bala
2025-10-15
safety
2510.13931v1

In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers

Avihay Cohen
2025-10-15
red teaming
2510.13543v1