Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1172 papers total

October 13 - October 19, 2025

9 papers

October 06 - October 12, 2025

15 papers

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
2025-10-11
red teaming
2510.10281v1

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan
2025-10-11
red teaming
2510.10271v1

MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State Segmentation

Ching Chang, Ming-Chih Lo, Chiao-Tung Chan, Wen-Chih Peng, Tien-Fu Chen
2025-10-11
2510.09930v1

Learning Bug Context for PyTorch-to-JAX Translation with LLMs

Hung Phan, Son Le Vu, Ali Jannesari
2025-10-10
2510.09898v1

Text Prompt Injection of Vision Language Models

Ruizhe Zhu
2025-10-10
red teaming
2510.09849v1

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

Raoyuan Zhao, Yihong Liu, Hinrich Schütze, Michael A. Hedderich
2025-10-10
2510.09555v1

Multimodal Policy Internalization for Conversational Agents

Zhenhailong Wang, Jiateng Liu, Amin Fazel, Ritesh Sarkhel, Xing Fan, Xiang Li, Chenlei Guo, Heng Ji, Ruhi Sarikaya
2025-10-10
2510.09474v1

Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Ines Altemir Marinas, Anastasiia Kucherenko, Alexander Sternfeld, Andrei Kucharavy
2025-10-10
2510.09471v1

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou, Caglar Gulcehre, Maksym Andriushchenko, Ameya Prabhu, Jonas Geiping
2025-10-10
red teaming
2510.09462v1

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Wuyang Li, Wentao Pan, Po-Chien Luan, Yang Gao, Alexandre Alahi
2025-10-10
2510.09212v1

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dennis Rall, Bernhard Bauer, Mohit Mittal, Thomas Fraunholz
2025-10-10
red teaming
2510.09093v1

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, Florian Tramèr
2025-10-10
red teaming safety
2510.09023v1

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Nan Lu, Yurong Hu, Jiaquan Fang, Yan Liu, Rui Dong, Yiming Wang, Rui Lin, Shaoyi Xu
2025-10-10
2510.08948v2

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Nan Lu, Yurong Hu, Jiaquan Fang, Yan Liu, Rui Dong, Yiming Wang, Rui Lin, Shaoyi Xu
2025-10-10
governance
2510.08948v1

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Brandon Lit, Edward Crowder, Daniel Vogel, Hassan Khan
2025-10-10
2510.08917v1