Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

October 06 - October 12, 2025

24 papers

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
2025-10-11
red teaming
2510.10281v1

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan
2025-10-11
red teaming
2510.10271v1

MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State Segmentation

Ching Chang, Ming-Chih Lo, Chiao-Tung Chan, Wen-Chih Peng, Tien-Fu Chen
2025-10-11
2510.09930v1

Learning Bug Context for PyTorch-to-JAX Translation with LLMs

Hung Phan, Son Le Vu, Ali Jannesari
2025-10-10
2510.09898v1

Text Prompt Injection of Vision Language Models

Ruizhe Zhu
2025-10-10
red teaming
2510.09849v1

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

Raoyuan Zhao, Yihong Liu, Hinrich Schütze, Michael A. Hedderich
2025-10-10
2510.09555v1

Multimodal Policy Internalization for Conversational Agents

Zhenhailong Wang, Jiateng Liu, Amin Fazel, Ritesh Sarkhel, Xing Fan, Xiang Li, Chenlei Guo, Heng Ji, Ruhi Sarikaya
2025-10-10
2510.09474v1

Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Ines Altemir Marinas, Anastasiia Kucherenko, Alexander Sternfeld, Andrei Kucharavy
2025-10-10
2510.09471v1

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou, Caglar Gulcehre, Maksym Andriushchenko, Ameya Prabhu, Jonas Geiping
2025-10-10
red teaming
2510.09462v1

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Wuyang Li, Wentao Pan, Po-Chien Luan, Yang Gao, Alexandre Alahi
2025-10-10
2510.09212v1

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dennis Rall, Bernhard Bauer, Mohit Mittal, Thomas Fraunholz
2025-10-10
red teaming
2510.09093v1

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, Florian Tramèr
2025-10-10
red teaming safety
2510.09023v1

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Nan Lu, Yurong Hu, Jiaquan Fang, Yan Liu, Rui Dong, Yiming Wang, Rui Lin, Shaoyi Xu
2025-10-10
2510.08948v2

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Nan Lu, Yurong Hu, Jiaquan Fang, Yan Liu, Rui Dong, Yiming Wang, Rui Lin, Shaoyi Xu
2025-10-10
governance
2510.08948v1

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Brandon Lit, Edward Crowder, Daniel Vogel, Hassan Khan
2025-10-10
2510.08917v1

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

Ragib Amin Nihal, Rui Wen, Kazuhiro Nakadai, Jun Sakuma
2025-10-09
red teaming
2510.08859v1

CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization

Debeshee Das, Luca Beurer-Kellner, Marc Fischer, Maximilian Baader
2025-10-09
2510.08829v1

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

Aofan Liu, Lulu Tang
2025-10-09
red teaming
2510.09699v1

An Adaptive Multi Agent Bitcoin Trading System

Aadi Singhi
2025-10-09
2510.08068v2

An Adaptive Multi Agent Bitcoin Trading System

Aadi Singhi
2025-10-09
2510.08068v1

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev
2025-10-09
red teaming
2510.07985v2

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev
2025-10-09
red teaming
2510.07985v1

From Defender to Devil? Unintended Risk Interactions Induced by LLM Defenses

Xiangtao Meng, Tianshuo Cong, Li Wang, Wenyu Chen, Zheng Li, Shanqing Guo, Xiaoyun Wang
2025-10-09
safety
2510.07968v1

MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation

Weisen Jiang, Sinno Jialin Pan
2025-10-09
2510.07835v1