Paper Library

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh

2025-10-11

red teaming

2510.10281v1

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan

2025-10-11

red teaming

2510.10271v1

MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State Segmentation

Ching Chang, Ming-Chih Lo, Chiao-Tung Chan, Wen-Chih Peng, Tien-Fu Chen

2025-10-11

2510.09930v1

Learning Bug Context for PyTorch-to-JAX Translation with LLMs

Hung Phan, Son Le Vu, Ali Jannesari

2025-10-10

2510.09898v1

Text Prompt Injection of Vision Language Models

Ruizhe Zhu

2025-10-10

red teaming

2510.09849v1

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

Raoyuan Zhao, Yihong Liu, Hinrich Schütze, Michael A. Hedderich

2025-10-10

2510.09555v1

Multimodal Policy Internalization for Conversational Agents

Zhenhailong Wang, Jiateng Liu, Amin Fazel, Ritesh Sarkhel, Xing Fan, Xiang Li, Chenlei Guo, Heng Ji, Ruhi Sarikaya

2025-10-10

2510.09474v1

Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Ines Altemir Marinas, Anastasiia Kucherenko, Alexander Sternfeld, Andrei Kucharavy

2025-10-10

2510.09471v1

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou, Caglar Gulcehre, Maksym Andriushchenko, Ameya Prabhu, Jonas Geiping

2025-10-10

red teaming

2510.09462v1

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Wuyang Li, Wentao Pan, Po-Chien Luan, Yang Gao, Alexandre Alahi

2025-10-10

2510.09212v1

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dennis Rall, Bernhard Bauer, Mohit Mittal, Thomas Fraunholz

2025-10-10

red teaming

2510.09093v1

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, Florian Tramèr

2025-10-10

red teaming safety

2510.09023v1

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Nan Lu, Yurong Hu, Jiaquan Fang, Yan Liu, Rui Dong, Yiming Wang, Rui Lin, Shaoyi Xu

2025-10-10

2510.08948v2

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Nan Lu, Yurong Hu, Jiaquan Fang, Yan Liu, Rui Dong, Yiming Wang, Rui Lin, Shaoyi Xu

2025-10-10

governance

2510.08948v1

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Brandon Lit, Edward Crowder, Daniel Vogel, Hassan Khan

2025-10-10

2510.08917v1

October 13 - October 19, 2025

Deep Research Brings Deeper Harm

Countermind: A Multi-Layered Security Architecture for Large Language Models

Don't Walk the Line: Boundary Guidance for Filtered Generation

Bag of Tricks for Subverting Reasoning-based Safety Guardrails

Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation

Attacks by Content: Automated Fact-checking is an AI Security Issue

TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code

Demystifying Numerosity in Diffusion Models -- Limitations and Remedies

SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model

October 06 - October 12, 2025

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State Segmentation

Learning Bug Context for PyTorch-to-JAX Translation with LLMs

Text Prompt Injection of Vision Language Models

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

Multimodal Policy Internalization for Conversational Agents

Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Exploiting Web Search Tools of AI Agents for Data Exfiltration

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy