Paper Library

Reasoning Up the Instruction Ladder for Controllable Language Models

Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar

2025-10-30

red teaming

2511.04694v2

CATCH: A Modular Cross-domain Adaptive Template with Hook

Xinjin Li, Yulie Lu, Jinghan Cao, Yu Ma, Zhenglin Li, Yeyang Zhou

2025-10-30

2510.26582v1

Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token

Shaked Zychlinski, Yuval Kainan

2025-10-30

red teaming

2510.26847v1

Chain-of-Thought Hijacking

Jianli Zhao, Tingchen Fu, Rylan Schaeffer, Mrinank Sharma, Fazl Barez

2025-10-30

red teaming

2510.26418v1

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko

2025-10-30

red teaming

2510.26328v1

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang

2025-10-30

red teaming

2510.26096v1

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

André V. Duarte, Xuying li, Bin Zeng, Arlindo L. Oliveira, Lei Li, Zhuo Li

2025-10-29

red teaming

2510.25941v1

VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

Baolu Li, Yiming Zhang, Qinghe Wang, Liqian Ma, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Zhenfei Yin, Yunzhi Zhuge, Huchuan Lu, Xu Jia

2025-10-29

2510.25772v1

Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models

Juan Ren, Mark Dras, Usman Naseem

2025-10-29

red teaming

2510.25179v1

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection

Chanhyeong Yang, Taehoon Song, Jihwan Park, Hyunwoo J. Kim

2025-10-29

2510.25094v1

Compositional Image Synthesis with Inference-Time Scaling

Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn

2025-10-28

2510.24133v1

Fortytwo: Swarm Inference with Peer-Ranked Consensus

Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov, Ivan Nikitin, Alexander Firsov

2025-10-27

2510.24801v1

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

Zhenyu Zhang, Tianyi Chen, Weiran Xu, Alex Pentland, Jiaxin Pei

2025-10-27

2510.23822v1

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She

2025-10-27

red teaming

2510.23675v1

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Jing-Xuan Zhang, Genshun Wan, Jin Li, Jianqing Gao

2025-10-27

2510.22961v1

FAME: Fairness-aware Attention-modulated Video Editing

Zhangkai Wu, Xuhui Fan, Zhongyuan Xie, Kaize Shi, Zhidong Li, Longbing Cao

2025-10-27

2510.22960v1

October 27 - November 02, 2025

Reasoning Up the Instruction Ladder for Controllable Language Models

CATCH: A Modular Cross-domain Adaptive Template with Hook

Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token

Chain-of-Thought Hijacking

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection

Compositional Image Synthesis with Inference-Time Scaling

Fortytwo: Swarm Inference with Peer-Ranked Consensus

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

FAME: Fairness-aware Attention-modulated Video Editing

October 20 - October 26, 2025

Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks

Characterizing Low-Latency Sky Localization in Multi-Detector Gravitational-Wave Networks

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models

When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails