Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1172 papers total

November 24 - November 30, 2025

2 papers

November 17 - November 23, 2025

22 papers

Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification

Yanxi Li, Ruocheng Shan
2025-11-23
red teaming
2511.21752v1

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

Yanting Wang, Runpeng Geng, Jinghui Chen, Minhao Cheng, Jinyuan Jia
2025-11-23
red teaming
2511.18581v2

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

Yanting Wang, Runpeng Geng, Jinghui Chen, Minhao Cheng, Jinyuan Jia
2025-11-23
red teaming
2511.18581v1

Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems

Xiaoqing Wang, Keman Huang, Bin Liang, Hongyu Li, Xiaoyong Du
2025-11-23
safety
2511.18467v1

Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation

Qingsong He, Jing Nan, Jiayu Jiao, Liangjie Tang, Xiaodong Xu, Mengmeng Sun, Qingyao Wang, Minghui Yan
2025-11-23
2511.19483v1

Curvature-Aware Safety Restoration In LLMs Fine-Tuning

Thong Bach, Thanh Nguyen-Tang, Dung Nguyen, Thao Minh Le, Truyen Tran
2025-11-22
safety
2511.18039v1

Building Browser Agents: Architecture, Security, and Practical Solutions

Aram Vardanyan
2025-11-22
2511.19477v1

Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries

Yunyi Zhang, Shibo Cui, Baojun Liu, Jingkai Yu, Min Zhang, Fan Shi, Han Zheng
2025-11-22
2511.17874v1

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

Tom Perel
2025-11-21
red teaming
2511.17666v1

The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks

Chunyang Li, Zifeng Kang, Junwei Zhang, Zhuo Ma, Anda Cheng, Xinghua Li, Jianfeng Ma
2025-11-20
red teaming
2511.16347v1

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang, Han Sun, Yule Liu, Yun Shen, Xiangshan Gao, Yilong Yang, Shuai Liu, Yutao Yue, Xinlei He
2025-11-20
red teaming
2511.16278v1

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Wei Zhao, Zhe Li, Yige Li, Jun Sun
2025-11-20
red teaming
2511.16229v1

An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs

Zhi Luo, Zenghui Yuan, Wenqi Wei, Daizong Liu, Pan Zhou
2025-11-20
red teaming
2511.16163v1

D4C: Data-free Quantization for Contrastive Language-Image Pre-training Models

Wenlun Zhang, Yunshan Zhong, Zihao Ding, Xinyu Li, Kentaro Yoshioka
2025-11-19
2511.15411v1

CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution

Baoliang Tian, Yuxuan Si, Jilong Wang, Lingyao Li, Zhongyuan Bao, Zineng Zhou, Tao Wang, Sixu Li, Ziyao Xu, Mingze Wang, Zhouzhuo Zhang, Zhihao Wang, Yike Yun, Ke Tian, Ning Yang, Minghui Qiu
2025-11-19
2511.21717v1

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti, Matteo Prandi, Federico Pierucci, Francesco Giarrusso, Marcantonio Bracale, Marcello Galisai, Vincenzo Suriani, Olga Sorokoletova, Federico Sartore, Daniele Nardi
2025-11-19
red teaming
2511.15304v2

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti, Matteo Prandi, Federico Pierucci, Francesco Giarrusso, Marcantonio Bracale, Marcello Galisai, Vincenzo Suriani, Olga Sorokoletova, Federico Sartore, Daniele Nardi
2025-11-19
red teaming
2511.15304v1

Securing AI Agents Against Prompt Injection Attacks

Badrinath Ramakrishnan, Akshaya Balaji
2025-11-19
red teaming
2511.15759v1

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, Shuai Wang
2025-11-19
red teaming safety
2511.15203v1

SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models

Xin Gao, Shaohan Yu, Zerui Chen, Yueming Lyu, Weichen Yu, Guanghao Li, Jiyao Liu, Jianxiong Gao, Jian Liang, Ziwei Liu, Chenyang Si
2025-11-19
2511.15169v2

SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models

Xin Gao, Shaohan Yu, Zerui Chen, Yueming Lyu, Weichen Yu, Guanghao Li, Jiyao Liu, Jianxiong Gao, Jian Liang, Ziwei Liu, Chenyang Si
2025-11-19
2511.15169v1

Unified Defense for Large Language Models against Jailbreak and Fine-Tuning Attacks in Education

Xin Yi, Yue Li, Dongsheng Shi, Linlin Wang, Xiaoling Wang, Liang He
2025-11-18
2511.14423v1