Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1169 papers total

January 05 - January 11, 2026

24 papers

How Secure is Secure Code Generation? Adversarial Prompts Put LLM Defenses to the Test

Melissa Tessa, Iyiola E. Olatunji, Aicha War, Jacques Klein, Tegawendé F. Bissyandé
2026-01-11
safety
2601.07084v1

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

Hongyan Chang, Ergute Bao, Xinjian Luo, Ting Yu
2026-01-11
red teaming
2601.07072v1

Paraphrasing Adversarial Attack on LLM-as-a-Reviewer

Masahiro Kaneko
2026-01-11
red teaming
2601.06884v1

VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit

Junda Lin, Zhaomeng Zhou, Zhi Zheng, Shuochen Liu, Tong Xu, Yong Chen, Enhong Chen
2026-01-09
red teaming
2601.05755v2

VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit

Junda Lin, Zhaomeng Zhou, Zhi Zheng, Shuochen Liu, Tong Xu, Yong Chen, Enhong Chen
2026-01-09
red teaming
2601.05755v1

The Echo Chamber Multi-Turn LLM Jailbreak

Ahmad Alobaid, MartĂ­ JordĂ  Roca, Carlos Castillo, Joan Vendrell
2026-01-09
red teaming
2601.05742v1

PII-VisBench: Evaluating Personally Identifiable Information Safety in Vision Language Models Along a Continuum of Visibility

G M Shahariar, Zabir Al Nazi, Md Olid Hasan Bhuiyan, Zhouxing Shi
2026-01-09
2601.05739v1

Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making

Jua Han, Jaeyoon Seo, Jungbin Min, Jean Oh, Jihie Kim
2026-01-09
safety
2601.05529v2

Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making

Jua Han, Jaeyoon Seo, Jungbin Min, Jean Oh, Jihie Kim
2026-01-09
safety
2601.05529v1

Memory Poisoning Attack and Defense on Memory Based LLM-Agents

Balachandra Devarangadi Sunil, Isheeta Sinha, Piyush Maheshwari, Shantanu Todmal, Shreyan Mallik, Shuchi Mishra
2026-01-09
safety
2601.05504v2

Memory Poisoning Attack and Defense on Memory Based LLM-Agents

Balachandra Devarangadi Sunil, Isheeta Sinha, Piyush Maheshwari, Shantanu Todmal, Shreyan Malik, Shuchi Mishra
2026-01-09
safety
2601.05504v1

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Zhi Yang, Runguo Li, Qiqi Qiang, Jiashun Wang, Fangqi Lou, Mengping Li, Dongpo Cheng, Rui Xu, Heng Lian, Shuo Zhang, Xiaolong Liang, Xiaoming Huang, Zheng Wei, Zhaowei Liu, Xin Guo, Huacan Wang, Ronghao Chen, Liwen Zhang
2026-01-09
red teaming
2601.07853v1

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Zhaoqi Wang, Zijian Zhang, Daqing He, Pengtao Kou, Xin Li, Jiamou Liu, Jincheng An, Yong Liu
2026-01-09
red teaming
2601.05466v1

Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models

Songze Li, Ruishi He, Xiaojun Jia, Jun Wang, Zhihui Fu
2026-01-09
red teaming
2601.05445v1

Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models

Badhan Chandra Das, Md Tasnim Jawad, Joaquin Molto, M. Hadi Amini, Yanzhao Wu
2026-01-08
red teaming
2601.05339v1

$PC^2$: Politically Controversial Content Generation via Jailbreaking Attacks on GPT-based Text-to-Image Models

Wonwoo Choi, Minjae Seo, Minkyoo Song, Hwanjo Heo, Seungwon Shin, Myoungsung You
2026-01-08
red teaming
2601.05150v1

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

Suyash Mishra, Qiang Li, Srikanth Patil, Anubhav Girdhar
2026-01-08
2601.05059v1

Defense Against Indirect Prompt Injection via Tool Result Parsing

Qiang Yu, Xinran Cheng, Chuanyi Liu
2026-01-08
red teaming
2601.04795v1

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang
2026-01-08
2601.04666v1

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

Hoagy Cunningham, Jerry Wei, Zihan Wang, Andrew Persic, Alwin Peng, Jordan Abderrachid, Raj Agarwal, Bobby Chen, Austin Cohen, Andy Dau, Alek Dimitriev, Rob Gilson, Logan Howard, Yijin Hua, Jared Kaplan, Jan Leike, Mu Lin, Christopher Liu, Vladimir Mikulik, Rohit Mittapalli, Clare O'Hara, Jin Pan, Nikhil Saxena, Alex Silverstein, Yue Song, Xunjie Yu, Giulio Zhou, Ethan Perez, Mrinank Sharma
2026-01-08
red teaming
2601.04603v1

Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries

Saad Alqithami
2026-01-08
2601.04583v1

MiJaBench: Revealing Minority Biases in Large Language Models via Hate Speech Jailbreaking

Iago Alves Brito, Walcy Santos Rezende Rios, Julia Soares Dollis, Diogo Fernandes Costa Silva, Arlindo Rodrigues GalvĂŁo Filho
2026-01-07
red teaming
2601.04389v1

SearchAttack: Red-Teaming LLMs against Real-World Threats via Framing Unsafe Web Information-Seeking Tasks

Yu Yan, Sheng Sun, Mingfeng Li, Zheming Yang, Chiwei Zhu, Fei Ma, Benfeng Xu, Min Liu
2026-01-07
red teaming
2601.04093v1

When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life

Xinyue Lou, Jinan Xu, Jingyi Yin, Xiaolong Wang, Zhaolu Kang, Youwei Liao, Yixuan Wang, Xiangyu Shi, Fengran Mo, Su Yao, Kaiyu Huang
2026-01-07
safety
2601.04043v1