Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

April 06 - April 12, 2026

3 papers

March 30 - April 05, 2026

21 papers

LLM-Enabled Open-Source Systems in the Wild: An Empirical Study of Vulnerabilities in GitHub Security Advisories

Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker, Mia Mohammad Imran
2026-04-05
2604.04288v1

FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment

Daniel Kuznetsov, Ofir Cohen, Karin Shistik, Rami Puzis, Asaf Shabtai
2026-04-05
red teaming
2604.04992v1

Predict, Don't React: Value-Based Safety Forecasting for LLM Streaming

Pride Kavumba, Koki Wataoka, Huy H. Nguyen, Jiaxuan Li, Masaya Ohagi
2026-04-05
safety
2604.03962v1

Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework

Dalal Alharthi, Ivan Roberto Kawaminami Garcia
2026-04-05
red teaming
2604.03912v1

Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

Wenhui Zhu, Xuanzhao Dong, Xiwen Chen, Rui Cai, Peijie Qiu, Zhipeng Wang, Oana Frunza, Shao Tang, Jindong Gu, Yalin Wang
2026-04-04
red teaming
2604.03870v1

AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models

Jackson Wang
2026-04-04
red teaming
2604.03598v1

Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach

Takayuki Semitsu, Naoto Kiribuchi, Kengo Zenitani
2026-04-04
safety
2604.03533v1

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng, Yuekang Li, Yanjun Zhang, Jianting Ning, Leo Yu Zhang, Lei Ma, Zhiqiang Li
2026-04-03
red teaming
2604.03070v1

LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation

Yilin Xiao, Jin Chen, Qinggang Zhang, Yujing Zhang, Chuang Zhou, Longhao Yang, Lingfei Ren, Xin Yang, Xiao Huang
2026-04-03
red teaming
2604.02954v1

Generalization Limits of Reinforcement Learning Alignment

Haruhi Shida, Koo Imai, Keigo Kansa
2026-04-03
red teaming
2604.02652v1

Understanding the Effects of Safety Unalignment on Large Language Models

John T. Halloran
2026-04-02
red teaming
2604.02574v1

Low-Effort Jailbreak Attacks Against Text-to-Image Safety Filters

Ahmed B Mustafa, Zihan Ye, Yang Lu, Michael P Pound, Shreyank N Gowda
2026-04-02
red teaming
2604.01888v1

CRaFT: Circuit-Guided Refusal Feature Selection via Cross-Layer Transcoders

Su-Hyeon Kim, Hyundong Jin, Yejin Lee, Yo-Sub Han
2026-04-02
red teaming
2604.01604v1

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska, Jiahao Xu
2026-04-01
red teaming
2604.01473v1

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang, Zhen Xiang, Chaowei Xiao, Kristina Gligorić, Muhao Chen
2026-04-01
red teaming
2604.01444v2

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang, Zhen Xiang, Chaowei Xiao, Kristina Gligorić, Muhao Chen
2026-04-01
red teaming
2604.01444v1

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan, Kai Mei, Xiao Wang, Jihun Hamm, Ziwei Zhu, Yingqiang Ge
2026-04-01
red teaming
2604.01438v2

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan, Kai Mei, Xiao Wang, Jihun Hamm, Ziwei Zhu, Yingqiang Ge
2026-04-01
red teaming
2604.01438v1

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia
2026-04-01
2604.01194v1

Multi-Agent LLM Governance for Safe Two-Timescale Reinforcement Learning in SDN-IoT Defense

Saeid Jamshidi, Negar Shahabi, Foutse Khomh, Carol Fung, Mohammad Hamdaqa
2026-04-01
safety governance
2604.01127v1

The Persistent Vulnerability of Aligned AI Systems

Aengus Lynch
2026-03-31
red teaming
2604.00324v1