← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 1331 papers total
March 23 - March 29, 2026
15 papers
Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs
Wenyu Chen, Xiangtao Meng, Chuanchao Zang, Li Wang, Xinyu Gao, Jianing Wang, Peng Zhan, Zheng Li, Shanqing Guo
2026-03-24
red teaming
2603.23269v1
SafeSeek: Universal Attribution of Safety Circuits in Language Models
Miao Yu, Siyuan Fu, Moayad Aloqaily, Zhenhong Zhou, Safa Otoum, Xing fan, Kun Wang, Yufei Guo, Qingsong Wen
2026-03-24
red teaming
2603.23268v1
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution
Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng, Jiawen Zhang, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang
2026-03-24
2603.23064v3
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution
Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng, Jiawen Zhang, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang
2026-03-24
2603.23064v2
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution
Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng, Jiawen Zhang, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang
2026-03-24
2603.23064v1
SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy
Ali Dehghantanha, Sajad Homayoun
2026-03-24
red teaming
2603.22928v1
TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration
Chunxiao Li, Lijun Li, Jing Shao
2026-03-24
red teaming
2603.22882v1
LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface
Michael Hind, Basel Shbita, Bo Wu, Farhan Ahmed, Chad DeLuca, Nathan Fulton, David Cox, Dan Gutfreund
2026-03-23
2603.22519v2
LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface
Michael Hind, Basel Shbita, Bo Wu, Farhan Ahmed, Chad DeLuca, Nathan Fulton, David Cox, Dan Gutfreund
2026-03-23
2603.22519v1
Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning
Charoes Huang, Xin Huang, Ngoc Phu Tran, Amin Milani Fard
2026-03-23
red teaming
2603.22489v1
Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models
Xingyu Zhu, Beier Zhu, Shuo Wang, Junfeng Fang, Kesen Zhao, Hanwang Zhang, Xiangnan He
2026-03-23
red teaming
2603.22094v2
Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models
Xingyu Zhu, Beier Zhu, Shuo Wang, Junfeng Fang, Kesen Zhao, Hanwang Zhang, Xiangnan He
2026-03-23
2603.22094v1
SecureBreak -- A dataset towards safe and secure models
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera
2026-03-23
red teaming
2603.21975v1
Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee
2026-03-23
red teaming
2603.21697v1
Are AI-assisted Development Tools Immune to Prompt Injection?
Charoes Huang, Xin Huang, Amin Milani Fard
2026-03-23
red teaming
2603.21642v1
March 16 - March 22, 2026
9 papers
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
Huamin Chen, Xunzhuo Liu, Bowei He, Fuyuan Lyu, Yankai Chen, Xue Liu, Yuhan Liu, Junchen Jiang
2026-03-22
2603.21354v1
JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution Optimization
Haolun Zheng, Yu He, Tailun Chen, Shuo Shao, Zhixuan Chu, Hongbin Zhou, Lan Tao, Zhan Qin, Kui Ren
2026-03-22
red teaming
2603.21208v2
JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution Optimization
Haolun Zheng, Yu He, Tailun Chen, Shuo Shao, Zhixuan Chu, Hongbin Zhou, Lan Tao, Zhan Qin, Kui Ren
2026-03-22
red teaming
2603.21208v1
Detection of adversarial intent in Human-AI teams using LLMs
Abed K. Musaffar, Ambuj Singh, Francesco Bullo
2026-03-21
red teaming
2603.20976v1
The production of meaning in the processing of natural language
Christopher J. Agostino, Quan Le Thien, Nayan D'Souza, Louis van der Elst
2026-03-20
2603.20381v1
Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models
Wenjing Hong, Zhonghua Rong, Li Wang, Feng Chang, Jian Zhu, Ke Tang, Zexuan Zhu, Yew-Soon Ong
2026-03-20
red teaming
2603.20122v1
Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance
Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan, Zhenyu Xu, Xiang Li, Guoxing Chen, Yan Meng, Haojin Zhu
2026-03-20
red teaming
2603.19974v1
The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents
Shasha Yu, Fiona Carroll, Barry L. Bentley
2026-03-19
safety
2603.20320v1
A Framework for Formalizing LLM Agent Security
Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, Dawn Song
2026-03-19
red teaming
2603.19469v1
‹
1
2
3
4
5
...
54
55
56
›