← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 770 papers total
November 24 - November 30, 2025
24 papers
When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals
Riad Ahmed Anonto, Md Labid Al Nahiyan, Md Tanvir Hassan, Ch. Md. Rakin Haider
2025-11-30
safety
2512.01037v1
Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis
Mintong Kang, Chong Xiang, Sanjay Kariyappa, Chaowei Xiao, Bo Li, Edward Suh
2025-11-30
red teaming
2512.00966v1
On the Regulatory Potential of User Interfaces for AI Agent Governance
K. J. Kevin Feng, Tae Soo Kim, Rock Yuren Pang, Faria Huq, Tal August, Amy X. Zhang
2025-11-30
2512.00742v1
Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model
Junshu Tang, Jiacheng Liu, Jiaqi Li, Longhuang Wu, Haoyu Yang, Penghao Zhao, Siruis Gong, Xiang Yuan, Shuai Shao, Qinglin Lu
2025-11-28
2511.23429v1
Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering
Qiming Li, Xiaocheng Feng, Yixuan Ma, Zekai Ye, Ruihan Chen, Xiachong Feng, Bing Qin
2025-11-28
2511.23231v1
Are LLMs Good Safety Agents or a Propaganda Engine?
Neemesh Yadav, Francesco Ortu, Jiarui Liu, Joeun Yook, Bernhard Schölkopf, Rada Mihalcea, Alberto Cazzaniga, Zhijing Jin
2025-11-28
red teaming
safety
2511.23174v1
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
Richard J. Young
2025-11-27
red teaming
2511.22047v1
Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression
Tianyu Zhang, Zihang Xi, Jingyu Hua, Sheng Zhong
2025-11-27
red teaming
2511.22044v1
DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models
Mingue Park, Prin Phunyaphibarn, Phillip Y. Lee, Minhyuk Sung
2025-11-26
2511.21415v1
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
Yuhang Wang, Yanxu Zhu, Dongyuan Lu, Jitao Sang
2025-11-26
2511.21214v2
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
Yuhang Wang, Yanxu Zhu, Dongyuan Lu, Jitao Sang
2025-11-26
2511.21214v1
Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
Dongkyu Derek Cho, Huan Song, Arijit Ghosh Chowdhury, Haotian An, Yawei Wang, Rohit Thekkanal, Negin Sokhandan, Sharlina Keshava, Hannah Marlowe
2025-11-26
safety
2511.21050v1
CameraMaster: Unified Camera Semantic-Parameter Control for Photography Retouching
Qirui Yang, Yang Yang, Ying Zeng, Xiaobin Hu, Bo Li, Huanjing Yue, Jingyu Yang, Peng-Tao Jiang
2025-11-26
2511.21024v1
BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents
Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, Ninghui Li
2025-11-25
red teaming
2511.20597v1
Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
Jakub Hoscilowicz, Artur Janicki
2025-11-25
red teaming
2511.20494v3
Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
Jakub Hoscilowicz, Artur Janicki
2025-11-25
red teaming
2511.20494v2
Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
Jakub Hoscilowicz, Artur Janicki
2025-11-25
red teaming
2511.20494v1
A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control
Jiawei Lin, Guanlong Jiao, Jianjin Xu
2025-11-25
2511.20401v1
Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge
Yuhang Wang, Heye Huang, Zhenhua Xu, Kailai Sun, Baoshen Guo, Jinhua Zhao
2025-11-25
safety
2511.20726v1
SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
Lin Chen, Yingjian Zhu, Qi Yang, Xin Niu, Kun Ding, Shiming Xiang
2025-11-25
2511.20027v1
NOEM$^{3}$A: A Neuro-Symbolic Ontology-Enhanced Method for Multi-Intent Understanding in Mobile Agents
Ioannis Tzachristas, Aifen Sui
2025-11-24
2511.19780v1
Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts
Steven Peh
2025-11-24
2511.19727v1
LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
Jingzhi Bao, Hongze Chen, Lingting Zhu, Chenyu Liu, Runze Zhang, Keyang Luo, Zeyu Hu, Weikai Chen, Yingda Yin, Xin Wang, Zehong Lin, Jun Zhang, Xiaoguang Han
2025-11-24
2511.19437v1
Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization
Xurui Li, Kaisong Song, Rui Zhu, Pin-Yu Chen, Haixu Tang
2025-11-24
red teaming
safety
2511.19218v2
‹
1
2
3
...
31
32
33
›