Paper Library

Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding

Ning Ding, Keisuke Fujii, Toru Tamaki

2025-10-16

2510.14617v1

Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs

Kyubyung Chae, Gihoon Kim, Gyuseong Lee, Taesup Kim, Jaejin Lee, Heejin Kim

2025-10-16

safety

2510.14565v1

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Andrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes

2025-10-16

red teaming

2510.14381v1

Towards Agentic Self-Learning LLMs in Search Environment

Wangtao Sun, Xiang Cheng, Jialin Fan, Yao Xu, Xing Yu, Shizhu He, Jun Zhao, Kang Liu

2025-10-16

2510.14253v2

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu

2025-10-16

red teaming

2510.14207v2

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu

2025-10-16

red teaming

2510.14207v1

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

Wei Zou, Yupei Liu, Yanting Wang, Ying Chen, Neil Gong, Jinyuan Jia

2025-10-15

red teaming

2510.14005v2

Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions

Siying Liu, Shisheng Zhang, Indu Bala

2025-10-15

safety

2510.13931v1

In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers

Avihay Cohen

2025-10-15

red teaming

2510.13543v1

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems

Karthik Avinash, Nikhil Pareek, Rishav Hada

2025-10-15

2510.13351v1

Prompt-based Adaptation in Large-scale Vision Models: A Survey

Xi Xiao, Yunbei Zhang, Lin Zhao, Yiyang Liu, Xiaoying Liao, Zheda Mai, Xingjian Li, Xiao Wang, Hao Xu, Jihun Hamm, Xue Lin, Min Xu, Qifan Wang, Tianyang Wang, Cheng Han

2025-10-15

2510.13219v1

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs

Juan Ren, Mark Dras, Usman Naseem

2025-10-15

red teaming

2510.13190v1

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

João A. Leite, Arnav Arora, Silvia Gargova, João Luz, Gustavo Sampaio, Ian Roberts, Carolina Scarton, Kalina Bontcheva

2025-10-14

red teaming

2510.12993v1

SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents

Simon Sinong Zhan, Yao Liu, Philip Wang, Zinan Wang, Qineng Wang, Zhian Ruan, Xiangyu Shi, Xinyu Cao, Frank Yang, Kangrui Wang, Huajie Shao, Manling Li, Qi Zhu

2025-10-14

safety

2510.12985v1

CADE 2.5 - ZeResFDG: Frequency-Decoupled, Rescaled and Zero-Projected Guidance for SD/SDXL Latent Diffusion Models

Denis Rychkovskiy

2025-10-14

2510.12954v2

RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs

Tuan T. Nguyen, John Le, Thai T. Vu, Willy Susilo, Heath Cooper

2025-10-14

red teaming

2510.13901v1

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Kevin Li, Manuel Brack, Sudeep Katakol, Hareesh Ravi, Ajinkya Kale

2025-10-14

2510.12789v1

Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers

Ruben Belo, Marta Guimaraes, Claudia Soares

2025-10-14

2510.12672v2

Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers

Ruben Belo, Claudia Soares, Marta Guimaraes

2025-10-14

2510.12672v1

Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection

Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani, Daniele Nardi

2025-10-14

red teaming

2510.13893v1

PromptLocate: Localizing Prompt Injection Attacks

Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong

2025-10-14

red teaming

2510.12252v2

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

Dongsen Zhang, Zekun Li, Xu Luo, Xuannan Liu, Peipei Li, Wenjun Xu

2025-10-14

red teaming

2510.15994v1

SafeMT: Multi-turn Safety for Multimodal Language Models

Han Zhu, Juntao Dai, Jiaming Ji, Haoran Li, Chengkun Cai, Pengcheng Wen, Chi-Min Chan, Boyuan Chen, Yaodong Yang, Sirui Han, Yike Guo

2025-10-14

red teaming

2510.12133v1

Deep Research Brings Deeper Harm

Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu

2025-10-13

red teaming

2510.11851v2

October 13 - October 19, 2025

Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding

Assessing Socio-Cultural Alignment and Technical Safety of Sovereign LLMs

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Towards Agentic Self-Learning LLMs in Search Environment

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions

In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems

Prompt-based Adaptation in Large-scale Vision Models: A Survey

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents

CADE 2.5 - ZeResFDG: Frequency-Decoupled, Rescaled and Zero-Projected Guidance for SD/SDXL Latent Diffusion Models

RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers

Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers

Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection

PromptLocate: Localizing Prompt Injection Attacks

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

SafeMT: Multi-turn Safety for Multimodal Language Models

Deep Research Brings Deeper Harm