Paper Library

Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification

Yanxi Li, Ruocheng Shan

2025-11-23

red teaming

2511.21752v1

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

Yanting Wang, Runpeng Geng, Jinghui Chen, Minhao Cheng, Jinyuan Jia

2025-11-23

red teaming

2511.18581v2

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

Yanting Wang, Runpeng Geng, Jinghui Chen, Minhao Cheng, Jinyuan Jia

2025-11-23

red teaming

2511.18581v1

Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems

Xiaoqing Wang, Keman Huang, Bin Liang, Hongyu Li, Xiaoyong Du

2025-11-23

safety

2511.18467v1

Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation

Qingsong He, Jing Nan, Jiayu Jiao, Liangjie Tang, Xiaodong Xu, Mengmeng Sun, Qingyao Wang, Minghui Yan

2025-11-23

2511.19483v1

Curvature-Aware Safety Restoration In LLMs Fine-Tuning

Thong Bach, Thanh Nguyen-Tang, Dung Nguyen, Thao Minh Le, Truyen Tran

2025-11-22

safety

2511.18039v1

Building Browser Agents: Architecture, Security, and Practical Solutions

Aram Vardanyan

2025-11-22

2511.19477v1

Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries

Yunyi Zhang, Shibo Cui, Baojun Liu, Jingkai Yu, Min Zhang, Fan Shi, Han Zheng

2025-11-22

2511.17874v1

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

Tom Perel

2025-11-21

red teaming

2511.17666v1

The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks

Chunyang Li, Zifeng Kang, Junwei Zhang, Zhuo Ma, Anda Cheng, Xinghua Li, Jianfeng Ma

2025-11-20

red teaming

2511.16347v1

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang, Han Sun, Yule Liu, Yun Shen, Xiangshan Gao, Yilong Yang, Shuai Liu, Yutao Yue, Xinlei He

2025-11-20

red teaming

2511.16278v1

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Wei Zhao, Zhe Li, Yige Li, Jun Sun

2025-11-20

red teaming

2511.16229v1

An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs

Zhi Luo, Zenghui Yuan, Wenqi Wei, Daizong Liu, Pan Zhou

2025-11-20

red teaming

2511.16163v1

D4C: Data-free Quantization for Contrastive Language-Image Pre-training Models

Wenlun Zhang, Yunshan Zhong, Zihao Ding, Xinyu Li, Kentaro Yoshioka

2025-11-19

2511.15411v1

November 24 - November 30, 2025

Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion

Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation

Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering

RoguePrompt: Dual-Layer Ciphering for Self-Reconstruction to Circumvent LLM Moderation

Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

Automating Deception: Scalable Multi-Turn LLM Jailbreaks

November 17 - November 23, 2025

Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems

Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation

Curvature-Aware Safety Restoration In LLMs Fine-Tuning

Building Browser Agents: Architecture, Security, and Practical Solutions

Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs

D4C: Data-free Quantization for Contrastive Language-Image Pre-training Models