Paper Library

Showing 1331 papers total

December 29 - January 04, 2026

4 papers

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai. -Doss

2025-12-29

red teaming

2512.23684v1

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty

2025-12-29

2512.23557v1

Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems

Armstrong Foundjem, Lionel Nganyewou Tidjon, Leuson Da Silva, Foutse Khomh

2025-12-29

red teaming

2512.23132v1

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Błaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H. S. Torr, Adam Mahdi, Adel Bibi

2025-12-29

red teaming

2512.23128v1

December 22 - December 28, 2025

7 papers

Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

Kerem Zaman, Shashank Srivastava

2025-12-28

2512.23032v1

Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs

Jiayu Hu, Beibei Li, Jiangwei Xia, Yanjun Qin, Bing Ji, Zhongshi He

2025-12-26

2512.21999v1

AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs

Yihan Wang, Huanqi Yang, Shantanu Pal, Weitao Xu

2025-12-24

red teaming

2512.20986v1

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

Kanchon Gharami, Sanjiv Kumar Sarkar, Yongxin Liu, Shafika Showkat Moni

2025-12-23

red teaming

2512.20405v2

AprielGuard

Jaykumar Kasundra, Anjaneya Praharaj, Sourabh Surana, Lakshmi Sirisha Chodisetty, Sourav Sharma, Abhigya Verma, Abhishek Bhardwaj, Debasish Kanhar, Aakash Bhagat, Khalil Slimi, Seganrasan Subramanian, Sathwik Tejaswi Madhusudhan, Ranga Prasad Chenna, Srinivas Sunkara

2025-12-23

red teaming

2512.20293v2

PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

Akshaj Prashanth Rao, Advait Singh, Saumya Kumaar Saksena, Dhruv Kumar

2025-12-22

red teaming

2512.19011v2

Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

Akshaj Prashanth Rao, Advait Singh, Saumya Kumaar Saksena, Dhruv Kumar

2025-12-22

2512.19011v1

December 15 - December 21, 2025

7 papers

Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness

Haotian Deng, Chris Farber, Jiyoon Lee, David Tang

2025-12-21

red teaming

2601.08843v1

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid, Tahsin Zaman Jilan

2025-12-18

red teaming

2512.16307v1

MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

Saksham Sahai Srivastava, Haoyu He

2025-12-18

red teaming

2512.16962v1

Quantifying Return on Security Controls in LLM Systems

Richard Helder Moulton, Austin O'Brien, John D. Hastings

2025-12-17

red teaming

2512.15081v1

Trust in LLM-controlled Robotics: a Survey of Security Threats, Defenses and Challenges

Xinyu Huang, Shyam Karthick V B, Taozhao Chen, Mitch Bryson, Thomas Chaffey, Huaming Chen, Kim-Kwang Raymond Choo, Ian R. Manchester

2025-12-17

red teaming

2601.02377v1

Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

Viet K. Nguyen, Mohammad I. Husain

2025-12-16

red teaming

2512.14860v1

Cisco Integrated AI Security and Safety Framework Report

Amy Chang, Tiffany Saade, Sanket Mendapara, Adam Swanda, Ankit Garg

2025-12-15

red teaming

2512.12921v1

December 08 - December 14, 2025

6 papers

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar

2025-12-11

red teaming

2512.10449v3

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar

2025-12-11

red teaming

2512.10449v1

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation

Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra, Yash Sinha, Murari Mandal, Dhruv Kumar

2025-12-11

red teaming

2512.10415v1

‹ 1 2 3 ... 20 21 22 ... 54 55 56 ›

December 29 - January 04, 2026

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

December 22 - December 28, 2025

Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs

AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

AprielGuard

PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

December 15 - December 21, 2025

Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

Quantifying Return on Security Controls in LLM Systems

Trust in LLM-controlled Robotics: a Survey of Security Threats, Defenses and Challenges

Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

Cisco Integrated AI Security and Safety Framework Report

December 08 - December 14, 2025

ceLLMate: Sandboxing Browser AI Agents

Detecting Prompt Injection Attacks Against Application Using Classifiers

Challenges of Evaluating LLM Safety for User Welfare

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation