← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 770 papers total
September 23 - September 29, 2024
1 paper
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu, Yangguang Shao, Hanwen Miao, Junzheng Shi
2024-09-23
red teaming
2409.14729v2
September 16 - September 22, 2024
2 papers
Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection
Md Abdur Rahman, Hossain Shahriar, Fan Wu, Alfredo Cuzzocrea
2024-09-20
red teaming
2409.13331v1
A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration
Zhang Zheng
2024-09-16
2409.10403v1
August 05 - August 11, 2024
1 paper
Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection
Subaru Kimura, Ryota Tanaka, Shumpei Miyawaki, Jun Suzuki, Keisuke Sakaguchi
2024-08-07
red teaming
2408.03554v1
July 29 - August 04, 2024
1 paper
PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning
Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk
2024-07-30
2407.20705v1
July 01 - July 07, 2024
1 paper
Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning
Simon Ostermann, Kevin Baum, Christoph Endres, Julia Masloh, Patrick Schramowski
2024-07-03
red teaming
2407.03391v1
May 27 - June 02, 2024
1 paper
Exfiltration of personal information from ChatGPT via prompt injection
Gregory Schwartzman
2024-05-31
red teaming
2406.00199v2
April 01 - April 07, 2024
1 paper
Goal-guided Generative Prompt Injection Attack on Large Language Models
Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo Jin
2024-04-06
red teaming
2404.07234v4
March 25 - March 31, 2024
1 paper
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong
2024-03-26
red teaming
2403.17710v5
March 18 - March 24, 2024
1 paper
Defending Against Indirect Prompt Injection Attacks With Spotlighting
Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman
2024-03-20
red teaming
2403.14720v1
March 04 - March 10, 2024
1 paper
Automatic and Universal Prompt Injection Attacks against Large Language Models
Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao
2024-03-07
red teaming
2403.04957v1
February 05 - February 11, 2024
1 paper
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen, Julien Piet, Chawin Sitawarin, David Wagner
2024-02-09
2402.06363v2
January 29 - February 04, 2024
1 paper
An Early Categorization of Prompt Injection Attacks on Large Language Models
Sippo Rossi, Alisia Marianne Michel, Raghava Rao Mukkamala, Jason Bennett Thatcher
2024-01-31
red teaming
2402.00898v1
January 15 - January 21, 2024
1 paper
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo
2024-01-15
2401.07612v1
December 25 - December 31, 2023
1 paper
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner
2023-12-29
2312.17673v2
December 11 - December 17, 2023
1 paper
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Ahmed Salem, Andrew Paverd, Boris Köpf
2023-12-12
red teaming
2312.11513v1
November 20 - November 26, 2023
1 paper
Assessing Prompt Injection Risks in 200+ Custom GPTs
Jiahao Yu, Yuhang Wu, Dong Shu, Mingyu Jin, Sabrina Yang, Xinyu Xing
2023-11-20
red teaming
2311.11538v2
October 30 - November 05, 2023
1 paper
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
2023-11-02
red teaming
2311.01011v1
October 16 - October 22, 2023
2 papers
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong
2023-10-19
red teaming
2310.12815v5
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong
2023-10-19
red teaming
2310.12815v4
August 14 - August 20, 2023
1 paper
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li, Baolin Peng, Pengcheng He, Xifeng Yan
2023-08-17
red teaming
2308.10819v3
July 31 - August 06, 2023
2 papers
PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification
Hongwei Yao, Jian Lou, Kui Ren, Zhan Qin
2023-08-05
2308.02816v2
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
Rodrigo Pedro, Daniel Castro, Paulo Carreira, Nuno Santos
2023-08-03
red teaming
2308.01990v4
June 05 - June 11, 2023
1 paper
Prompt Injection attack against LLM-integrated Applications
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu
2023-06-08
red teaming
2306.05499v2
‹
1
2
3
...
31
32
33
›