← Back to Library

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Authors: Brandon Lit, Edward Crowder, Daniel Vogel, Hassan Khan

Published: 2025-10-10

arXiv ID: 2510.08917v1

Added to Library: 2025-11-14 23:13 UTC

📄 Abstract

AI chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in domains such as corporate security policy, they could be weaponized to deliver guidance that intentionally undermines system defenses. We investigate whether users can be tricked by a compromised AI chatbot in this scenario. A controlled study (N=15) asked participants to use a chatbot to complete security-related tasks. Without their knowledge, the chatbot was manipulated to give incorrect advice for some tasks. The results show how trust in AI chatbots is related to task familiarity, and confidence in their ownn judgment. Additionally, we discuss possible reasons why people do or do not trust AI chatbots in different scenarios.

🔍 Key Points

  • Systematic evaluation of indirect prompt injection attacks on large language models (LLMs), highlighting vulnerabilities across various models and implementations.
  • Identification of key factors influencing model susceptibility including size and architecture, revealing persistent weaknesses even in advanced models.
  • Development of novel obfuscation techniques utilized in the attack scenarios, allowing adversaries to exploit models through hidden instructions embedded in seemingly benign inputs.
  • Empirical evidence showcasing varying degrees of resilience among different LLMs, with some models exhibiting alarming rates of successful attacks despite advanced security mechanisms.
  • Recommendations for improving LLM defenses, including a centralized database of attack vectors and the integration of security into model training processes.

💡 Why This Paper Matters

This paper is critically relevant in addressing the emerging threats posed by indirect prompt injection attacks on LLMs, underscoring the necessity for enhanced security frameworks in AI applications. The findings not only highlight significant vulnerabilities in existing models but also provide a structured approach for future developments in AI security protocols, making it a pivotal resource for safeguarding corporate data against unprecedented threats.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper offers crucial insights into the evolving landscape of vulnerabilities associated with LLMs, particularly in the context of their integration with external data sources. The empirical data on attack success rates, coupled with a comprehensive analysis of obfuscation techniques, serves as a foundational study for understanding and mitigating security threats in generative AI systems. Furthermore, the proposed frameworks and defensive strategies can guide researchers in developing robust countermeasures against increasingly sophisticated adversarial tactics.

📚 Read the Full Paper