Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

Authors: Nicolás E. Díaz Ferreyra, Monika Swetha Gurupathi, Zadia Codabux, Nalin Arachchilage, Riccardo Scandariato

Published: 2026-04-09

arXiv ID: 2604.08352v1

Added to Library: 2026-04-10 03:01 UTC

Red Teaming

📄 Abstract

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software practitioners across multiple programming tasks, including code completion, documentation, and bug detection. However, current research has identified significant limitations and open issues in GenAI, including reliability, non-determinism, bias, and copyright infringement. While prior work has primarily focused on assessing the technical performance of these technologies for code generation, less attention has been paid to emerging concerns of software developers, particularly in the security realm. OBJECTIVE: This work explores security concerns regarding the use of GenAI-based coding assistants by analyzing challenges voiced by developers and software enthusiasts in public online forums. METHOD: We retrieved posts, comments, and discussion threads addressing security issues in GitHub Copilot from three popular platforms, namely Stack Overflow, Reddit, and Hacker News. These discussions were clustered using BERTopic and then synthesized using thematic analysis to identify distinct categories of security concerns. RESULTS: Four major concern areas were identified, including potential data leakage, code licensing, adversarial attacks (e.g., prompt injection), and insecure code suggestions, underscoring critical reflections on the limitations and trade-offs of GenAI in software engineering. IMPLICATIONS: Our findings contribute to a broader understanding of how developers perceive and engage with GenAI-based coding assistants, while highlighting key areas for improving their built-in security features.

🔍 Key Points

Identification of major security concerns related to GitHub Copilot: data exposure, insecure code suggestions, legal licensing issues, and trust erosion among developers.
Utilization of BERTopic and thematic analysis to categorize insights from discussions on platforms like Stack Overflow, Reddit, and Hacker News, yielding a nuanced understanding of developers' perspectives.
Empirical evidence of developers' skepticism towards GenAI tools, indicating a disparity between the tool's functionality and the security expectations of users.
Highlights the need for improved transparency and safeguards in GenAI systems to address issues like data poisoning and intellectual property violations.

💡 Why This Paper Matters

This paper significantly contributes to the understanding of security concerns surrounding generative AI coding assistants by capturing the voices of developers engaging in online discussions. By identifying key concern areas that influence trust and usability, it lays a foundation for more secure and legally compliant AI-based coding tools.

🎯 Why It's Interesting for AI Security Researchers

This paper would interest AI security researchers as it addresses crucial security vulnerabilities specific to generative AI, especially in code generation, and emphasizes the practical concerns developers have in deploying these tools. The findings highlight important areas for future research, particularly in improving the security and compliance frameworks of AI technologies.

Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper