Malicious Repurposing of Open Science Artefacts by Using Large Language Models

📄 Abstract

The rapid evolution of large language models (LLMs) has fuelled enthusiasm about their role in advancing scientific discovery, with studies exploring LLMs that autonomously generate and evaluate novel research ideas. However, little attention has been given to the possibility that such models could be exploited to produce harmful research by repurposing open science artefacts for malicious ends. We fill the gap by introducing an end-to-end pipeline that first bypasses LLM safeguards through persuasion-based jailbreaking, then reinterprets NLP papers to identify and repurpose their artefacts (datasets, methods, and tools) by exploiting their vulnerabilities, and finally assesses the safety of these proposals using our evaluation framework across three dimensions: harmfulness, feasibility of misuse, and soundness of technicality. Overall, our findings demonstrate that LLMs can generate harmful proposals by repurposing ethically designed open artefacts; however, we find that LLMs acting as evaluators strongly disagree with one another on evaluation outcomes: GPT-4.1 assigns higher scores (indicating greater potential harms, higher soundness and feasibility of misuse), Gemini-2.5-pro is markedly stricter, and Grok-3 falls between these extremes. This indicates that LLMs cannot yet serve as reliable judges in a malicious evaluation setup, making human evaluation essential for credible dual-use risk assessment.

🔍 Key Points

Introduces a novel end-to-end pipeline for malicious proposal generation using Large Language Models (LLMs), showcasing the dual-use risks of open science artefacts.
Demonstrates how persuasion-based jailbreaking enables the extraction and repurposing of datasets, methods, and tools from legitimate research for harmful applications.
Establishes an AI-safety evaluation framework assessing generated proposals across harmfulness, feasibility of misuse, and technical soundness, providing a structured way to evaluate risks.
Finds significant variability in assessment outcomes by different LLMs, indicating that they are currently unreliable as evaluators, emphasizing the necessity of human oversight.
Highlights the pressing need for safeguards as open science artefacts can be exploited for malicious purposes, posing a potential threat to society.

💡 Why This Paper Matters

This paper is crucial as it highlights the vulnerabilities in utilizing LLMs for research purposes, showcasing how they can inadvertently facilitate the creation of harmful applications. It underscores the importance of rigorously evaluating the safety mechanisms currently in place and advocates for the development of robust safeguards to prevent misuse in the growing field of AI-assisted research.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper indispensable because it not only exposes the inherent risks of using LLMs in research but also presents a detailed methodology for evaluating those risks. It presents a new dimension of dual-use concerns, emphasizing that while LLMs can enhance scientific discovery, they also pose significant threats that must be understood and mitigated.

Malicious Repurposing of Open Science Artefacts by Using Large Language Models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper