Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

Authors: Chenxi Li, Xianggan Liu, Dake Shen, Yaosong Du, Zhibo Yao, Hao Jiang, Linyi Jiang, Chengwei Cao, Jingzhe Zhang, RanYi Peng, Peiling Bai, Xiande Huang

Published: 2026-03-08

arXiv ID: 2603.07590v1

Added to Library: 2026-03-10 03:01 UTC

Red Teaming

📄 Abstract

Despite the rapid progress of Large Vision-Language Models (LVLMs), the integration of visual modalities introduces new safety vulnerabilities that adversaries can exploit to elicit biased or malicious outputs. In this paper, we demonstrate an underexplored vulnerability via semantic slot filling, where LVLMs complete missing slot values with unsafe content even when the slot types are deliberately crafted to appear benign. Building on this finding, we propose StructAttack, a simple yet effective single-query jailbreak framework under black-box settings. StructAttack decomposes a harmful query into a central topic and a set of benign-looking slot types, then embeds them as structured visual prompts (e.g., mind maps, tables, or sunburst diagrams) with small random perturbations. Paired with a completion-guided instruction, LVLMs automatically recompose the concealed semantics and generate unsafe outputs without triggering safety mechanisms. Although each slot appears benign in isolation (local benignness), StructAttack exploits LVLMs' reasoning to assemble these slots into coherent harmful semantics. Extensive experiments on multiple models and benchmarks show the efficacy of our proposed StructAttack.

🔍 Key Points

Introduction of a new vulnerability in Large Vision-Language Models (LVLMs) pertaining to semantic slot filling, highlighting how benign-seeming prompts can yield harmful outputs.
Proposal of StructAttack, a novel black-box jailbreak framework requiring only a single query to elicit unsafe output from LVLMs by exploiting their reasoning and semantic slot decompositions.
Extensive evaluations of StructAttack across various models and datasets demonstrate its effectiveness, achieving high Attack Success Rates (ASR) compared to existing jailbreak methods and showcasing robustness against defense mechanisms.
Ablation studies confirm the contribution of each component within StructAttack, providing insight into its design and effectiveness.
Exploration of the semantic slot decomposition process reveals the potential for structured visual prompts to trigger LVLMs to generate harmful content without direct malicious input.

💡 Why This Paper Matters

This paper is crucial as it identifies and elucidates significant vulnerabilities in LVLMs that have been overlooked. By developing StructAttack, the authors offer a straightforward yet effective approach to bypassing current safety measures, underlining the urgent need for improved defensive strategies in AI systems. The findings emphasize the complex relationship between benign inputs and the potential for adverse outcomes, thus contributing to the broader discourse on AI safety.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of great interest to AI security researchers as it not only unveils a critical vulnerability in LVLMs but also provides a practical framework for exploiting such vulnerabilities. The implications of StructAttack extend beyond theoretical findings, raising awareness about the ease with which adversaries can manipulate AI systems to produce harmful content. Researchers focused on the security of AI systems can benefit from understanding these vulnerabilities and developing more robust defensive mechanisms.

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper