LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface

📄 Abstract

Textual Large Language Models (LLMs) provide a simple and familiar interface: a string of text is used for both input and output. However, the information conveyed to an LLM often has a richer structure and semantics, which is not conveyed in a string. For example, most prompts contain both instructions ("Summarize this paper into a paragraph") and data (the paper to summarize), but these are usually not distinguished when passed to the model. This can lead to model confusion and security risks, such as prompt injection attacks. This work addresses this shortcoming by introducing an LLM-native mark-up language, LLMON (LLM Object Notation, pronounced "Lemon"), that enables the structure and semantic metadata of the text to be communicated in a natural way to an LLM. This information can then be used during model training, model prompting, and inference implementation, leading to improvements in model accuracy, safety, and security. This is analogous to how programming language types can be used for many purposes, such as static checking, code generation, dynamic checking, and IDE highlighting. We discuss the general design requirements of an LLM-native markup language, introduce the LLMON markup language and show how it meets these design requirements, describe how the information contained in a LLMON artifact can benefit model training and inference implementation, and provide some preliminary empirical evidence of its value for both of these use cases. We also discuss broader issues and research opportunities that are enabled with an LLM-native approach.

🔍 Key Points

SLMs are found to be significantly more vulnerable to jailbreak attacks than LLMs, with the proposed empirical evaluation demonstrating this vulnerability using nine jailbreak attack strategies across various models.
The introduction of GUARD-SLM represents a novel approach to defense that relies on lightweight token activation analysis in the representation space of SLMs, detecting malicious prompts with high accuracy and low computational overhead.
The study showcases that hidden representations at multiple layers contain discernible patterns for different input types, providing a foundation for effective activation-space adversarial prompt filtering.
Extensive experimental validation on multiple SLMs and LLMs highlights the robustness of GUARD-SLM, achieving near-zero jailbreak success rates for various attack categories while ensuring real-time performance.
The findings reveal that layered analysis of internal representations can lead to better understanding and defenses against jailbreaking, emphasizing the need for layered-based intrusion detection in language models.

💡 Why This Paper Matters

This paper is crucial as it addresses the growing concern of jailbreak attacks on small language models, a domain often overshadowed by larger models. By presenting a novel defense mechanism in GUARD-SLM that leverages token activation analysis, the research provides a practical and scalable solution to enhance the security of language models, particularly in resource-constrained environments. The insights gained through layer-wise sensitivity analysis also contribute significantly to the overall understanding of model vulnerabilities, making this work relevant not only for immediate defensive strategies but also for future research in model safety.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper particularly interesting as it sheds light on vulnerabilities specific to small language models, which are increasingly deployed in various applications. The innovative method of using internal layer activations for prompt filtering represents a significant shift towards proactive security measures in AI systems. Furthermore, as adversarial techniques like jailbreak attacks evolve, understanding and improving defenses becomes imperative, positioning this research as a valuable contribution to the field of AI safety and security.

LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper