EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model

📄 Abstract

The performance of speech emotion recognition (SER) is limited by the insufficient emotion information in unimodal systems and the feature alignment difficulties in multimodal systems. Recently, multimodal large language models (MLLMs) have made progress in SER. However, MLLMs still suffer from hallucination and misclassification problems in complex emotion reasoning. To address these problems, we propose an MLLM-based framework called EmoQ, which generates query embeddings that fuse multimodal information through an EmoQ-Former and uses multi-objective affective learning (MAL) to achieve co-optimization. The framework also provides a soft-prompt injection strategy to inject multimodal representations into the LLM. This end-to-end architecture achieves state-of-the-art performance on the IEMOCAP and MELD datasets, providing a new multimodal fusion paradigm for SER.

🔍 Key Points

Introduction of SilentStriker, the first stealthy Bit-Flip Attack (BFA) targeting Large Language Models (LLMs) that achieves significant task degradation with minimal compromise to output naturalness.
The development of a token-based loss function that balances the dual objectives of attack effectiveness (performance degradation) and naturalness (output fluency).
Implementation of an iterative, progressive search strategy to identify optimal parameters for attack, maximizing efficiency and stealthiness in hardware-level adversarial techniques.
Demonstration of the effectiveness of SilentStriker through extensive experiments on various LLM models, achieving superior results compared to existing BFA methods like GenBFA and PrisonBreak.

💡 Why This Paper Matters

The paper demonstrates a critical step in understanding and exploiting vulnerabilities in large language models, particularly in contexts where their integrity can affect important decisions. As LLMs become increasingly adopted in sensitive domains, the proposed SilentStriker attack highlights the urgent need for improved security measures in AI systems, making this research highly relevant.

🎯 Why It's Interesting for AI Security Researchers

This paper is crucial for AI security researchers as it explores a less-discussed dimension of model vulnerabilities—hardware-based attacks. The methodology presented not only advances the understanding of BFA but also challenges existing security paradigms, requiring researchers to rethink the defenses against such stealthy and effective attacks on LLMs.

EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper