← Back to Library

AIPsychoBench: Understanding the Psychometric Differences between LLMs and Humans

Authors: Wei Xie, Shuoyoucheng Ma, Zhenhua Wang, Enze Wang, Kai Chen, Xiaobing Sun, Baosheng Wang

Published: 2025-09-20

arXiv ID: 2509.16530v1

Added to Library: 2025-09-23 04:02 UTC

📄 Abstract

Large Language Models (LLMs) with hundreds of billions of parameters have exhibited human-like intelligence by learning from vast amounts of internet-scale data. However, the uninterpretability of large-scale neural networks raises concerns about the reliability of LLM. Studies have attempted to assess the psychometric properties of LLMs by borrowing concepts from human psychology to enhance their interpretability, but they fail to account for the fundamental differences between LLMs and humans. This results in high rejection rates when human scales are reused directly. Furthermore, these scales do not support the measurement of LLM psychological property variations in different languages. This paper introduces AIPsychoBench, a specialized benchmark tailored to assess the psychological properties of LLM. It uses a lightweight role-playing prompt to bypass LLM alignment, improving the average effective response rate from 70.12% to 90.40%. Meanwhile, the average biases are only 3.3% (positive) and 2.1% (negative), which are significantly lower than the biases of 9.8% and 6.9%, respectively, caused by traditional jailbreak prompts. Furthermore, among the total of 112 psychometric subcategories, the score deviations for seven languages compared to English ranged from 5% to 20.2% in 43 subcategories, providing the first comprehensive evidence of the linguistic impact on the psychometrics of LLM.

🔍 Key Points

  • Introduction of AdaptiveGuard, a new guardrail framework designed to dynamically adapt to emerging jailbreak attacks by detecting out-of-distribution (OOD) inputs.
  • Achieves a 96% OOD detection accuracy and retains over 85% F1-score on in-distribution data after adaptation, demonstrating superior performance compared to existing guardrails like LlamaGuard.
  • Utilizes a continual learning approach with Low-Rank Adaptation (LoRA) to enable quick adaptation to novel attacks with minimal re-training requirements, reaching optimal defense success rates in just 2-38 update steps.
  • Provides empirical evidence that AdaptiveGuard can effectively balance defense against new OOD inputs while maintaining knowledge of previously encountered data, addressing the problem of catastrophic forgetting.
  • Releases the AdaptiveGuard model and datasets to support further research and development in adaptive guardrail technologies.

💡 Why This Paper Matters

This paper highlights the significant challenges faced by traditional guardrails in securing LLM-powered applications against evolving threats. By proposing AdaptiveGuard, the authors provide a robust solution capable of dynamic adaptation, which is crucial for maintaining safety in real-world deployment scenarios where threats constantly evolve. The high detection accuracy and minimal performance degradation underscore the practical implications of this work for the field of AI safety and security.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper particularly relevant as it addresses the pressing issue of jailbreak attacks on large language models (LLMs), a concern that is becoming increasingly important as these models are widely deployed. The introduction of AdaptiveGuard represents a significant advancement in the area of adaptive guardrails, showcasing novel methodologies that enhance the resilience and effectiveness of safety mechanisms. The techniques and findings presented could inform future research and development efforts aimed at improving model robustness and security protocols.

📚 Read the Full Paper