← Back to Library

ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

Authors: Chaoyue He, Xin Zhou, Yi Wu, Xinjia Yu, Yan Zhang, Lei Zhang, Di Wang, Shengfei Lyu, Hong Xu, Xiaoqiao Wang, Wei Liu, Chunyan Miao

Published: 2025-06-02

arXiv ID: 2506.01646v1

Added to Library: 2025-06-04 04:04 UTC

Risk & Governance

📄 Abstract

We introduce ESGenius, a comprehensive benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social and Governance (ESG) and sustainability-focused question answering. ESGenius comprises two key components: (i) ESGenius-QA, a collection of 1 136 multiple-choice questions generated by LLMs and rigorously validated by domain experts, covering a broad range of ESG pillars and sustainability topics. Each question is systematically linked to its corresponding source text, enabling transparent evaluation and supporting retrieval-augmented generation (RAG) methods; and (ii) ESGenius-Corpus, a meticulously curated repository of 231 foundational frameworks, standards, reports and recommendation documents from seven authoritative sources. Moreover, to fully assess the capabilities and adaptation potential of the model, we implement a rigorous two-stage evaluation protocol -- Zero-Shot and RAG. Extensive experiments across 50 LLMs (ranging from 0.5 B to 671 B parameters) demonstrate that state-of-the-art models achieve only moderate performance in zero-shot settings, with accuracies typically around 55--70\%, highlighting ESGenius's challenging nature for LLMs in interdisciplinary contexts. However, models employing RAG show significant performance improvements, particularly for smaller models. For example, "DeepSeek-R1-Distill-Qwen-14B" improves from 63.82\% (zero-shot) to 80.46\% with RAG. These results underscore the necessity of grounding responses in authoritative sources for enhanced ESG understanding. To the best of our knowledge, ESGenius is the first benchmark curated for LLMs and the relevant enhancement technologies that focuses on ESG and sustainability topics.

🔍 Key Points

  • Introduction of ESGenius, the first comprehensive benchmark dedicated to evaluating LLM proficiency in Environmental, Social, and Governance (ESG) knowledge.
  • Creation of ESGenius-QA with 1,136 multiple-choice questions, rigorously validated by domain experts, linked to authoritative sources for enhanced transparency and responsible AI use.
  • Development of ESGenius-Corpus, a collection of 231 documents encompassing crucial ESG frameworks and standards from trusted sources, facilitating efficient retrieval during performance evaluations.
  • Implementation of a two-stage evaluation protocol (Zero-Shot and RAG) to assess LLMs across different parameter sizes, revealing significant performance gaps and improvements with RAG techniques.
  • Open source initiative promoting community collaboration by providing full access to the ESGenius dataset, experimental code, and evaluation tools.

💡 Why This Paper Matters

The introduction of ESGenius fills a crucial gap in the evaluation of LLMs in the interdisciplinary fields of ESG and sustainability, providing a standardized method to assess and enhance AI models' understanding of complex environmental, social, and governance issues. By linking questions to authoritative sources and employing advanced evaluation techniques, ESGenius promotes the development of responsible AI systems that can contribute meaningfully to sustainability-focused initiatives.

🎯 Why It's Interesting for AI Security Researchers

The paper is of great interest to AI security researchers as it emphasizes the importance of grounding AI responses in authoritative sources, mitigating risks associated with misinformation and misinterpretation in high-stakes domains such as ESG. The benchmarking of LLMs against rigorous standards also highlights the potential vulnerabilities of AI models, encouraging the development of more robust systems that adhere to ethical guidelines and accuracy requirements critical in areas affecting public interest and corporate responsibility.

📚 Read the Full Paper