← Back to Library

DataGovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows

Authors: Zhou Liu, Zhaoyang Han, Guochen Yan, Hao Liang, Bohan Zeng, Xing Chen, Yuanfeng Song, Wentao Zhang

Published: 2025-12-04

arXiv ID: 2512.04416v2

Added to Library: 2025-12-09 03:03 UTC

Risk & Governance

📄 Abstract

Data governance ensures data quality, security, and compliance through policies and standards, a critical foundation for scaling modern AI development. Recently, large language models (LLMs) have emerged as a promising solution for automating data governance by translating user intent into executable transformation code. However, existing benchmarks for automated data science often emphasize snippet-level coding or high-level analytics, failing to capture the unique challenge of data governance: ensuring the correctness and quality of the data itself. To bridge this gap, we introduce DataGovBench, a benchmark featuring 150 diverse tasks grounded in real-world scenarios, built on data from actual cases. DataGovBench employs a novel "reversed-objective" methodology to synthesize realistic noise and utilizes rigorous metrics to assess end-to-end pipeline reliability. Our analysis on DataGovBench reveals that current models struggle with complex, multi-step workflows and lack robust error-correction mechanisms. Consequently, we propose DataGovAgent, a framework utilizing a Planner-Executor-Evaluator architecture that integrates constraint-based planning, retrieval-augmented generation, and sandboxed feedback-driven debugging. Experimental results show that DataGovAgent significantly boosts the Average Task Score (ATS) on complex tasks from 39.7 to 54.9 and reduces debugging iterations by over 77.9 percent compared to general-purpose baselines.

🔍 Key Points

  • Introduction of DataGovBench, a hierarchical benchmark for automated data governance with 150 diverse tasks based on real-world scenarios.
  • Proposal of the DataGovAgent framework, which utilizes a Planner-Executor-Evaluator architecture to enhance task execution efficiency and accuracy in data governance workflows.
  • Experimental results demonstrating that DataGovAgent significantly outperforms existing models, improving Average Task Score from 39.7 to 54.9 and reducing debugging iterations by 77.9% compared to general-purpose baselines.
  • Utilization of a novel 'reversed-objective' methodology for noise synthesis, addressing the challenge of ensuring data correctness and quality within governance tasks.
  • Highlighting the gap in existing benchmarks that do not adequately assess the unique challenges posed by data governance, which includes ensuring data quality and compliance.

💡 Why This Paper Matters

This paper is highly relevant as it provides critical advancements in the field of automated data governance. By establishing a comprehensive benchmarking framework and introducing a tailored agent architecture, it lays a solid foundation for future research and applications in AI-driven data management, ensuring higher quality and more reliable data governance processes essential for organizations leveraging AI technologies.

🎯 Why It's Interesting for AI Security Researchers

The findings from this paper are especially pertinent for AI security researchers as effective data governance is integral to ensuring data integrity and compliance with legal standards. The methodologies proposed can be applied to enhance data security measures, prevent data breaches or compliance violations, and build trust in AI systems that are increasingly reliant on high-quality datasets. Furthermore, the introduced metrics and frameworks may guide researchers in evaluating and improving the resilience of AI models against adversarial data manipulations.

📚 Read the Full Paper