EU AI Act Article 10: Data Governance Checklist

EU AI Act Article 10: Data Governance Checklist

Key Takeaways

  • Article 10 requires documented data governance for all high-risk AI systems
  • Only 28% of organizations feel prepared for EU AI Act compliance (Forrester 2024)
  • Non-compliance can result in fines up to 3% of global turnover or €15 million
  • High-risk systems must comply by June 2026 (24 months after entry into force)
  • Implementation typically adds 10-15% to AI project budgets

Only 28% of organizations feel “highly” or “very highly” prepared for the EU AI Act. If you’re an ML engineer at a Fortune 500 company, that statistic should concern you. The EU AI Act is not a legal abstraction—it’s a set of engineering requirements that will reshape how you build, document, and monitor AI systems.

Article 10 sits at the heart of this regulation. It mandates specific data governance practices for any “high-risk” AI system. This isn’t about legal compliance for its own sake. It’s about building AI systems that are auditable, reproducible, and trustworthy.

This guide translates Article 10 into an engineering checklist. No legal jargon—just the technical controls, documentation, and monitoring you need to implement.

What Article 10 Actually Requires

Article 10 of the EU AI Act addresses “Data and data governance” for high-risk AI systems. The full regulatory text spans six paragraphs, but here’s what it means for your engineering team:

RequirementArticle ReferenceEngineering Implication
Appropriate data governance and management practicesArt. 10(2)Implement a version-controlled, auditable data pipeline. Document every step from data sourcing to pre-processing using tools like DVC and MLflow. Create a “Data Governance Charter” for each project.
Relevant design choices and data collection processesArt. 10(2)(a-b)Create and maintain a “Datasheet for Datasets.” This document must detail the provenance of the data, the collection methodology, and the rationale for its use.
Data-preparation processing operationsArt. 10(2)(c)Automate and log all data transformation steps. Implement quality assurance workflows for annotation and labelling, including inter-annotator agreement (IAA) metrics. Use data quality tools like Great Expectations to validate data at each stage.
Examination in view of possible biasesArt. 10(2)(f)Integrate bias detection libraries (e.g., Fairlearn, AIF360) as a mandatory CI/CD step in the training pipeline. Analyze data for demographic parity, equal opportunity, and other fairness metrics.
Data sets must be relevant, representative, free of errors and completeArt. 10(3)Implement automated data validation and profiling to check for errors, missing values, and distributional drift. The data must statistically reflect the production environment and target user population.
Account for specific geographical, behavioural or functional settingsArt. 10(4)Your training data must include representative samples for all target markets/contexts. Use stratified sampling and data augmentation techniques. This must be explicitly tested and documented.
Handling of third-party dataArt. 10(6)For any external datasets, you are responsible for their compliance. This requires technical due diligence: demand datasheets, audit their collection practices, and run your own quality and bias checks before integration.

Enforcement Timeline

The EU AI Act follows a phased rollout. Here’s what matters for Article 10 compliance:

  • June 2024: The AI Act entered into force (20 days after publication)
  • January 2025: Prohibited AI practices banned (6 months)
  • June 2025: General-purpose AI model rules apply (12 months)
  • June 2026: High-risk AI system rules apply, including Article 10 (24 months)

If you’re building a high-risk AI system today, you have until June 2026 to implement compliant data governance. That sounds like plenty of time until you factor in the scope of changes required.

The Engineering Checklist

Here’s what a compliant data pipeline looks like, mapped to specific Article 10 requirements:

Pipeline Architecture Components

Your data pipeline needs these components to satisfy Article 10:

  1. Data Lake/Warehouse with strict Access Controls (e.g., AWS S3 with IAM, Snowflake)
  2. Data Ingestion Layer with source and version logging
  3. Automated Data Quality & Validation Gate (e.g., Great Expectations)
  4. Data Transformation & Pre-processing Layer (logged and versioned)
  5. Bias & Fairness Analysis Step (e.g., Fairlearn on pre-processed data)
  6. Data Versioning System (e.g., DVC) to snapshot training, validation, and test sets
  7. Feature Store for governed, reusable features
  8. Experiment Tracking System (e.g., MLflow) to link data versions to model artifacts
  9. Documentation Generator that produces Datasheets/Model Cards from pipeline metadata

Required Documentation

Article 10 compliance requires these artifacts:

  • Datasheets for Datasets: A living document for each dataset covering its motivation, composition, collection process, preprocessing, and distribution
  • Data Governance Plan: Outlines the policies, roles (e.g., Data Stewards), and procedures for managing the data lifecycle
  • Bias Assessment Report: Documents the fairness metrics used, the groups analyzed, the results of the bias scan, and the mitigation steps taken
  • Data Lineage Graph: A visual and machine-readable representation of data flow from source to model
  • Technical Documentation (per Art. 11): An umbrella document containing all of the above, required for the conformity assessment

Technical Controls

Implement these controls across your pipeline:

  • Role-Based Access Control (RBAC) for all data assets
  • Immutable Data Versioning: Use content-addressable storage or tools like DVC to ensure training data cannot be altered
  • Automated PII/Sensitive Data Scanning and masking in pre-processing pipelines
  • CI/CD Quality Gates: Pipeline fails if data quality checks or bias thresholds are not met
  • Audit Logging: Every access, transformation, and use of data must be logged in an immutable ledger

Monitoring Requirements

Article 10 compliance is not a one-time certification. You need ongoing monitoring:

  • Data Quality Monitoring: Continuously monitor production data streams for schema changes, errors, and completeness
  • Data Drift Detection: Track statistical drift between training data and live inference data to know when retraining is necessary
  • Bias Monitoring: Periodically re-run fairness assessments on production data to ensure the model’s behavior hasn’t become biased over time
  • Incident Response Plan: A documented procedure for what to do when a data quality or bias issue is detected in production

What Auditors Look For

When regulators or auditors assess your Article 10 compliance, they’re looking for evidence of systematic data governance. Here’s what raises red flags:

Auditor Red Flags

  • Inability to produce the exact version of the dataset used to train a specific model version
  • Missing documentation on why certain data sources were chosen and others were rejected
  • Vague or non-existent records of data cleaning and transformation steps
  • No quantitative evidence of bias examination across relevant demographic groups
  • Lack of a clear data retention policy and procedure for handling data subject requests (e.g., deletion)

Real-World Failure Examples

These cases illustrate what happens when data governance fails:

Clearview AI: Fined by multiple EU data protection authorities (France’s CNIL, UK’s ICO) for scraping billions of images from the web without a legal basis to train a facial recognition model. This is a direct violation of data provenance and collection principles now codified in Article 10. Fines totaled tens of millions of euros, with orders to delete data of EU citizens.

Amazon Recruiting AI: A historical (2018) but highly relevant case where an internal AI recruiting tool was found to be biased against women. The model was trained on historical hiring data, which reflected existing societal biases. This exemplifies the risk described in Art. 10(2)(f). The project was scrapped, highlighting the reputational and financial cost of rectifying bias after development.

i-PRO Cameras: The city of Utrecht (Netherlands) banned the use of i-PRO’s AI cameras for crowd analysis due to concerns about demographic bias (e.g., misclassifying gender, age). This shows that deployers and the public are becoming sensitive to these issues, leading to market access problems.

Common Mistakes

Avoid these common Article 10 preparation errors:

  1. Data Archaeology: Trying to document data provenance and quality after the model is built, which is nearly impossible and always incomplete
  2. One-and-Done Bias Check: Running a single fairness report before deployment and never looking at it again, ignoring post-deployment data drift and feedback loops
  3. Ignoring Upstream Data: Assuming data received from another team or a vendor is compliant without conducting independent validation and quality checks
  4. Tool-Fixation: Believing that buying a “compliance tool” is sufficient without integrating it into a robust governance process and engineering culture
  5. Treating Unstructured Data Differently: Applying less rigor to the governance of unstructured data (images, text, audio) compared to structured data

Cost and Timeline Reality

Article 10 compliance has real costs. Here’s what to budget:

Cost Estimates

Industry analysts estimate that robust AI governance and compliance can add 10-15% to the total budget of a high-risk AI project. This spans:

  • Tooling: Data governance platforms, monitoring solutions
  • Personnel: ML Compliance Engineers, Data Stewards
  • Consulting: Legal and technical advisory

The cost of non-compliance is higher: beyond fines (up to 3% of turnover for Article 10 violations), you face reputational damage, loss of EU market access, and the cost of mandatory system redesign or withdrawal.

Timeline Estimates

How long does compliance implementation take?

  • Small organization (single high-risk system, mature data culture): 6-9 months
  • Medium organization (portfolio of models, establishing governance frameworks): 9-18 months
  • Large enterprise (multiple business units, full-scale transformation): 18-36 months

Team Requirements

A dedicated “AI Governance” pod of 3-5 people is common for a portfolio of high-risk systems, working with multiple ML engineering teams:

  • ML Compliance Engineer: Implements the technical controls within the MLOps pipeline
  • Data Steward: Owns the quality, documentation, and lifecycle of specific datasets
  • AI Governance Manager: Oversees the entire compliance program and interfaces with legal/risk teams

Build vs. Buy

Should you build compliance infrastructure in-house or buy from vendors?

Recommendation: A hybrid “Buy-and-Integrate” approach is most effective.

Buy a foundational data infrastructure platform that provides core governance, lineage, and sovereignty capabilities. Integrate specialized open-source tools (like Great Expectations, Fairlearn) for specific tasks like quality testing and bias detection.

Key factors:

  • Time-to-Compliance: Buying a platform is significantly faster than building a full governance suite from scratch
  • Core Competency: Your team’s expertise is in building models, not building compliance infrastructure. Focus on what creates business value.
  • Maintainability: Commercial platforms are maintained and updated by the vendor to keep pace with evolving regulations and standards
  • Flexibility: A hybrid approach avoids vendor lock-in and allows you to use the best tool for each specific job

Existing Tools

Several tools can help you implement Article 10 requirements:

ToolTypeKey Compliance Features
Databricks Unity CatalogData Governance PlatformCentralized data discovery, fine-grained access control, automated data lineage tracking
IBM Watson OpenScaleAI ObservabilityBias and fairness monitoring, drift detection, explainability (LIME, SHAP)
Great ExpectationsOpen Source Data QualityAutomated data validation and testing, data documentation generation, pipeline integration for quality gates
DVC (Data Version Control)Open Source Data VersioningGit-based versioning for large datasets, reproducible ML pipelines, connects data versions to code and model versions
FairlearnOpen Source Fairness ToolkitBias assessment dashboards, bias mitigation algorithms, quantifies fairness metrics across subgroups

Next Steps

Article 10 compliance is not optional for high-risk AI systems operating in the EU. The June 2026 deadline will arrive faster than you expect, and retrofitting compliance is far more expensive than building it in from the start.

Start with an audit of your current data pipeline. Identify the gaps between what you have and what Article 10 requires. Prioritize the documentation and technical controls that will take the longest to implement.

The organizations that treat this as an engineering challenge—not just a legal checkbox—will build AI systems that are not only compliant but genuinely more reliable and trustworthy.


Sources:

Frequently Asked Questions

When does EU AI Act Article 10 come into force?
The EU AI Act entered into force in June 2024. High-risk AI system rules (including Article 10) apply from June 2026, giving organizations 24 months to prepare.
What are the penalties for Article 10 non-compliance?
Non-compliance with Article 10 for high-risk systems can result in fines up to €15 million or 3% of global annual turnover, whichever is higher.
Does Article 10 apply to my AI system?
Article 10 applies to high-risk AI systems as defined in Annex III, including systems used in biometric identification, critical infrastructure, employment, education, and law enforcement.
How long does Article 10 compliance take to implement?
Implementation timelines vary: 6-9 months for small organizations with a single high-risk system, 9-18 months for medium organizations, and 18-36 months for large enterprises with multiple business units.

Need Help with EU AI Act Compliance?

YPAI provides sovereign AI data infrastructure designed for regulatory compliance.