---
title: "AI Governance in Practice: Implementing ISO/IEC 42001 for Medical ML"
description: "How ISO/IEC 42001 and EU AI Act principles can be translated into practical controls for medical machine learning systems."
date: "2026-07-04"
tags: [AI Governance, ISO 42001, EU AI Act, Medical AI, Responsible AI]
keywords: [AI governance, ISO IEC 42001, EU AI Act, medical machine learning, responsible AI, model risk management]
image: "/My.jpeg"
imageAlt: "Murat Tut portfolio image"
aiSummary: "This article explains how AI governance principles become concrete software practices in medical machine learning, including role ownership, data quality controls, bias risk, explainability, documentation, and operational monitoring."
---

*How we integrated the ISO/IEC 42001 standard and EU AI Act constraints into a medical machine learning educational platform.*

As machine learning systems transition into clinical diagnostic workflows, they face rigorous regulatory scrutiny:
1. **The EU AI Act (Regulation 2024/1689)**: Classifies medical decision-support software as high-risk, mandating detailed records of data quality, bias audits, and explainability.
2. **ISO/IEC 42001:2023**: Specifies standard controls for establishing an **AI Management System (AIMS)**.

During my work on **DrAI**—an educational ML platform for healthcare professionals funded by the EU Erasmus+ program—I served as the QA and Documentation Lead. My primary goal was to implement the ISO/IEC 42001 standard and the EU AI Act's risk management principles directly into our software design.

Here is how we translated policy constraints into software features.

---

## 1. Governance Roles (ISO 42001 Clause 5.3)

ISO 42001 requires establishing lines of organizational accountability. We mapped Scrum roles directly to compliance roles:

```
            ┌───────────────────────────────────────────────┐
            │          Scrum / Compliance Mapping           │
            ├───────────────────────┬───────────────────────┤
            │      Scrum Role       │   ISO 42001 Role      │
            ├───────────────────────┼───────────────────────┤
            │     Product Owner     │    AI System Owner    │
            │     Tech Lead         │    Technical Lead     │
            │     Developer         │    Data Steward       │
            │     QA Lead (My Role) │    Ethics Reviewer    │
            └───────────────────────┴───────────────────────┘
```

By formalizing these roles, we ensured that every analytical choice—such as data preprocessing configuration or training splits—had a designated owner responsible for its ethical and technical implications.

---

## 2. Managing the Accuracy Paradox (Annex A.6)

A key pillar of ISO 42001 is **Data Quality (Annex A.6.2)**. In clinical datasets, data is often highly imbalanced (e.g. only 6% of patients have active diagnoses). 

A model trained on such data can achieve **94% accuracy** by simply predicting "Negative" for every patient, missing all active cases. This is known as the **Accuracy Paradox**.

To resolve this class imbalance safely, we integrated **SMOTE** (Synthetic Minority Over-sampling Technique) with adaptive parameters to prevent crashes on small minority samples:

```python
from imblearn.over_sampling import SMOTE

def apply_adaptive_smote(X_train, y_train):
    minority_count = sum(y_train == 1)
    if minority_count <= 1:
        return X_train, y_train
        
    # Dynamically scale k-neighbors based on minority sample size
    k = min(5, max(1, minority_count - 1))
    smote = SMOTE(k_neighbors=k, random_state=42)
    return smote.fit_resample(X_train, y_train)
```

We apply SMOTE to the training split *only*, keeping the test split clean to ensure our final metric evaluation represents real-world clinical conditions.

---

## 3. Subgroup Bias Auditing & Performance Quality Gates

Under the EU AI Act (Article 10), high-risk systems must perform bias audits across demographic groups. 

In DrAI, we built statistical validation gates. The auditor segments predictions by demographic attributes (e.g., Age or Gender) and evaluates sensitivity (recall) for each group:

```python
from sklearn.metrics import recall_score

def audit_demographic_bias(y_true, y_pred, demographic_col):
    group_scores = {}
    for group in demographic_col.unique():
        mask = (demographic_col == group)
        # Calculate sensitivity: TP / (TP + FN)
        group_scores[group] = recall_score(y_true[mask], y_pred[mask])
        
    # Trigger high-visibility alert if sensitivity gap exceeds 10 percentage points
    sens_gap = max(group_scores.values()) - min(group_scores.values())
    if sens_gap > 0.10:
        trigger_ui_warning(sens_gap)
        
    return group_scores
```

If the sensitivity gap between subgroups exceeds **10 percentage points**, the system displays a warning banner in the clinical configuration dashboard, blocking deployment until bias is resolved.

---

## 4. Understanding Medical Metrics

To help non-technical clinicians evaluate models safely, we map data science metrics to clinical concepts:

* **Sensitivity (Recall)**: *Patients correctly identified.* Measures the ratio of true positive classifications over actual positives:
  $$\text{Sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$
* **Specificity**: *Healthy patients correctly cleared.* Measures the ratio of true negatives over actual negatives:
  $$\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}$$
* **Positive Predictive Value (PPV/Precision)**: *Likelihood of actual disease on positive result.*
  $$\text{PPV} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$
* **Negative Predictive Value (NPV)**: *Likelihood of being disease-free on negative result.*
  $$\text{NPV} = \frac{\text{TN}}{\text{TN} + \text{FN}}$$

By displaying these formulas alongside concrete patient numbers (e.g., *"This model missed 12 active cancer cases out of 100"*), the system makes AI limitations transparent to medical professionals.

---

## 5. Privacy by Design: Volatile Memory

Medical patient records are subject to strict data storage laws (like KVKK or GDPR). To eliminate data leakage risk, DrAI uses a **zero-persistence** memory architecture:
* **No Database Storage**: Uploaded files are processed in-memory using FastAPI volatile sessions (`SessionData`).
* **Auto-Expiration**: Sessions are deleted after 1 hour of idle time or a 4-hour hard TTL.

---

## Engineering Takeaways

1. **Focus on Sensitivity (Recall)**: In healthcare AI, raw accuracy is a misleading metric. Prioritize sensitivity to avoid the accuracy paradox and minimize missed diagnoses.
2. **Audit demographic subgroups**: A model can look accurate on average while exhibiting significant bias against specific subgroups. Program automated subgroup validation gates.
3. **Minimize data retention**: Enforce volatile, self-expiring memory structures for processing PII to simplify regulatory compliance.
