Missingness as a Fairness Issue in Machine Learning: Who Gets Modeled, and Who Gets Left Behind
Executive Summary
Fairness discussions in machine learning usually begin after the model is built.
They focus on:
- subgroup performance,
- calibration parity,
- error rates by demographic category.
But long before a model can be unfair, something more basic happens:
Some people are measured. Some people are partially measured. Some people are not measured at all.
Missing data is not just a statistical inconvenience. It is often the first fairness filter in the pipeline.
This post argues that missingness is a fairness issue upstream of the algorithm, and that ethical ML requires confronting who gets represented, who gets excluded, and whose uncertainty is acknowledged (Rajkomar et al. 2018; Martı́nez-Plumed et al. 2021; Zhang and Long 2021).
Fairness Does Not Start at the Loss Function
Most fairness frameworks assume:
- complete data,
- stable labels,
- equal opportunity to be observed.
Clinical and operational data violate all three.
Before asking:
“Is the model fair?”
We must ask:
“Who made it into the dataset, and under what conditions?”
That question is central because fairness metrics are only as informative as the population that survived measurement, documentation, and preprocessing (Barocas et al. 2023; Suresh and Guttag 2021).
Missingness Is Rarely Random Across Groups
Data is more likely to be missing for people who:
- are critically ill,
- move frequently across care settings,
- receive care under time pressure,
- are treated in under-resourced environments,
- face language or access barriers.
data %>%
group_by(group_variable) %>%
summarise(
n = n(),
pct_missing = mean(is.na(key_predictor))
)When missingness correlates with group membership, dropping missing data induces structural bias.
No fairness metric downstream can undo this, because the bias entered before the model ever saw a loss function (Martı́nez-Plumed et al. 2021; Zhang and Long 2021).
Exclusion Is a Form of Representation Bias
When we exclude observations with missing data, we are not just reducing noise.
We are deciding:
- whose outcomes matter,
- whose experiences shape the model,
- whose uncertainty is tolerated.
This creates a feedback loop:
- under-measured groups are excluded,
- models work best where data is clean,
- deployment favors already well-measured populations.
This is not algorithmic bias. It is representation bias driven by missingness (Suresh and Guttag 2021; Gianfrancesco et al. 2018).
Clean Data Is Often Privileged Data
Data completeness often reflects:
- staffing levels,
- institutional resources,
- time availability,
- workflow stability.
“High-quality data” is frequently a proxy for system advantage.
Models trained only on clean data may:
- overestimate performance,
- underestimate risk in marginalized settings,
- silently fail where stakes are highest (Rajkomar et al. 2018; Obermeyer et al. 2019; Seyyed-Kalantari et al. 2021).
Fairness Metrics Can Mask Missingness Bias
It is possible for a model to:
- pass fairness audits,
- satisfy parity metrics,
- appear well-calibrated,
while still being unfair — because entire subpopulations were never fully represented.
Fairness metrics evaluate errors conditional on inclusion.
Missingness determines who gets included. A model can satisfy parity conditions among included observations while still inheriting unfairness from upstream exclusion (Zhang and Long 2021; Martı́nez-Plumed et al. 2021).
Missingness-Aware Modeling Is a Fairness Intervention
Fairness is not only about constraints. It is about uncertainty allocation.
Ethically preferable models:
- widen uncertainty where data is sparse,
- avoid confident predictions for under-measured groups,
- explicitly model missingness where informative.
data <- data %>%
mutate(
predictor_missing = is.na(predictor)
)This does not “fix” fairness. It makes inequity visible, which is often the first ethically necessary step in a defensible deployment workflow (Rojas et al. 2022; Rose et al. 2023).
Hierarchy and Fairness Interact Through Missingness
In hierarchical systems, missingness often clusters by:
- site,
- service,
- geography,
- time.
data %>%
group_by(site) %>%
summarise(
pct_missing = mean(is.na(predictor))
)Partial pooling can:
- stabilize estimates,
- but also shrink under-measured groups toward the majority.
This is not wrong. But it must be acknowledged.
Fairness requires knowing who is borrowing strength from whom. Partial pooling can stabilize estimates, but it can also conceal asymmetric data sparsity if the missingness pattern is ignored (Gelman et al. 2013; Little and Rubin 2019).
MNAR Is Often a Fairness Signal
MNAR missingness frequently reflects:
- escalation of care,
- severity beyond routine measurement,
- system overload.
Treating MNAR as a nuisance:
- privileges stable care pathways,
- penalizes patients in crisis.
Sensitivity analysis becomes a fairness tool when it asks:
How different could conclusions be for those we barely measured?
That question matters especially when missingness is plausibly informative rather than ignorable (Little and Rubin 2019; National Research Council 2010).
Ethical Deployment Requires Fairness Documentation
An audit-ready, fairness-aware model documents:
- missingness rates by subgroup,
- exclusion counts and reasons,
- uncertainty differences across groups,
- sensitivity of conclusions to missing data assumptions.
Example language:
Missingness rates differed substantially across care settings and patient subgroups, reflecting workflow and resource constraints. Model outputs should therefore be interpreted with greater uncertainty in under-measured contexts. We report sensitivity analyses to bound potential impacts and avoid overconfident use in populations with limited representation.
This is not a disclaimer. It is responsible deployment guidance—the kind of documentation needed if fairness claims are to survive operational scrutiny (Sendak et al. 2020; Barocas et al. 2023).
Fairness Is About Who Bears the Risk of Uncertainty
When uncertainty is hidden:
- marginalized groups bear the risk.
When uncertainty is explicit:
- decision-makers can respond appropriately.
Ethical ML does not promise equal accuracy. It promises honest uncertainty.
A Fairness-Oriented Missingness Checklist
Before deployment, ask:
- Which groups have higher missingness?
- Who was excluded, and why?
- Does the model express higher uncertainty where data is sparse?
- Would decisions differ if missingness were handled differently?
- Have we documented this clearly?
If these questions are unanswered, fairness claims are incomplete.
If prehospital documentation quality varies systematically by theater of operation, unit type, or provider training level, then models that treat missingness as ignorable will perform better for patients with complete records and worse for those whose care was delivered under the most austere conditions — exactly the inverse of what a fair triage support system should do. A model trained on DoDTR data will implicitly learn from better-documented casualties, then be deployed in forward settings where documentation is sparse, with no mechanism to flag that its confidence estimates are no longer calibrated for that population. This is not statistical noise; it is a fairness failure embedded in the data generation process, upstream of any modeling decision. Auditing the model for bias without auditing the missingness structure of the training data will miss the problem entirely.
Closing: Fairness Begins Before the Model
Machine learning systems do not become unfair only at prediction time.
They become unfair when:
- data is missing unevenly,
- exclusions are silent,
- uncertainty is suppressed,
- and clean data is mistaken for neutral data.
Missingness is not just a statistical detail. It is a moral boundary in the modeling pipeline.
Ethical ML starts by asking:
Who gets measured — and who doesn’t?
This post is part of the Missing Data Toolkit — a companion reference with fairness-aware missing data frameworks, subgroup representation diagnostics, imputation bias templates, and equity audit scaffolds.
Series Callout
This post is part of a broader Ethics in Trauma Registry Analysis Series:
- Opacity Is Sometimes Ethical: When Black Boxes Save Lives
- Accountability Without Interpretability: Who Owns a Model’s Decision?
- Bias Isn’t Always Where You Think It Is: Ethical Failure Modes in Registry Data
- Prediction vs Responsibility: Why Risk Scores Can Be Ethically Dangerous
- Human-in-the-Loop Is Not a Panacea (and Sometimes a Lie)
- The Ethical Implications of Excluding “Messy” Patients
- Missingness as a Fairness Issue in Machine Learning
- You Can’t Trust What You Don’t Track: AI Performance Monitoring in Clinical Systems
- From Weeks to Minutes: The Ethics of Automating CPG Compliance
- Ontology Is Not Optional: Semantic Infrastructure as Ethical Foundation
- What Responsible AI in Clinical Guidance Actually Requires
- Modernizing the DOD Trauma Registry: An Ethical and Technical Imperative