Bias Isn’t Always Where You Think It Is: Ethical Failure Modes in Registry Data

Ethics in Trauma Registry Analysis

How ethical failure in registry analytics often begins upstream in measurement, abstraction, exclusion, and governance.

Published

November 1, 2024

Modified

June 9, 2026

Executive Summary

When people talk about bias in machine learning, they usually point to:

algorithms,
features,
or model outputs.

In registry-based analytics, that focus is often misplaced.

The most consequential biases are usually:

upstream,
structural,
invisible in tidy datasets,
and reinforced by “clean” preprocessing.

This post explains where bias actually enters registry data, why it is ethically dangerous, and how to confront it honestly in applied analyses.

That framing is consistent with recent work emphasizing that harms in machine learning often arise across the entire data and deployment pipeline, not only inside the final model (Suresh and Guttag 2021; Barocas et al. 2023).

The Myth: Bias Is a Model Problem

A common narrative goes like this:

“If the model is biased, we can fix it with better algorithms.”

In registry data, this is rarely true.

Most bias is introduced by:

who gets measured,
when they get measured,
how data is recorded,
and which records survive abstraction.

In healthcare AI, that pattern has been documented repeatedly, including cases where widely used algorithms encoded access-to-care or documentation patterns rather than underlying need (Obermeyer et al. 2019; Seyyed-Kalantari et al. 2021).

By the time modeling begins, many ethical failures are already locked in.

Registries Reflect Systems, Not Populations

Registries are not neutral mirrors of reality.

They reflect:

clinical workflows,
documentation priorities,
resource constraints,
institutional incentives,
and historical practice patterns.

This means registry data is selectively observed reality, not population truth.

Treating it otherwise is an ethical error.

Selection Bias: Who Enters the Registry (and Who Doesn’t)

The first ethical filter happens at inclusion.

Ask:

Who qualifies for registry entry?
Who gets excluded due to rapid death?
Who is transferred before documentation?
Who lacks complete identifiers?

These exclusions are rarely random.

Dropping “incomplete” records often removes:

the sickest,
the most unstable,
the most marginalized.

That’s not data cleaning — it’s population distortion.

Measurement Bias: What Gets Recorded Under Pressure

In high-stress environments:

not all variables are equally observable,
absence of data often reflects urgency, not neglect.

Examples:

labs missing because resuscitation took priority,
vitals missing during transport,
assessments skipped due to instability.

Treating missingness as noise erases clinical reality.

Documentation Bias: What the System Rewards

Registries inherit incentives from:

billing,
quality reporting,
regulatory definitions,
performance metrics.

This shapes:

which diagnoses are emphasized,
how severity is coded,
what counts as an “event.”

Variables that look objective often encode institutional priorities, not physiology.

Survivorship Bias: The Quietest Failure Mode

Many registry analyses implicitly condition on survival:

final severity scores,
discharge diagnoses,
completed interventions.

Patients who die early often:

have fewer measurements,
less detailed documentation,
lower apparent severity in structured fields.

This makes aggressive care look safer than it is — ethically dangerous.

Temporal Bias: Using the Future to Explain the Past

A subtle but common error:

using adjudicated or final values to explain early decisions.

Examples:

final AIS to model early triage,
discharge diagnoses to assess ED decision-making.

This leaks future information backward and:

flatters models,
misrepresents decision contexts,
and distorts accountability.

“Fairness” Metrics Can Miss the Real Problem

Post-hoc fairness checks often ask:

does performance differ by group?

But registry bias often comes from:

who is missing entirely,
whose data is thin,
whose outcomes are delayed or obscured.

You cannot fix exclusion bias with reweighting.

Ethics requires confronting what isn’t there.

How Bias Gets Amplified by “Good” Modeling Practice

Ironically, best practices can amplify bias when applied blindly:

complete-case analysis,
aggressive feature selection,
harmonization without provenance,
collapsing time and context.

Each step can quietly favor:

well-documented patients,
stable workflows,
high-resource settings.

Clean data is not always ethical data.

What Ethical Analysis Actually Requires

Ethical registry analysis demands:

explicit cohort definitions,
counts of excluded records (and why),
missingness patterns by subgroup,
sensitivity analyses,
acknowledgment of structural limitations.

This is not weakness.
It is intellectual honesty.

Practical Checks for Ethical Failure Modes

Before modeling, ask:

Who is systematically missing?
Which variables fail under stress?
Where does documentation thin out?
Which patients have the shortest timelines?
What data was revised after the fact?

library(naniar)

vis_miss(data)

Patterns tell stories. Ignoring them silences voices.

Bias Is Often a Governance Problem

Many ethical failures trace back to:

unclear data standards,
inconsistent abstraction rules,
undocumented revisions,
lack of provenance tracking.

These are governance issues, not modeling flaws.

That is why fairness work that begins only at the modeling stage is often already too late for the most important harms (Barocas et al. 2023; Suresh and Guttag 2021).

You can’t fairness-adjust your way out of them.

Where This Shows Up in AI/ML

Automation bias — the tendency to defer to algorithmic recommendations even when clinical judgment conflicts — has been documented in sepsis and deterioration alert studies, and it is amplified in high-tempo military trauma settings where cognitive load is extreme: a triage decision support tool wrong 15% of the time in a battalion aid station produces systematic errors, not random ones, because every provider under pressure will err toward the algorithm. Mission creep occurs when a mortality prediction model validated for ICU resource planning gets repurposed for individual triage decisions, a use case with different performance requirements and ethical stakes it was never designed to meet. Feedback loops emerge when model outputs change clinical behavior, which changes the outcome distribution, which degrades the model’s calibration — with no automatic mechanism to detect the drift. Each of these failure modes is predictable, which means deploying a system without explicitly designing against them is not an oversight but a choice.

Closing: Ethics Begins Before the First Line of Code

Bias in registry data is rarely malicious. It is structural, historical, and systemic.

But ignoring it — or pretending models can “fix” it — is an ethical choice.

Responsible analysis does not ask:

“Is the model biased?”

It asks:

“Whose reality does this data represent — and whose does it miss?”

That question belongs at the beginning of every analysis.

📚 Go Deeper: Prediction Modeling Toolkit

This post is part of the Prediction Modeling Toolkit — a companion reference with bias detection templates, registry data quality checklists, ethical failure mode analysis scaffolds, and measurement bias diagnostics.

→ Open the Prediction Modeling Toolkit

Series Callout

Note

This post is part of a broader Ethics in Trauma Registry Analysis Series:

Opacity Is Sometimes Ethical: When Black Boxes Save Lives
Accountability Without Interpretability: Who Owns a Model’s Decision?
Bias Isn’t Always Where You Think It Is: Ethical Failure Modes in Registry Data
Prediction vs Responsibility: Why Risk Scores Can Be Ethically Dangerous
Human-in-the-Loop Is Not a Panacea (and Sometimes a Lie)
The Ethical Implications of Excluding “Messy” Patients
Missingness as a Fairness Issue in Machine Learning
You Can’t Trust What You Don’t Track: AI Performance Monitoring in Clinical Systems
From Weeks to Minutes: The Ethics of Automating CPG Compliance
Ontology Is Not Optional: Semantic Infrastructure as Ethical Foundation
What Responsible AI in Clinical Guidance Actually Requires
Modernizing the DOD Trauma Registry: An Ethical and Technical Imperative

Series: Ethics & Philosophy of AI

← Accountability Without Interpretability: Who Owns a Model’s Decision? | Prediction vs Responsibility: Why Risk Scores Can Be Ethically Dangerous →

References

Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. MIT Press. https://fairmlbook.org/.

Obermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.” Science 366 (6464): 447–53. https://doi.org/10.1126/science.aax2342.

Seyyed-Kalantari, Laleh, Haoran Zhang, Matthew B. A. McDermott, Irene Y. Chen, and Marzyeh Ghassemi. 2021. “Underdiagnosis Bias of Artificial Intelligence Algorithms Applied to Chest Radiographs in Underserved Patient Populations.” Nature Medicine 27 (12): 2176–82. https://doi.org/10.1038/s41591-021-01595-0.

Suresh, Harini, and John Guttag. 2021. “A Framework for Understanding Sources of Harm Throughout the Machine Learning Life Cycle.” Equity and Access in Algorithms, Mechanisms, and Optimization, 1–9. https://doi.org/10.1145/3465416.3483305.