Ontology Is Not Optional: Semantic Infrastructure as Ethical Foundation

Ethics in Trauma Registry Analysis

Why building a trauma registry on local codes and tribal vocabulary is an ethical failure, and how semantic infrastructure makes data interoperable, auditable, and fair.

Published

June 1, 2025

Modified

June 9, 2026

Executive Summary

Most trauma registries are built on local codes.

Institutions develop their own variable names, their own severity scales, their own definitions of key events.

The result is a landscape where:

“hemorrhagic shock” means different things at different facilities,
“prehospital blood product” is coded differently across theater and garrison,
and two registries that both claim to track traumatic brain injury cannot be directly compared.

This is not a formatting problem.

It is a knowledge problem with ethical consequences.

When registry data cannot be compared across institutions, variation in care quality becomes invisible.

When clinical concepts drift across sites and time, the harm signal — the pattern that would otherwise warn clinicians — disappears into terminological noise.

This post argues that semantic infrastructure — the vocabulary network that gives clinical data meaning — is not an optional technical refinement.

It is the ethical foundation of any registry that claims to generate generalizable knowledge.

What a Semantic/Ontology Network Is

An ontology is a formal, shared vocabulary that defines concepts, their meanings, and the relationships between them.

In clinical data, ontologies govern questions like:

What is the canonical definition of “major trauma”?
Is ICD-10 code S06.0 the same concept as the registry variable “concussion”?
Does “early blood product transfusion” mean the same thing at Level I trauma centers and forward surgical teams?

A semantic network connects clinical data to these shared concept definitions.

It means that when a trauma registry records “hemorrhagic shock,” that record points to a defined, stable concept — not a local string that may or may not correspond to what the next institution means.

Standard ontological frameworks in clinical medicine include:

SNOMED CT (clinical findings and procedures),
ICD-10-CM (diagnoses and injuries),
LOINC (laboratory and clinical observations),
and the OMOP Common Data Model, which harmonizes these vocabulary systems into a shared analytical structure (OHDSI Community 2019; Reich et al. 2024).

Local Codes Are an Ethical Problem

Local codes fragment knowledge.

When every registry speaks its own language:

multicenter studies require months of manual harmonization,
cross-institutional benchmarking becomes impossible without expensive data curation,
rare events cannot be identified because the cases are distributed across incompatible vocabularies,
and errors in local code definitions propagate silently into analyses and clinical decisions.

This is not merely inefficient.

It is a systematic barrier to generating the evidence that would otherwise improve care for patients who need trauma systems most.

The populations most harmed by fragmented registry data are, predictably, those with the least institutional representation:

patients treated at smaller facilities with less abstraction infrastructure,
populations with high prehospital mortality that creates incomplete records,
and military personnel whose care crosses civilian-military system boundaries (Pinto Junior et al. 2023; Wilkinson et al. 2016).

Fixing this is not a database administration task.

It is an equity intervention.

OMOP CDM as Shared Semantic Infrastructure

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) provides a clinical data architecture designed around shared vocabularies rather than local codes.

Its core properties make it ethically significant:

Concept standardization: clinical events are mapped to standard concept IDs drawn from curated vocabulary systems, not stored as raw local strings.

Vocabulary governance: the vocabulary system is maintained, versioned, and updated by a community of practice — not by individual institutions.

Concept relationships: the ontology encodes which concepts are broader, narrower, or equivalent — allowing queries to be written at the right level of specificity.

Reproducibility: a query written against the OMOP CDM at one institution can, in principle, be re-run at another and produce comparable results.

This is not just technical convenience.

It is what makes multi-institutional clinical research possible without requiring each collaboration to rebuild the shared vocabulary from scratch.

The Civilian-Military Translation Problem

The military trauma context creates a specific and difficult variant of the interoperability problem.

Military trauma registries and civilian trauma systems use overlapping but distinct terminologies.

Key concepts that require translation include:

injury mechanism (blast vs. blunt vs. penetrating categories differ),
care setting (Role 1 through Role 4 care levels have no direct civilian equivalent),
intervention timing (tactical field care windows do not map to ED arrival-to-treatment metrics),
and injury severity scoring conventions.

When a patient is evacuated from a forward surgical team to a military treatment facility and then transferred to a civilian Level I center, their clinical record crosses three terminological systems.

The semantic network that tracks that patient must be able to represent all three — not by pretending they are the same, but by making the translation explicit, governed, and auditable (Arvanitis 2014; Gazzarata et al. 2024).

That translation layer is not a technical nicety.

It is the precondition for knowing whether that patient received appropriate care across the entire continuum.

Value-Level Metadata as Moral Clarity

One of the most important and overlooked components of semantic infrastructure is value-level metadata.

A trauma registry that records “prehospital tourniquet: yes” is storing a data element.

But what does “yes” mean?

Was the tourniquet applied before, during, or after arrival at a care facility?
By a medic, a civilian bystander, or the patient themselves?
Was application time recorded, or imputed from a fixed protocol window?
Does “yes” include improvised tourniquets, or only commercial ones?

Without value-level metadata — definitions, collection rules, and transformation logic attached to individual data values — every recorded “yes” is an ambiguous symbol.

Clinical decisions made on the basis of ambiguous symbols are not evidence-based.

They are coincidence-dependent.

The ethical obligation to define what registry variables mean is not satisfied by a data dictionary that lists variable names.

It requires meaning at the value level, governed and versioned like the vocabulary it describes (Wilkinson et al. 2016).

Ontology Governance Is a Political Act

The choice of what concepts to include in a shared vocabulary is not technically neutral.

Every ontology reflects:

whose clinical experience shaped the concept definitions,
whose patient populations were considered representative at development,
and whose care patterns were treated as the norm from which variation is measured.

When military trauma medicine adopts a civilian clinical ontology wholesale, military-specific concepts — tactical resuscitation, prolonged field care, damage control surgery under fire — may have no adequate representation.

When trauma registries use vocabularies developed primarily from urban academic medical center populations, rural and combat medicine presentations are systematically othered.

Vocabulary governance — the process of deciding which concepts exist, how they are defined, and how they relate to each other — is an act of institutional power.

Whoever controls the ontology controls what can be seen in the data.

That is why vocabulary governance should not be delegated entirely to software vendors or international standards bodies.

It must include the clinical communities whose patients are being represented.

Semantic Infrastructure Enables Accountability

Beyond interoperability and equity, semantic infrastructure has a third ethical function:

It makes accountability traceable.

When a clinical quality measure shows that patients with a specific injury pattern are consistently not receiving a recommended intervention, the ability to investigate that finding depends on having data that is:

consistently defined across institutions,
traceable to its source records,
and comparable across time periods.

None of that is possible if the registry is built on local codes that evolve with each new analyst, each system upgrade, and each facility-specific abstraction protocol.

Semantic infrastructure is what allows a quality finding to be investigated rather than explained away by terminological ambiguity.

A Practical Checklist for Registry Semantic Design

Before claiming a trauma registry is ready for cross-institutional analytics, ask:

Are clinical variables mapped to standard vocabulary concepts, not local strings?
Is the vocabulary versioned and change-controlled?
Is there explicit documentation of concept coverage for military-specific clinical events?
Are value-level definitions documented alongside variable definitions?
Does the translation layer between civilian and military terminology exist, and is it governed?
Can a query reproduce the same result across institutions using the same CDM?
Who owns the vocabulary governance process, and does it include clinical domain experts?

Where This Shows Up in AI/ML

When DoDTR data is mapped to OMOP CDM for interoperability, clinical concepts are translated through controlled vocabularies — SNOMED, ICD, LOINC — that were not designed for combat casualty care: blast injury mechanism, tourniquet application time, tactical evacuation category, and hemorrhage control method have no clean OMOP representations, so abstractors make local mapping decisions that vary across sites and are not recorded in the standardized dataset a downstream model consumer receives. Models trained on OMOP-harmonized DoDTR data inherit these translation losses silently — the dataset looks complete and standardized, but the concepts most specific to military trauma have been flattened, approximated, or dropped. This is an ethical issue, not just a technical one, because the losses are invisible: a researcher building a mortality prediction model on harmonized data has no way to know that “blast injury” was mapped to a generic trauma code that loses the mechanism-specific information that drives outcome differences. The standardization that enables interoperability also enables the confident misuse of data whose meaning has been quietly degraded.

Closing: The Vocabulary You Choose Is the Knowledge You Can Generate

A trauma registry that cannot be compared to any other registry is not a registry.

It is an institutional archive with limited scientific value and no capacity for the multicenter learning that saves lives at scale.

The semantic foundation — the shared vocabulary, the ontology network, the concept governance process — determines what questions the registry can answer.

That is not a technical specification.

It is a scientific and ethical commitment:

Choosing the vocabulary of a trauma registry is choosing the limits of the knowledge it can produce.

Those limits will be lived by patients.

They deserve to be chosen deliberately, equitably, and with full awareness of what will be invisible if the vocabulary is wrong.

📚 Go Deeper: OMOP & Interoperability Toolkit

This post is part of the OMOP & Interoperability Toolkit — a companion reference with CDM mapping templates, value-level metadata schemas, trauma-specific vocabulary extension patterns, and civilian-military concept translation frameworks.

→ Open the OMOP & Interoperability Toolkit

Series Callout

Note

This post is part of a broader Ethics in Trauma Registry Analysis Series:

Opacity Is Sometimes Ethical: When Black Boxes Save Lives
Accountability Without Interpretability: Who Owns a Model’s Decision?
Bias Isn’t Always Where You Think It Is: Ethical Failure Modes in Registry Data
Prediction vs Responsibility: Why Risk Scores Can Be Ethically Dangerous
Human-in-the-Loop Is Not a Panacea (and Sometimes a Lie)
The Ethical Implications of Excluding “Messy” Patients
Missingness as a Fairness Issue in Machine Learning
You Can’t Trust What You Don’t Track: AI Performance Monitoring in Clinical Systems
From Weeks to Minutes: The Ethics of Automating CPG Compliance
Ontology Is Not Optional: Semantic Infrastructure as Ethical Foundation
What Responsible AI in Clinical Guidance Actually Requires
Modernizing the DOD Trauma Registry: An Ethical and Technical Imperative

Series: Ethics & Philosophy of AI

← From Weeks to Minutes: The Ethics of Automating CPG Compliance | What Responsible AI in Clinical Guidance Actually Requires →

References

Arvanitis, Theodoros N. 2014. “Interoperability in Digital Health: Global and National Initiatives.” Yearbook of Medical Informatics 9: 30–34. https://doi.org/10.15265/IY-2014-0003.

Gazzarata, Roberta, Maurizio Vergari, Cristina Napolitano, et al. 2024. “HL7 FHIR for Interoperability in Health Research: A Scoping Review.” International Journal of Medical Informatics 184: 105356. https://doi.org/10.1016/j.ijmedinf.2024.105356.

OHDSI Community. 2019. The Book of OHDSI: Observational Health Data Sciences and Informatics. OHDSI. https://ohdsi.github.io/TheBookOfOhdsi/.

Pinto Junior, Everton Pimentel, Carlos Augusto Souza Pires, Nayara Cristina Almeida Medeiros, et al. 2023. “Observational Health Data Sciences and Informatics in the Global South: Opportunities and Challenges.” Journal of the American Medical Informatics Association 30 (11): 1762–69. https://doi.org/10.1093/jamia/ocad167.

Reich, Christian, Anna Ostropolets, Patrick Ryan, et al. 2024. “OMOP Common Data Model and Standardized Vocabularies for Observational Research.” Journal of the American Medical Informatics Association 31 (3): 583–90. https://doi.org/10.1093/jamia/ocad247.

Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3: 160018. https://doi.org/10.1038/sdata.2016.18.