Prediction ≠ Causation: How to Use Each Correctly in Applied Statistics

Trauma Registry and Other Topics

Causal Inference

A practical guide to separating prediction from causation in applied statistics, healthcare analytics, and clinical modeling.

Published

June 1, 2024

Modified

June 9, 2026

Executive Summary

One of the most common—and costly—mistakes in applied statistics is confusing prediction with causation.

They are not competing philosophies.
They are tools built for different purposes.

Using a predictive model to answer a causal question doesn’t make the model “less interpretable.”
It makes the conclusion wrong.

This post explains: - what each approach is actually for, - where each fails, - and how to use both responsibly in real-world analyses.

Two Questions That Sound Similar (But Aren’t)

Let’s start with two questions that often get conflated:

Can we predict who will experience this outcome?
Will changing this variable change the outcome?

The first is a prediction problem.
The second is a causal problem.

That distinction is foundational: explanatory and predictive modeling serve different purposes, and good work starts by naming the purpose correctly (Shmueli 2010; Pearl 2009).

Same data.
Very different logic.

What Prediction Models Are Designed to Do

Prediction models answer this question:

“Given everything we know right now, how well can we forecast an outcome?”

They optimize: - accuracy, - discrimination, - calibration.

They do not care why a variable works—only that it improves prediction.

library(tidymodels)

fit_pred <- logistic_reg() %>%
  set_engine("glmnet") %>%
  fit(outcome ~ ., data = data)

If a variable helps prediction, it stays. Even if it’s downstream, redundant, or non-actionable.

That’s a feature—not a bug.

What Causal Models Are Designed to Do

Causal models answer this question:

“What would happen if we intervened?”

They care deeply about:

confounding,
temporality,
mechanisms,
identifiability.

They often sacrifice predictive performance to gain interpretability and validity.

fit_causal <- lm(outcome ~ treatment + confounders, data = data)

A causal model that predicts poorly can still be correct.

Why High Predictive Performance Proves Nothing Causal

A model can:

predict mortality extremely well,
include variables like lactate, shock index, or transfusion,
and still tell you nothing about what to change.

Why?

Because many predictors are:

consequences, not causes,
proxies for severity,
signals of downstream processes.

Prediction answers who. Causation answers what to do.

The Most Common (and Dangerous) Error

This mistake shows up everywhere:

“Variable X is important in the model, therefore changing X will change the outcome.”

This is false.

Variable importance ≠ causal effect.

Elastic net, random forests, and boosting are especially prone to this misuse because they surface “importance” without meaning.

DAGs: The Simplest Way to Avoid Confusion

Before modeling, ask one question:

“What causes what?”

Directed acyclic graphs (DAGs) force you to:

define assumptions,
separate causes from consequences,
and avoid adjusting for the wrong variables.

They do not prove causation, but they make the assumed causal structure explicit, which is often the first protection against analytic confusion (Greenland et al. 1999; Hernán and Robins 2020).

library(dagitty)  # @textor2016_dagitty

dag <- dagitty("
dag {
  Injury -> Severity
  Severity -> Outcome
  Treatment -> Outcome
  Injury -> Treatment
}
")

No math. Just clarity.

When Prediction Is the Right Tool

Use prediction when:

the goal is early warning,
triage or risk stratification,
resource allocation,
decision support under uncertainty.

Examples:

hemorrhage risk scores,
ICU deterioration alerts,
readmission risk.

In these cases, asking why may be secondary to how well.

When Causation Is the Right Tool

Use causal methods when:

evaluating interventions,
informing policy,
changing practice,
comparing strategies.

Examples:

does early intervention reduce mortality?
does protocol A outperform protocol B?
does timing matter?

Prediction models cannot answer these safely.

Why “Interpretable ML” Doesn’t Fix the Problem

Making a predictive model interpretable does not make it causal.

SHAP values (Lundberg and Lee 2017), partial dependence plots, and coefficients explain:

how the model behaves,
not how the world works.

Interpretability ≠ causality.

This distinction matters.

The Right Way to Combine Prediction and Causation

The strongest applied workflows use both:

Prediction to identify high-risk populations
Causal analysis to evaluate interventions within them

This avoids:

treating risk factors as levers,
deploying models that can’t guide action.

A Practical Decision Tree

Ask yourself:

Am I trying to forecast? → Prediction
Am I trying to intervene? → Causation
Am I ranking risk? → Prediction
Am I changing practice? → Causation

If you can’t answer this clearly, stop and reset.

How to Explain This to Non-Statisticians

Try this:

“Prediction tells us who is likely to have a bad outcome. Causation tells us what might prevent it. Confusing the two leads to confident but wrong decisions.”

That usually lands.

Where This Shows Up in AI/ML

The intervention fallacy — using a predictive model’s coefficient or feature importance to justify a clinical intervention — is one of the most dangerous misapplications of clinical AI in trauma care. A DoDTR-trained model that identifies “time to OR” as a strong predictor of mortality does not establish that reducing time to OR will reduce mortality by the predicted amount; the association may reflect injury severity channeling the sickest patients to the fastest surgery, not a causal pathway the model can identify. Deploying a MAVEN alert that nudges surgeons toward faster OR access based on this association could harm patients if the true mechanism is injury acuity, not surgical delay. Predictive models answer “who is at risk” — only causal models, backed by trial data or rigorous quasi-experimental analysis, can answer “what should we do differently.”

Closing: Different Tools, Different Responsibilities

Prediction models are powerful—and dangerous—when misused.

Causal models are fragile—and essential—when decisions change systems.

Applied statistics isn’t about choosing sides. It’s about choosing the right tool for the right question.

If you don’t know which question you’re answering, the model won’t save you.

📚 Go Deeper: Causal Inference Toolkit

This post is part of the Causal Inference Toolkit — a companion reference with DAG templates, collider bias examples, and reviewer-safe language for distinguishing prediction from causal claims.

→ Open the Causal Inference Toolkit

Series Callout

Note

This post is part of a broader Trauma Registry and Other Topics Series:

Why Most Clinical Models Fail in the Real World (and How to Fix Them in R)
Audit-Ready Applied Statistics: How to Make Your R Analysis Defensible
Bayesian Models for Clinicians Who Hate Math (But Love Good Decisions)
Missing Data Is the Real Model: Practical Strategies in R
From Registry to Knowledge: How to Analyze Messy Trauma Data Without Lying to Yourself
Why Statistical Significance Is a Terrible Stopping Rule
Hierarchical Models Are Not Optional in Healthcare (Here’s Why)
Prediction ≠ Causation: How to Use Each Correctly in Applied Statistics
How to Evaluate Models When the Outcome Is Rare (and Lives Are at Stake)
Building Clinical Decision Support That Doesn’t Collapse Under Scrutiny
Rare Event Modeling in Clinical Prediction: Why 1% Outcomes Break Your Model (And What to Do in R)
Calibration Under Drift: How Clinical Models Become Confident and Wrong (And How to Monitor It in R)
Audit-Ready Bayesian Workflows: Why Transparency Is a Process, Not a Model Feature
Missing Data in Hierarchical Clinical Models: Why Structure Changes the Problem
MNAR Sensitivity Analysis for Applied Work: What to Do When Missingness Depends on Reality

Series: Trauma Registry & Outcomes

← Hierarchical Models Are Not Optional in Healthcare (Here’s Why) | How to Evaluate Models When the Outcome Is Rare (and Lives Are at Stake) →

References

Greenland, Sander, Judea Pearl, and James M. Robins. 1999. “Causal Diagrams for Epidemiologic Research.” Epidemiology 10 (1): 37–48. https://doi.org/10.1097/00001648-199901000-00008.

Hernán, Miguel A., and James M. Robins. 2020. Causal Inference: What If. Chapman; Hall/CRC.

Lundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems 30: 4765–74.

Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press.

Shmueli, Galit. 2010. “To Explain or to Predict?” Statistical Science 25 (3): 289–310. https://doi.org/10.1214/10-STS330.