From Correlation to Causation: Rubin’s Framework for AI Insights

Advanced Statistics
Causal Inference
An applied introduction to potential outcomes, causal estimands, and the logic of causal inference for AI, healthcare, and observational data.
Published

November 15, 2025

Modified

June 9, 2026

Executive Summary

One of the biggest mistakes in data analysis is to assume that a strong association automatically implies a causal effect (Rubin 1974; Hernán and Robins 2020; Pearl 2009).

It does not.

A treatment can look beneficial because healthier people were more likely to receive it. A recommendation system can appear effective because engaged users were already more likely to click. A policy can look successful because the treated group differed systematically from the untreated group before the intervention began.

This is why causal inference matters.

Causal inference asks a different kind of question from prediction:

  • not only what tends to happen,
  • but what would happen if we intervened differently.

One of the most influential frameworks for this is the Rubin Causal Model, also called the potential outcomes framework.

This post introduces:

  • potential outcomes,
  • counterfactual reasoning,
  • average treatment effects,
  • randomized versus observational data,
  • and the difference between association and causation.

Along the way, I use a simple A/B test versus observational-bias example to show why causal reasoning is essential in both statistics and AI.

Causal inference matters because the hardest question in data science is often not what is associated with an outcome, but what would change the outcome if we acted differently. ## Causation Requires More Than Association

Most statistical and machine learning models are fundamentally associational.

They can answer questions like:

  • which variables predict the outcome?
  • which features are most informative?
  • how accurately can we classify or forecast?

Those are useful questions.

But causal inference asks something different:

what is the effect of changing an exposure, treatment, or action?

This is a fundamentally counterfactual question.

It is not about what happened under the observed world alone. It is about what would have happened under an alternative world in which the treatment assignment changed.

That is why causal inference needs its own conceptual framework. ## The Rubin Causal Model Starts with Potential Outcomes

The Rubin Causal Model formalizes causal effects using potential outcomes.

For each unit, imagine two possible outcomes:

  • (Y(1)): the outcome if the unit receives treatment
  • (Y(0)): the outcome if the unit does not receive treatment

The individual causal effect is:

[ Y(1) - Y(0)]

This is conceptually elegant, but it creates a problem.

For any one person, we can only observe one of these outcomes:

  • the treated outcome,
  • or the untreated outcome,

but not both at once.

This is often called the fundamental problem of causal inference.

We never directly observe the individual-level counterfactual. ## Counterfactuals Are Central Even Though We Cannot Observe Them Directly

The term counterfactual refers to the unobserved potential outcome:

  • what would have happened under the treatment state that did not occur.

For a treated patient, the untreated outcome is counterfactual. For an untreated patient, the treated outcome is counterfactual.

This is why causal inference is hard.

The target of interest is always partly hidden.

Good causal design and analysis therefore try to create situations where the observed outcomes of one group can serve as a credible stand-in for the unobserved counterfactual outcomes of another group.

That is the logic behind randomization, matching, weighting, and other causal adjustment methods. ## Average Treatment Effects Shift the Focus from Individuals to Populations

Because individual causal effects are not directly observable, analysts often focus on population-level averages.

The most common target is the Average Treatment Effect (ATE):

[ ATE = E[Y(1) - Y(0)]]

This is the average causal effect across the population.

Other important estimands include:

  • ATT — Average Treatment Effect on the Treated
  • ATC — Average Treatment Effect on the Controls

These distinctions matter because the relevant population may differ depending on the scientific or policy question.

Still, the ATE is the most common starting point and one of the clearest ways to introduce the framework. ## A Randomized A/B Test Is the Cleanest Causal Design

Randomized experiments are so valuable because they make treatment assignment independent of the potential outcomes, at least in expectation.

That means treated and untreated groups should be comparable except for random variation.

In the potential outcomes language, randomization helps justify using:

  • the observed outcomes in the control group as a stand-in for
  • the missing untreated outcomes of the treated group,

and vice versa.

This is why A/B tests are often the gold standard for causal estimation.

They do not make causal inference trivial, but they make the key identification assumptions much more plausible. ## A Simple Randomized Trial Example Makes the Idea Concrete

Let us simulate a simple randomized experiment.

library(dplyr)
library(tibble)
library(ggplot2)

n <- 500

rct_df <- tibble::tibble(
  id = 1:n,
  baseline_risk = rnorm(n, mean = 0, sd = 1),
  treatment = rbinom(n, size = 1, prob = 0.5)
) |>
  dplyr::mutate(
    y0 = 50 - 4 * baseline_risk + rnorm(n, 0, 4),
    y1 = y0 + 3,
    outcome = if_else(treatment == 1, y1, y0)
  )

In this simulation, we actually know both potential outcomes because we constructed them.

That allows us to compare the true ATE to what we would estimate from the randomized data.

true_ate <- mean(rct_df$y1 - rct_df$y0)

estimated_ate_rct <- rct_df |>
  dplyr::group_by(treatment) |>
  dplyr::summarise(
    mean_outcome = mean(outcome),
    .groups = "drop"
  ) |>
  tidyr::pivot_wider(
    names_from = treatment,
    values_from = mean_outcome
  ) |>
  dplyr::mutate(
    estimated_ate = `1` - `0`
  ) |>
  dplyr::pull(estimated_ate)

tibble::tibble(
  true_ate = true_ate,
  estimated_ate_rct = estimated_ate_rct
)
# A tibble: 1 × 2
  true_ate estimated_ate_rct
     <dbl>             <dbl>
1        3              3.28

Because treatment was randomized, the simple difference in means should recover the true causal effect reasonably well. ## Observational Data Break the Easy Comparison

Now consider a nonrandomized setting.

Suppose treatment is more likely to be given to lower-risk individuals. Then the treated group may already have better outcomes even before treatment has any effect.

This creates confounding.

In observational data, the treated and untreated groups may differ systematically in ways that affect the outcome.

That means the raw difference in means may mix together:

  • the treatment effect,
  • and preexisting group differences.

This is where causal inference becomes much harder. ## A Simulated Observational Example Shows the Bias Clearly

Let us simulate treatment assignment that depends on baseline risk.

obs_df <- tibble::tibble(
  id = 1:n,
  baseline_risk = rnorm(n, mean = 0, sd = 1)
) |>
  dplyr::mutate(
    treat_prob = plogis(-0.8 - 1.0 * baseline_risk),
    treatment = rbinom(n, size = 1, prob = treat_prob),
    y0 = 50 - 4 * baseline_risk + rnorm(n, 0, 4),
    y1 = y0 + 3,
    outcome = if_else(treatment == 1, y1, y0)
  )

Now compare the true ATE with the naïve observational estimate.

true_ate_obs <- mean(obs_df$y1 - obs_df$y0)

naive_ate_obs <- obs_df |>
  dplyr::group_by(treatment) |>
  dplyr::summarise(
    mean_outcome = mean(outcome),
    .groups = "drop"
  ) |>
  tidyr::pivot_wider(
    names_from = treatment,
    values_from = mean_outcome
  ) |>
  dplyr::mutate(
    naive_ate = `1` - `0`
  ) |>
  dplyr::pull(naive_ate)

tibble::tibble(
  true_ate = true_ate_obs,
  naive_observational_estimate = naive_ate_obs
)
# A tibble: 1 × 2
  true_ate naive_observational_estimate
     <dbl>                        <dbl>
1        3                         6.22

Because lower-risk people were more likely to get treatment, the naïve estimate is biased.

This is the central associational-versus-causal problem. ## The Difference Between Randomized and Observational Data Is the Assignment Mechanism

A key insight from causal inference is that the difference between an experiment and an observational study is not only the data structure.

It is the assignment mechanism.

In a randomized trial, treatment assignment is externally controlled and designed to be independent of potential outcomes.

In an observational study, treatment assignment is typically influenced by:

  • patient characteristics,
  • clinician decisions,
  • access,
  • preferences,
  • severity,
  • or institutional processes.

That means the assignment mechanism itself becomes part of the modeling problem.

This is one reason causal inference always requires thinking about design, not only estimation. ## Confounding Is What Blurs Correlation into False Causation

A confounder is a variable that affects both:

  • treatment assignment,
  • and the outcome.

If confounders are not properly accounted for, the estimated effect of treatment may be biased.

In the simulated observational example:

  • baseline risk affects treatment probability,
  • and baseline risk affects the outcome.

That makes it a confounder.

This is why the crude difference between treated and untreated groups does not isolate the treatment effect.

It combines treatment with imbalance. ## Adjustment Tries to Reconstruct a Fair Comparison

In observational causal inference, the goal is often to adjust for confounding so that the treated and untreated groups become more comparable.

Common approaches include:

  • regression adjustment,
  • matching,
  • propensity scores,
  • inverse probability weighting,
  • and doubly robust methods.

At a high level, these methods all try to answer the same question:

can we use the observed data to create a more credible stand-in for the missing counterfactual?

The details differ, but the logic is shared. ## A Simple Regression Adjustment Example Helps Show the Idea

For the observational example, we can adjust for baseline risk using regression.

fit_adj <- lm(outcome ~ treatment + baseline_risk, data = obs_df)

summary(fit_adj)$coefficients
               Estimate Std. Error   t value     Pr(>|t|)
(Intercept)   50.097076  0.2238028 223.84475 0.000000e+00
treatment      3.331767  0.3937226   8.46222 2.957606e-16
baseline_risk -3.685127  0.1860448 -19.80774 8.522505e-65

The treatment coefficient now adjusts for the confounding influence of baseline risk.

This is not the final word in causal analysis, but it illustrates the idea that once confounding is recognized, raw group differences are not enough.

We need a comparison that is more conditionally fair. ## The Potential Outcomes Framework Clarifies What the Model Is Trying to Recover

One of the strengths of the Rubin Causal Model is that it makes the causal target explicit.

Instead of saying vaguely:

  • “we want the effect of treatment,”

it says more precisely:

  • “we want the contrast between potential outcomes under treatment and no treatment.”

That is helpful because it forces the analyst to define:

  • what the treatment is,
  • what the outcome is,
  • what the target population is,
  • and which causal estimand is being pursued.

This clarity is one reason the potential outcomes framework became so influential in both statistics and applied causal ML. ## Do-Calculus and Intervention Thinking Point in the Same Direction

The user topic mentions do-calculus basics, so it is useful to connect that idea without shifting away from Rubin’s framework.

The intervention notation:

[ E[Y do(T = 1)]]

means the expected outcome if we were to intervene and set treatment to 1.

This is different from:

[ E[Y T = 1]]

which is only the average outcome among those who happened to receive treatment.

That distinction is exactly the correlation-versus-causation divide.

Even though the Rubin Causal Model and Pearl-style intervention notation come from different traditions, they overlap strongly in this core idea:

  • observational conditioning is not the same as causal intervention.

That is one of the most important conceptual bridges in modern causal inference. ## Recommendation Systems and Policy AI Need Causal Thinking Too

Causal inference is not limited to medicine.

It matters in AI/ML settings such as:

  • recommendation systems,
  • ad targeting,
  • pricing,
  • policy evaluation,
  • and platform interventions.

For example, if a recommender shows a product to users already likely to click, then click-through association does not necessarily reveal the causal effect of showing the product.

Similarly, in policy AI, a program’s recipients may differ systematically from nonrecipients.

This is why predictive accuracy is not enough in intervention-focused systems. The model must also separate selection effects from treatment effects. ## A/B Testing Is the Clean Causal Benchmark for Many AI Problems

In many product and AI deployment settings, randomized experiments remain the strongest benchmark for causal learning.

Why?

Because they provide the cleanest estimate of what changes because of the intervention.

This is why A/B testing is so central in:

  • recommender systems,
  • interface changes,
  • notification strategies,
  • and ranking interventions.

Observational AI may predict behavior very well. But if the goal is to understand what will happen under an intervention, causal design still matters.

That is why causal inference is not only a statistical theory topic. It is a deployment topic. ## Causal Inference Requires Assumptions Even When the Framework Is Clear

The potential outcomes framework is conceptually strong, but identification still requires assumptions.

Common ones include:

  • consistency — the observed outcome under the received treatment equals the relevant potential outcome
  • exchangeability / ignorability — no unmeasured confounding, conditional on observed covariates
  • positivity — every relevant subgroup has some chance of receiving each treatment

These assumptions are critical.

The framework tells us what the estimand is. The assumptions tell us whether the data can identify it.

That is one reason causal inference is both elegant and demanding. ## Simulation Helps Separate the Estimand from the Estimator

One of the best ways to teach causal inference is through simulation because simulation lets us know the full potential outcomes.

That means we can compare:

  • the true causal effect,
  • the naïve associational estimate,
  • and the adjusted estimate.

This makes the conceptual distinction visible in a way that real observational data usually cannot.

In real data, the counterfactuals are hidden. In simulation, they are known. That is why simple simulated examples are so useful for understanding Rubin’s framework.

Trauma Registry Application: Why Causation Matters for Treatment Comparisons

Trauma registries are rich observational datasets — but they are not randomized experiments.

A patient transferred to a Level I trauma center is not randomly assigned. A patient receiving massive transfusion protocol is not randomly assigned. A patient admitted to a surgical versus non-surgical service is not randomly assigned.

That means every unadjusted comparison in a trauma registry is potentially a causal inference problem.

The potential outcomes framework clarifies what question is actually being asked:

  • What would mortality have been if this patient had been transferred versus treated in place?
  • What would the outcome have been under protocol A versus protocol B?

Those are counterfactual questions. Regression alone — even well-specified regression — does not automatically answer them (Hernán and Robins 2020; Rubin 1974).

Trauma researchers using registry data need causal thinking built into the design, not just the model.


A Practical Checklist for Applied Work

Before claiming a causal effect, ask:

  • What is the treatment or intervention?
  • What is the outcome?
  • What causal estimand am I trying to estimate: ATE, ATT, or something else?
  • Is the data source randomized or observational?
  • What variables plausibly confound treatment and outcome?
  • Does the analysis target an intervention effect or only an association?
  • Are the identifying assumptions plausible enough to support the claim?

These questions usually matter more than the choice of software package.

NoteWhere This Shows Up in AI/ML

An EHR-trained sepsis model that identifies “antibiotics administered early” as a predictor of 30-day mortality is not telling you that antibiotics cause death — it is capturing confounding by indication, where sicker patients receive antibiotics sooner. Deploying that model as if the association is causal and using it to justify withholding early antibiotics would be harmful; the counterfactual it needs to answer (what would have happened without treatment?) was never identified from observational data alone. DoDTR-based models face the same structure: damage control resuscitation is applied to the most severely injured patients, so its association with outcomes in raw registry data reflects selection, not effect. Causal inference frameworks — potential outcomes, DAGs, target trial emulation — are the tools that force the analyst to state which question is actually being answered before fitting any model.

Closing: Rubin’s Framework Forces Us to Ask the Right Question

The Rubin Causal Model remains foundational because it formalizes one of the deepest ideas in applied statistics and AI:

causation is about comparing what did happen to what would have happened under a different treatment state.

That comparison is fundamentally counterfactual.

Potential outcomes make the causal target explicit. Randomization makes the comparison credible. Observational data make confounding a central challenge. Adjustment methods try to recover a fairer stand-in for the missing counterfactuals.

Causal inference matters because correlation describes the observed world, but causation asks how the world would change if we intervened.

Tip📚 Go Deeper: Causal Inference Toolkit

This post is part of the Causal Inference Toolkit — a companion reference with potential outcomes frameworks, DAG templates, ATE/ATT estimation scaffolds, and reviewer-safe language for distinguishing association from causation.

→ Open the Causal Inference Toolkit

References

Hernán, Miguel A., and James M. Robins. 2020. Causal Inference: What If. Chapman; Hall/CRC.
Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press.
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688–701. https://doi.org/10.1037/h0037350.