---
title: "Advanced Statistics — Master Speaker Notes"
subtitle: "Instructor Teaching Guide · 4-Lecture Series"
author: "Jonathan D. Stallings, PhD, MS"
date: "Summer 2026"
format:
  html:
    toc: true
    toc-depth: 3
    toc-title: "Lecture Navigator"
    number-sections: true
    theme: cosmo
---

> **How to use this guide.** Instructor-facing notes for the 4-lecture Advanced Statistics series. Prerequisite: Applied Statistics series (or equivalent). Audience: analysts, fellows, clinician-researchers who need to work with observational registry data and critically evaluate causal claims.

---

# Lecture 1 — Missing Data, Mechanisms & Imputation

**Posts covered:** 01 (MCAR/MAR/MNAR mechanisms), 02 (Imputation), 03 (Sensitivity analysis for missing data)

## Teaching strategy

Open with the provocation: "When a data element is missing in a trauma registry, what do you assume happened?" Most audiences will say "the measurement wasn't taken" or "it was recorded somewhere else." Rarely: "this variable was missing *because* of the patient's clinical state."

That last possibility — missingness tied to the outcome — is MNAR, and it's the hardest to handle and the most dangerous to ignore. The goal of Part 1 is to get the audience thinking about missingness as signal, not noise.

## Key talking points

**Slide: MCAR vs. MAR vs. MNAR**
Use the lactate example throughout:
- MCAR: lab machine broke, random patients missing
- MAR: lactate more often missing in stable patients (missingness depends on observed variables like GCS, BP)
- MNAR: lactate more often missing in the *most* unstable patients who were too critical to wait for a lab (missingness depends on the lactate value itself)

In MNAR, the missing patients are systematically different from the observed patients in the very variable that's missing. No amount of imputation based on observed data fixes this — it requires sensitivity analysis.

**Slide: Multiple Imputation vs. Mean Imputation**
The density plot visualization is the key teaching moment. Mean imputation destroys the distribution shape — it creates an artificial spike at the center. For a right-skewed outcome like ISS, this is especially damaging. Multiple imputation preserves distributional properties by using the observed data to model what the missing values would likely look like.

**Slide: Delta-Adjustment for MNAR**
The delta-adjustment curve is a communication tool, not just an analysis tool. Show it to the audience and ask: "At what delta value does your conclusion change? Does that delta value seem clinically plausible?" If the finding reverses only when you shift all missing values to extreme values that are physiologically implausible, the conclusion is robust.

## Timing
- Mechanisms (MCAR/MAR/MNAR): 20 min
- Imputation methods: 20 min
- Sensitivity analysis: 15 min
- Discussion: 5 min

## Common questions
- *"Can I just use complete-case analysis?"* Only if MCAR is plausible (rarely true in registries). And even then, you're sacrificing power. Always at minimum describe who is missing.
- *"How many imputations do I need?"* Rule of thumb: at least as many as the percent missing. 30% missing → at least 30 imputations.
- *"What if the outcome itself is missing?"* Then you have a much harder problem. If the outcome is the mortality indicator and it's missing, you cannot simply impute it — you need to investigate the missing mechanism and consider bounds analysis.

---

# Lecture 2 — Causal Inference & Propensity Methods

**Posts covered:** 04 (Causal Inference fundamentals), 05 (Propensity score methods)

## Teaching strategy

The fundamental problem of causal inference is simple to state: "We can never observe both potential outcomes for the same patient." A patient either receives treatment or doesn't. We observe only one arm. Everything in causal inference is an attempt to construct a valid comparison in the absence of randomization.

Start with the RCT as the gold standard — not because it's perfect, but because it controls confounding by design. Then ask: "What do we do when we can't randomize?" Answer: we try to make the comparison groups as similar as possible on observed characteristics. That's propensity score methods.

## Key talking points

**Slide: RCT vs. Observational Study — The Bias Recovery Simulation**
The simulation demonstrates that propensity score weighting (IPTW) can recover the true treatment effect from a confounded observational dataset — but only when the propensity model is correctly specified and all important confounders are measured. This is the fundamental limitation: unmeasured confounders cannot be adjusted for.

**Slide: Propensity Score Overlap**
Before any IPTW or matching analysis, check overlap: is there a region of propensity score overlap where both treated and untreated patients exist? If there's no overlap, you cannot estimate the treatment effect in that region — and your estimate is extrapolating beyond your data.

**Slide: The Love Plot**
Standardized mean differences before and after weighting. The target: all SMDs < 0.1 after weighting. If some covariates remain imbalanced, the weighting model may be misspecified or the sample is too small to achieve balance.

**Slide: Estimand Clarity — ATE vs. ATT**
ATE: what's the average effect if everyone received treatment? ATT: what's the effect in the patients who actually received treatment? These answer different questions. For trauma care, ATT is usually more relevant — you're asking "did the treatment help the patients who received it?" not "would treatment help everyone?"

## Timing
- Counterfactual framework: 15 min
- Propensity score estimation and diagnostics: 25 min
- Sensitivity to unmeasured confounding (E-value): 15 min

## Discussion prompt
"You're doing an IPTW analysis comparing early vs. late tourniquet application. You achieve SMDs < 0.1 for ISS, mechanism, and facility type. But you couldn't measure time from injury to first care — which is correlated with tourniquet timing and with outcome. Discuss."

---

# Lecture 3 — Identification Strategies & Sensitivity Analysis

**Posts covered:** 06 (Instrumental Variables), 07 (Confounding and Bias), 08 (Target Trial Emulation)

## Teaching strategy

Lecture 3 builds on Lecture 2 by presenting methods for when propensity scores aren't sufficient — typically because there's important unmeasured confounding. The three tools (IV, target trial emulation, sensitivity analysis) represent fundamentally different philosophies for handling this problem.

IV says: "Find a variable that creates quasi-random variation in treatment assignment." Target trial says: "Specify the RCT you would have run, then use observational data to emulate it." Sensitivity analysis says: "Show how robust your conclusion is to deviations from your assumptions."

## Key talking points

**Slide: Instrumental Variables — The Three Assumptions**
Relevance (the instrument is correlated with treatment), exclusion restriction (the instrument only affects outcome through the treatment), and independence (the instrument is as good as randomly assigned). All three must hold. The exclusion restriction is never directly testable — it's an assumption that must be argued on subject-matter grounds.

**Slide: 2SLS in Practice**
Walk through the two stages manually: Stage 1 — regress treatment on instrument to get predicted treatment. Stage 2 — regress outcome on predicted treatment. The IV estimate is the effect of treatment in the complier subgroup (patients whose treatment status changes with the instrument). This is not the ATE — it's the LATE (Local Average Treatment Effect).

**Slide: E-Value — Sensitivity to Unmeasured Confounding**
The E-value answers: "How strong would unmeasured confounding have to be — on both the exposure-confounder and confounder-outcome associations — to fully explain away the observed association?" If the E-value is 3.5, you'd need an unmeasured confounder that doubles the odds of both the exposure and the outcome. That's a very strong confounder; if you've already adjusted for the major confounders, this may not be plausible.

**Slide: Target Trial Emulation**
The elegance of target trial emulation is that it forces you to specify the causal question precisely before looking at the data. "What is the eligibility criterion? What is the assigned treatment? What is the primary outcome? What is the follow-up period?" If you can't specify these as you would for an RCT protocol, you shouldn't be making causal claims from the observational analysis.

## Timing
- IV methods: 20 min
- E-value and sensitivity: 15 min
- G-computation and target trial: 20 min

---

# Lecture 4 — Evidence Synthesis & Transportability

**Posts covered:** 09 (Meta-analysis and systematic review), 10 (Generalizability and transportability)

## Teaching strategy

End the series with the question: "You've done a careful analysis at your institution. Can others use it? Can you use theirs?" Evidence synthesis is about combining information across studies. Transportability is about whether results from one population apply to another.

The forest plot is the canonical output of meta-analysis and needs careful interpretation. Heterogeneity (I²) is the most misinterpreted statistic in meta-analysis — an I² of 70% does not mean the studies are "inconsistent" in a qualitative sense; it means most of the variance in effect estimates is between-study rather than within-study, which is almost always true in clinical research.

## Key talking points

**Slide: Fixed vs. Random Effects**
Fixed effects model: assumes one true effect, all studies estimate it. Random effects model: assumes the true effect varies across studies, we're estimating the mean of the distribution of true effects. For most clinical meta-analyses, random effects is the correct default — unless there's strong reason to believe all studies are estimating the same quantity.

**Slide: Publication Bias — Funnel Plot**
Show the asymmetric funnel and explain: small studies with null results are less likely to be published. This creates a systematic bias toward positive results in the published literature. The funnel plot asymmetry test (Egger's test) has low power in small meta-analyses.

**Slide: Covariate Shift and Transportability**
This is the connection to registry analytics. Your meta-analysis pooled studies from European trauma centers, but you're at a Level 1 trauma center seeing a different mechanism mix. The covariate distribution in your population differs from the meta-analysis population. Transportability reweighting adjusts for this.

**Slide: When Not to Pool**
If the included studies are too heterogeneous in design, population, or intervention to be meaningfully combined — don't pool. A systematic review without a meta-analysis is a legitimate output. A poorly pooled meta-analysis produces a precise answer to the wrong question.

## Timing
- Fixed and random effects meta-analysis: 20 min
- Forest plots, heterogeneity, publication bias: 15 min
- Transportability and external validity: 20 min

## Series-Level Discussion Questions

1. A propensity-weighted analysis finds a tourniquet timing benefit with OR 0.62 (95% CI: 0.44–0.88). The E-value is 2.1. Is this finding robust? What additional information would you want?

2. A systematic review of trauma fluid resuscitation includes 15 studies across three continents, five trauma mechanisms, and three fluid protocols. I² = 78%. Should you pool? What would you do instead?

3. You want to use target trial emulation to study the effect of early tracheostomy on ventilator days in a military trauma registry. Specify the target trial: eligibility, intervention, comparator, outcome, follow-up, assignment strategy.

4. A colleague argues that propensity score matching is "basically as good as randomization." Under what conditions is this approximately true? Under what conditions does it break down badly?

5. A meta-analysis of registry studies in civilian trauma shows mortality benefit for a new protocol. Your setting is forward-deployed military trauma with different injury mechanisms and care echelons. Walk through the steps of a transportability analysis.