This curriculum maps the Data InDeed blog series into a structured, clinically grounded learning pathway for physicians at every stage of training. It is not designed for PhD statisticians. It is designed for clinicians who need to read, evaluate, and act on data โ during training, in practice, and when patients are affected by AI tools, registry benchmarks, or observational research.
No coding required. Every module emphasizes interpretation and critical evaluation, not derivation or implementation. Estimated time per module: 3โ5 hours of reading + 1โ2 hours for the practical exercise.
๐๏ธ Lecture slides are available for all six blog series โ RevealJS slide decks with live R visualizations, speaker notes, and chalkboard mode. See each module below for direct slide links, or browse the full lecture library.
How to Use This Curriculum
Self-paced: Read one post per day, 30โ45 minutes. Work through modules in any order within a tier. Complete the practical exercise before moving to the next module.
Course-based: Each tier maps to one semester of protected seminars (6 sessions for Foundation, 5 for Applied, 5 for Advanced). Assign readings in advance; use the exercise as the in-session case discussion anchor.
| Tier | Audience | Modules | Est. Hours | Lecture Series |
|---|---|---|---|---|
| Foundation | MS1โMS2 | 6 | 25โ35 hrs | Applied Stats (10 lectures) ยท DOE (4) ยท Advanced Stats (4) |
| Applied | MS3โMS4, PGY-1 | 5 | 20โ30 hrs | Trauma Registry (5 lectures) ยท Ethics (4) ยท Advanced Stats (4) |
| Advanced | Fellows, clinician-researchers | 5 | 25โ35 hrs | Advanced Stats (4 lectures) ยท Trauma Registry (5) ยท OMOP (2) ยท Ethics (4) |
Tier 1 โ Foundation
For MS1โMS2 ยท Build statistical vocabulary and the habit of asking "how do we know this?" before accepting clinical evidence at face value.
F1 ยท How Evidence Is Made โ Probability, Uncertainty, and the Limits of Knowing
- L1 โ Probability โ manipulate event probabilities and see joint/conditional/marginal relationships update live; anchors the diagnostic reasoning framing
- L1 โก Bayes โ adjust prior and likelihood; watch the posterior shift; directly maps to sensitivity/specificity and pre-test probability
- L1 โข Random Variables โ vary distribution parameters and observe how expectation and variance change
- L1 โค Distributions โ overlay clinical examples (LOS, lab values, mortality) on named distributions; use to illustrate shape โ family choice
- L2 โ CLT Explorer โ sample repeatedly from a skewed population; watch the sampling distribution converge to normal as n grows
- L2 โข LLN Convergence โ show how the running mean stabilizes as n increases; grounds expected value in simulation
F2 ยท Reading a Study Without Being Fooled โ p-Values, Confidence Intervals, and Statistical Significance
- L3 โ p-value Explorer โ shift effect size, sample size, and variance; watch the p-value move; illustrates why large n produces small p even for trivial effects
- L3 โก Type I / II Error โ move the ฮฑ threshold and see the trade-off between false positives and false negatives in real time; essential for the arbitrary ฮฑ = 0.05 discussion
- L3 โข CI Coverage โ repeatedly sample and observe how often the interval contains the true value; corrects the common misconception about what 95% means
- L3 โฃ CI Precision โ show how n and variance drive interval width independent of the p-value
F3 ยท Study Design โ Why the Architecture of Evidence Determines What You Can Conclude
- L2 โค Power & Sample Size โ vary effect size, ฮฑ, and power; show how each design choice has a sample size consequence; connects study design decisions to resource constraints
- L2 โฃ Survivorship Bias โ illustrate how selection at enrollment vs. follow-up distorts study findings; applicable to registry-based observational designs
F4 ยท Regression and Prediction โ What Clinical Models Actually Do
- L5 โ OLS Explorer โ drag data points; watch the regression line, residuals, and Rยฒ update; builds intuition for what "best fit" means and why outliers are influential
- L5 โก LINE Diagnostics โ toggle assumption violations (heteroscedasticity, non-linearity); show what "bad" residual plots look like vs. well-behaved ones
- L5 โข Logistic Regression โ vary predictor values; see how the log-odds to probability transformation works; ground the odds ratio interpretation
- L5 โฃ Calibration โ compare a well-calibrated model to one that systematically over- or under-predicts; show the calibration curve and Brier score
- L8 โ Bias-Variance โ move model complexity; watch training vs. test error diverge; the clearest possible illustration of overfitting
- L9 โ ROC & AUC โ shift the classification threshold; show the sensitivity/specificity trade-off live; explain why AUC 0.5 = chance and what 0.75 vs. 0.90 means clinically
F5 ยท Missing Data โ The Patients Who Aren't in the Table
F6 ยท Survival Analysis โ Reading Time-to-Event Evidence
- L6 โข Censoring & KM โ add and move censoring events; watch the KM curve update and the risk table change; illustrates why the right tail is unreliable
- L6 โฃ Cox Model โ vary covariate values; show how the hazard ratio shifts and how proportional hazards can be violated; grounds interpretation of published Cox tables
Tier 2 โ Applied
For MS3โMS4 and PGY-1 Residents ยท Connect statistical reasoning to clinical decision-making encountered on wards, in journal clubs, and when evaluating AI tools in practice.
A1 ยท Confounding and Causation โ Why "Associated With" Is Not "Causes"
A2 ยท Clinical Prediction Tools and AI โ What to Ask Before You Trust a Score
- L9 โ ROC & AUC โ shift prevalence of the rare outcome; watch how a fixed AUC translates into very different PPV; makes the rare-event problem viscerally clear
- L9 โก Calibration โ compare a model's predicted risk to observed outcomes by decile; shows when a model with high AUC still fails clinically due to miscalibration
- L9 โข Variable Importance โ toggle predictors; observe how importance rankings shift with data; grounds the discussion of which features are driving a black-box score
- L8 โค k-Fold CV โ show the variance in performance across folds; illustrates why single-split validation overestimates true model performance
A3 ยท Real-World Evidence โ What Registry Data and EHR Studies Can and Cannot Show
A4 ยท Ethics, Bias, and Accountability in Clinical AI
A5 ยท Bayesian Thinking at the Bedside โ Updating Beliefs With Evidence
- L4 โ Bayesian Updater โ set a prior (flat, weakly informative, or strong), add observed data, watch the posterior shift; use with a clinical prior probability scenario
- L4 โก Posterior Predictive โ show how the posterior distribution generates predictions for new observations; grounds "what does this model say about the next patient?"
- L4 โข Monte Carlo โ run repeated sampling to approximate intractable integrals; illustrates why simulation-based inference works when formulas don't
- L4 โฃ MCMC Explorer โ visualize a Markov chain converging to the posterior; optional for clinical audiences but powerful for research fellows who want the "why it works" explanation
Tier 3 โ Advanced
For Clinical Fellows and Clinician-Researchers ยท Develop analytical independence to critique, commission, and contribute to clinical research โ including registry-based studies, AI validation, and real-world evidence generation.
Adv1 ยท Hierarchical Models, Clustering, and Why Healthcare Data Have Structure
- L7 โ PCA Explorer โ rotate through PC axes; color points by severity, neuro injury, or shock; shows how PCA reveals clinical phenotypes without using the outcome as a label
- L7 โก Scree & Loadings โ move the "retain k PCs" slider; show cumulative variance captured; grounds the practical decision of how many dimensions to keep
- L7 โข k-Means โ vary k and seed; show how the elbow plot guides selection; illustrates that cluster labels are arbitrary โ their clinical meaning must be interpreted
- L7 โฃ Hierarchical โ switch linkage methods (Ward vs. single vs. complete); show how the dendrogram shape changes; connects to the clustering-as-phenotyping narrative in the trauma registry context
- L7 โค Curse of Dims โ increase p; watch the max/min distance ratio collapse toward 1; makes the case for dimensionality reduction before clustering high-dimensional registry data
Adv2 ยท Causal Inference for Clinician-Researchers โ Moving Beyond Association
Adv3 ยท Missing Data, Sensitivity Analysis, and Analytic Honesty
Adv4 ยท Emerging Real-World Evidence โ Synthetic Data, Digital Twins, and AI-Enabled Evidence
Adv5 ยท Data Standards, Interoperability, and the Architecture of Research Data
Recommended Reading Paths
Shortest coherent path through all three tiers
Primarily interested in clinical AI evaluation
Entering a research fellowship
Building a registry-based research project
Video-first: all lecture series in sequence
What This Curriculum Does Not Do
This curriculum does not teach coding, does not require statistical software, and does not prepare learners to conduct original statistical analyses. That is intentional.
The goal is a clinician who can:
- Read a methods section critically
- Ask the right questions before trusting a model or registry report
- Recognize when a statistical claim exceeds what the design supports
- Evaluate AI tools before they affect patient care
- Contribute meaningfully to a research team without delegating all methodological judgment
Interested in consulting support for curriculum implementation, faculty development, or statistical programming for clinical research? Get in touch.
All readings in this curriculum are freely available through the Data InDeed blog series. No subscription or software is required.