---
title: "Applied Statistics — Master Speaker Notes"
subtitle: "Instructor Teaching Guide · 10-Lecture Series"
author: "Jonathan D. Stallings, PhD, MS"
date: "Summer 2026"
format:
  html:
    toc: true
    toc-depth: 3
    toc-title: "Lecture Navigator"
    number-sections: true
    theme: cosmo
    code-fold: false
---

> **How to use this guide.** These are instructor-facing notes to accompany the 10-lecture Applied Statistics slide deck series. Each section covers: what to emphasize, how to handle the live R slides, likely student questions, and timing guidance. This guide does not reproduce slide content verbatim — it expands the reasoning and provides context that helps a presenter deliver confidently and connect material to clinical practice.

---

# Lecture 1 — Probability Foundations

**Posts covered:** 01 (Probability & AI), 02 (Random Variables), 03 (Distributions)
**Target audience:** Clinicians with no prior statistics background
**Tone:** Demystifying. Frame math as a language, not a test.

## Teaching strategy

Open by asking the audience to tell you the probability that a positive troponin means MI. Let them guess. Most will say something confident. Then reveal: it depends entirely on the pre-test probability of the population you're testing. This immediately grounds probability in clinical reasoning they already do — imperfectly — every day.

The Monty Hall problem (if you include it) lands with most audiences as genuinely surprising. Use that discomfort productively: "The right answer feels wrong because our probability intuition is badly calibrated. Medicine runs on probability. This matters."

## Key talking points

**Slide: Joint vs. Conditional Probability**
The most important concept in this lecture. Spend extra time. Almost every clinical reasoning error — overtesting low-risk patients, missing rare diagnoses in high-risk ones — is a failure to correctly condition on the right evidence. Draw a 2×2 table on the whiteboard and fill it in with a disease prevalence example.

**Slide: Distributions**
Don't get lost in the math. The core message: distributions are models for uncertainty. A normal distribution says "we expect values to cluster here, with spread that looks like this." A binomial distribution says "we're counting things that either happen or don't." Ask the audience which distribution describes: mortality in a trauma cohort (binomial), length of stay (right-skewed, maybe log-normal), lab values in healthy patients (normal).

**Slide: Expected Value**
Use the framing "what would happen on average if you ran this scenario 10,000 times." This maps cleanly to frequentist thinking and sets up later lectures on CIs.

## Timing
- Part 1 (Probability): 20 min
- Part 2 (Random Variables): 15 min
- Part 3 (Distributions): 15 min
- Q&A / whiteboard: 10 min

## Common questions
- *"Is a p-value a probability?"* Yes — but of what? Save the full answer for Lecture 3; here just say "it's a conditional probability, and we'll get into exactly what that condition is."
- *"What distribution should I use for my outcome variable?"* Depends on the outcome type. Binary outcome → logistic model (binomial). Count → Poisson. Continuous → normal as a starting point.

## Interactive App — `lecture-01`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-01/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① Probability | Opening the lecture | Set P(A) and P(B\|A); toggle independence on/off; show how joint probability changes — grounds the 2×2 diagnostic table discussion |
| ② Bayes | After the diagnostic reasoning framing | Set disease prevalence to 2%, sensitivity to 90%, specificity to 85%; show how PPV collapses to ~12%; this surprises nearly every clinician |
| ③ Random Variables | During the expectation/variance section | Vary mean and variance sliders; show how spread changes without changing the center — useful for LOS vs. mortality examples |
| ⑤ Distributions | Closing Part 3 | Overlay a right-skewed distribution (LOS) and a binomial (mortality); ask the audience which family applies to each clinical outcome they work with |
| ⑥ Distribution Shift | If time permits | Show what happens to a model trained on one distribution when the test population shifts — foreshadows the model deployment lectures |

---

# Lecture 2 — Statistical Laws & Distributions

**Posts covered:** 04 (CLT), 05 (Bayes' Theorem), 06 (Sampling)
**Key pivot:** From describing individual observations to describing behavior of estimates.

## Teaching strategy

The Central Limit Theorem is the statistical miracle that makes most of applied statistics work. Make it concrete: run the simulation live (the slides already do this), and let the audience watch a skewed distribution's sample means converge to normal as n increases. Then say: "This is why we can use t-tests on non-normal data. The statistic is approximately normal — the data doesn't have to be."

Bayes' Theorem should be taught as an update rule, not a formula. "You started with a prior belief. You saw new data. How much should you update?" Use sensitivity/specificity of a clinical test and walk through the full Bayes calculation for a hypothetical patient.

## Key talking points

**Slide: CLT — Why Sample Size Matters**
The practical implication: once n ≥ 30 or so, many things are "approximately normal enough." This is why confidence intervals have the shape they do, and why sample size calculations work.

**Slide: Bayes' Theorem — Diagnostic Reasoning**
Draw the 2×2 manually if you can. Show: high-prevalence disease + imperfect test → PPV < you'd expect. Low-prevalence disease + same test → PPV dramatically worse. This surprises nearly every clinician.

**Slide: Sampling Methods**
Connect to their daily experience: who ends up in a trauma registry? Not random. Who gets enrolled in RCTs? Not random. What are the consequences of convenience sampling for generalizability?

## Timing
- Part 1 (CLT): 20 min
- Part 2 (Bayes): 20 min
- Part 3 (Sampling): 15 min
- Discussion: 5 min

## Discussion prompt
"Your hospital is benchmarking its sepsis mortality rate against a national registry. The registry uses a different inclusion criterion. What sampling issue is that? What does it do to the comparison?"

## Interactive App — `lecture-02`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-02/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① CLT Explorer | Core of Part 1 | Sample from an exponential or uniform population; increase n from 5 → 30 → 100; show the sample mean distribution converging to normal — this is the moment the audience sees *why* t-tests work on non-normal data |
| ② Standard Error | After CLT | Show how SE = σ/√n; increase n and watch the SE shrink; grounds the intuition that bigger studies produce more precise estimates |
| ③ LLN Convergence | After expected value definition | Show the running mean bouncing widely at first then stabilizing; use to define expected value as a long-run average |
| ④ Survivorship Bias | After sampling methods | Illustrate how conditioning on survival at enrollment changes what you can conclude; apply to the registry question of who gets to a trauma center |
| ⑤ Power & Sample Size | If time permits here (or defer to L3) | Move effect size and α; show how required n balloons as effect size shrinks — motivates why underpowered studies are not just unlucky, they're uninformative |

---

# Lecture 3 — Inference, Sampling & Survival

**Posts covered:** 07 (Hypothesis Testing), 08 (Confidence Intervals), 09 (Survival Analysis)
**This lecture is high-stakes** — hypothesis testing is where most clinical misinterpretation lives.

## Teaching strategy

Start with the provocation: "A p-value does not tell you the probability that the null hypothesis is true. Almost everyone thinks it does." Pause. Let that land. Then walk through exactly what a p-value does tell you: P(data this extreme | H₀ is true).

The confidence interval section should emphasize that CIs are about the procedure, not about this particular interval. "95% of intervals constructed this way will contain the true value" is subtly different from "there's a 95% probability the true value is in this interval." If the audience doesn't see why that distinction matters yet, that's fine — Lecture 4 (Bayesian) will make it concrete.

Survival analysis is best taught visually. Walk through the Kaplan-Meier curve step by step: what happens at each event time, what censoring looks like, what the thinning tail means for the width of the confidence band.

## Key talking points

**Slide: Type I and Type II Errors**
Clinical framing: Type I error = false alarm (treating a patient who doesn't need treatment, approving a drug that doesn't work). Type II error = missed signal (not treating a patient who needs it, not detecting a drug that works). The tradeoff is real and deliberate.

**Slide: Why p < 0.05 Is Arbitrary**
R.A. Fisher chose 0.05 as a convenient convention in the 1920s — for agricultural experiments. It has no principled clinical justification. This doesn't mean abandon significance testing; it means don't treat the threshold as sacred.

**Slide: Kaplan-Meier Curves**
Ask: "What does the right tail of a KM curve tell you?" Answer: less than you think — that region has very few patients still at risk, so the confidence band is wide and the curve is unstable. Many clinicians read the right tail as reliable data.

## Timing
- Hypothesis testing: 20 min
- Confidence intervals: 15 min
- Survival analysis: 20 min

## Interactive App — `lecture-03`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-03/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① p-value Explorer | Opening the hypothesis testing section | Set a true effect of 0; show how p-values are uniformly distributed under H₀ — this destroys the "small p = important finding" intuition immediately. Then set n=5000 with effect=0.001; show p<0.001 despite a trivially small effect |
| ② Type I / II Error | After defining the two error types | Move the α slider and show the red/blue regions shift; demonstrate the direct trade-off — lowering α (fewer false alarms) always increases β (more missed signals) |
| ③ CI Coverage | After the frequentist CI definition | Repeatedly sample and show how many intervals contain the true value; the visual of ~95% of intervals capturing the parameter corrects the "this interval has 95% probability of containing the truth" misconception |
| ④ CI Precision | Connecting n to interval width | Show that a wider CI from a small study is not just unlucky — it's the correct answer given less data; motivation for adequate sample sizes |
| ⑤ MLE Explorer | Advanced audiences or research fellows | Show the likelihood function and how the MLE sits at its peak; grounds the "why these estimates?" question for logistic and Cox models |

---

# Lecture 4 — Bayesian Inference & Simulation

**Posts covered:** 10 (Bayesian Inference), 05 (Monte Carlo — if included)

## Teaching strategy

This lecture pairs naturally with Lecture 3. Where Lecture 3 covers what frequentist inference says (and doesn't say), Lecture 4 covers the Bayesian alternative. Key message: "Bayesian statistics lets you make the statement you actually want to make: there's a 95% probability the parameter is in this range."

The prior sensitivity demonstration is crucial. Show a flat prior, a weakly informative prior, and a strongly informative prior, and show how each shifts the posterior. The lesson: priors are not bias — they're honesty about what you knew before data arrived.

## Key talking points

**Slide: Prior as Transparency**
Frequentist statistics implicitly assumes a flat prior (complete ignorance). Bayesian statistics makes the prior explicit. Which is more honest when you have 15 years of trauma surgery experience?

**Slide: Monte Carlo Simulation**
The simplest framing: "We can't derive the answer analytically. But we can simulate it 100,000 times and count." Power calculations, expected value of information, sample size under uncertainty — all can be done via simulation when the math is intractable.

## Discussion prompt
"You're designing a study on a rare complication. You have strong prior data from a similar institution suggesting 8% rate. An uninformative prior would pretend you don't know this. A Bayesian informative prior would use it. When is using that prior knowledge ethical? When is it questionable?"

## Interactive App — `lecture-04`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-04/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① Bayesian Updater | Core of Part 1 | Set a flat prior; add small data; show minor posterior shift. Then add a strong informative prior (8% complication rate from institutional data); show how the posterior is pulled toward prior with small n but dominated by data at large n — directly answers "when does prior matter?" |
| ② Posterior Predictive | After posterior derivation | Sample from the posterior; show the distribution of predictions for a new patient; distinguishes parameter uncertainty from prediction uncertainty |
| ③ Monte Carlo | During the simulation section | Estimate π or an integral by sampling; show convergence as n grows; grounds the "why simulation?" narrative for cases where analytical posteriors don't exist |
| ④ MCMC Explorer | Research fellows / advanced audiences | Show the chain traversing the posterior landscape; demonstrate burn-in and mixing; explain why MCMC produces samples rather than a closed-form answer |
| ⑤ Entropy & KL | If covering information theory | Show KL divergence between two distributions shifting as they diverge; optional for most clinical audiences but useful for AI/ML-focused groups |

---

# Lecture 5 — Regression Methods

**Posts covered:** 11 (Linear Regression), 12 (Logistic Regression), 13 (Multiple Regression)

## Teaching strategy

Most clinicians have seen regression output but have never had the concepts explained from first principles. Start with the simplest case: one predictor, one outcome, draw a line through data. That's linear regression. Everything else is an elaboration.

For logistic regression, the odds ratio explanation is the most important. Use a concrete 2×2 example before showing any formula. Interpretation: "For a one-unit increase in ISS, the odds of mortality multiply by this factor."

## Key talking points

**Slide: The Regression Equation**
The intercept is "the predicted value when all predictors are zero." For most clinical models, that's not a meaningful patient — so the intercept alone rarely has clinical interpretation.

**Slide: Coefficient Interpretation in Logistic Regression**
This is where most clinical readers get lost. Walk through: coefficient → exponentiate → odds ratio → convert to approximate risk ratio for rare outcomes. The log-odds scale is mathematically necessary but clinically unintuitive.

**Slide: Multiple Regression — Adjustment**
Key message: every coefficient in a multiple regression model means "the effect of this variable, holding all other variables constant." That's what adjustment means. Confounding is controlled by holding confounders constant in the model.

## Common questions
- *"How do I know if my regression assumptions are met?"* Residual plots. Show what "bad" residuals look like vs. "good."
- *"What's the difference between logistic regression and classification?"* Same model, different framing. Logistic regression gives probabilities; classification adds a threshold.

## Interactive App — `lecture-05`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-05/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① OLS Explorer | Opening Part 1 | Add a high-leverage outlier; watch the regression line pulled toward it while R² drops; illustrates why outlier detection matters before modeling. Then remove it and show the line snap back |
| ② LINE Diagnostics | After assumption listing | Toggle heteroscedasticity (variance that grows with fitted values); show the fan-shaped residual plot; toggle non-linearity; show the curved residual pattern — makes assumption violations recognizable from the output, not just theory |
| ③ Logistic Regression | Part 2 opener | Vary ISS and age; show the S-shaped probability curve; show how increasing ISS shifts the curve; demonstrate that "coefficient = log-odds change per unit" by reading the app output |
| ④ Calibration | After predicted probability discussion | Set a model that systematically overestimates risk; show the calibration curve bending above the diagonal; ask: "would you trust this model's predicted 20% mortality?" |
| ⑤ GLM Family | Research fellows or if time permits | Switch outcome family (Poisson, gamma, binomial); show how the link function changes; grounds the "which GLM?" question for count outcomes and time-to-event |

---

# Lecture 6 — Comparing Groups & Nonparametrics

**Posts covered:** 14 (ANOVA), 15 (Chi-Square), 16 (Nonparametric Tests), 17 (Correlation)

## Teaching strategy

This lecture is often the most directly applicable to clinical audiences — these are the tests they see in journal club papers every week. Focus heavily on interpretation and appropriate use, less on mechanics.

The chi-square test section is a good place to emphasize the difference between statistical and clinical significance using a large-n example: with n=50,000, a difference of 0.1% in adverse events is highly significant but almost certainly clinically meaningless.

## Key talking points

**Slide: ANOVA — When to Use It**
ANOVA = t-test for more than two groups, with a correction for multiple comparisons built in. The F-test tells you "at least one mean differs." Post-hoc tests tell you which pairs differ.

**Slide: Nonparametric Tests**
When to use: small n (< 30), obvious non-normality, ordinal outcomes (pain scales, GCS). Cost: less power than parametric equivalents when normality holds.

**Slide: Correlation ≠ Causation**
Every medical statistics audience needs this reminder with a concrete clinical example. ISS and mortality are correlated — but ISS doesn't cause mortality; the injury mechanism does both.

## Interactive App — `lecture-06`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-06/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① ANOVA Explorer | Opening Part 1 | Compare means across 3 groups (e.g., trauma centers A/B/C); increase variance within groups until F-test loses significance; illustrates why within-group spread matters as much as between-group differences |
| ② Multiple Comparisons | After the ANOVA result | Show unadjusted vs. Bonferroni vs. FDR-adjusted p-values across many pairwise comparisons; demonstrate how Type I error inflates without correction — the "garden of forking paths" problem made visual |
| ③ Censoring & KM | Part 3 opener | Add censored observations by moving the slider; watch the KM curve update step-by-step; show how the confidence band widens at the right tail as the risk set thins — resolves the most common clinical misreading of KM plots |
| ④ Cox Model | After KM discussion | Vary a covariate (age, ISS) and show the hazard ratio shift; toggle a time-varying effect and show how the proportional hazards assumption can be violated; grounds interpretation of Cox output in published tables |
| ⑤ Non-Parametric | Closing the lecture | Switch from t-test to Wilcoxon; show power loss for normal data; show power gain for heavy-tailed data — makes the "when to use nonparametrics" rule concrete rather than rule-of-thumb |

---

# Lecture 7 — Dimensionality & Unsupervised Learning

**Posts covered:** 16 (Clustering), 17 (PCA), 18 (Factor Analysis — if covered)

## Teaching strategy

Frame dimensionality reduction as answering: "Can we find structure in data without an outcome variable?" This is different from every lecture so far — no prediction, no inference, just pattern discovery.

PCA is best taught geometrically: you have data in high-dimensional space; PCA finds the directions of maximum variance. The first PC explains the most variance. Show the biplot and ask the audience what the PCs seem to represent in clinical terms.

## Key talking points

**Slide: k-Means Clustering**
The algorithm is simple: assign points to the nearest centroid, update centroids, repeat. The hard problem is choosing k. Show the elbow plot. Discuss stability: do the clusters replicate if you run the algorithm twice?

**Slide: PCA in Clinical Context**
A common use case: a trauma registry has 40 lab values for each patient. PCA reduces them to 5–10 dimensions that capture most of the variance. This makes downstream analysis tractable.

## Interactive App — `lecture-07`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-07/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① PCA Explorer | Part 1 opener | Set correlation to 0.7 (high inter-variable correlation); color by "High lactate (severity)"; show PC1 vs. PC2 — the severity group should separate cleanly. Then color by "None" and ask the audience what structure they see before revealing the label. This demonstrates unsupervised discovery |
| ② Scree & Loadings | After PCA geometry | Move the "retain k PCs" slider; read the cumulative variance; show the loadings heatmap to identify which variables drive each PC (e.g., PC1 = physiologic stress, PC2 = neurologic status); grounds the "how many PCs?" decision |
| ③ k-Means | Part 2 — clustering introduction | Set k=3; show the elbow plot; then set k=2 and k=6; ask the audience which k is defensible. Key point: the algorithm always produces clusters — the question is whether they mean anything |
| ④ Hierarchical | After k-means | Switch linkage from Ward to single; show how the dendrogram shape changes dramatically; use to explain why Ward is preferred for compact, equal-sized clinical phenotypes |
| ⑤ Curse of Dims | Closing the lecture | Increase p from 2 to 100; watch the distance ratio collapse toward 1; show the histogram where high-dimensional distances concentrate; delivers the punchline — distance-based methods like kNN and k-means lose meaning as p grows, which is why PCA first then cluster is the clinical data science standard |

---

# Lecture 8 — Model Building & Validation

**Posts covered:** 19 (Feature Selection), 20 (Bias-Variance), 21 (Cross-Validation)

## Teaching strategy

This is the lecture that connects statistics to machine learning in the clinical audience's mind. The bias-variance tradeoff is the central concept — the fundamental tension between a model that is too simple (high bias, underfits) and too complex (high variance, overfits).

The cross-validation demonstration is the most important live computation: show that in-sample performance always looks better than out-of-sample performance. This is why you should never evaluate a model on the data used to train it.

## Key talking points

**Slide: The Overfitting Problem**
A model that memorizes the training data has perfect training accuracy and poor test accuracy. Clinical analogy: a resident who memorizes exam question answers but cannot reason through novel clinical presentations.

**Slide: Regularization — Ridge and Lasso**
Lasso performs variable selection by shrinking some coefficients exactly to zero. Ridge shrinks all coefficients toward zero but doesn't eliminate them. When to use: whenever you have many predictors relative to observations.

**Slide: Cross-Validation Properly Done**
K-fold: split into K folds, train on K-1, test on 1, rotate. Temporal validation: if data has time structure (registry data always does), validate on a later time period — not a random split. This is the clinical standard.

## Interactive App — `lecture-08`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-08/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① Bias-Variance | Core of Part 1 | Drag the complexity slider from left (underfit) to right (overfit); show training error decreasing monotonically while test error forms a U-shape; this single visualization carries the entire bias-variance concept — spend 5 minutes here |
| ② Overfitting Demo | After bias-variance | Fit a degree-15 polynomial to noisy data; show it threading every training point with a test error far above training error; clinical analogy: a model that "memorized" which patients in the training set died |
| ③ Regularization | Part 2 opener | Move the λ slider from 0 (OLS) to large (heavy regularization); show coefficients shrinking toward zero in the path plot; show Lasso driving some to exactly zero (variable selection) vs. Ridge which retains all |
| ④ CV & λ Selection | After regularization | Show cross-validation error curve as a function of λ; identify the 1-SE rule minimum; explains how λ is actually chosen in practice — not by intuition |
| ⑤ k-Fold CV | Closing Part 3 | Show how performance estimates vary across folds; compare 5-fold vs. 10-fold vs. LOOCV variance; ask: "why not always use LOOCV?" — answers: computational cost and high variance |

---

# Lecture 9 — Evaluation & Ensembles

**Posts covered:** 22 (ROC/AUC), 23 (Calibration), 29 (Model Evaluation), 24–25 (Ensemble Methods)

## Teaching strategy

This lecture makes the point the trauma registry series builds on: AUC and calibration are both necessary, and they measure different things. A model can rank patients correctly (good AUC) while its risk estimates are systematically wrong (poor calibration). For clinical decision-making, calibration often matters more.

The decision curve analysis slide is clinically powerful: it shows net benefit across a range of threshold probabilities. Ask the audience: "At what threshold probability would you treat? 10%? 30%? What does the DCA say about whether the model adds value at your threshold?"

## Key talking points

**Slide: AUC — What It Does and Doesn't Mean**
AUC = P(randomly selected event > randomly selected non-event in model ranking). Perfect discrimination = 1.0. Random guessing = 0.5. It says nothing about whether the risk estimates are accurate.

**Slide: Calibration — The Hosmer-Lemeshow Test and Beyond**
HL test: often fails to detect poor calibration in large datasets (everything is significant at n=5000). Calibration plots are better. Show the decile calibration plot.

**Slide: Ensemble Methods — Random Forest and Boosting**
Random forest: many trees, each trained on a bootstrap sample with random feature subset. Predictions averaged. This reduces variance without increasing bias much. Gradient boosting: sequential trees, each correcting the errors of the previous. Generally higher performance, but more tuning required.

## Interactive App — `lecture-09`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-09/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① ROC & AUC | Part 1 opener | Set prevalence to 5% (rare outcome); move the threshold slider; show how sensitivity and specificity trade off; then show how PPV collapses even with AUC=0.85 at low prevalence — the most important clinical AI literacy point in the series |
| ② Calibration | After AUC | Show a model with perfect AUC (1.0) but catastrophic calibration (all predicted risks wrong); then show a model with AUC 0.75 but good calibration; ask the audience which they'd rather use for a treatment threshold decision |
| ③ Variable Importance | After ensemble discussion | Toggle between permutation importance, SHAP, and Gini; show how rankings differ; grounds the discussion of "why does the model use this variable?" — and why variable importance is model-specific, not universal |
| ④ Time Series | Part 3 opener | Show a registry volume trend; decompose into trend, seasonality, residual; explains why naive year-over-year comparisons miss seasonal confounding |
| ⑤ ARIMA Forecast | After time series decomposition | Fit ARIMA to a registry series; show forecast intervals widening into the future; illustrates both the utility and the uncertainty of time-series forecasting for capacity planning |

---

# Lecture 10 — Mathematical Foundations

**Posts covered:** 26–30 (Linear Algebra, Optimization, Information Theory, Deep Learning)

## Teaching strategy

This is the bridge lecture for audiences who want to understand *why* the algorithms work, not just what they produce. Calibrate to audience: for a clinical audience, spend more time on information theory (entropy, information gain) and less on matrix algebra. For a more technical audience, the eigenvalue/eigenvector slides are the entry point to PCA and neural networks.

Frame deep learning appropriately: it is not magic. It is gradient descent applied to a very large composite function with a lot of parameters. The "magic" is that the optimization works — and that the representations learned are often surprisingly meaningful.

## Key talking points

**Slide: Gradient Descent**
Intuition: you're standing on a hilly landscape in fog. You want to reach the lowest point. You look at the slope under your feet and take a step downhill. Repeat until you stop moving. That's gradient descent. The "landscape" is the loss function.

**Slide: Neural Networks — What a Layer Does**
Each layer applies a linear transformation followed by a nonlinear activation function. Stacking layers allows the model to learn increasingly abstract representations. The final layer maps to the output space.

**Slide: When to Use Deep Learning**
Deep learning wins when: you have very large datasets, the input is high-dimensional (images, text, sequences), and feature engineering would be burdensome. For tabular clinical data with n < 50,000, gradient boosting typically outperforms neural networks with less data and less compute.

## Interactive App — `lecture-10`

Run with: `shiny::runApp("shiny_apps/applied-statistics/lecture-10/app.R")`

| Tab | When to use it | What to show |
|---|---|---|
| ① Gradient Descent | Part 1 opener | Start with a large learning rate; show the loss oscillating or diverging; then reduce it; show smooth convergence — delivers the core intuition of optimization without a single equation |
| ② Learning Rate | After the gradient descent concept | Compare learning rate schedules (constant vs. decay vs. cyclic); show how the loss curve behaves differently for each; grounds the "why does my model not converge?" question |
| ③ SVD | Part 2 — linear algebra | Show a matrix being decomposed into singular vectors; reconstruct at rank 1, 3, 10; use the image reconstruction analogy to explain why SVD underlies PCA, recommendation systems, and embeddings |
| ④ Eigenvalues | After SVD | Show a 2D transformation by a matrix with large eigenvalue ratio; visualize how eigenvectors define the axes of maximum stretch — connects directly to PCA geometry from Lecture 7 |
| ⑤ Calculus & Backprop | Closing the lecture | Step through a two-layer network's forward pass and gradient computation; show how the chain rule propagates error signals backward; frames deep learning as "applied calculus at scale," not magic |

---

# Series-Level Discussion Questions

Use these at the end of the series or as capstone discussion prompts:

1. A new risk score for trauma mortality has AUC 0.88 in its derivation paper. What four additional pieces of information would you need before using it in your practice?

2. A hospital's quality team shows that mortality rates dropped 15% after implementing a new protocol. What design do they need to make that claim? What confounders are most likely?

3. A vendor claims their sepsis prediction model was validated on 200,000 patients. What questions would you ask about the validation methodology?

4. You're reviewing a paper that reports p < 0.001 for a treatment effect. Effect size: OR = 1.04. Does this finding support changing practice? What should the authors have reported instead?

5. A registry analysis finds that Black patients have 12% higher mortality than white patients after adjusting for ISS and age. Name three ways this result could be biased — and what additional information would help distinguish them.