No Yes
No 0.4 0.1
Yes 0.1 0.4
Applied Statistics for AI & Clinical Decision-Making — Lecture 1 of 10
Data InDeed | dataindeed.org
2026-01-01
Probability is not a prerequisite.
It is the language.
Three posts · One lecture · ~50 minutes
Post 1 How Probability Powers AI
Post 2 Random Variables
Post 3 Key Distributions
From spam filters to clinical triage
AI does not eliminate uncertainty. It formalizes decisions under uncertainty.
Every AI/ML system that outputs a “prediction” is really computing a probability:
In trauma care — vitals may be incomplete, mechanism may be wrong, hemorrhage may be occult.
Probability gives us a disciplined way to reason under incomplete information.
All of probability theory rests on three rules:
Axiom 1 (Non-negativity): \(P(A) \geq 0\)
Axiom 2 (Normalization): \(P(S) = 1\)
Axiom 3 (Additivity): If A, B are mutually exclusive: \(P(A \cup B) = P(A) + P(B)\)
Why they matter for AI: A classifier that outputs incoherent probabilities (e.g., probabilities that don’t sum to 1) is mathematically invalid, not just poorly calibrated.
The three types you’ll use constantly:
| Type | Definition | Example |
|---|---|---|
| Joint | \(P(A \cap B)\) | P(contains “free” AND is spam) |
| Marginal | \(P(A) = \sum_B P(A,B)\) | P(spam) regardless of words |
| Conditional | \(P(A \mid B) = \frac{P(A,B)}{P(B)}\) | P(spam | contains “free”) |
Trauma translation:
A 2×2 table gives you the entire joint probability space:
No Yes
No 0.4 0.1
Yes 0.1 0.4
Row sums → marginal P(contains_free) Column sums → marginal P(spam) Each cell → joint P(contains_free, spam)
\[P(\theta \mid \text{data}) = \frac{P(\text{data} \mid \theta) \cdot P(\theta)}{P(\text{data})}\]
Posterior ∝ Likelihood × Prior
In clinical triage:
Why it matters for AI:
Every Bayesian model, naive Bayes classifier, and probabilistic neural network is applying this one formula.
# Simulate: 1000 trauma patients, 15% have occult hemorrhage
# Positive FAST exam sensitivity = 0.85, specificity = 0.92
n <- 10000
p_hem <- 0.15
sens <- 0.85
spec <- 0.92
# Compute positive predictive value (PPV) — Bayes in one formula
p_pos_given_hem <- sens
p_pos_given_no_hem <- 1 - spec
p_pos <- p_pos_given_hem * p_hem + p_pos_given_no_hem * (1 - p_hem)
ppv <- (p_pos_given_hem * p_hem) / p_pos
tibble::tibble(PPV = round(ppv, 3), FDR = round(1 - ppv, 3))# A tibble: 1 × 2
PPV FDR
<dbl> <dbl>
1 0.652 0.348
The mathematical bridge between events and numbers
A random variable maps outcomes of a random process to numbers we can compute with.
Discrete random variables Take countable values:
Continuous random variables Take any value in a range:
The distribution of a random variable tells you how likely each value (or range of values) is.
Two numbers that summarize any distribution:
Expected value (mean): \(E[X] = \sum_x x \cdot P(X=x)\)
Variance: \(\text{Var}(X) = E[(X - \mu)^2]\)
Standard deviation: \(\text{SD}(X) = \sqrt{\text{Var}(X)}\)
Two models with the same mean prediction can have very different clinical implications:
set.seed(123)
preds <- tibble::tibble(
model_a = rnorm(500, mean = 0.3, sd = 0.05),
model_b = rnorm(500, mean = 0.3, sd = 0.18)
) |>
tidyr::pivot_longer(everything(), names_to = "model", values_to = "pred_risk")
ggplot2::ggplot(preds, ggplot2::aes(x = pred_risk, fill = model)) +
ggplot2::geom_density(alpha = 0.5) +
ggplot2::labs(title = "Same mean risk — very different uncertainty",
x = "Predicted Risk", y = "Density") +
theme_di()The models behind the data
| Distribution | Data type | Classic use in trauma/AI |
|---|---|---|
| Bernoulli | Binary outcome (0/1) | Mortality, complication (yes/no) |
| Binomial | Count of successes in n trials | # of CPG-compliant cases out of 20 |
| Poisson | Count of rare events | ED arrivals per hour, rare complications |
| Normal | Continuous, symmetric | Lab values, physiologic scores |
| Exponential | Time until event | Time to hemorrhage control |
| Beta | Proportions (0–1) | Prior on compliance rate |
| Gamma / Weibull | Skewed positive continuous | Survival time, ICU LOS |
x <- seq(-4, 4, length.out = 300)
tibble::tibble(x = x, y = dnorm(x)) |>
ggplot2::ggplot(ggplot2::aes(x, y)) +
ggplot2::geom_line(linewidth = 1.2, color = "#2563eb") +
ggplot2::geom_area(fill = "#2563eb", alpha = 0.12) +
ggplot2::labs(title = "Standard Normal N(0,1)",
x = "Standard deviations from mean", y = "Density") +
theme_di()
The 68-95-99.7 rule:
It’s ubiquitous because of the CLT — sums of many independent variables converge to Normal (Lecture 2).
Clinical: Systolic BP, hematocrit, temperature — all approximately Normal in stable populations.
lambdas <- c(1, 3, 8)
pois_df <- purrr::map_dfr(lambdas, function(l) {
tibble::tibble(
x = 0:20,
prob = dpois(0:20, lambda = l),
lambda = paste0("λ = ", l)
)
})
ggplot2::ggplot(pois_df, ggplot2::aes(x, prob, fill = lambda)) +
ggplot2::geom_col(position = "dodge", alpha = 0.85) +
ggplot2::scale_fill_brewer(palette = "Blues") +
ggplot2::labs(title = "Poisson distribution — three rates",
x = "Count", y = "P(X = x)", fill = NULL) +
theme_di()Clinical application: Rare adverse events — anastomotic leaks, intraoperative cardiac arrest, battlefield tourniquet failures. When mean ≈ variance and events are independent, Poisson is the right model.
Is the outcome binary (0/1)?
→ Bernoulli / Binomial
Is the outcome a count?
→ Poisson (if mean ≈ variance)
→ Negative Binomial (if overdispersed)
Is the outcome time-to-event?
→ Exponential / Weibull / Cox (Lecture 6)
Is the outcome continuous and symmetric?
→ Normal
Is the outcome continuous and right-skewed?
→ Gamma / Log-Normal
Is the outcome a proportion (0–1)?
→ Beta (as a prior) / logistic regression
Probability
Random Variables
Distributions
Clinical Application
The Laws That Make Statistics Work
Posts 04, 05, 06:
These three results are why statistics can make reliable inferences from incomplete data.
Read Before Lecture 2
Blog posts for this lecture:
Data InDeed · Jonathan D. Stallings, PhD · dataindeed.org
Data InDeed · Applied Statistics Series · Lecture 1 | ⚡ Open App