Survival Analysis Toolkit (Time Zero, Cox Models, Calibration)

Toolkit

Survival Analysis Toolkit

A practical toolkit for time-to-event analysis, including time-zero definition, Kaplan-Meier estimation, Cox modeling, delayed entry, proportional hazards diagnostics, and reporting templates.

Published

April 8, 2026

Modified

June 9, 2026

Executive Summary

This toolkit is a reusable framework for time-to-event analysis in clinical and operational data.

It includes:

time-zero definition prompts
Kaplan-Meier estimation templates
Cox proportional hazards model scaffolds
delayed-entry guidance
proportional-hazards checks
event-rate and censoring summaries
reviewer-facing reporting language

Survival analysis is not only about fitting a Cox model. It is about aligning time correctly, defining risk sets honestly, and making the event process interpretable under censoring (Kaplan and Meier 1958; Cox 1972; Harrell 2015).

1. Define Time Zero First

The most important step in a survival workflow is often the least glamorous: defining when follow-up begins.

Before modeling, document:

what starts follow-up?
when is a subject first at risk?
when does treatment become eligible?
what event ends follow-up?
what produces censoring?

Many survival problems are really time-alignment problems in disguise. A sophisticated model cannot rescue a badly defined risk origin (Hernán and Robins 2020).

2. Basic Data Structure

Assume a dataset named data with:

time = observed follow-up time
event = 1 if event occurred, 0 if censored
optional entry_time for delayed entry
baseline predictors measured before or at time zero

required_pkgs <- c("survival", "dplyr", "tibble", "ggplot2")
missing_pkgs <- required_pkgs[
  !vapply(required_pkgs, requireNamespace, logical(1), quietly = TRUE)
]
if (length(missing_pkgs) > 0) {
  stop("Missing packages: ", paste(missing_pkgs, collapse = ", "))
}

# Example:
# data <- readRDS("data_processed/time_to_event_df.rds")

Quick summary:

survival_summary <- function(df, time_var, event_var) {
  tibble::tibble(
    n = nrow(df),
    n_events = sum(df[[event_var]] == 1, na.rm = TRUE),
    n_censored = sum(df[[event_var]] == 0, na.rm = TRUE),
    median_followup = stats::median(df[[time_var]], na.rm = TRUE)
  )
}

# survival_summary(data, "time", "event")

3. Kaplan-Meier Template

The Kaplan-Meier estimator is a nonparametric estimator of the survival function (Kaplan and Meier 1958).

\[ \hat S(t) = \prod_{t_i \le t} \left(1 - \frac{d_i}{n_i}\right) \]

where \(d_i\) is the number of events at time \(t_i\) and \(n_i\) is the number at risk just before \(t_i\).

# Example:
# km_fit <- survival::survfit(survival::Surv(time, event) ~ 1, data = data)
# summary(km_fit)
# plot(km_fit)

Grouped curves:

# Example:
# km_group <- survival::survfit(survival::Surv(time, event) ~ treatment, data = data)
# plot(km_group, col = 1:2, lty = 1:2)
# legend("bottomleft", legend = levels(factor(data$treatment)), col = 1:2, lty = 1:2)

4. Cox Proportional Hazards Template

The Cox model is:

\[ h(t \mid X) = h_0(t) \exp(X^T \beta) \]

where \(h_0(t)\) is the baseline hazard and \(\beta\) indexes covariate effects on the hazard scale (Cox 1972).

# Example:
# fit_cox <- survival::coxph(
#   survival::Surv(time, event) ~ age + severity + treatment,
#   data = data
# )
#
# summary(fit_cox)

The hazard ratio for a one-unit increase in a covariate is:

\[ HR = \exp(\beta) \]

5. Delayed Entry / Left Truncation Template

If subjects only become observable or eligible after some time has already elapsed, delayed entry should be handled directly in the risk-set definition.

# Example:
# fit_left_trunc <- survival::coxph(
#   survival::Surv(entry_time, time, event) ~ age + severity + treatment,
#   data = data
# )
#
# summary(fit_left_trunc)

This is essential when treatment timing or cohort entry rules would otherwise create immortal time bias or artificially favorable exposure groups (Hernán and Robins 2020).

6. Proportional Hazards Diagnostics

The proportional hazards assumption should be checked rather than assumed.

# Example:
# ph_test <- survival::cox.zph(fit_cox)
# ph_test
# plot(ph_test)

A failure of the PH assumption does not automatically invalidate the analysis, but it does change how coefficients should be interpreted and may motivate stratification, time-varying effects, or alternative modeling choices.

7. Simple Censoring and Event Tables

event_table <- function(df, event_var) {
  tibble::tibble(
    status = c("Event", "Censored"),
    n = c(
      sum(df[[event_var]] == 1, na.rm = TRUE),
      sum(df[[event_var]] == 0, na.rm = TRUE)
    )
  ) |>
    dplyr::mutate(pct = 100 * n / sum(n))
}

# event_table(data, "event")

This table should accompany most survival analyses because censoring burden shapes interpretability.

8. Why Median Follow-Up Needs Care

Median event-free time and median follow-up time are not interchangeable.

A useful reporting convention is to describe:

median observed follow-up
number of events
censoring proportion
whether median survival was reached

This avoids overstating how much information the data actually contain (Harrell 2015).

9. Risk Prediction and Calibration in Survival Models

When a survival model is used for prediction, ranking metrics are not enough. Calibration at clinically meaningful time horizons should also be examined (Steyerberg 2019).

For example, if a model predicts 30-day mortality risk, then calibration should be evaluated against observed 30-day event probabilities rather than only reporting concordance.

10. Reviewer-Facing Reporting Language

Use language like this in methods or appendices:

Time-to-event methods were used because the analysis required explicit treatment of differential follow-up and censoring. We defined time zero before modeling, described the event and censoring mechanisms, estimated survival nonparametrically with Kaplan-Meier methods, and used Cox proportional hazards regression for adjusted analyses. Proportional hazards assumptions were evaluated rather than assumed (Kaplan and Meier 1958; Cox 1972).

11. Minimum Reporting Checklist

time zero defined explicitly
event and censoring rules described
delayed entry handled if present
event and censoring counts reported
nonparametric survival summaries shown when appropriate
hazard ratios interpreted cautiously
PH diagnostics reported
predictive use distinguished from etiologic use

Where This Shows Up in AI/ML

Deep survival models — DeepSurv, DeepHit, transformer-based time-to-event architectures — extend Cox regression to capture nonlinear covariate effects and time-varying hazards, and are increasingly applied to EHR-based readmission and mortality prediction within systems like MHS GENESIS and Epic. In DoDTR-based analyses, competing risks are pervasive: a casualty can die from the index injury, die from a complication, be administratively transferred, or survive to discharge — treating all non-events as equivalent censoring produces biased survival estimates and miscalibrated prediction models. A readmission model that ignores competing mortality will systematically overestimate readmission risk for the sickest patients, because those patients are censored by death rather than surviving long enough to be readmitted. Any deployed survival model in a trauma context that does not account for competing risks has baked a known bias into every probability it outputs.

12. Closing

A good survival workflow is less about forcing every problem into a Cox model and more about aligning time, eligibility, censoring, and interpretation.

Time-to-event analysis becomes much more defensible when the risk-set logic is made explicit.

Series Callout

Note

This post is part of a broader Toolkit Series for Applied Statistics, AI, and Clinical Analytics:

Bayesian Workflow Toolkit
Calibration Toolkit
Missing Data Toolkit
Rare Events Toolkit
Causal Inference Toolkit
Survival Analysis Toolkit
Prediction Modeling Toolkit
Real-World Evidence Toolkit
OMOP and Interoperability Toolkit
Trauma Registry Analytics Toolkit

Series: Toolkit

← Causal Inference Toolkit (DAGs, Propensity Scores, Sensitivity) | Prediction Modeling Toolkit (Validation, Calibration, Reporting) →

References

Cox, D. R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological) 34 (2): 187–220. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.

Harrell, Jr., Frank E. 2015. Regression Modeling Strategies. 2nd ed. Springer.

Hernán, Miguel A., and James M. Robins. 2020. Causal Inference: What If. Chapman; Hall/CRC.

Kaplan, Edward L., and Paul Meier. 1958. “Nonparametric Estimation from Incomplete Observations.” Journal of the American Statistical Association 53 (282): 457–81. https://doi.org/10.1080/01621459.1958.10501452.

Steyerberg, Ewout W. 2019. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd ed. Springer.