required_pkgs <- c("survival", "dplyr", "tibble", "ggplot2")
missing_pkgs <- required_pkgs[
!vapply(required_pkgs, requireNamespace, logical(1), quietly = TRUE)
]
if (length(missing_pkgs) > 0) {
stop("Missing packages: ", paste(missing_pkgs, collapse = ", "))
}
# Example:
# data <- readRDS("data_processed/time_to_event_df.rds")Survival Analysis Toolkit (Time Zero, Cox Models, Calibration)
Executive Summary
This toolkit is a reusable framework for time-to-event analysis in clinical and operational data.
It includes:
- time-zero definition prompts
- Kaplan-Meier estimation templates
- Cox proportional hazards model scaffolds
- delayed-entry guidance
- proportional-hazards checks
- event-rate and censoring summaries
- reviewer-facing reporting language
Survival analysis is not only about fitting a Cox model. It is about aligning time correctly, defining risk sets honestly, and making the event process interpretable under censoring (Kaplan and Meier 1958; Cox 1972; Harrell 2015).
1. Define Time Zero First
The most important step in a survival workflow is often the least glamorous: defining when follow-up begins.
Before modeling, document:
- what starts follow-up?
- when is a subject first at risk?
- when does treatment become eligible?
- what event ends follow-up?
- what produces censoring?
Many survival problems are really time-alignment problems in disguise. A sophisticated model cannot rescue a badly defined risk origin (Hernán and Robins 2020).
2. Basic Data Structure
Assume a dataset named data with:
time= observed follow-up timeevent= 1 if event occurred, 0 if censored- optional
entry_timefor delayed entry - baseline predictors measured before or at time zero
Quick summary:
survival_summary <- function(df, time_var, event_var) {
tibble::tibble(
n = nrow(df),
n_events = sum(df[[event_var]] == 1, na.rm = TRUE),
n_censored = sum(df[[event_var]] == 0, na.rm = TRUE),
median_followup = stats::median(df[[time_var]], na.rm = TRUE)
)
}
# survival_summary(data, "time", "event")3. Kaplan-Meier Template
The Kaplan-Meier estimator is a nonparametric estimator of the survival function (Kaplan and Meier 1958).
\[ \hat S(t) = \prod_{t_i \le t} \left(1 - \frac{d_i}{n_i}\right) \]
where \(d_i\) is the number of events at time \(t_i\) and \(n_i\) is the number at risk just before \(t_i\).
# Example:
# km_fit <- survival::survfit(survival::Surv(time, event) ~ 1, data = data)
# summary(km_fit)
# plot(km_fit)Grouped curves:
# Example:
# km_group <- survival::survfit(survival::Surv(time, event) ~ treatment, data = data)
# plot(km_group, col = 1:2, lty = 1:2)
# legend("bottomleft", legend = levels(factor(data$treatment)), col = 1:2, lty = 1:2)4. Cox Proportional Hazards Template
The Cox model is:
\[ h(t \mid X) = h_0(t) \exp(X^T \beta) \]
where \(h_0(t)\) is the baseline hazard and \(\beta\) indexes covariate effects on the hazard scale (Cox 1972).
# Example:
# fit_cox <- survival::coxph(
# survival::Surv(time, event) ~ age + severity + treatment,
# data = data
# )
#
# summary(fit_cox)The hazard ratio for a one-unit increase in a covariate is:
\[ HR = \exp(\beta) \]
5. Delayed Entry / Left Truncation Template
If subjects only become observable or eligible after some time has already elapsed, delayed entry should be handled directly in the risk-set definition.
# Example:
# fit_left_trunc <- survival::coxph(
# survival::Surv(entry_time, time, event) ~ age + severity + treatment,
# data = data
# )
#
# summary(fit_left_trunc)This is essential when treatment timing or cohort entry rules would otherwise create immortal time bias or artificially favorable exposure groups (Hernán and Robins 2020).
6. Proportional Hazards Diagnostics
The proportional hazards assumption should be checked rather than assumed.
# Example:
# ph_test <- survival::cox.zph(fit_cox)
# ph_test
# plot(ph_test)A failure of the PH assumption does not automatically invalidate the analysis, but it does change how coefficients should be interpreted and may motivate stratification, time-varying effects, or alternative modeling choices.
7. Simple Censoring and Event Tables
event_table <- function(df, event_var) {
tibble::tibble(
status = c("Event", "Censored"),
n = c(
sum(df[[event_var]] == 1, na.rm = TRUE),
sum(df[[event_var]] == 0, na.rm = TRUE)
)
) |>
dplyr::mutate(pct = 100 * n / sum(n))
}
# event_table(data, "event")This table should accompany most survival analyses because censoring burden shapes interpretability.
8. Why Median Follow-Up Needs Care
Median event-free time and median follow-up time are not interchangeable.
A useful reporting convention is to describe:
- median observed follow-up
- number of events
- censoring proportion
- whether median survival was reached
This avoids overstating how much information the data actually contain (Harrell 2015).
9. Risk Prediction and Calibration in Survival Models
When a survival model is used for prediction, ranking metrics are not enough. Calibration at clinically meaningful time horizons should also be examined (Steyerberg 2019).
For example, if a model predicts 30-day mortality risk, then calibration should be evaluated against observed 30-day event probabilities rather than only reporting concordance.
10. Reviewer-Facing Reporting Language
Use language like this in methods or appendices:
Time-to-event methods were used because the analysis required explicit treatment of differential follow-up and censoring. We defined time zero before modeling, described the event and censoring mechanisms, estimated survival nonparametrically with Kaplan-Meier methods, and used Cox proportional hazards regression for adjusted analyses. Proportional hazards assumptions were evaluated rather than assumed (Kaplan and Meier 1958; Cox 1972).
11. Minimum Reporting Checklist
- time zero defined explicitly
- event and censoring rules described
- delayed entry handled if present
- event and censoring counts reported
- nonparametric survival summaries shown when appropriate
- hazard ratios interpreted cautiously
- PH diagnostics reported
- predictive use distinguished from etiologic use
Deep survival models — DeepSurv, DeepHit, transformer-based time-to-event architectures — extend Cox regression to capture nonlinear covariate effects and time-varying hazards, and are increasingly applied to EHR-based readmission and mortality prediction within systems like MHS GENESIS and Epic. In DoDTR-based analyses, competing risks are pervasive: a casualty can die from the index injury, die from a complication, be administratively transferred, or survive to discharge — treating all non-events as equivalent censoring produces biased survival estimates and miscalibrated prediction models. A readmission model that ignores competing mortality will systematically overestimate readmission risk for the sickest patients, because those patients are censored by death rather than surviving long enough to be readmitted. Any deployed survival model in a trauma context that does not account for competing risks has baked a known bias into every probability it outputs.
12. Closing
A good survival workflow is less about forcing every problem into a Cox model and more about aligning time, eligibility, censoring, and interpretation.
Time-to-event analysis becomes much more defensible when the risk-set logic is made explicit.
Series Callout
This post is part of a broader Toolkit Series for Applied Statistics, AI, and Clinical Analytics:
- Bayesian Workflow Toolkit
- Calibration Toolkit
- Missing Data Toolkit
- Rare Events Toolkit
- Causal Inference Toolkit
- Survival Analysis Toolkit
- Prediction Modeling Toolkit
- Real-World Evidence Toolkit
- OMOP and Interoperability Toolkit
- Trauma Registry Analytics Toolkit