library(dplyr)
library(tibble)
library(ggplot2)
n_clusters <- 12
patients_per_cluster <- 80
prag_df <- expand.grid(
clinic = paste0("Clinic_", 1:n_clusters),
patient = 1:patients_per_cluster
) |>
tibble::as_tibble() |>
dplyr::mutate(
age = rnorm(dplyr::n(), mean = 63, sd = 12),
severity = rnorm(dplyr::n(), mean = 0, sd = 1),
comorbidity = rnorm(dplyr::n(), mean = 0, sd = 1)
)Real-Life Testing: Pragmatic Trials for Practical AI
Executive Summary
A trial can be internally rigorous and still tell us too little about what happens in routine care.
That is the core motivation for pragmatic trials.
Traditional explanatory trials are often designed to answer:
- can this intervention work under ideal conditions?
Pragmatic trials shift the question toward:
- does this intervention work in real practice, with real patients, real clinicians, and real systems?
That difference matters.
Pragmatic trials typically aim to:
- embed research in ordinary care,
- reduce unnecessary exclusions,
- use clinically meaningful outcomes,
- and reflect the messy implementation conditions that shape real-world effectiveness.
This explanatory-versus-pragmatic contrast is central to modern trial design frameworks and is commonly discussed alongside tools such as PRECIS-2 for making design intent explicit (Ford and Norrie 2016; Loudon et al. 2015).
This matters in both biostatistics and AI/ML.
In biostatistics, pragmatic trials help bridge efficacy to effectiveness. In AI/ML, they are especially important for testing whether a model, alerting system, workflow tool, or decision aid still performs usefully once deployed in operational care rather than in a curated development environment.
This post introduces:
- what pragmatic trials are,
- how they differ from explanatory trials,
- why routine-care embedding matters,
- when cluster randomization becomes useful,
- and why pragmatic evaluation is essential for scalable healthcare AI.
Pragmatic trials matter because many interventions look promising in controlled settings, but only real-life testing shows whether they remain useful once the messiness of actual care is allowed back in.
1. Pragmatic Trials Begin with a Different Question
The key question in a pragmatic trial is not simply:
can this intervention work under tightly controlled conditions?
It is:
does this intervention improve outcomes when implemented in ordinary practice?
That shift sounds small, but it changes the entire design philosophy.
A pragmatic trial is usually less focused on maximizing internal cleanliness at all costs and more focused on preserving the operational realities of routine care.
That means the design is often more tolerant of:
- heterogeneous patients,
- variable adherence,
- clinician discretion,
- site-to-site practice variation,
- and real implementation constraints.
This is not weaker science. It is science answering a different question (Ford and Norrie 2016; Loudon et al. 2015).
3. Routine-Care Embedding Is One of the Main Design Features
A pragmatic trial often tries to embed the intervention directly into normal care delivery.
That may mean:
- using existing EHR workflows,
- enrolling broad patient populations,
- minimizing extra study visits,
- collecting outcomes from routine records,
- and allowing normal clinicians rather than research specialists to deliver the intervention.
The idea is that the trial should resemble real implementation as closely as possible.
This is especially relevant in AI/ML because many digital or decision-support interventions interact directly with routine clinical systems.
If the study environment is too artificial, the result may not tell us much about real deployment.
4. Fewer Exclusions Usually Improve Real-World Relevance
A major feature of pragmatic trials is that they tend to minimize unnecessary exclusions.
Traditional explanatory trials may exclude participants because of:
- comorbidity,
- age,
- polypharmacy,
- variable adherence likelihood,
- or operational complexity.
That can improve internal control, but it may also produce a trial population that looks very different from real practice.
Pragmatic trials often accept more heterogeneity because they are asking whether the intervention works in the population that would actually receive it.
This improves applicability, even if it sometimes increases noise.
5. A Broad Eligibility Example Makes the Pragmatic Logic Concrete
Suppose a healthcare system wants to evaluate an AI-guided medication adherence intervention.
An explanatory version might include only:
- adults aged 40–65
- with one condition
- on one medication
- at one academic site
- with strong digital engagement
A pragmatic version might instead include:
- all adults meeting a basic clinical eligibility rule
- across multiple clinics
- with ordinary variation in adherence, literacy, comorbidity, and follow-up patterns
The second design is messier. But it is often much closer to the real deployment population.
That is the pragmatic tradeoff.
6. Pragmatic Trials Often Use Outcomes That Matter Operationally
Pragmatic trials tend to prioritize outcomes that matter to patients, clinicians, and systems in routine care.
Examples include:
- hospitalization
- emergency visits
- treatment discontinuation
- symptom burden
- workflow burden
- clinician uptake
- time to action
- or all-cause utilization
These may differ from tightly controlled surrogate endpoints used in explanatory trials.
The key principle is that a pragmatic outcome should reflect whether the intervention improves care in a way that would matter outside the trial.
This is especially important in AI studies, where a model can improve a technical metric without improving actual clinical or operational outcomes.
7. A Simple Pragmatic Trial Scenario Helps Frame the Analysis
To illustrate, we will simulate a healthcare-system style pragmatic trial comparing:
- usual care
- versus an AI-assisted outreach intervention
The outcome will be a binary event such as hospitalization within follow-up.
The key idea is that the intervention is deployed in routine settings with multiple clinics and varying baseline risk.
Now assign intervention at the clinic level, as might happen in a system-wide workflow rollout.
set.seed(20260316)
intervention_map <- tibble::tibble(
clinic = unique(prag_df$clinic),
intervention = sample(c(0, 1), size = n_clusters, replace = TRUE)
)
prag_df <- prag_df |>
dplyr::left_join(intervention_map, by = "clinic") |>
dplyr::mutate(
clinic_effect = as.numeric(as.factor(clinic)) / 10,
event_prob = plogis(-2.4 + 0.5 * intervention + 0.9 * severity + 0.8 * comorbidity + 0.02 * age + clinic_effect),
outcome = rbinom(dplyr::n(), size = 1, prob = event_prob)
)
prag_df |>
dplyr::summarise(
n = dplyr::n(),
event_rate = mean(outcome)
)# A tibble: 1 × 2
n event_rate
<int> <dbl>
1 960 0.448
This gives a pragmatic-style clustered implementation setting.
8. Pragmatic Trials Often Need Cluster Randomization
In many real-world settings, the intervention is not naturally assigned at the individual level.
For example:
- one clinic gets a decision-support tool
- one ward uses a new workflow
- one practice adopts a digital alert
- one hospital unit gets a staffing or implementation intervention
In those cases, cluster randomization is often more realistic than individual randomization (Donner and Klar 2004; Ford and Norrie 2016).
Cluster randomization assigns groups rather than individuals.
This is especially common in pragmatic trials because routine-care interventions are often delivered at the level of:
- clinic
- unit
- hospital
- physician
- or practice
That makes the design operationally feasible, even though it introduces correlation within clusters.
9. Cluster Randomization Improves Feasibility, but Changes the Analysis
Once a trial is randomized by clinic or site, outcomes within the same cluster are no longer independent.
Patients within a clinic may resemble one another because of:
- shared staff,
- shared workflows,
- shared implementation quality,
- or shared local population structure.
This means the analysis must account for clustering.
Ignoring the cluster structure can lead to:
- underestimated standard errors,
- overly optimistic significance,
- and misleading inference.
That is why pragmatic trials often need cluster-aware methods.
10. A Simple Cluster-Level Summary Is Often a Good First Check
Before fitting models, it is often useful to summarize outcomes at the cluster level.
cluster_tbl <- prag_df |>
dplyr::group_by(clinic, intervention) |>
dplyr::summarise(
n = dplyr::n(),
event_rate = mean(outcome),
.groups = "drop"
)
cluster_tbl# A tibble: 12 × 4
clinic intervention n event_rate
<fct> <dbl> <int> <dbl>
1 Clinic_1 1 80 0.338
2 Clinic_2 0 80 0.438
3 Clinic_3 0 80 0.362
4 Clinic_4 0 80 0.338
5 Clinic_5 1 80 0.488
6 Clinic_6 0 80 0.45
7 Clinic_7 1 80 0.5
8 Clinic_8 0 80 0.375
9 Clinic_9 1 80 0.55
10 Clinic_10 0 80 0.475
11 Clinic_11 0 80 0.438
12 Clinic_12 1 80 0.625
This helps show whether outcome rates differ across clinics and whether the intervention contrast appears consistent or highly variable by site.
That variability is a major part of pragmatic evidence.
11. Pragmatic Trials Accept Real-World Noise Rather Than Designing It Away
One of the defining features of pragmatic trials is that they allow much of the natural variation of real care to remain.
That means the study may include:
- nonadherence,
- treatment delays,
- site heterogeneity,
- incomplete uptake,
- and co-interventions.
Traditional explanatory designs often try to suppress these features.
Pragmatic designs often leave them in because they are part of the real implementation question.
This is one reason pragmatic trials may look noisier. But that noise is often exactly what makes the result more relevant.
12. Intention-to-Treat Thinking Is Especially Important Here
Because pragmatic trials preserve real-world variation, intention-to-treat reasoning becomes especially important.
The intervention is often evaluated according to assignment, even when:
- implementation is imperfect,
- uptake is incomplete,
- clinicians override the tool,
- or patients do not fully adhere.
This is not a flaw. It reflects the pragmatic estimand:
- what happens when this strategy is implemented in routine care, not only when everyone follows it perfectly?
That is usually the decision-relevant question for healthcare systems.
13. A Simple Cluster-Aware Model Can Reflect the Design
For a binary outcome, one pragmatic approach is a mixed-effects logistic model with a clinic-level random intercept.
required_pkgs <- c("lme4")
missing_pkgs <- required_pkgs[
!vapply(required_pkgs, requireNamespace, logical(1), quietly = TRUE)
]
if (length(missing_pkgs) > 0) {
stop("Missing packages: ", paste(missing_pkgs, collapse = ", "))
}
fit_prag <- lme4::glmer(
outcome ~ intervention + age + severity + comorbidity + (1 | clinic),
data = prag_df,
family = binomial()
)
summary(fit_prag)This is not the only valid approach, but it makes the cluster structure explicit.
14. Pragmatic Trials Are Especially Relevant for AI Interventions
Pragmatic trial design is extremely important in healthcare AI because many AI interventions are not drugs. They are:
- alerts,
- triage systems,
- decision-support tools,
- prioritization models,
- documentation aids,
- or workflow-integrated recommendations.
These interventions often behave differently in deployment than in sandbox testing.
For example, a model may show excellent discrimination in retrospective validation but have weak real-world impact because:
- clinicians ignore it,
- alert fatigue develops,
- the workflow is disrupted,
- or the population differs from the training data.
Pragmatic trials are often the right design for evaluating whether the AI system actually improves care when embedded in practice.
15. The Difference Between Efficacy and Implementation Is Central for AI
A model can be technically accurate and still operationally ineffective.
That is one of the key reasons pragmatic trials matter for AI.
The real question is often not:
- can the model classify well in a held-out dataset?
It is:
- does this system improve decisions, outcomes, or workflow when used under routine conditions?
That is a pragmatic trial question.
This is one reason AI evaluation increasingly needs study-design language, not only model-performance language.
16. The Salford Lung Study Is a Useful Mental Model for Pragmatic Thinking
A helpful way to frame pragmatic trials is to think of them as studies that preserve routine care instead of replacing it with a highly curated research environment.
A classic example often discussed in this context is the Salford Lung Study, which emphasized real-world care conditions, broad inclusion, and routine practice integration.
The important lesson is not to memorize one specific case. It is to see the design principle:
- when the goal is real-world applicability, the trial should resemble the real world closely enough for the result to travel there.
That is the pragmatic design mindset.
17. External Validity Is One of the Major Strengths of Pragmatic Trials
Traditional efficacy trials often maximize internal validity at the expense of real-world applicability.
Pragmatic trials shift that balance.
Because they include:
- broader patients,
- ordinary clinicians,
- routine systems,
- and more realistic adherence patterns,
their results often have stronger external relevance for real deployment.
That is especially important for AI/ML interventions intended for healthcare systems, where transportability across messy environments matters at least as much as ideal-condition performance.
18. Pragmatic Trials Are Not Looser Science — They Are Different Science (Ford and Norrie 2016; Loudon et al. 2015)
A common misunderstanding is that pragmatic trials are simply less rigorous because they are less controlled.
That is not the right view.
Pragmatic trials answer different questions. They prioritize:
- applicability,
- implementation reality,
- system-level usefulness,
- and ordinary-care effectiveness.
That may reduce some forms of internal control, but it can increase decision relevance substantially.
The right comparison is not:
- strict versus sloppy.
It is:
- efficacy-focused versus effectiveness-focused.
That is a much better way to teach the design tradeoff.
19. A Practical Checklist for Applied Work
Before designing or interpreting a pragmatic trial, ask:
- Is the study trying to estimate efficacy or effectiveness?
- Are the eligibility criteria broad enough to reflect the real target population?
- Is the intervention embedded in ordinary care rather than an artificial research environment?
- Should randomization occur at the patient level or the cluster level?
- Are the outcomes meaningful in routine practice?
- Is the analysis aligned with intention-to-treat logic?
- Will the result still matter once the intervention is deployed at scale?
These questions usually matter more than whether the trial looks neat on paper.
The PRECIS-2 framework for assessing how pragmatic a trial is maps directly onto the gap between clinical AI validation studies and deployment reality. Most published clinical AI validations are highly explanatory on every PRECIS-2 domain: controlled patient populations, single academic sites, curated and complete data, supervised implementation with expert oversight — none of which exist at a forward operating base or a rural MTF running MHS GENESIS. The PRECIS-2 score of the original validation study is a reasonable first-order predictor of how much performance will degrade at deployment: the more explanatory the validation, the larger the gap. Epic-embedded sepsis prediction tools validated at academic centers have shown 30–50% drops in positive predictive value when deployed at community hospitals, and the same dynamic applies to any DoDTR-derived model fielded outside the large trauma centers that dominate the registry.
Closing: Pragmatic Trials Test Whether an Intervention Survives Contact with Reality
Pragmatic trials matter because many interventions look better in controlled settings than they do in the real world.
By embedding the study in routine care, minimizing unnecessary exclusions, and using outcomes that matter operationally, pragmatic trials test whether an intervention remains valuable once real patients, real clinicians, and real systems are allowed back into the picture.
That makes them especially important for modern healthcare AI, where deployment conditions often determine success as much as algorithm quality does.
Pragmatic trials matter because the most useful evidence is not only about whether an intervention can work, but whether it still works when routine care stops protecting it from reality.
This post is part of the Real-World Evidence Toolkit — a companion reference with pragmatic trial reporting templates, cluster randomization analysis scaffolds, PRECIS-2 checklist guidance, and effectiveness evaluation workflows.
Series Callout
This post concludes the series on Design of Experiments for Biostats and AI/ML:
- Randomized controlled trials
- Observational study designs
- Cross-sectional study design
- Longitudinal study design
- Sample size and power analysis
- Stratification and randomization techniques
- Blinding and placebo controls
- Adaptive study designs
- Pragmatic trials
- Quasi-experimental designs