IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Planning for a 3-arm cluster randomized trial with a nested intervention and a time-to-event outcome

    ouR data generation发表于 2025-05-20 00:00:00
    love 0
    [This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    A researcher recently approached me for advice on a cluster-randomized trial he is developing. He is interested in testing the effectiveness of two interventions and wondered whether a 2×2 factorial design might be the best approach.

    As we discussed the interventions (I’ll call them \(A\) and \(B\)), it became clear that \(A\) was the primary focus. Intervention \(B\) might enhance the effectiveness of \(A\), but on its own, \(B\) was not expected to have much impact. (It’s also possible that \(A\) alone doesn’t work, but once \(B\) is in place, the combination may reap benefits.) Given this, it didn’t seem worthwhile to randomize clinics or providers to receive B alone. We agreed that a three-arm cluster-randomized trial—with (1) control, (2) \(A\) alone, and (3) \(A + B\)—would be a more efficient and relevant design.

    A while ago, I wrote about a proposal to conduct a three-arm trial using a two-step randomization scheme. That design assumes that outcomes in the enhanced arm (\(A + B\)) are uncorrelated with those in the standalone arm \(A\) within the same cluster. For this project, that assumption didn’t seem plausible, so I recommended sticking with a standard cluster-level randomization.

    The study has three goals:

    • Assess the effectiveness of \(A\) versus control
    • Compare \(A + B\) versus \(A\) alone
    • If \(A\) alone is ineffective, compare \(A + B\) versus control

    In other words, we want to make three pairwise comparisons. Initially, we were concerned about needing to adjust our tests for multiple comparisons. However, we decided to use a gatekeeping strategy that maintains the overall Type I error rate at 5% while allowing each test to be performed at \(\alpha = 0.05\).

    This post describes how I set up simulations to evaluate sample size requirements for the proposed trial. The primary outcome is a time-to-event measure: the time from an index physician visit to a follow-up visit, which the intervention aims to shorten. I first generated survival data based on estimates from the literature, then simulated the study design under various sample size assumptions. For each scenario, I generated multiple data sets and applied the gatekeeping hypothesis testing framework to estimate statistical power.

    Preliminaries

    Before getting started, here are the R packages used in this post. In addition, I’ve set a randomization seed so that if you try to replicate the approach taken here, our results should align.

    library(simstudy)
    library(data.table)
    library(survival)
    library(coxme)
    library(broom)
    
    set.seed(8271)

    Generating time-to-event data

    When simulating time-to-event outcomes, one of the first decisions is what the underlying survival curves should look like. I typically start by defining a curve for the control (baseline) condition, and then generate curves for the intervention arms relative to that baseline.

    Getting parameters that define survival curve

    We identified a comparable study that reported quintiles for the time-to-event outcome. Specifically, 20% of participants had a follow-up within 1.4 months, 40% by 4.7 months, 60% by 8.7 months, and 80% by just over 15 months. We used the survGetParams function from the simstudy package to estimate the Weibull distribution parameters—the intercept in the Weibull formula and the shape—that characterize this baseline survival curve.

    q20 <- c(1.44, 4.68, 8.69, 15.32)
    
    points <- list(c(q20[1], 0.80), c(q20[2], 0.60), c(q20[3], 0.40), c(q20[4], 0.20))
    s <- survGetParams(points)
    
    s
    ## [1] -1.868399  1.194869

    We can visualize the idealized survival curve that will be generated using these parameters stored in the vector s:

    survParamPlot(f = s[1], shape = s[2], points, limits = c(0, 20))

    Generating data for a simpler two-arm RCT

    Before getting into the more complicated three-armed cluster randomized trial, I started with a simpler, two-armed randomized controlled trial. The only covariate at the individual level is the binary treatment indicator \(A\) which takes on values of \(0\) (control) and \(1\) (treatment). The time-to-event outcome is a function of the Weibull parameters we just generated based on the quintiles, along with the treatment indicator.

    def <- defData(varname = "A", formula = "1;1", dist = "trtAssign")
    
    defS <- 
      defSurv(varname = "time", formula = "..int + A * ..eff", shape = "..shape") |>
      defSurv(varname = "censor", formula = -40, scale = 0.5, shape = 0.10)

    I generated a large data set to that we can recreate the idealized curve from above. I assumed a hazard ratio of 2 (which is actually parameterized on the log scale):

    int <- s[1]
    shape <- s[2]
    
    eff <- log(2)
    
    dd <- genData(100000, def)
    dd <- genSurv(dd, defS, timeName = "time", censorName = "censor")

    Here are the quintiles (and median) from the control arm, which are fairly close to the quintiles from the study:

    dd[A==0, round(quantile(time, probs = c(0.20, 0.40, 0.50, 0.60, 0.80)), 1)]
    ##  20%  40%  50%  60%  80% 
    ##  1.6  4.2  6.1  8.5 16.4

    Visualizing the curve and assessing its properties

    A plot of the survival curves from the two arms is shown below, with the control arm in yellow and the intervention arm in red:

    Fitting a model

    I fit a Cox proportional hazards model just to make sure I could recover the hazard ratio I used in generating the data:

    fit <- coxph(Surv(time, event) ~  factor(A), data = dd)
    tidy(fit, exponentiate = TRUE)
    ## # A tibble: 1 × 5
    ##   term       estimate std.error statistic p.value
    ##   <chr>         <dbl>     <dbl>     <dbl>   <dbl>
    ## 1 factor(A)1     1.99   0.00665      103.       0

    Relationship of HR to median time-to-event

    My collaborator is especially interested in how the interventions might shift the median time-to-event. This essentially raises the question of how the hazard ratio translates to a change in the median. To explore this, I generated a series of data sets using hazard ratios ranging from 1 to 2 and recorded the observed median for each. As expected, when the hazard ratio is 1, the median closely aligns with that of the baseline distribution.

    getmedian <- function(eff = 0) {
      
      dd <- genData(100000, def)
      dd <- genSurv(dd, defS, timeName = "time", censorName = "censor")
      dd[A == 1, round(median(time), 1)]
      
    }
    
    dm <- rbindlist(lapply(
      log(seq(1, 2, by = .1)), 
      function(x) data.table(HR = exp(x), median = getmedian(x))
    ))

    Simulating the three-arm study data

    In the proposed three-arm cluster randomized trial, there are three levels of measurement: patient, provider, and clinic. Randomization is conducted at the provider level, stratified by clinic. The hazard for individual \(i\), treated by provider \(j\) in clinic \(k\), is modeled as:

    \[ h_{ijk}(t) = h_0(t) \exp\left( \beta_1 A_{ijk} + \beta_2 AB_{ijk} + b_j + g_k \right) \]

    where:

    • \(h_0(t)\) is the baseline hazard function,
    • \(\beta_1\) is the effect of treatment \(A\) alone,
    • \(\beta_2\) is the effect of the combination of \(A + B\),
    • \(b_j \sim N(0, \sigma^2_b)\) is the random effect for provider \(j\),
    • \(g_k \sim N(0, \sigma^2_g)\) is the random effect for clinic \(k\),
    • \(A_{ijk} = 1\) if provider \(j\) is randomized to treatment \(A\), and \(0\) otherwise,
    • \(AB_{ijk} = 1\) if provider \(j\) is randomized to treatment \(A + B\), and 0 otherwise.

    Data definititions

    While the model is semi-parametric (i.e., it does not assume a specific distribution for event times), the data generation process is fully parametric, based on the Weibull distribution. Despite this difference, the two are closely aligned: if all goes well, we should be able to recover the parameters used for data generation when fitting the semi-parametric model as we did in the simpler RCT case above.

    Data generation occurs in three broad steps:

    1. Clinic-level: generate the clinic-specific random effect \(g\).
    2. Provider-level: generate the provider-specific random effect \(b\) and assign treatment.
    3. Patient-level: generate individual time-to-event outcomes.

    These steps are implemented using the following definitions:

    defC <- defData(varname = "g", formula = 0, variance = "..s2_clinic")
    
    defP <- 
      defDataAdd(varname = "b", formula = 0, variance = "..s2_prov") |>
      defDataAdd(varname = "A", formula = "1;1;1", variance = "clinic", dist = "trtAssign")
    
    defS <- defSurv(
      varname = "eventTime", 
      formula = "..int + b + g + (A==2)*..eff_A + (A==3)*..eff_AB", 
      shape = "..shape") 

    Data generation

    For this simulation, we assumed 16 clinics, each with 6 providers, and 48 patients per provider. A key element of the study is that recruitment occurs over 12 months, and patients are followed for up to 6 months after their recruitment period ends. Thus, follow-up duration varies depending on when a patient enters the study: patients recruited earlier have longer potential follow-up, while those recruited later are more likely to be censored.

    This staggered follow-up is implemented in the final step of data generation:

    nC <- 16               # number of clinics (clusters)
    nP <- 6                # number of providers per clinic
    nI <- 48               # number of patients per provider
    s2_clinic <- 0.10      # variation across clinics (g)
    s2_prov <- 0.25        # variation across providers (b)
    eff_A <- log(c(1.4))   # log HR of intervention A (compared to control)
    eff_AB <- log(c(1.6))  # log HR of combined A+B (compared to control)
    
    ds <- genData(nC, defC, id = "clinic")
    dp <- genCluster(ds, "clinic", nP, "provider")
    dp <- addColumns(defP, dp)
    dd <- genCluster(dp, "provider", nI, "id")
    dd <- genSurv(dd, defS)
    
    # assign a patient to a particular month - 4 per month
    
    dd <- trtAssign(dd, nTrt = 12, strata = "provider", grpName = "month")
      
    dd[, event := as.integer(eventTime <= 18 - month)]
    dd[, obsTime := pmin(eventTime, 18 - month)]

    Below is a Kaplan-Meier plot showing survival curves for each provider within each clinic, color-coded by study arm:

    The mixed-effects Cox model recovers the variance components and coefficients used in data generation:

    me.fit <- coxme(
      Surv(eventTime, event) ~ factor(A) + (1|provider) + (1|clinic), 
      data = dd
    )
    
    summary(me.fit)
    ## Mixed effects coxme model
    ##  Formula: Surv(eventTime, event) ~ factor(A) + (1 | provider) + (1 | clinic) 
    ##     Data: dd 
    ## 
    ##   events, n = 3411, 4608
    ## 
    ## Random effects:
    ##      group  variable        sd   variance
    ## 1 provider Intercept 0.5437050 0.29561511
    ## 2   clinic Intercept 0.3051615 0.09312354
    ##                    Chisq    df p    AIC   BIC
    ## Integrated loglik  919.4  4.00 0  911.4 886.9
    ##  Penalized loglik 1251.0 86.81 0 1077.3 544.8
    ## 
    ## Fixed effects:
    ##              coef exp(coef) se(coef)    z       p
    ## factor(A)2 0.3207    1.3781   0.1493 2.15 0.03173
    ## factor(A)3 0.4650    1.5921   0.1472 3.16 0.00158

    Typically, I would use this data generation and model fitting code to estimate power or sample size requirements. While I did carry out those steps, I’ve left them out here so that you can try them yourself (though I’m happy to share my code if you’re interested). Beyond estimating sample size, simulation studies like this can also be used to evaluate the Type I error rates of the gatekeeping hypothesis testing framework.

    References:

    Proschan, M.A. and Brittain, E.H., 2020. A primer on strong vs weak control of familywise error rate. Statistics in medicine, 39(9), pp.1407-1413.

    To leave a comment for the author, please follow the link and comment on their blog: ouR data generation.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Planning for a 3-arm cluster randomized trial with a nested intervention and a time-to-event outcome


沪ICP备19023445号-2号
友情链接