In my simulations of Response Adaptive Randomization, I discovered it performs comparably to fixed 50-50 allocation in identifying treatment effects. The adaptive approach does appear to work! However, with only 10 trials, I’ve merely scratched the surface. Important limitations exist – temporal bias risks, statistical inefficiency, and complex multiplicity adjustments in Bayesian frameworks.
Response-Adaptive Randomization (RAR) is a technique used in clinical trials where the allocation of patients to different treatment arms changes based on interim results collected during the trial. Unlike traditional fixed randomization where treatment allocation ratios remain constant throughout the study, RAR adjusts the probability of assignment to treatment groups as data accumulates, typically to assign more patients to treatments that appear to be performing better. This approach is designed to achieve two main goals: to maximize the information gained about superior treatments while minimizing the number of patients exposed to inferior treatments during the trial itself.
The primary motivation for using RAR is ethical – it aims to treat more trial participants effectively while still gathering sufficient data for scientific conclusions. It’s particularly considered in trials involving serious conditions with high mortality rates or in multi-arm trials where several experimental treatments are being compared. RAR has received considerable theoretical attention since the 1930s but remains controversial in practice, with proponents citing its potential ethical advantages while critics point to concerns about statistical validity, potential bias from temporal trends, and increased complexity in trial design and analysis.
REMAP-CAP was the first study that I came across that introduced me to this design and I found the theory quite intriguing and interesting! It does appear too good to be true but for me, I cannot get the intuition behind how the adaptive randomization be able to identify it. Fortunately, there is another way we can convince ourselves that this works, and that is to simulate it for ourselves, Bayesian style!
This is a commonly used formula that adjusts randomization probabilities based on posterior probabilities. For a two-arm trial, the probability of assigning a patient to treatment 1 is:
\begin{gather} π₁,ᵢ = \frac{[P(p₁ > p₀|data)]^c}{[P(p₁ > p₀|data)]^c + [1-P(p₁ > p₀|data)]^c} \end{gather}
Where P(p₁ > p₀|data) is the posterior probability that treatment 1 is better than treatment 0, and c is a tuning parameter that controls adaptation speed. When c=0, this reduces to equal randomization; when c=1, it becomes Thompson Sampling.
There may be other approaches to RAR, but this is the one that makes sense and easy to implement. Dive deeper
Hopefully simulating this multiple times might give me a better idea of how it will work in practice. How do we do that?
-1.09
, which means the treatment is better at reducing some proportion of event than the control. The formula would be log odds = b0 + b1 * treatment
, where b0
is 0, and b1
is our treatment effect.Response Adaptive Randomization
and 50-50 allocation
, with a 50 sets. Each set will then have 4 sequential analyses and sampling. The first analysis will have 50 patients with random allocation with 50%. For subsequent interim analyses, response adaptive randomization group will depend on the response, whereas 50-50 allocation group will remain as 50%. For adatpive randomization, we will use Thall and Wathen approach with c
between 0
and 0.5
. We will use Thall and Warten’s formula for updating the c
with n / (2*N)
, where n
is the number of patients sampled thus far plus the ones will be sampled this in interim trial, and N
is the maximum number of patients in the trial. As you can see, when total of 200 patients sampled, c will be 0.5
at the last interim analysis.Response Adaptive Randomization
and 50-50 allocation
in terms of treatment effect estimates? Will RAR have the same accuracy at identifying treatment effect? Will RAR reach identify treatment effect faster than 50-50 allocation?library(tidyverse) library(cmdstanr) library(glue) ## comparing 50-50 allocation vs Adaptive Randomization method_list <- c("50_50","adaptive") seeds <- sample(1:100000, size = 10, replace = F) ## set up empty dataframe result <- tibble( b0 = numeric(), b0_lower = numeric(), b0_upper = numeric(), b1 = numeric(), b1_lower = numeric(), b1_upper = numeric(), prob = numeric(), treatment_num = numeric(), num_sample = numeric(), treatment_num_cum = numeric(), num = numeric(), method = character(), seed = numeric() ) for (seed in seeds) { for (method in method_list){ treatment_effect <- -1.09 ### Set Up Entire Population, knowing both effects of placebo and treatment of each patient set.seed(seed) N <- 100000 x0 <- replicate(N, 0) y0 <- rbinom(N, 1, plogis(treatment_effect*x0)) x1 <- replicate(N,1) y1 <- rbinom(N, 1, plogis(treatment_effect*x1)) df <- tibble(y0,y1) |> mutate(id = row_number()) ### Max sample max_n <- 200 ### How many sampling trials n <- 50 sample_number <- max_n / n b0 = b0_lower = b0_upper = b1 = b1_lower = b1_upper = prob_vec = treatment_num = num_vec = vector(mode = "numeric", length = sample_number) x_vec <- c() ### Changing c parameter n_sum <- 0 c_param <- function(x) { value <- x / (2*max_n) return(value) } for (i in 0:sample_number) { ## Set initial parameters and don't log it if (i == 0) { b0mu <- 0 b0sd <- 10 b1mu <- 0 b1sd <- 2.5 diff <- 0.5 x <- rbinom(n, 1, 0.5) num_vec[i+1] <- n n_sum <- n } else { ## Update prior b0mu <- fit$summary("beta0")[['median']] b0sd <- fit$summary("beta0")[['sd']] b1mu <- fit$summary("beta1")[['median']] b1sd <- fit$summary("beta1")[['sd']] b0[i] <- b0mu b0_lower[i] <- fit$summary("beta0")[["q5"]] b0_upper[i] <- fit$summary("beta0")[["q95"]] b1[i] <- b1mu b1_lower[i] <- b1_lower_i b1_upper[i] <- b1_upper_i n_sum <- n_sum + n ## Assigment of sampling proportion; 50% allocation at all times vs Adaptive if (method == "50_50") { diff <- 0.5 } else { prob <- beta1 |> mutate(treatment_benefit = ifelse(beta1 < 0, 1, 0)) |> summarize(prop = mean(treatment_benefit)) |> pull() c <- c_param(n_sum) # diff <- min(max(prob, 0.1), 0.9) diff <- prob^c / (prob^c + (1-prob)^c) } prob_vec[i] <- diff treatment_num[i] = sum(x) ## Each Sampling is 50 patients num_vec[i+1] <- n x <- rbinom(n, 1, diff) } ## Sampling from population df_list <- df |> slice_sample(n = n) |> bind_cols(x=x) |> mutate(y = case_when( x == 1 ~ y1, x == 0 ~ y0 )) ## Bayesian Model ## main model stan_model <- glue(" data {{ int<lower=0> N; array[N] int x; array[N] int y; }} parameters {{ real beta0; real beta1; }} model {{ beta0 ~ normal({b0mu},{b0sd}); beta1 ~ normal({b1mu},{b1sd}); y ~ bernoulli_logit(beta0 + beta1*to_vector(x)); }} ") ## compile model mod <- write_stan_file(stan_model) model <- cmdstan_model(mod) fit <- model$sample( data = list(N=nrow(df_list), x=df_list$x, y=df_list$y), chains = 4, iter_sampling = 2000, iter_warmup = 1000, seed = 1, parallel_chains = 4 ) ## Remove patients who are already sampled from the population sample <- df_list |> pull(id) df <- df |> filter(!id %in% sample) x_vec <- c(x_vec,x) print(i) ## Extract MCMC mcmc <- as.data.frame(fit$draws(inc_warmup = F)) ## Assess probability that treatment effect is < 0 beta1 <- mcmc |> select(contains("beta1")) |> pivot_longer(cols = everything(), names_to = "column", values_to = "beta1") b1_lower_i <- beta1 |> summarize(lower = quantile(beta1, 0.025)) |> pull(lower) b1_upper_i <- beta1 |> summarize(upper = quantile(beta1, 0.975)) |> pull(upper) } ## log results result <- result |> bind_rows(tibble(b0=b0,b0_lower=b0_lower,b0_upper=b0_upper,b1=b1,b1_lower=b1_lower,b1_upper=b1_upper,prob=prob_vec,treatment_num=treatment_num, num_sample=num_vec[1:(length(num_vec)-1)]) |> mutate(treatment_num_cum = cumsum(treatment_num), num = cumsum(num_sample), method = method, seed = seed) |> slice_head(n=30)) } }
## Visualize the result result |> ggplot(aes(x=num,y=b1)) + geom_point() + geom_line() + geom_ribbon(aes(ymin=b1_lower,ymax=b1_upper), alpha=0.5) + geom_hline(yintercept = 0) + geom_hline(yintercept = treatment_effect, color = "blue") + geom_label(aes(x=num,y=b1,label=treatment_num_cum), size=3) + ggtitle("Reponse Adaptive Randomization vs 50-50 Fixed Randomization", subtitle = "Blue line = True value, Black line = No different between treatment & placebo, Numbers labeled = Number of treatment") + xlab("Patients") + theme_bw() + xlim(40,210) + facet_grid(method~seed)
The above plot shows the beta1 coefficient parameter from the posterior distribution. The 10 columns are the seeds for each set of trial. The row represents the method of allocation, either 50-50 or adaptive randomization. The blue line is the true treatment effect, which is -1.09. The grey line is when there is no effect (0), and the shaded area is the 95% credible interval. Simple heuristic is that if the grey area goes below the black line, it then means there is 95% probability that there is a treatment effect. Now let’s see if we can answer the questions we had.
Does not seem like it like, both 50-50 and RAR seem to have similar number of patients to show a positive treatment effect.
Yes. The patterns on both methods appear to be quite similar.
Maybe? both 50-50 and adpative randomization seem to be able to identify that there is no treatment effect. However, the 50-50 allocation seems to be more stable than the adaptive randomization. This is only from 10 trials, hard to say. We also didn’t set criteria for stopping for futility.
On a side note, notice how with seed 26561
and 87907
, we will falsely conclude that there is a treatment effect when there is none.
According to Resist the Temptation of Response-Adaptive Randomization, Response-Adaptive Randomization (RAR) causes many problems, including:
Bayesian adaptive designs face a practical dilemma regarding multiplicity adjustment. While Bayesian inference theoretically doesn’t require corrections for multiple looks at data, regulatory requirements typically demand Type I error control for confirmation trials regardless of statistical approach. Case studies show that unadjusted interim analyses with early efficacy stopping inflate Type I error rates, while futility-only stopping decreases Type I error but reduces power. For researchers implementing Bayesian adaptive designs with early efficacy stopping, adjustments to boundaries become necessary as analyses increase, creating tension between Bayesian philosophy and regulatory demands, especially in trials aiming to change clinical practice.
Read more: Do we need to adjust for interim analyses in a Bayesian adaptive trial design?
I had sent my friend Alec Wong the script earlier on to proof-read. Here are the comments for me to improve in for future reference.
Thanks Alec! I’ll work on these!
If you like this article: