Introduction

In this vignette, we demonstrate how to perform Bayesian model averaging using data from two melanoma clinical trials conducted by the Eastern Cooperative Oncology Group (ECOG): E1684 and E1690. The E1684 trial is treated as external data, while the E1690 trial serves as the current trial data. Both trials evaluated the effect of high-dose interferon alfa-2b (IFN) on relapse-free survival (RFS) compared to an observation arm (control). RFS is defined as the time from randomization to either cancer relapse or death.

The goal of this analysis is to estimate the log hazard ratio (log HR) of RFS between the treated and control groups. To account for model uncertainty, we consider multiple combinations of outcome models and prior specifications, including a non-informative reference prior and several informative priors that incorporate external information. For these candidate models, we compute model averaging weights using approaches such as Bayesian model averaging (BMA, Madigan and Raftery (1994)), pseudo-BMA, and stacking (Yao et al. (2018)). We then draw posterior samples from the ensemble distribution to obtain model-averaged estimates of the treatment effect.

library(hdbayes)
library(survival)
library(posterior)
library(dplyr)
library(ggplot2)
library(ggthemes)

Load Data

## replace 0 failure times with 0.50 days
E1684$failtime[E1684$failtime == 0] <- 0.50/365.25
E1690$failtime[E1690$failtime == 0] <- 0.50/365.25
data.list  <- list(currdata = E1690, histdata = E1684)

fmla <- survival::Surv(failtime, failcens) ~ treatment

For comparison, we first estimate the treatment effect (i.e., the log hazard ratio between the treated and control groups) by fitting a proportional hazards (PH) regression model that includes only the treatment indicator as a covariate, using the coxph() function from the survival package.

## fit proportional hazards models on E1690 and E1684 data
fit.mle.cur <- coxph(formula = fmla, data = data.list$currdata)

summary(fit.mle.cur)
#> Call:
#> coxph(formula = fmla, data = data.list$currdata)
#> 
#>   n= 426, number of events= 240 
#> 
#>              coef exp(coef) se(coef)      z Pr(>|z|)  
#> treatment -0.2411    0.7858   0.1293 -1.865   0.0622 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>           exp(coef) exp(-coef) lower .95 upper .95
#> treatment    0.7858      1.273    0.6099     1.012
#> 
#> Concordance= 0.533  (se = 0.017 )
#> Likelihood ratio test= 3.48  on 1 df,   p=0.06
#> Wald test            = 3.48  on 1 df,   p=0.06
#> Score (logrank) test = 3.49  on 1 df,   p=0.06
suppressMessages(confint(fit.mle.cur))
#>                2.5 %     97.5 %
#> treatment -0.4944693 0.01233594

Fitting PWEPH Models with Different Priors

We now proceed to the Bayesian analysis using the functions provided in the hdbayes package. For the outcome model, we consider a PH model with a piecewise constant baseline hazard, referred to as the piecewise exponential proportional hazards (PWEPH) model. The PWEPH model assumes a constant baseline hazard within each of \(J\) prespecified time intervals, defined by the partition \(0 = s_0 < s_1 < \ldots < s_{J-1} < s_J = \infty\). Let \(\beta = (\beta_1, \ldots, \beta_p)'\) denote the vector of regression coefficients, and \(\lambda = (\lambda_1, \ldots, \lambda_J)'\) denote the baseline hazards. The conditional survival function for a subject with covariate vector \(x\) is given by \[ \begin{align*} S(t \mid \beta, \lambda) = \exp\big\{-\sum_{j=1}^{J} \lambda_j \cdot \big[ (s_j - s_{j-1}) \cdot I(t > s_j) + (t - s_{j-1}) \cdot I(s_{j-1} < t \leq s_j)\big] \exp\{x'\beta\}\big\}, \end{align*} \] where \(I(\cdot)\) is the indicator function.

In our analysis, we consider two values of the number of intervals \(J\) in the PWEPH model: \(J = 2\) and \(J = 5\). For each \(J\), the cutpoints are chosen so that each interval contains approximately the same number of events in the current trial (E1690). We then fit PWEPH models under three prior specifications for \((\beta, \lambda)\):

Non-informative reference prior.
Power prior (PP, Ibrahim and Chen (2000)): Incorporates external data from E1684 with a discounting parameter \(a_0 = 0.5\).
Commensurate prior (CP, Hobbs, Sargent, and Carlin (2012)): Links the parameters of the external and current data models through commensurability parameters that determine the degree of borrowing from the external data.

nchains       <- 4 # number of Markov chains
ncores        <- 4 # maximum number of MCMC chains to run in parallel
iter_warmup   <- 1000 # warmup per chain for MCMC sampling
iter_sampling <- 2500 # number of samples post warmup per chain

base.pars     <- "treatment"

# function to pull out the posterior summaries in a convenient form
get_summaries <- function(fit, pars.interest, digits = 2) {
  out <- suppressWarnings(
    fit %>%
      select(all_of(pars.interest)) %>%
      posterior::summarise_draws(
        mean,
        sd,
        `q2.5` = ~ posterior::quantile2(.x, probs = 0.025),
        `q97.5` = ~ posterior::quantile2(.x, probs = 0.975),
        rhat,
        ess_bulk,
        ess_tail
      ) %>%
      mutate(across(where(is.numeric), \(x) round(x, digits)))
  )
  return(out)
}

## determine breaks/cutpoints for PWEPH models

## J = 2
nbreaks   <- 2
probs     <- 1:nbreaks / nbreaks
breaks_J2 <- as.numeric(
  quantile(E1690[E1690$failcens==1, ]$failtime, probs = probs)
)
breaks_J2 <- c(0, breaks_J2)
## set the last cutpoint to a large value to cover the tail
breaks_J2[length(breaks_J2)] <- max(10000, 1000 * breaks_J2[length(breaks_J2)])

## J = 5
nbreaks   <- 5
probs     <- 1:nbreaks / nbreaks
breaks_J5 <- as.numeric(
  quantile(E1690[E1690$failcens == 1, ]$failtime, probs = probs)
)
breaks_J5 <- c(0, breaks_J5)
breaks_J5[length(breaks_J5)] <- max(10000, 1000 * breaks_J5[length(breaks_J5)])

Reference Prior

In the reference prior, we specify \(\beta_1, \ldots, \beta_p \sim N(0, 10^2)\) and \(\lambda_1, \ldots, \lambda_J \sim N^{+}(0, 10^2)\), where \(\beta = (\beta_1, \ldots, \beta_p)'\) is the vector of regression coefficients, \(\lambda = (\lambda_1, \ldots, \lambda_J)'\) is the vector of baseline hazards, and \(N^{+}\) denotes the half-normal distribution.

if (instantiate::stan_cmdstan_exists()) {
  fit_post_J2 <- pwe.post(
    formula = fmla, data.list = data.list,
    breaks = breaks_J2,
    beta.sd = 10,
    base.hazard.sd = 10,
    get.loglik = TRUE,
    chains = nchains, parallel_chains = ncores,
    iter_warmup = iter_warmup, iter_sampling = iter_sampling,
    refresh = 0
  )
  fit_post_J5 <- pwe.post(
    formula = fmla, data.list = data.list,
    breaks = breaks_J5,
    beta.sd = 10,
    base.hazard.sd = 10,
    get.loglik = TRUE,
    chains = nchains, parallel_chains = ncores,
    iter_warmup = iter_warmup, iter_sampling = iter_sampling,
    refresh = 0
  )
}
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 0.9 seconds.
#> Chain 2 finished in 0.9 seconds.
#> Chain 3 finished in 0.9 seconds.
#> Chain 4 finished in 0.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.9 seconds.
#> Total execution time: 1.0 seconds.
#> 
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 1.0 seconds.
#> Chain 2 finished in 1.1 seconds.
#> Chain 3 finished in 1.1 seconds.
#> Chain 4 finished in 1.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.0 seconds.
#> Total execution time: 1.1 seconds.

summ_ref <- dplyr::bind_rows(
  get_summaries(fit_post_J2, pars.interest = base.pars) %>%
    mutate(prior = "Reference", J = 2),
  get_summaries(fit_post_J5, pars.interest = base.pars) %>%
    mutate(prior = "Reference", J = 5)
) %>%
  relocate(J, prior, .before = 1)

summ_ref
#> # A tibble: 2 × 10
#>       J prior     variable   mean    sd  q2.5 q97.5  rhat ess_bulk ess_tail
#>   <dbl> <chr>     <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1     2 Reference treatment -0.27  0.13 -0.52 -0.02     1    5013.    5849.
#> 2     5 Reference treatment -0.29  0.13 -0.54 -0.03     1    7350.    7341.

Power Prior

The power prior (PP) takes the form: \[\pi_{\text{PP}}(\beta, \lambda) \propto \mathcal{L}(\beta, \lambda \mid D_0)^{a_0} ~\pi_0(\beta, \lambda),\]

where \(\mathcal{L}(\beta, \lambda \mid D_0)\) is the likelihood of the PWEPH model based on the external data \(D_0\), and \(\pi_0(\beta, \lambda)\) is the initial prior, taken here to be the same as the reference prior described above. The discounting parameter is set to \(a_0 = 0.5\), which downweights the external information by \(50\%\).

if (instantiate::stan_cmdstan_exists()) {
  fit_pp_J2 <- pwe.pp(
    formula = fmla, data.list = data.list,
    breaks = breaks_J2,
    a0 = 0.5,
    beta.sd = 10,
    base.hazard.sd = 10,
    get.loglik = TRUE,
    chains = nchains, parallel_chains = ncores,
    iter_warmup = iter_warmup, iter_sampling = iter_sampling,
    refresh = 0
  )
  fit_pp_J5 <- pwe.pp(
    formula = fmla, data.list = data.list,
    breaks = breaks_J5,
    a0 = 0.5,
    beta.sd = 10,
    base.hazard.sd = 10,
    get.loglik = TRUE,
    chains = nchains, parallel_chains = ncores,
    iter_warmup = iter_warmup, iter_sampling = iter_sampling,
    refresh = 0
  )
}
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 1.2 seconds.
#> Chain 1 finished in 1.3 seconds.
#> Chain 3 finished in 1.2 seconds.
#> Chain 4 finished in 1.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.3 seconds.
#> Total execution time: 1.4 seconds.
#> 
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 1.4 seconds.
#> Chain 4 finished in 1.3 seconds.
#> Chain 1 finished in 1.4 seconds.
#> Chain 3 finished in 1.4 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.4 seconds.
#> Total execution time: 1.5 seconds.

summ_pp <- dplyr::bind_rows(
  get_summaries(fit_pp_J2, pars.interest = base.pars) %>%
    mutate(prior = "PP", J = 2),
  get_summaries(fit_pp_J5, pars.interest = base.pars) %>%
    mutate(prior = "PP", J = 5)
) %>%
  relocate(J, prior, .before = 1)

summ_pp
#> # A tibble: 2 × 10
#>       J prior variable   mean    sd  q2.5 q97.5  rhat ess_bulk ess_tail
#>   <dbl> <chr> <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1     2 PP    treatment -0.32  0.11 -0.53  -0.1     1    5060.    5649.
#> 2     5 PP    treatment -0.32  0.11 -0.53  -0.1     1    7953.    7659.

Commensurate Prior

Under the commensurate prior (CP), the parameters of the current and external data models are linked through commensurability parameters that determine the degree of borrowing from the external data. Let \(\beta_0 = (\beta_{01}, \ldots, \beta_{0p})'\) and \(\lambda_0 = (\lambda_{01}, \ldots, \lambda_{0J})'\) denote the vectors of regression coefficients and baseline hazards for the external data model, respectively. For \(j = 1, \ldots, p\), we assume \[\beta_j \sim N(\beta_{0j}, \tau_j^{-1}),\] where \(\tau_j\) is referred as the commensurability parameter. The \(\tau_j\)’s are treated as random and assigned a spike-and-slab prior, which is specified as a mixture of two half-normal priors. Specifically, for \(j = 1, \ldots, p\), the priors on \(\beta_{0j}\) and \(\tau_j\) are

\[ \begin{align*} \beta_{0j} &\sim N(0, 10^2), \\ \tau_j &\sim 0.1 \cdot N^{+}(200, 0.1^2) + 0.9 \cdot N^{+}(0, 5^2). \end{align*} \]

if (instantiate::stan_cmdstan_exists()) {
  fit_cp_J2 <- pwe.commensurate(
    formula = fmla, data.list = data.list,
    breaks = breaks_J2,
    beta0.sd = 10,
    base.hazard.sd = 10,
    get.loglik = TRUE,
    chains = nchains, parallel_chains = ncores,
    iter_warmup = iter_warmup, iter_sampling = iter_sampling,
    refresh = 0
  )
  fit_cp_J5 <- pwe.commensurate(
    formula = fmla, data.list = data.list,
    breaks = breaks_J5,
    beta0.sd = 10,
    base.hazard.sd = 10,
    get.loglik = TRUE,
    chains = nchains, parallel_chains = ncores,
    iter_warmup = iter_warmup, iter_sampling = iter_sampling,
    refresh = 0
  )
}
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 1.6 seconds.
#> Chain 4 finished in 1.5 seconds.
#> Chain 2 finished in 1.6 seconds.
#> Chain 3 finished in 1.6 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.6 seconds.
#> Total execution time: 1.7 seconds.
#> 
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 1.6 seconds.
#> Chain 3 finished in 1.6 seconds.
#> Chain 4 finished in 1.6 seconds.
#> Chain 2 finished in 1.7 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.6 seconds.
#> Total execution time: 1.8 seconds.

summ_cp <- dplyr::bind_rows(
  get_summaries(fit_cp_J2, pars.interest = base.pars) %>%
    mutate(prior = "CP", J = 2),
  get_summaries(fit_cp_J5, pars.interest = base.pars) %>%
    mutate(prior = "CP", J = 5)
) %>%
  relocate(J, prior, .before = 1)

summ_cp
#> # A tibble: 2 × 10
#>       J prior variable   mean    sd  q2.5 q97.5  rhat ess_bulk ess_tail
#>   <dbl> <chr> <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1     2 CP    treatment -0.28  0.13 -0.54 -0.04     1    7093.    7529.
#> 2     5 CP    treatment -0.29  0.12 -0.54 -0.05     1    6715.    7729.

Model Averaging and Ensemble Inference

We combine the fitted PWEPH models using several model averaging approaches: Bayesian model averaging (BMA), pseudo‐BMA, pseudo‐BMA+ (pseudo‐BMA with the Bayesian bootstrap), and stacking. In hdbayes, we provide the wrapper function compute.ensemble.weights() that use the loo package (Vehtari et al. (2024)) to compute the pseudo‐BMA, pseudo‐BMA+, and stacking weights. Posterior samples are then drawn from the resulting ensemble distribution to produce model-averaged estimates of the log hazard ratio.

## Compute Model Weights
fit.list <- list(fit_post_J2, fit_post_J5, fit_pp_J2, fit_pp_J5, fit_cp_J2, fit_cp_J5)

## BMA
wts_bma <- compute.ensemble.weights(
  fit.list = fit.list,
  type = "bma",
  chains = nchains, parallel_chains = ncores,
  iter_warmup = iter_warmup + 1000, iter_sampling = iter_sampling,
  refresh = 0
)
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 0.5 seconds.
#> Chain 2 finished in 0.5 seconds.
#> Chain 3 finished in 0.5 seconds.
#> Chain 4 finished in 0.5 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.5 seconds.
#> Total execution time: 0.6 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 0.6 seconds.
#> Chain 2 finished in 0.6 seconds.
#> Chain 3 finished in 0.6 seconds.
#> Chain 4 finished in 0.6 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.6 seconds.
#> Total execution time: 0.7 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 0.7 seconds.
#> Chain 3 finished in 0.6 seconds.
#> Chain 1 finished in 0.8 seconds.
#> Chain 4 finished in 0.7 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.7 seconds.
#> Total execution time: 0.8 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 0.7 seconds.
#> Chain 4 finished in 0.7 seconds.
#> Chain 3 finished in 0.8 seconds.
#> Chain 2 finished in 1.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.8 seconds.
#> Total execution time: 1.1 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5

## Pseudo-BMA
wts_pseudoBMA <- compute.ensemble.weights(
  fit.list = fit.list,
  type = "pseudobma",
  loo.args = list(cores = ncores),
  loo.wts.args = list(BB = FALSE, cores = ncores)
)

## Pseudo-BMA+
wts_pseudoBMA_BB <- compute.ensemble.weights(
  fit.list = fit.list,
  type = "pseudobma+",
  loo.args = list(cores = ncores),
  loo.wts.args = list(BB = TRUE, cores = ncores)
)

## Stacking
wts_stacking <- compute.ensemble.weights(
  fit.list = fit.list,
  type = "stacking",
  loo.args = list(cores = ncores),
  loo.wts.args = list(cores = ncores)
)

The table below reports point estimates, (posterior) standard deviations, and \(95\%\) credible intervals (Bayesian) or confidence intervals (frequentist) for the log hazard ratio from the Cox PH model and PWEPH models with \(J = 2\) and \(J = 5\) intervals under three prior specifications: reference, power prior (PP), and commensurate prior (CP). For the Bayesian models, model averaging weights from BMA, pseudo-BMA, pseudo-BMA+, and stacking are also provided.

estim.tab <- rbind(summ_ref, summ_pp, summ_cp)
estim.tab$J <- paste0("PWEPH (J = ", estim.tab$J, ")")
colnames(estim.tab)[1] <- "model"
estim.tab <- estim.tab[, c("model", "prior", "mean", "sd", "q2.5", "q97.5")]

estim.tab <- rbind(
  c("Cox PH", "MLE", round(as.numeric(fit.mle.cur$coefficients), 2),
    round(summary(fit.mle.cur)$coefficients[, "se(coef)"], 2),
    round(suppressMessages(confint(fit.mle.cur))[1], 2),
    round(suppressMessages(confint(fit.mle.cur))[2], 2)),
  estim.tab
) %>%
  as.data.frame()

tab               <- estim.tab
tab$BMA           <- c("--", round(wts_bma$weights, 2))
tab$`Pseudo-BMA`  <- c("--", round(wts_pseudoBMA$weights, 2))
tab$`Pseudo-BMA+` <- c("--", round(wts_pseudoBMA_BB$weights, 2))
tab$Stacking      <- c("--", round(wts_stacking$weights, 2))

tab
#>           model     prior  mean   sd  q2.5 q97.5 BMA Pseudo-BMA Pseudo-BMA+ Stacking
#> 1        Cox PH       MLE -0.24 0.13 -0.49  0.01  --         --          --       --
#> 2 PWEPH (J = 2) Reference -0.27 0.13 -0.52 -0.02   0          0           0        0
#> 3 PWEPH (J = 5) Reference -0.29 0.13 -0.54 -0.03   0       0.28        0.27     0.24
#> 4 PWEPH (J = 2)        PP -0.32 0.11 -0.53  -0.1   0          0           0        0
#> 5 PWEPH (J = 5)        PP -0.32 0.11 -0.53  -0.1   1       0.44        0.46     0.72
#> 6 PWEPH (J = 2)        CP -0.28 0.13 -0.54 -0.04   0          0           0     0.04
#> 7 PWEPH (J = 5)        CP -0.29 0.12 -0.54 -0.05   0       0.28        0.27        0

Posterior Summaries from Individual and Ensemble Methods

Building on the results in the previous table, we now include model‐averaged estimates obtained by combining the Bayesian models using BMA, pseudo‐BMA, pseudo‐BMA+, and stacking. For each ensemble method, the table reports the posterior mean, posterior standard deviation, and \(95\%\) credible interval for the log hazard ratio, alongside the corresponding results from the individual models shown earlier.

samples.mtx <- lapply(fit.list, function(f){
  f[["treatment"]]
})
samples.mtx <- do.call(cbind, samples.mtx)

ensemble.bma          <- sample.ensemble(wts_bma$weights, samples.mtx)
ensemble.pseudoBMA    <- sample.ensemble(wts_pseudoBMA$weights, samples.mtx)
ensemble.pseudoBMA_BB <- sample.ensemble(wts_pseudoBMA_BB$weights, samples.mtx)
ensemble.stacking     <- sample.ensemble(wts_stacking$weights, samples.mtx)

## summarize ensemble posterior samples
summ_ensemble <- function(samples, method) {
  mean_val <- mean(samples)
  sd_val   <- sd(samples)
  ci_vals  <- as.numeric( quantile2(samples, probs = c(0.025, 0.975)) )

  data.frame(
    model  = paste0("Ensemble (", method, ")"),
    prior  = "--",
    mean   = round(mean_val, 2),
    sd     = round(sd_val, 2),
    q2.5   = round(ci_vals[1], 2),
    q97.5  = round(ci_vals[2], 2)
  )
}

## create table for each ensemble method
estim.ens <- rbind(
  summ_ensemble(ensemble.bma,          "BMA"),
  summ_ensemble(ensemble.pseudoBMA,    "Pseudo-BMA"),
  summ_ensemble(ensemble.pseudoBMA_BB, "Pseudo-BMA+"),
  summ_ensemble(ensemble.stacking,     "Stacking")
)

## combine with previous table
estim.combined <- rbind(estim.tab, estim.ens) %>%
  as.data.frame()
estim.combined
#>                     model     prior  mean   sd  q2.5 q97.5
#> 1                  Cox PH       MLE -0.24 0.13 -0.49  0.01
#> 2           PWEPH (J = 2) Reference -0.27 0.13 -0.52 -0.02
#> 3           PWEPH (J = 5) Reference -0.29 0.13 -0.54 -0.03
#> 4           PWEPH (J = 2)        PP -0.32 0.11 -0.53  -0.1
#> 5           PWEPH (J = 5)        PP -0.32 0.11 -0.53  -0.1
#> 6           PWEPH (J = 2)        CP -0.28 0.13 -0.54 -0.04
#> 7           PWEPH (J = 5)        CP -0.29 0.12 -0.54 -0.05
#> 8          Ensemble (BMA)        -- -0.32 0.11 -0.53  -0.1
#> 9   Ensemble (Pseudo-BMA)        --  -0.3 0.12 -0.54 -0.07
#> 10 Ensemble (Pseudo-BMA+)        --  -0.3 0.12 -0.53 -0.07
#> 11    Ensemble (Stacking)        -- -0.31 0.12 -0.53 -0.08

Posterior Densities from Ensemble Methods

We compare the posterior densities of the log hazard ratio obtained from the four ensemble methods, BMA, pseudo‐BMA, pseudo‐BMA+, and stacking, alongside the maximum likelihood estimate (MLE) from fitting the Cox PH model to the current data (shown as a dashed line). The posterior densities are largely similar across methods, suggesting that, in this example, the choice of ensemble weighting approach has little influence on the resulting posterior distribution.

data.ens <- data.frame(
  logHR = c(ensemble.bma, ensemble.pseudoBMA, ensemble.pseudoBMA_BB, ensemble.stacking),
  method = rep(c("Ensemble (BMA)", "Ensemble (Pseudo-BMA)", "Ensemble (pseudo-BMA+)",
                   "Ensemble (Stacking)"), each = length(ensemble.bma)))

mle_curr <- as.numeric(coef(fit.mle.cur)["treatment"])

ggplot(data.ens, aes(x = logHR, colour = method)) +
  geom_density(linewidth = 0.9, adjust = 1) +
  geom_vline(xintercept = mle_curr, linetype = 2) +
  labs(
    title = "Posterior densities of log HR from ensemble methods",
    subtitle = "Dashed line = Cox PH MLE",
    x = "Log hazard ratio (treatment vs control)",
    y = "Posterior density",
    colour = "Ensemble method"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "bottom",
    legend.title = element_text(size = 10),
    legend.text  = element_text(size = 9)
  ) +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE)) +
  scale_color_tableau("Superfishel Stone")

plot of posterior densities from ensemble methods

References

Hobbs, Brian P, Daniel J Sargent, and Bradley P Carlin. 2012. “Commensurate Priors for Incorporating Historical Information in Clinical Trials Using General and Generalized Linear Models.” Bayesian Analysis (Online) 7 (3): 639.

Ibrahim, Joseph G, and Ming-Hui Chen. 2000. “Power Prior Distributions for Regression Models.” Statistical Science, 46–60.

Madigan, David, and Adrian E. Raftery. 1994. “Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window.” Journal of the American Statistical Association 89 (428): 1535–46. https://doi.org/10.2307/2291017.

Vehtari, Aki, Jonah Gabry, Mans Magnusson, Yuling Yao, Paul-Christian Bürkner, Topi Paananen, and Andrew Gelman. 2024. Loo: Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models. https://mc-stan.org/loo/.

Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Using Stacking to Average Bayesian Predictive Distributions (with Discussion).” Bayesian Analysis 13 (3): 917–1007. https://doi.org/10.1214/17-BA1091.

Model-Averaging-Demonstration