This tutorial uses the Treatment Selection package to analyze the example data provided in the package.
First, you need to download and install the package:
install.packages("TreatmentSelection")If you would like the current version of the package directly from github, use the devtools package:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("TreatmentSelection", "mdbrown")First, load the example data set for this package called tsdata. Four markers are included in the data example, a ‘’weak’‘and a’‘strong’’ marker (\(Y1\) and \(Y2\) respectively), along with a weak/strong discrete markers.
library(TreatmentSelection)
set.seed(12321)
data(tsdata)
tsdata[1:5, ]##   trt event      Y1      Y2 Y1_disc Y2_disc
## 1   1     1 39.9120 -0.8535       1       0
## 2   1     0  6.6820  0.2905       0       1
## 3   1     0  6.5820  0.0800       0       1
## 4   0     0  1.3581  1.1925       0       1
## 5   0     0  7.6820 -0.2070       0       0Once we have the package and our data loaded into R, we need to create a treatment selection R object using the function trtsel. This function takes as inputs a data.frame of treatment indicators, adverse event status, marker values, and other optional information. Once we have created this object, we can then use it to plot risk/treatment effect curves, estimate summary measures, and check model calibration.
tsdata contains indicators for event and treatment. There are also two continous biomarkers Y1 and Y2 along with dichotomous versions of the two markers. First let’s create a trtsel object using the marker Y1, and take a look at it’s contents:
tsdata$outcome <- tsdata$event
trtsel.Y1 <- trtsel(outcome ~ Y1*trt, 
                    treatment.name = "trt", 
                    data = tsdata, 
                    study.design = "randomized cohort",
                    link = "logit", 
                    default.trt = "trt all")
trtsel.Y1## Study design: randomized cohort 
## 
## Model Fit:
## 
##  Link function: logit 
## 
##  Coefficients: 
## NULL
## 
## 
## Derived Data: (first ten rows)
## 
##    outcome      Y1 trt fittedrisk.t0 fittedrisk.t1    trt.effect  marker
## 1        1 39.9120   1    0.35016583     0.2583742  0.0917916549 39.9120
## 2        0  6.6820   1    0.09974358     0.1340472 -0.0343036269  6.6820
## 3        0  6.5820   1    0.09931697     0.1337641 -0.0344471266  6.5820
## 4        0  1.3581   0    0.07918316     0.1196652 -0.0404820847  1.3581
## 5        0  7.6820   0    0.10410005     0.1369063 -0.0328062456  7.6820
## 6        0 41.1720   0    0.36393311     0.2643117  0.0996213622 41.1720
## 7        0 19.4920   1    0.16933976     0.1746644 -0.0053246137 19.4920
## 8        1 20.8220   1    0.17843231     0.1793943 -0.0009620341 20.8220
## 9        0  6.9620   0    0.10094678     0.1348426 -0.0338958439  6.9620
## 10       0  2.5020   0    0.08324538     0.1226384 -0.0393929781  2.5020
##    marker.neg
## 1           0
## 2           1
## 3           1
## 4           1
## 5           1
## 6           0
## 7           1
## 8           1
## 9           1
## 10          1As we see above, the object contains information about the study design, model fit, fitted risks given treatment, and estimated treatment effect for each individual.
Now create a trtsel object using a discrete marker.
# Y2_disc = as.numeric(Y2>0)
trtsel.Y2_disc <- trtsel(outcome ~ trt*(Y2_disc), 
                         treatment.name = "trt",
                         data = tsdata, 
                         study.design = "randomized cohort", 
                         link = "logit")See ?trtsel for more information. Now that we have created trtsel objects, we can plot, evaluate, calibrate and compare them.
Plot risk curves:
plot(trtsel.Y1, 
            main = "Y1: Oncotype-DX-like marker", 
            plot.type = "risk", 
            ci = "horizontal", 
            conf.bands = TRUE, 
            bootstraps = 50,       #more bootstraps should be run than this in practice!
            trt.names = c("chemo.", "no chemo."), 
            show.marker.axis = FALSE)For a binary marker, we calculate vertical confidence bands:
tmp <- plot(trtsel.Y2_disc,
                   main = "Discrete version of Y2", 
                   plot.type = "risk", 
                   ci = "vertical", 
                   conf.bands = TRUE, 
                   offset = 0.01, 
                   bootstraps = 50, 
                   trt.names = c("chemo.", "no chemo."))tmp is now a list with elements plot that holds the ggplot output, and ci.bounds which holds the information regarding the confidence bounds.
tmp$ci.bounds##         risk trt marker      lower      upper
## 1 0.35984848   1      0 0.32422707 0.39520792
## 2 0.06198347   1      1 0.03510314 0.09270948
## 4 0.32061069   0      1 0.26155639 0.37440180
## 5 0.17241379   0      0 0.12972671 0.20252033We can also plot the distribution of treatment effects.
plot(trtsel.Y1, 
            plot.type = "treatment effect", 
            ci = "horizontal", 
            conf.bands = TRUE, 
            bootstraps = 50)plot.trtsel(trtsel.Y2_disc, 
            plot.type = "treatment effect", 
            conf.bands = TRUE, 
            bootstraps = 50)## [1] "vertical confidence intervals will be plotted."Plot ‘selection impact’ curves, which show the estimated event rate if different proportions of observations where treated based off the marker of interest.
plot.trtsel(trtsel.Y1, 
            plot.type = "selection impact", 
            ci = "vertical", 
            conf.bands = TRUE, 
            bootstraps = 50)Calculate summary measures of marker performance along with bootstrap confidence intervals.
tmp <- evaluate(trtsel.Y1, bootstraps = 50)
tmp## 
##   Summary Measure Estimates (with 95% confidence intervals) 
##  -----------------------------------------------------------
##   Decrease in event rate under marker-based treatment (Theta)
##     Empirical:    0.013 (0.013,0.013) 
##     Model Based:  0.01 (0.01,0.01) 
## 
##   Proportion marker negative:
##    0.461 (0.461,0.461) 
##   Proportion marker positive:
##    0.539 (0.539,0.539) 
## 
##   Average benefit of no treatment among marker-negatives (B.neg)
##     Empirical:    0.029 (0.029,0.029) 
##     Model Based:  0.023 (0.023,0.023) 
## 
##   Average benefit of treatment among marker-positives (B.pos)
##     Empirical:    0.089 (0.089,0.089) 
##     Model Based:  0.098 (0.098,0.098) 
## 
## 
##   Variance in estimated treatment effect: 
##     0.007 (0.007,0.007) 
##   Total Gain: 
##     0.066 (0.066,0.066) 
## 
##   Marker positivity threshold:  21.082
## 
##   Event Rates:
##  --------------------
##              Treat all       Treat None    Marker-based Treatment
##  Empirical:     0.217           0.251          0.204    
##             (0.217,0.217)   (0.251,0.251)   (0.204,0.204) 
##  Model Based:   0.214           0.257          0.204    
##             (0.214,0.214)   (0.257,0.257)   (0.204,0.204)# access the estimates
tmp$estimates##   p.neg p.pos  B.neg.emp  B.neg.mod  B.pos.emp  B.pos.mod  Theta.emp
## 1 0.461 0.539 0.02864779 0.02252989 0.08866547 0.09846275 0.01320663
##    Theta.mod   Var.Delta         TG ER.trt0.emp ER.trt0.mod ER.trt1.emp
## 1 0.01038628 0.007416402 0.06579359   0.2510121   0.2567964   0.2173913
##   ER.trt1.mod ER.mkrbased.emp ER.mkrbased.mod Marker.Thresh
## 1   0.2141113       0.2041847        0.203725        21.082# discrete marker
evaluate(trtsel.Y2_disc, bootstraps = 50)## 
##   Summary Measure Estimates (with 95% confidence intervals) 
##  -----------------------------------------------------------
##   Decrease in event rate under marker-based treatment (Theta)
##     Empirical:    0.093 (0.093,0.093) 
##     Model Based:  0.093 (0.093,0.093) 
## 
##   Proportion marker negative:
##    0.496 (0.496,0.496) 
##   Proportion marker positive:
##    0.504 (0.504,0.504) 
## 
##   Average benefit of no treatment among marker-negatives (B.neg)
##     Empirical:    0.187 (0.187,0.187) 
##     Model Based:  0.187 (0.187,0.187) 
## 
##   Average benefit of treatment among marker-positives (B.pos)
##     Empirical:    0.259 (0.259,0.259) 
##     Model Based:  0.259 (0.259,0.259) 
## 
## 
##   Event Rates:
##  --------------------
##              Treat all       Treat None    Marker-based Treatment
##  Empirical:     0.217           0.251          0.124    
##             (0.217,0.217)   (0.251,0.251)   (0.124,0.124) 
##  Model Based:   0.210           0.247          0.117    
##             (0.210,0.210)   (0.247,0.247)   (0.117,0.117)Currently, model calibration is only available for continuous markers.
calibrate(trtsel.Y1, 
                 groups = 10, 
                 plot = "calibration", 
                 trt.names = c("chemo.", "no chemo."))## 
##   Hosmer - Lemeshow test for model calibration
##  ----------------------------------------------
## 
##    Number of Groups: 10 
## 
##    No Treatment (trt = 0):
##     Test Statistic = 4.496,   DF = 8,   p value = 0.8098813
## 
##    Treated (trt = 1):
##     Test Statistic = 4.986,   DF = 8,   p value = 0.7591213See ?calibrate.trtsel for more plot options.
To compare markers, the trt and event labels must be identical for the two markers. Plots can not be generated if comparing a discrete marker with a continuous marker.
# trtsel object for the stronger marker 2
trtsel.Y2 <- trtsel(event~trt*(Y2 + Y1), 
                    treatment.name = "trt", 
                    data = tsdata, 
                    default.trt = "trt all")
# Compare the markers based on summary measures
mycompare <- compare(trtsel1 = trtsel.Y1, 
                     trtsel2 = trtsel.Y2, 
                     model.names = c("Model 1", "Model 2"), 
                     bootstraps = 50, 
                     plot = TRUE, 
                     ci = "vertical", 
                     offset = 0.01, 
                     conf.bands = TRUE)mycompare##                       Summary Measure Estimates 
##                     (with  95 % confidence intervals) 
## 
##                 Model 1     |     Model 2  |   difference    (p-value)
##  ------------------------------------------------------------------------
## 
## Decrease in event rate under marker-based treatment (Theta)
##  Empirical:     0.013      |     0.092     |     -0.079         (< 0.02)
##             (0.019,0.051) | (0.172,0.275) | (-0.188,-0.167) 
##  Model Based:   0.010      |     0.097      |     -0.087         (< 0.02)
##             (0.041,0.181)  | (0.194,0.265)  | (-0.172,-0.060) 
## 
## Proportion marker negative:
##                 0.461      |     0.414      |     0.047         (< 0.02)
##             (-2.500,-1.903)  | (0.164,0.232)  | (-2.689,-2.108) 
## Proportion marker positive:
##                 0.539      |     0.586      |     -0.047         (< 0.02)
##             (-0.041,-0.001)  | (0.170,0.223)  | (-0.248,-0.199) 
## 
## Average benefit of no treatment among marker-negatives (B.neg)
##  Empirical:     0.029      |     0.223     |     -0.194         (0.68)
##             (0.002,0.694) | (0.324,0.478) | (-0.321,0.220) 
##  Model Based:   0.023      |     0.234      |     -0.212         (0.66)
##             (0.306,0.998)  | (0.522,0.676)  | (-0.220,0.321) 
## 
## Average benefit of treatment among marker-positives (B.pos)
##  Empirical:     0.089      |     0.219     |     -0.130         (< 0.02)
##             (-0.050,0.077) | (0.166,0.281) | (-0.315,-0.118) 
##  Model Based:   0.098      |     0.236      |     -0.137         (< 0.02)
##             (0.000,0.051)  | (0.199,0.275)  | (-0.259,-0.167) 
## 
## 
## Variance in estimated treatment effect : 
##                 0.007      |     0.094      |     -0.086         (< 0.02)
##             (-0.004,0.043)  | (0.072,0.120)  | (-0.104,-0.044) 
## 
## Total Gain: 
##                 0.066      |     0.231      |     -0.165         (< 0.02)
##             (0.000,0.034)  | (0.072,0.122)  | (-0.098,-0.068)## Compare two discrete markers Y1_disc = as.numeric(Y1>mean(Y1))
trtsel.Y1_disc <- trtsel(event~Y1_disc*trt, 
                         treatment.name = "trt", 
                         data = tsdata, 
                         study.design = "randomized cohort", 
                         link = "logit")
compare.trtsel(trtsel1 = trtsel.Y1_disc, 
               trtsel2 = trtsel.Y2_disc, 
               ci = "vertical", 
               offset = 0.2, 
               bootstraps = 50, 
               plot = TRUE, 
               conf.bands = TRUE, 
               annotate.plot = FALSE)##                       Summary Measure Estimates 
##                     (with  95 % confidence intervals) 
## 
##                 Model 1     |     Model 2  |   difference    (p-value)
##  ------------------------------------------------------------------------
## 
## Decrease in event rate under marker-based treatment (Theta)
##  Empirical:     0.011      |     0.093     |     -0.082         (< 0.02)
##             (-0.015,0.196) | (0.064,0.313) | (-0.108,-0.064) 
##  Model Based:   0.011      |     0.093      |     -0.082         (< 0.02)
##             (-0.015,0.057)  | (0.064,0.126)  | (-0.108,-0.046) 
## 
## Proportion marker negative:
##                 0.570      |     0.496      |     0.074         (< 0.02)
##             (0.541,0.597)  | (0.465,0.525)  | (0.043,0.108) 
## Proportion marker positive:
##                 0.430      |     0.504      |     -0.074         (< 0.02)
##             (0.403,0.459)  | (0.475,0.535)  | (-0.108,-0.043) 
## 
## Average benefit of no treatment among marker-negatives (B.neg)
##  Empirical:     0.019      |     0.187     |     -0.168         (< 0.02)
##             (-0.026,0.099) | (0.129,0.253) | (-0.217,-0.099) 
##  Model Based:   0.019      |     0.187      |     -0.168         (< 0.02)
##             (-0.026,0.099)  | (0.129,0.253)  | (-0.217,-0.099) 
## 
## Average benefit of treatment among marker-positives (B.pos)
##  Empirical:     0.106      |     0.259     |     -0.153         (< 0.02)
##             (0.019,0.196) | (0.203,0.313) | (-0.218,-0.064) 
##  Model Based:   0.106      |     0.259      |     -0.153         (< 0.02)
##             (0.019,0.196)  | (0.203,0.313)  | (-0.218,-0.064) 
## 
## 
## Variance in estimated treatment effect : 
##                 0.004      |     0.050      |     -0.046         (< 0.02)
##             (0.000,0.013)  | (0.032,0.064)  | (-0.058,-0.027) 
## 
## Total Gain: 
##                 0.061      |     0.223      |     -0.162         (< 0.02)
##             (0.013,0.113)  | (0.178,0.252)  | (-0.212,-0.087)See ?compare.trtsel for more options.
Alternative to including a marker and fitting a logistic model, the user can specify fitted risks for trt = 0 and trt = 1. In this case, no model fitting will be implemented and all bootstrap confidence intervals will be conditional on the provided fitted model.
#calculate model fit
mymod <- glm(event~trt*(Y2), data= tsdata, family = binomial("logit"))
tsdata$fitted.t0 <- predict(mymod, newdata=data.frame(trt = 0,Y1 = tsdata$Y1, Y2 = tsdata$Y2), type = "response")
tsdata$fitted.t1 <- predict(mymod, newdata=data.frame(trt = 1,Y1 = tsdata$Y1, Y2 = tsdata$Y2), type = "response")
myfitted.trtsel <- trtsel( event~trt, treatment.name = "trt", 
                         data = tsdata,
                         fittedrisk.t0 = "fitted.t0",
                         fittedrisk.t1 = "fitted.t1",
                         study.design = "randomized cohort", 
                         default.trt = "trt all")We can now use this trtsel object just as before, but confidence intervals will be smaller because we do not account for the variation due to model fitting.
plot.trtsel(myfitted.trtsel, bootstraps = 50, plot.type = "risk",
            ci = "horizontal", show.marker.axis = FALSE)estimate_measures(event = tsdata$event, trt = tsdata$trt, marker.neg = 1-tsdata$Y1_disc)## Warning in estimate_measures(event = tsdata$event, trt = tsdata$trt,
## marker.neg = 1 - : Estimates of trt.effect are not provided. Only empirical
## estimates will be calculated.## 
##   Summary Measure Estimates
##  -----------------------------------------------------------
##   Decrease in event rate under marker-based treatment (Theta)
##     Empirical:    0.011
##     Model Based:  NA
## 
##   Proportion marker negative:    0.57
##   Proportion marker positive:    0.43
## 
##   Average benefit of no treatment among marker-negatives (B.neg)
##     Empirical:    0.019
##     Model Based:  NA
## 
##   Average benefit of treatment among marker-positives (B.pos)
##     Empirical:    0.106
##     Model Based:  NA
## 
##   Variance in estimated treatment effect:   NA
##   Total Gain:     NA
## 
##   Event Rates:
##  --------------------
##              Treat all       Treat None    Marker-based Treatment
##  Empirical:     0.217           0.251          0.207    
## mod <- glm(event~trt*Y1_disc,  data = tsdata, family = binomial())
tsdata.0 <- tsdata; 
tsdata.0$trt = 0 
tsdata.1 <- tsdata;
tsdata.1$trt = 1
delta.hat <- predict(mod, newdata= tsdata.0, type = "response") - predict(mod, newdata= tsdata.1, type = "response")
estimate_measures(event = tsdata$event, trt = tsdata$trt, marker.neg = 1- tsdata$Y1_disc, trt.effect = delta.hat )## 
##   Summary Measure Estimates
##  -----------------------------------------------------------
##   Decrease in event rate under marker-based treatment (Theta)
##     Empirical:    0.011
##     Model Based:  0.011
## 
##   Proportion marker negative:    0.57
##   Proportion marker positive:    0.43
## 
##   Average benefit of no treatment among marker-negatives (B.neg)
##     Empirical:    0.019
##     Model Based:  0.019
## 
##   Average benefit of treatment among marker-positives (B.pos)
##     Empirical:    0.106
##     Model Based:  0.106
## 
##   Variance in estimated treatment effect:   0.004
##   Total Gain:     0.061
## 
##   Event Rates:
##  --------------------
##              Treat all       Treat None    Marker-based Treatment
##  Empirical:     0.217           0.251          0.207    
## Janes H, Brown MD, Huang Y, Pepe MS. 2014. An Approach to Evaluating and Comparing Biomarkers for Patient Treatment Selection. International Journal of Biostatistics