| Title: | Cox Models by Likelihood Based Boosting for a Single Survival Endpoint or Competing Risks |
| Version: | 1.5.1 |
| Description: | Provides routines for fitting Cox models by likelihood based boosting for single event survival data with right censoring or in the presence of competing risks. The methodology is described in Binder and Schumacher (2008) <doi:10.1186/1471-2105-9-14> and Binder et al. (2009) <doi:10.1093/bioinformatics/btp088>. |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 3.6.0) |
| Imports: | Matrix, survival |
| Suggests: | parallel, prodlim, snowfall |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| Packaged: | 2025-12-06 01:25:35 UTC; john |
| Author: | John Zobolas |
| Maintainer: | John Zobolas <bblodfon@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-11 13:30:24 UTC |
CoxBoost: Cox Models by Likelihood Based Boosting for a Single Survival Endpoint or Competing Risks
Description
Provides routines for fitting Cox models by likelihood based boosting for single event survival data with right censoring or in the presence of competing risks. The methodology is described in Binder and Schumacher (2008) doi:10.1186/1471-2105-9-14 and Binder et al. (2009) doi:10.1093/bioinformatics/btp088.
Author(s)
Maintainer: John Zobolas bblodfon@gmail.com (ORCID)
Authors:
Harald Binder harald.binder@uniklinik-freiburg.de (ORCID)
Fit a Cox model by likelihood based boosting
Description
CoxBoost is used to fit a Cox proportional hazards model by
componentwise likelihood based boosting. It is especially suited for models
with a large number of predictors and allows for mandatory covariates with
unpenalized parameter estimates.
Usage
CoxBoost(
time,
status,
x,
unpen.index = NULL,
standardize = TRUE,
subset = 1:length(time),
weights = NULL,
stratum = NULL,
stepno = 100,
penalty = 9 * sum(status[subset] == 1),
criterion = c("pscore", "score", "hpscore", "hscore"),
cmprsk = c("sh", "csh", "ccsh"),
coupled.strata = TRUE,
stepsize.factor = 1,
sf.scheme = c("sigmoid", "linear"),
pendistmat = NULL,
connected.index = NULL,
x.is.01 = FALSE,
return.score = TRUE,
trace = FALSE
)
Arguments
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
x |
|
unpen.index |
vector of length |
standardize |
logical value indicating whether covariates should be standardized for estimation. This does not apply for mandatory covariates, i.e., these are not standardized. |
subset |
a vector specifying a subset of observations to be used in the fitting process. |
weights |
optional vector of length |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
stepno |
number of boosting steps ( |
penalty |
penalty value for the update of an individual element of the parameter vector in each boosting step. |
criterion |
indicates the criterion to be used for selection in each
boosting step. |
cmprsk |
type of competing risk, specific hazards or cause-specific |
coupled.strata |
logical value indicating whether strata should be
coupled during variable selection in each boosting step. If |
stepsize.factor |
determines the step-size modification factor by which
the natural step size of boosting steps should be changed after a covariate
has been selected in a boosting step. The default (value |
sf.scheme |
scheme for changing step sizes (via
|
pendistmat |
connection matrix with entries ranging between 0 and 1,
with entry |
connected.index |
indices of the |
x.is.01 |
logical value indicating whether (the non-mandatory part of)
|
return.score |
logical value indicating whether the value of the score
statistic (or penalized score statistic, depending on |
trace |
logical value indicating whether progress in estimation should be indicated by printing the name of the covariate updated. |
Details
In contrast to gradient boosting (implemented e.g. in the glmboost
routine in the R package mboost, using the CoxPH loss
function), CoxBoost is not based on gradients of loss functions, but
adapts the offset-based boosting approach from Tutz and Binder (2007) for
estimating Cox proportional hazards models. In each boosting step the
previous boosting steps are incorporated as an offset in penalized partial
likelihood estimation, which is employed for obtain an update for one single
parameter, i.e., one covariate, in every boosting step. This results in
sparse fits similar to Lasso-like approaches, with many estimated
coefficients being zero. The main model complexity parameter, which has to
be selected (e.g. by cross-validation using cv.CoxBoost), is
the number of boosting steps stepno. The penalty parameter
penalty can be chosen rather coarsely, either by hand or using
optimCoxBoostPenalty.
The advantage of the offset-based approach compared to gradient boosting is
that the penalty structure is very flexible. In the present implementation
this is used for allowing for unpenalized mandatory covariates, which
receive a very fast coefficient build-up in the course of the boosting
steps, while the other (optional) covariates are subjected to penalization.
For example in a microarray setting, the (many) microarray features would be
taken to be optional covariates, and the (few) potential clinical covariates
would be taken to be mandatory, by including their indices in
unpen.index.
If a group of correlated covariates has influence on the response, e.g.
genes from the same pathway, componentwise boosting will often result in a
non-zero estimate for only one member of this group. To avoid this,
information on the connection between covariates can be provided in
pendistmat. If then, in addition, a penalty updating scheme with
stepsize.factor < 1 is chosen, connected covariates are more likely
to be chosen in future boosting steps, if a directly connected covariate has
been chosen in an earlier boosting step (see Binder and Schumacher, 2009b).
Value
CoxBoost returns an object of class CoxBoost.
n, p |
number of observations and number of covariates. |
stepno |
number of boosting steps. |
xnames |
vector of length
|
are used.
coefficients |
|
scoremat |
|
meanx, sdx |
vector of mean values and standard deviations used for standardizing the covariates. |
unpen.index |
indices of the mandatory
covariates in the original covariate matrix |
penalty |
If
|
time |
observed times given in the
|
status |
censoring indicator given in the
|
event.times |
vector with event times from the
data given in the |
linear.predictors |
|
Lambda |
matrix with the Breslow estimate for the
cumulative baseline hazard for boosting steps |
logplik |
partial log-likelihood of the fitted model in the final boosting step. |
Author(s)
Written by Harald Binder binderh@uni-mainz.de.
References
Binder, H., Benner, A., Bullinger, L., and Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in Medicine, doi: 10.1002/sim.5490.
Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25:890-896.
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Tutz, G. and Binder, H. (2007) Boosting ridge regression. Computational Statistics & Data Analysis, 51(12):6044-6059.
Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94:496-509.
See Also
predict.CoxBoost, cv.CoxBoost.
Examples
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)
# Fit a Cox proportional hazards model by CoxBoost
cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100,penalty=100)
summary(cbfit)
# ... with covariates 1 and 2 being mandatory
cbfit.mand <- CoxBoost(time=obs.time,status=status,x=x,unpen.index=c(1,2),
stepno=100,penalty=100)
summary(cbfit.mand)
Coeffients from CoxBoost fit
Description
Extracts the coefficients from the specified boosting steps of a CoxBoost
object fitted by CoxBoost.
Usage
## S3 method for class 'CoxBoost'
coef(object, at.step = NULL, scaled = TRUE, ...)
Arguments
object |
fitted CoxBoost object from a |
at.step |
scalar or vector of boosting step(s) at which prediction is wanted. If no step is given, the final boosting step is used. |
scaled |
logical value indicating whether coefficients should be
returned scaled to be at the level of the original covariate values, or
unscaled as used internally when |
... |
miscellaneous arguments, none of which is used at the moment. |
Value
For a vector of length p (number of covariates)
(at.step being a scalar) or a length(at.step) * p matrix
(at.step being a vector).
Author(s)
Harald Binder binderh@uni-mainz.de
Determines the optimal number of boosting steps by cross-validation
Description
Performs a K-fold cross-validation for CoxBoost in search for
the optimal number of boosting steps.
Usage
cv.CoxBoost(
time,
status,
x,
subset = 1:length(time),
weights = NULL,
stratum = NULL,
maxstepno = 100,
K = 10,
type = c("verweij", "naive"),
cmprsk = c("sh", "csh", "ccsh"),
parallel = FALSE,
upload.x = TRUE,
multicore = FALSE,
folds = NULL,
trace = FALSE,
...
)
Arguments
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
x |
|
subset |
a vector specifying a subset of observations to be used in the fitting process. |
weights |
weights to be passed to |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
maxstepno |
maximum number of boosting steps to evaluate, i.e, the
returned “optimal” number of boosting steps will be in the range
|
K |
number of folds to be used for cross-validation. If |
type |
way of calculating the partial likelihood contribution of the
observation in the hold-out folds: |
cmprsk |
type of competing risk, specific hazards or cause-specific |
parallel |
logical value indicating whether computations in the
cross-validation folds should be performed in parallel on a compute cluster,
using package |
upload.x |
logical value indicating whether |
multicore |
indicates whether computations in the cross-validation
folds should be performed in parallel, using package |
folds |
if not |
trace |
logical value indicating whether progress in estimation should be indicated by printing the number of the cross-validation fold and the index of the covariate updated. |
... |
miscellaneous parameters for the calls to
|
Value
List with the following components:
mean.logplik |
vector of
length |
se.logplik |
vector with standard error estimates for the mean partial log-likelihood criterion for each boosting step. |
optimal.step |
optimal boosting step number, i.e., with minimum mean partial log-likelihood. |
folds |
list of length |
Author(s)
Harald Binder binderh@uni-mainz.de
References
Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24):2305-2314.
See Also
CoxBoost, optimCoxBoostPenalty
Examples
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)
# 10-fold cross-validation
cv.res <- cv.CoxBoost(time=obs.time,status=status,x=x,maxstepno=500,
K=10,type="verweij",penalty=100)
# examine mean partial log-likelihood in the course of the boosting steps
plot(cv.res$mean.logplik)
# Fit with optimal number of boosting steps
cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=cv.res$optimal.step,
penalty=100)
summary(cbfit)
Control parameters for cross-validation in iCoxBoost
Description
This function allows to set the control parameters for cross-validation to
be passed into a call to iCoxBoost.
Usage
cvcb.control(
K = 10,
type = c("verweij", "naive"),
parallel = FALSE,
upload.x = TRUE,
multicore = FALSE,
folds = NULL
)
Arguments
K |
number of folds to be used for cross-validation. If |
type |
way of calculating the partial likelihood contribution of the
observation in the hold-out folds: |
parallel |
logical value indicating whether computations in the
cross-validation folds should be performed in parallel on a compute cluster,
using package |
upload.x |
logical value indicating whether |
multicore |
indicates whether computations in the cross-validation
folds should be performed in parallel, using package |
folds |
if not |
Value
List with elements corresponding to the call arguments.
Author(s)
Written by Harald Binder binderh@uni-mainz.de.
References
Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24):2305-2314.
See Also
Estimate p-values for a model fitted by CoxBoost
Description
Performs permutation-based p-value estimation for the optional covariates in
a fit from CoxBoost.
Usage
estimPVal(
object,
x,
permute.n = 10,
per.covariate = FALSE,
parallel = FALSE,
multicore = FALSE,
trace = FALSE,
...
)
Arguments
object |
fit object obtained from |
x |
|
permute.n |
number of permutations employed for obtaining a null distribution. |
per.covariate |
logical value indicating whether a separate null distribution should be considered for each covariate. A larger number of permutations will be needed if this is wanted. |
parallel |
logical value indicating whether computations for obtaining
a null distribution via permutation should be performed in parallel on a
compute cluster. Parallelization is performed via the package
|
multicore |
indicates whether computations in the permuted data sets
should be performed in parallel, using package |
trace |
logical value indicating whether progress in estimation should be indicated by printing the number of the permutation that is currently being evaluated. |
... |
miscellaneous parameters for the calls to
|
Details
As p-value estimates are based on permutations, random numbers are drawn for
determining permutation indices. Therfore, the results depend on the state
of the random number generator. This can be used to explore the variability
due to random variation and help to determine an adequate value for
permute.n. A value of 100 should be sufficient, but this can be quite
slow. If there is a considerable number of covariates, e.g., larger than
100, a much smaller number of permutations, e.g., 10, might already work
well. The estimates might also be negatively affected, if only a small
number of boosting steps (say <50) was employed for the original fit.
Value
Vector with p-value estimates, one value for each optional covariate
specificed in the original call to CoxBoost.
Author(s)
Harald Binder binderh@uni-mainz.de
References
Binder, H., Porzelius, C. and Schumacher, M. (2009). Rank-based p-values for sparse high-dimensional risk prediction models fitted by componentwise boosting. FDM-Preprint Nr. 101, University of Freiburg, Germany.
See Also
Examples
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)
# Fit a Cox proportional hazards model by CoxBoost
cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100,
penalty=100)
# estimate p-values
p1 <- estimPVal(cbfit,x,permute.n=10)
# get a second vector of estimates for checking how large
# random variation is
p2 <- estimPVal(cbfit,x,permute.n=10)
plot(p1,p2,xlim=c(0,1),ylim=c(0,1),xlab="permute 1",ylab="permute 2")
Interface for cross-validation and model fitting using a formula description
Description
Formula interface for fitting a Cox proportional hazards model by
componentwise likelihood based boosting (via a call to
CoxBoost), where cross-validation can be performed
automatically for determining the number of boosting steps (via a call to
cv.CoxBoost).
Usage
iCoxBoost(
formula,
data = NULL,
weights = NULL,
subset = NULL,
mandatory = NULL,
cause = 1,
cmprsk = c("sh", "csh", "ccsh"),
standardize = TRUE,
stepno = 200,
criterion = c("pscore", "score", "hpscore", "hscore"),
nu = 0.1,
stepsize.factor = 1,
varlink = NULL,
cv = cvcb.control(),
trace = FALSE,
...
)
Arguments
formula |
A formula describing the model to be fitted, similar to a
call to |
data |
data frame containing the variables described in the formula. |
weights |
optional vector, for specifying weights for the individual observations. |
subset |
a vector specifying a subset of observations to be used in the fitting process. |
mandatory |
vector containing the names of the covariates whose effect is to be estimated un-regularized. |
cause |
cause of interest in a competing risks setting, when the
response is specified by |
cmprsk |
type of competing risk, specific hazards or cause-specific |
standardize |
logical value indicating whether covariates should be standardized for estimation. This does not apply for mandatory covariates, i.e., these are not standardized. |
stepno |
maximum number of boosting steps to be evaluated when determining the number of boosting steps by cross-validation, otherwise the number of boosting seps itself. |
criterion |
indicates the criterion to be used for selection in each
boosting step. |
nu |
(roughly) the fraction of the partial maximum likelihood estimate
used for the update in each boosting step. This is converted into a penalty
for the call to |
stepsize.factor |
determines the step-size modification factor by which
the natural step size of boosting steps should be changed after a covariate
has been selected in a boosting step. The default (value |
varlink |
list for specifying links between covariates, used to
re-distribute step sizes when |
cv |
|
trace |
logical value indicating whether progress in estimation should be indicated by printing the name of the covariate updated. |
... |
miscellaneous arguments, passed to the call to
|
Details
In contrast to gradient boosting (implemented e.g. in the glmboost
routine in the R package mboost, using the CoxPH loss
function), CoxBoost is not based on gradients of loss functions, but
adapts the offset-based boosting approach from Tutz and Binder (2007) for
estimating Cox proportional hazards models. In each boosting step the
previous boosting steps are incorporated as an offset in penalized partial
likelihood estimation, which is employed for obtain an update for one single
parameter, i.e., one covariate, in every boosting step. This results in
sparse fits similar to Lasso-like approaches, with many estimated
coefficients being zero. The main model complexity parameter, the number of
boosting steps, is automatically selected by cross-validation using a call
to cv.CoxBoost). Note that this will introduce random
variation when repeatedly calling iCoxBoost, i.e. it is advised to
set/save the random number generator state for reproducible results.
The advantage of the offset-based approach compared to gradient boosting is
that the penalty structure is very flexible. In the present implementation
this is used for allowing for unpenalized mandatory covariates, which
receive a very fast coefficient build-up in the course of the boosting
steps, while the other (optional) covariates are subjected to penalization.
For example in a microarray setting, the (many) microarray features would be
taken to be optional covariates, and the (few) potential clinical covariates
would be taken to be mandatory, by including their names in
mandatory.
If a group of correlated covariates has influence on the response, e.g.
genes from the same pathway, componentwise boosting will often result in a
non-zero estimate for only one member of this group. To avoid this,
information on the connection between covariates can be provided in
varlink. If then, in addition, a penalty updating scheme with
stepsize.factor < 1 is chosen, connected covariates are more likely
to be chosen in future boosting steps, if a directly connected covariate has
been chosen in an earlier boosting step (see Binder and Schumacher, 2009b).
Value
iCoxBoost returns an object of class iCoxBoost, which
also has class CoxBoost. In addition to the elements from
CoxBoost it has the following elements:
call, formula, terms |
call, formula and terms from the formula interface. |
cause |
cause of interest. |
cv.res |
result from
|
Author(s)
Written by Harald Binder binderh@uni-mainz.de.
References
Binder, H., Benner, A., Bullinger, L., and Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in Medicine, doi: 10.1002/sim.5490.
Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25:890-896.
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Tutz, G. and Binder, H. (2007) Boosting ridge regression. Computational Statistics & Data Analysis, 51(12):6044-6059.
Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94:496-509.
See Also
predict.iCoxBoost, CoxBoost,
cv.CoxBoost.
Examples
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,2),rep(0,p-2))
x <- matrix(rnorm(n*p),n,p)
actual.data <- as.data.frame(x)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
actual.data$status <- ifelse(real.time <= cens.time,1,0)
actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time)
# Fit a Cox proportional hazards model by iCoxBoost
cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data)
summary(cbfit)
plot(cbfit)
# ... with covariates 1 and 2 being mandatory
cbfit.mand <- iCoxBoost(Surv(time,status) ~ .,data=actual.data,mandatory=c("V1"))
summary(cbfit.mand)
plot(cbfit.mand)
Coarse line search for adequate penalty parameter
Description
This routine helps in finding a penalty value that leads to an “optimal” number of boosting steps for CoxBoost, determined by cross-validation, that is not too small/in a specified range.
Usage
optimCoxBoostPenalty(
time,
status,
x,
minstepno = 50,
maxstepno = 200,
start.penalty = 9 * sum(status == 1),
iter.max = 10,
upper.margin = 0.05,
parallel = FALSE,
trace = FALSE,
...
)
Arguments
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
x |
|
minstepno, maxstepno |
range of boosting steps in which the “optimal” number of boosting steps is wanted to be. |
start.penalty |
start value for the search for the appropriate penalty. |
iter.max |
maximum number of search iterations. |
upper.margin |
specifies the fraction of |
parallel |
logical value indicating whether computations in the
cross-validation folds should be performed in parallel on a compute cluster.
Parallelization is performed via the package |
trace |
logical value indicating whether information on progress should be printed. |
... |
miscellaneous parameters for |
Details
The penalty parameter for CoxBoost has to be chosen only very
coarsely. In Tutz and Binder (2006) it is suggested for likelihood based
boosting just to make sure, that the optimal number of boosting steps,
according to some criterion such as cross-validation, is larger or equal to
50. With a smaller number of steps, boosting may become too “greedy” and
show sub-optimal performance. This procedure uses a very coarse line search
and so one should specify a rather large range of boosting steps.
Value
List with element penalty containing the “optimal” penalty
and cv.res containing the corresponding result of cv.CoxBoost.
Author(s)
Written by Harald Binder binderh@uni-mainz.de.
References
Tutz, G. and Binder, H. (2006) Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics, 62:961-971.
See Also
Examples
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)
# determine penalty parameter
optim.res <- optimCoxBoostPenalty(time=obs.time,status=status,x=x,
trace=TRUE,start.penalty=500)
# Fit with obtained penalty parameter and optimal number of boosting
# steps obtained by cross-validation
cbfit <- CoxBoost(time=obs.time,status=status,x=x,
stepno=optim.res$cv.res$optimal.step,
penalty=optim.res$penalty)
summary(cbfit)
Plot coefficient paths from CoxBoost fit
Description
Plots coefficient paths, i.e. the parameter estimates plotted against the
boosting steps as obtained from a CoxBoost object fitted by
CoxBoost.
Usage
## S3 method for class 'CoxBoost'
plot(
x,
line.col = "dark grey",
label.cex = 0.6,
xlab = NULL,
ylab = NULL,
xlim = NULL,
ylim = NULL,
main = NULL,
...
)
Arguments
x |
fitted CoxBoost object from a |
line.col |
color of the lines of the coefficient path |
label.cex |
scaling factor for the variable labels. |
xlab |
label for the x axis, default label when |
ylab |
label for the y axis, default label when |
xlim, ylim |
plotting ranges, default label when |
main |
a main title for the plot |
... |
miscellaneous arguments, passed to the plot routine. |
Value
No value is returned, but a plot is generated.
Author(s)
Harald Binder binderh@uni-mainz.de
Predict method for CoxBoost fits
Description
Obtains predictions at specified boosting steps from a CoxBoost object
fitted by CoxBoost.
Usage
## S3 method for class 'CoxBoost'
predict(
object,
newdata = NULL,
newtime = NULL,
newstatus = NULL,
subset = NULL,
weights = NULL,
stratum = NULL,
at.step = NULL,
times = NULL,
type = c("lp", "logplik", "risk", "CIF"),
...
)
Arguments
object |
fitted CoxBoost object from a |
newdata |
|
newtime, newstatus |
vectors with observed time and censoring indicator
(0 for censoring, 1 for no censoring, and any other values for competing
events in a competing risks setting) for new observations, where prediction
is wanted. Only required if predicted partial log-likelihood is wanted,
i.e., if |
subset |
an optional vector specifying a subset of observations to be used for evaluation. |
weights |
weights for each time. |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
at.step |
scalar or vector of boosting step(s) at which prediction is
wanted. If |
times |
vector with |
type |
type of prediction to be returned: |
... |
miscellaneous arguments, none of which is used at the moment. |
Value
For type="lp" and type="logplik" a vector of length
n.new (at.step being a scalar) or a n.new *
length(at.step) matrix (at.step being a vector) with predictions is
returned. For type="risk" or type="CIF" a n.new * T
matrix with predicted probabilities at the specific time points is returned.
Author(s)
Harald Binder binderh@uni-mainz.de
Examples
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)
# define training and test set
train.index <- 1:100
test.index <- 101:200
# Fit CoxBoost to the training data
cbfit <- CoxBoost(time=obs.time[train.index],status=status[train.index],
x=x[train.index,],stepno=300,penalty=100)
# mean partial log-likelihood for test set in every boosting step
step.logplik <- predict(cbfit,newdata=x[test.index,],
newtime=obs.time[test.index],
newstatus=status[test.index],
at.step=0:300,type="logplik")
plot(step.logplik)
# names of covariates with non-zero coefficients at boosting step
# with maximal test set partial log-likelihood
print(cbfit$xnames[cbfit$coefficients[which.max(step.logplik),] != 0])
Predict method for iCoxBoost fits
Description
Obtains predictions at specified boosting steps from a iCoxBoost object
fitted by iCoxBoost.
Usage
## S3 method for class 'iCoxBoost'
predict(
object,
newdata = NULL,
subset = NULL,
at.step = NULL,
times = NULL,
type = c("lp", "logplik", "risk", "CIF"),
...
)
Arguments
object |
fitted CoxBoost object from a |
newdata |
data frame with new covariate values (for |
subset |
an optional vector specifying a subset of observations to be used for evaluation. |
at.step |
scalar or vector of boosting step(s) at which prediction is
wanted. If |
times |
vector with |
type |
type of prediction to be returned: |
... |
miscellaneous arguments, none of which is used at the moment. |
Value
For type="lp" and type="logplik" a vector of length
n.new (at.step being a scalar) or a n.new *
length(at.step) matrix (at.step being a vector) with predictions is
returned. For type="risk" or type="CIF" a n.new * T
matrix with predicted probabilities at the specific time points is returned.
Author(s)
Harald Binder binderh@uni-mainz.de
Examples
n <- 200; p <- 100
beta <- c(rep(1,2),rep(0,p-2))
x <- matrix(rnorm(n*p),n,p)
actual.data <- as.data.frame(x)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
actual.data$status <- ifelse(real.time <= cens.time,1,0)
actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time)
# define training and test set
train.index <- 1:100
test.index <- 101:200
# Fit a Cox proportional hazards model by iCoxBoost
cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data[train.index,],
stepno=300,cv=FALSE)
# mean partial log-likelihood for test set in every boosting step
step.logplik <- predict(cbfit,newdata=actual.data[test.index,],
at.step=0:300,type="logplik")
plot(step.logplik)
# names of covariates with non-zero coefficients at boosting step
# with maximal test set partial log-likelihood
print(coef(cbfit,at.step=which.max(step.logplik)-1))
Print a CoxBoost type
Description
Print a CoxBoost type
Usage
## S3 method for class 'CoxBoost'
print(x, ...)
Arguments
x |
a CoxBoost object |
... |
not used |
Value
Invisibly returns NULL. The function is called for its
side-effects, printing for each cause–stratum combination:
the number of boosting steps,
the number of non-zero coefficients,
the partial log-likelihood.
Performs resampling for a stratified weighted Cox model by componentwise likelihood based boosting for developing subgroup covariate signatures
Description
resample.CoxBoost is used to perform resampling for stratified Cox
proportional hazards models by componentwise likelihood based boosting with
weights for developing subgroup covariate signatures. Specifically, this
weighted Cox regression approach is for performing a subgroup analysis, not
for the standard subgroup analysis but for a weighted form of it where the
observations of the not analyzed subgroup are down-weighted.
resample.CoxBoost calls cv.CoxBoost and CoxBoost.
Usage
resample.CoxBoost(
time,
status,
x,
rep = 100,
maxstepno = 200,
multicore = TRUE,
mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99),
stratum,
stratnotinfocus = 0,
penalty = sum(status) * (1/0.02 - 1),
criterion = "hscore",
unpen.index = NULL,
trace = FALSE
)
Arguments
time |
vector of length |
status |
vector of length |
x |
|
rep |
number of resampling data sets used for the evaluation of model
stability. Should be equal to |
maxstepno |
maximum number of boosting steps needed for
cross-validation in |
multicore |
logical value: If |
mix.list |
vector of different relative weights between zero and one, for specifying weights for the individual observations for the weighted regression approach. For each weight, a stratified Cox regression by componentwise boosting is fitted. |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
stratnotinfocus |
value for the group which is not analyzed but should
be weighted down in the analysis with the different weights element of
|
penalty |
penalty value for the covariates for the update of an
individual element in each boosting step. The mandatory covariates should
get a penalty value of zero, which can be realized with |
criterion |
indicates the criterion to be used for selection in each
boosting step in |
unpen.index |
index for mandatory covariates, which should get no penalty value in the log-partial likelihood function. |
trace |
logical value indicating whether progress in estimation should
be indicated by printing the number of the cross-validation fold and the
index of the covariate updated in the |
Details
The CoxBoost function can be used for performing variable selection
for time-to-event data based on componentwise likelihood based boosting
(Binder and Schumacher, 2009; Binder et al., 2013) for obtaining signatures
for high dimensional data, such as gene expression or high throughput
methylation measurements. If there is any heterogeneity due to known
patient subgroups, it would be interesting for developing subgroup
signatures to account for this heterogeneity. To account for heterogeneity
in the data a stratified Cox model via stratum can be performed,
which allows for different baseline hazards in each subgroup. For performing
a stratified CoxBoost for such heterogeneous subgroups the
stratum can be set to a subgroup variable, containing also the
subgroup for which a subgroup signature should be developed. A standard
subgroup analysis may lead to a decreased statistical power. With a
weighted approach based on componentwise likelihood-based boosting one can
develop subgroup signatures without losing statistical power as it is the
case in standard subgroup analysis. This approach focuses on building a risk
prediction signature for a specific stratum by down-weighting the
observations (Simon, 2002) from the other strata using a range of weights.
The not analyzed subgroup has to be indicated in stratnotinfocus, the
observations of this subgroup, which is not in focus but should be retained
in the analysis, are weighted down by different relative weights. Binder et
al. (2012) propose such a weighting approach for high dimensional data but
not for time-to-event endpoints. resample.CoxBoost performs a
weighted regression approach for time-to-event data while using resampling
over rep resampling data sets, specifically for evaluating the
results. The different weights can be entered by mix.list which
should be from a range between zero and one. For examining the variable
selection stability for specific covariates as a function of the weights the
beta coefficients are an output value of resample.CoxBoost, so that
resampling inclusion frequencies (Sauerbrei et. al, 2011) can be calculated
and visualized via the functions stabtrajec and weightfreqmap.
For mandatory variables (Binder and Schumacher, 2008), which are only
included for adjusting the model no penalty should be added, these can be
indicated by the indices in unpen.index.
Value
resample.CoxBoost returns a list of length rep list
elements with each list element being further a list of the following two
objects:
CV.opt |
number of optimal boosting steps obtained from
|
beta |
beta coefficients obtained
from |
Author(s)
Written by Veronika Weyer weyer@uni-mainz.de and Harald Binder binderh@uni-mainz.de.
References
Binder, H., Benner, A., Bullinger, L., Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in medicine 32(10), 1778-1791
Binder, H., Müller, T., Schwender, H., Golka, K., Steffens, M., G., H.J., Ickstadt, K., and Schumacher, M (2012). Cluster-Localized Sparse Logistic Regression for SNP Data. Statistical applications in genetics and molecular biology, 11(4), 1-31
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Sauerbrei, W., Boulesteix, A.-L., Binder, H. (2011). Stability investigations of multivariable regression models derived from low- and high-dimensional data. Journal of biopharmaceutical statistics 21(6), 1206-31
Simon, R. (2002). Bayesian subset analysis: application to studying treatment-by-gender interactions. Statistics in medicine 21(19), 2909-16
See Also
Examples
# Generate survival data with five informative covariates for one subgroup
n <- 400; p <- 1000
set.seed(129)
group<-rbinom(n,1,0.5)
x <- matrix(rnorm(n*p,0,1),n,p)
beta.vec1 <- c(c(1,1,1,1,1),rep(0,p-5))
beta.vec0 <- c(c(0,0,0,0,0),rep(0,p-5))
linpred<-ifelse(group==1,x %*% beta.vec1,x %*% beta.vec0)
set.seed(1234)
real.time<- (-(log(runif(n)))/(1/20*exp(linpred)))
cens.time <- rexp(n,rate=1/20)
obs.status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)
# Fit a stratified weighted Cox proportional hazards model
# by \code{resample.CoxBoost}
mix.list=c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99)
RIF <- resample.CoxBoost(
time=obs.time,status=obs.status,x=x,
# use more repetitions (eg `rep = 100`) for more stable results
rep=5,
maxstepno=200,multicore=FALSE,
mix.list=mix.list,
stratum=group,stratnotinfocus=0,penalty=sum(obs.status)*(1/0.02-1),
criterion="hscore",unpen.index=NULL)
# RIF is a list with number of resampling data sets list objects, each list
# object has further two list objects, the beta coefficients for all
# weights and the selected number of boosting steps.
# For each list object i \code{RIF[[i]]$beta} contains the beta
# coefficients with length mix.list*p for p covariates and with
# \code{RIF[[i]]$CV.opt}
# For getting an insight in the RIF Distribution of different weights
# use the following code:
RIF1<-c()
for (i in 1: length(RIF)){RIF1<-c(RIF1,RIF[[i]][[1]])}
freqmat <-matrix(apply(matrix(unlist(RIF1), ncol=length(RIF))!=0,1,mean),
ncol=length(mix.list))
# freqmat is a matrix with p rows and \code{length(mix.list)} columns
# which contains resampling inclusion frequencies for the different
# covariates and different weights.
# two plotting functions are available for the resulting object:
stabtrajec(RIF)
weightfreqmap(RIF)
Plots stability trajectories from resCoxBoost fits
Description
Plots the stability trajectories, i.e. the variable selection stability is
plotted in form of resampling inclusion frequencies for different weights
and a selection of different stably selected genes as obtained from a
resCoxBoost object fitted by resample.CoxBoost.
Usage
stabtrajec(
RIF,
mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99),
plotmix = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99),
my.colors = grDevices::gray(seq(0.99, 0, len = 10)),
yupperlim = 1,
huge = 0.6,
lowerRIFlimit = 0.6,
legendval = 4.5
)
weightfreqmap(
RIF,
mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99),
plotmix = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99),
lowerRIFlimit = 0.5,
method = "complete"
)
Arguments
RIF |
list obtained from a |
mix.list |
vector of weights which were also entered in the
|
plotmix |
vector of weights which should be plotted in the stability trajectory plot. |
my.colors |
vector with length(plotmix) of different colors for plotting the different weights, default are gray shades. |
yupperlim |
value for the upper y coordinate. |
huge |
size of the labels plottet on the x-axis. |
lowerRIFlimit |
cutoff RIF value: All covariates which have an resampling inclusion frequency (RIF) greater or equal this lowerRIFlimit value at any weight are presented in the stability trajectory plot. |
legendval |
space between the last plotted covariate and the legend on the right side of the plot. |
method |
passed to |
Details
The stabtrajec function produces a visualization tool (stability
trajectory plot), which shows the RIFs as a function of weights for stably
selected covariates selected via resample.CoxBoost. With this tool it
can be shown that with weights between zero and one and so the additional
information from the observations of the not analyzed subgroup in comparison
to the standard subgroup analysis (weight near zero) can improve the
variable selection stability. The number of plotted covariates in this plot
depends on lowerRIFlimit. How many weights are plotted which are
element of mix.list can be changed with plotmix.
Value
No value is returned, but the stability trajectory plot is generated.
Author(s)
Veronika Weyer weyer@uni-mainz.de and Harald Binder binderh@uni-mainz.de
Summary of a CoxBoost type
Description
Summary of a CoxBoost type
Usage
## S3 method for class 'CoxBoost'
summary(object, ...)
Arguments
object |
a CoxBoost object |
... |
not used |
Value
Invisibly returns NULL. The function is called for its
side-effects, printing for each cause–stratum combination:
the number of boosting steps and non-zero coefficients,
the partial log-likelihood,
the covariates with positive coefficients,
the covariates with negative coefficients,
and, if applicable, the coefficient estimates of mandatory (unpenalized) covariates.