| Type: | Package |
| Title: | A Robust Boosting Algorithm |
| Version: | 0.2 |
| Date: | 2024-11-17 |
| Description: | An implementation of robust boosting algorithms for regression in R. This includes the RRBoost method proposed in the paper "Robust Boosting for Regression Problems" (Ju X and Salibian-Barrera M. 2020) <doi:10.1016/j.csda.2020.107065>. It also implements previously proposed boosting algorithms in the simulation section of the paper: L2Boost, LADBoost, MBoost (Friedman, J. H. (2001) <doi:10.1214/aos/1013203451>) and Robloss (Lutz et al. (2008) <doi:10.1016/j.csda.2007.11.006>). |
| Depends: | R (≥ 3.5.0) |
| Imports: | stats, rpart, RobStatTM, methods |
| License: | GPL (≥ 3) |
| LazyData: | True |
| RoxygenNote: | 7.3.1 |
| Encoding: | UTF-8 |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Author: | Xiaomeng Ju [aut, cre], Matias Salibian-Barrera [aut] |
| Maintainer: | Xiaomeng Ju <xiaomeng.ju@stat.ubc.ca> |
| NeedsCompilation: | no |
| Packaged: | 2024-11-19 18:42:40 UTC; xmengju |
| Repository: | CRAN |
| Date/Publication: | 2024-11-19 19:00:02 UTC |
Robust Boosting for regression
Description
This function implements the RRBoost robust boosting algorithm for regression, as well as other robust and non-robust boosting algorithms for regression.
Usage
Boost(
x_train,
y_train,
x_val,
y_val,
x_test,
y_test,
type = "RRBoost",
error = c("rmse", "aad"),
niter = 200,
y_init = "LADTree",
max_depth = 1,
tree_init_provided = NULL,
control = Boost.control()
)
Arguments
x_train |
predictor matrix for training data (matrix/dataframe) |
y_train |
response vector for training data (vector/dataframe) |
x_val |
predictor matrix for validation data (matrix/dataframe) |
y_val |
response vector for validation data (vector/dataframe) |
x_test |
predictor matrix for test data (matrix/dataframe, optional, required when |
y_test |
response vector for test data (vector/dataframe, optional, required when |
type |
type of the boosting method: "L2Boost", "LADBoost", "MBoost", "Robloss", "SBoost", "RRBoost" (character string) |
error |
a character string (or vector of character strings) indicating the type of error metrics to be evaluated on the test set. Valid options are: "rmse" (root mean squared error), "aad" (average absolute deviation), and "trmse" (trimmed root mean squared error) |
niter |
number of boosting iterations (for RRBoost: |
y_init |
a string indicating the initial estimator to be used. Valid options are: "median" or "LADTree" (character string) |
max_depth |
the maximum depth of the tree learners (numeric) |
tree_init_provided |
an optional pre-fitted initial tree (an |
control |
a named list of control parameters, as returned by |
Details
This function implements a robust boosting algorithm for regression (RRBoost).
It also includes the following robust and non-robust boosting algorithms
for regression: L2Boost, LADBoost, MBoost, Robloss, and SBoost. This function
uses the functions available in the rpart package to construct binary regression trees.
Value
A list with the following components:
type |
which boosting algorithm was run. One of: "L2Boost", "LADBoost", "MBoost", "Robloss", "SBoost", "RRBoost" (character string) |
control |
the list of control parameters used |
niter |
number of iterations for the boosting algorithm (for RRBoost |
error |
if |
tree_init |
if |
tree_list |
if |
f_train_init |
a vector of the initialized estimator of the training data |
alpha |
a vector of base learners' coefficients |
early_stop_idx |
early stopping iteration |
when_init |
if |
loss_train |
a vector of training loss values (one per iteration) |
loss_val |
a vector of validation loss values (one per iteration) |
err_val |
a vector of validation aad errors (one per iteration) |
err_train |
a vector of training aad errors (one per iteration) |
err_test |
a matrix of test errors before and at the early stopping iteration (returned if make_prediction = TRUE in control); the matrix dimension is the early stopping iteration by the number of error types (matches the |
f_train |
a matrix of training function estimates at all iterations (returned if save_f = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration |
f_val |
a matrix of validation function estimates at all iterations (returned if save_f = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration |
f_test |
a matrix of test function estimatesbefore and at the early stopping iteration (returned if save_f = TRUE and make_prediction = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration |
var_select |
a vector of variable selection indicators (one per explanatory variable; 1 if the variable was selected by at least one of the base learners, and 0 otherwise) |
var_importance |
a vector of permutation variable importance scores (one per explanatory variable, and returned if cal_imp = TRUE in control) |
Author(s)
Xiaomeng Ju, xiaomeng.ju@stat.ubc.ca
See Also
Boost.validation, Boost.control.
Examples
data(airfoil)
n <- nrow(airfoil)
n0 <- floor( 0.2 * n )
set.seed(123)
idx_test <- sample(n, n0)
idx_train <- sample((1:n)[-idx_test], floor( 0.6 * n ) )
idx_val <- (1:n)[ -c(idx_test, idx_train) ]
xx <- airfoil[, -6]
yy <- airfoil$y
xtrain <- xx[ idx_train, ]
ytrain <- yy[ idx_train ]
xval <- xx[ idx_val, ]
yval <- yy[ idx_val ]
xtest <- xx[ idx_test, ]
ytest <- yy[ idx_test ]
model_RRBoost_LADTree = Boost(x_train = xtrain, y_train = ytrain,
x_val = xval, y_val = yval, x_test = xtest, y_test = ytest,
type = "RRBoost", error = "rmse", y_init = "LADTree",
max_depth = 1, niter = 3, ## to keep the running time low
control = Boost.control(max_depth_init = 2,
min_leaf_size_init = 20, make_prediction = TRUE,
cal_imp = FALSE))
Tuning and control parameters for the robust boosting algorithm
Description
Tuning and control parameters for the RRBoost robust boosting algorithm, including the initial fit.
Usage
Boost.control(
n_init = 100,
eff_m = 0.95,
bb = 0.5,
trim_prop = NULL,
trim_c = 3,
max_depth_init = 3,
min_leaf_size_init = 10,
cal_imp = TRUE,
save_f = FALSE,
make_prediction = TRUE,
save_tree = FALSE,
precision = 4,
shrinkage = 1,
trace = FALSE
)
Arguments
n_init |
number of iterations for the SBoost step of RRBoost ( |
eff_m |
scalar between 0 and 1 indicating the efficiency (measured in a linear model with Gaussian errors) of Tukey's loss function used in the 2nd stage of RRBoost. |
bb |
breakdown point of the M-scale estimator used in the SBoost step (numeric) |
trim_prop |
trimming proportion if 'trmse' is used as the performance metric (numeric). 'trmse' calculates the root-mean-square error of residuals (r) of which |r| < quantile(|r|, 1-trim_prop) (e.g. trim_prop = 0.1 ignores 10% of the data and calculates RMSE of residuals whose absolute values are below 90% quantile of |r|). If both |
trim_c |
the trimming constant if 'trmse' is used as the performance metric (numeric, defaults to 3). 'trmse' calculates the root-mean-square error of the residuals (r) between median(r) + trim_c mad(r) and median(r) - trim_c mad(r). If both |
max_depth_init |
the maximum depth of the initial LADTtree (numeric, defaults to 3) |
min_leaf_size_init |
the minimum number of observations per node of the initial LADTtree (numeric, defaults to 10) |
cal_imp |
logical indicating whether to calculate variable importance (defaults to |
save_f |
logical indicating whether to save the function estimates at all iterations (defaults to |
make_prediction |
logical indicating whether to make predictions using |
save_tree |
logical indicating whether to save trees at all iterations (defaults to |
precision |
number of significant digits to keep when using validation error to calculate early stopping time (numeric, defaults to 4) |
shrinkage |
shrinkage parameter in boosting (numeric, defaults to 1 which corresponds to no shrinkage) |
trace |
logical indicating whether to print the number of completed iterations and for RRBoost the completed combinations of LADTree hyperparameters for monitoring progress (defaults to |
Details
Various tuning and control parameters for the RRBoost robust boosting algorithm implemented in the
function Boost, including options for the initial fit.
Value
A list of all input parameters
Author(s)
Xiaomeng Ju, xiaomeng.ju@stat.ubc.ca
Examples
data(airfoil)
n <- nrow(airfoil)
n0 <- floor( 0.2 * n )
set.seed(123)
idx_test <- sample(n, n0)
idx_train <- sample((1:n)[-idx_test], floor( 0.6 * n ) )
idx_val <- (1:n)[ -c(idx_test, idx_train) ]
xx <- airfoil[, -6]
yy <- airfoil$y
xtrain <- xx[ idx_train, ]
ytrain <- yy[ idx_train ]
xval <- xx[ idx_val, ]
yval <- yy[ idx_val ]
xtest <- xx[ idx_test, ]
ytest <- yy[ idx_test ]
my.control <- Boost.control(max_depth_init = 2,
min_leaf_size_init = 20, make_prediction = TRUE,
cal_imp = FALSE)
model_RRBoost_LADTree = Boost(x_train = xtrain, y_train = ytrain,
x_val = xval, y_val = yval, x_test = xtest, y_test = ytest,
type = "RRBoost", error = "rmse", y_init = "LADTree",
max_depth = 1, niter = 3, ## to keep the running time low
control = my.control)
Robust Boosting for regression with initialization parameters chosen on a validation set
Description
A function to fit RRBoost (see also Boost) where the initialization parameters are chosen
based on the performance on the validation set.
Usage
Boost.validation(
x_train,
y_train,
x_val,
y_val,
x_test,
y_test,
type = "RRBoost",
error = c("rmse", "aad"),
niter = 1000,
max_depth = 1,
y_init = "LADTree",
max_depth_init_set = c(1, 2, 3, 4),
min_leaf_size_init_set = c(10, 20, 30),
control = Boost.control()
)
Arguments
x_train |
predictor matrix for training data (matrix/dataframe) |
y_train |
response vector for training data (vector/dataframe) |
x_val |
predictor matrix for validation data (matrix/dataframe) |
y_val |
response vector for validation data (vector/dataframe) |
x_test |
predictor matrix for test data (matrix/dataframe, optional, required when |
y_test |
response vector for test data (vector/dataframe, optional, required when |
type |
type of the boosting method: "L2Boost", "LADBoost", "MBoost", "Robloss", "SBoost", "RRBoost" (character string) |
error |
a character string (or vector of character strings) indicating the types of error metrics to be evaluated on the test set. Valid options are: "rmse" (root mean squared error), "aad" (average absulute deviation), and "trmse" (trimmed root mean squared error) |
niter |
number of iterations (for RRBoost |
max_depth |
the maximum depth of the tree learners (numeric) |
y_init |
a string indicating the initial estimator to be used. Valid options are: "median" or "LADTree" (character string) |
max_depth_init_set |
a vector of possible values of the maximum depth of the initial LADTree that the algorithm choses from |
min_leaf_size_init_set |
a vector of possible values of the minimum observations per node of the initial LADTree that the algorithm choses from |
control |
a named list of control parameters, as returned by |
Details
This function runs the RRBoost algorithm (see Boost) on different combinations of the
parameters for the initial fit, and chooses the optimal set based on the performance on the validation set.
Value
A list with components
the components of model |
an object returned by Boost that is trained with selected initialization parameters |
param |
a vector of selected initialization parameters (return (0,0) if selected initialization is the median of the training responses) |
Author(s)
Xiaomeng Ju, xiaomeng.ju@stat.ubc.ca
See Also
Examples
## Not run:
data(airfoil)
n <- nrow(airfoil)
n0 <- floor( 0.2 * n )
set.seed(123)
idx_test <- sample(n, n0)
idx_train <- sample((1:n)[-idx_test], floor( 0.6 * n ) )
idx_val <- (1:n)[ -c(idx_test, idx_train) ]
xx <- airfoil[, -6]
yy <- airfoil$y
xtrain <- xx[ idx_train, ]
ytrain <- yy[ idx_train ]
xval <- xx[ idx_val, ]
yval <- yy[ idx_val ]
xtest <- xx[ idx_test, ]
ytest <- yy[ idx_test ]
model_RRBoost_cv_LADTree = Boost.validation(x_train = xtrain,
y_train = ytrain, x_val = xval, y_val = yval,
x_test = xtest, y_test = ytest, type = "RRBoost", error = "rmse",
y_init = "LADTree", max_depth = 1, niter = 1000,
max_depth_init_set = 1:5,
min_leaf_size_init_set = c(10,20,30),
control = Boost.control(make_prediction = TRUE,
cal_imp = TRUE))
## End(Not run)
Airfoil data
Description
Here goes a description of the data.
Usage
data(airfoil)
Format
An object of class "data.frame".
Details
Here goes a more detailed description of the data.
There are 1503 observations and 6 variables:
y, frequency, angle, chord_length,
velocity, and thickness.
Source
The UCI Archive https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise,
References
Brooks, T. F., Pope, D. S., and Marcolini, M. A. (1989). Airfoil self-noise and prediction. NASA Reference Publication-1218, document id: 9890016302.
Examples
data(airfoil)
Variable importance scores for the robust boosting algorithm RRBoost
Description
This function calculates variable importance scores for a previously
computed RRBoost fit.
Usage
cal_imp_func(model, x_val, y_val, trace = FALSE)
Arguments
model |
an object returned by |
x_val |
predictor matrix for validation data (matrix/dataframe) |
y_val |
response vector for validation data (vector/dataframe) |
trace |
logical indicating whether to print the variable under calculation for monitoring progress (defaults to |
Details
This function computes permutation variable importance scores
given an object returned by Boost and a validation data set.
Value
a vector of permutation variable importance scores (one per explanatory variable)
Author(s)
Xiaomeng Ju, xiaomeng.ju@stat.ubc.ca
Examples
## Not run:
data(airfoil)
n <- nrow(airfoil)
n0 <- floor( 0.2 * n )
set.seed(123)
idx_test <- sample(n, n0)
idx_train <- sample((1:n)[-idx_test], floor( 0.6 * n ) )
idx_val <- (1:n)[ -c(idx_test, idx_train) ]
xx <- airfoil[, -6]
yy <- airfoil$y
xtrain <- xx[ idx_train, ]
ytrain <- yy[ idx_train ]
xval <- xx[ idx_val, ]
yval <- yy[ idx_val ]
xtest <- xx[ idx_test, ]
ytest <- yy[ idx_test ]
model = Boost(x_train = xtrain, y_train = ytrain,
x_val = xval, y_val = yval,
type = "RRBoost", error = "rmse",
y_init = "LADTree", max_depth = 1, niter = 1000,
control = Boost.control(max_depth_init = 2,
min_leaf_size_init = 10, save_tree = TRUE,
make_prediction = FALSE, cal_imp = FALSE))
var_importance <- cal_imp_func(model, x_val = xval, y_val= yval)
## End(Not run)
cal_predict
Description
A function to make predictions and calculate test error given an object returned by Boost and test data
Usage
cal_predict(model, x_test, y_test)
Arguments
model |
an object returned by Boost |
x_test |
predictor matrix for test data (matrix/dataframe) |
y_test |
response vector for test data (vector/dataframe) |
Details
A function to make predictions and calculate test error given an object returned by Boost and test data
Value
A list with with the following components:
f_t_test |
predicted values with model at the early stopping iteration using x_test as the predictors |
err_test |
a matrix of test errors before and at the early stopping iteration (returned if make_prediction = TRUE in control); the matrix dimension is the early stopping iteration by the number of error types (matches the |
f_test |
a matrix of test function estimates at all iterations (returned if save_f = TRUE in control) |
value |
a vector of test errors evaluated at the early stopping iteration |
Author(s)
Xiaomeng Ju, xiaomeng.ju@stat.ubc.ca
Examples
## Not run:
data(airfoil)
n <- nrow(airfoil)
n0 <- floor( 0.2 * n )
set.seed(123)
idx_test <- sample(n, n0)
idx_train <- sample((1:n)[-idx_test], floor( 0.6 * n ) )
idx_val <- (1:n)[ -c(idx_test, idx_train) ]
xx <- airfoil[, -6]
yy <- airfoil$y
xtrain <- xx[ idx_train, ]
ytrain <- yy[ idx_train ]
xval <- xx[ idx_val, ]
yval <- yy[ idx_val ]
xtest <- xx[ idx_test, ]
ytest <- yy[ idx_test ]
model = Boost(x_train = xtrain, y_train = ytrain,
x_val = xval, y_val = yval,
type = "RRBoost", error = "rmse",
y_init = "LADTree", max_depth = 1, niter = 1000,
control = Boost.control(max_depth_init = 2,
min_leaf_size_init = 10, save_tree = TRUE,
make_prediction = FALSE, cal_imp = FALSE))
prediction <- cal_predict(model, x_test = xtest, y_test = ytest)
## End(Not run)