Type: Package
Title: Self-Validated Ensemble Models with Elastic Net Regression
Version: 1.4.0
Date: 2025-08-16
Maintainer: Andrew T. Karl <akarl@asu.edu>
Description: Implements Self-Validated Ensemble Models (SVEM, Lemkus et al. (2021) <doi:10.1016/j.chemolab.2021.104439>) using Elastic Net regression via 'glmnet' (Friedman et al. <doi:10.18637/jss.v033.i01>). SVEM averages predictions from multiple models fitted to fractionally weighted bootstraps of the data, tuned with anti-correlated validation weights. Also implements the randomized permutation whole model test for SVEM (Karl (2024) <doi:10.1016/j.chemolab.2024.105122>). \\Code for the whole model test was taken from the supplementary material of Karl (2024). Development of this package was assisted by 'GPT o1-preview' for code structure and documentation.
Depends: R (≥ 3.5.0)
Imports: glmnet, stats, gamlss, gamlss.dist, ggplot2, lhs, doParallel, parallel, foreach
VignetteBuilder: knitr
Suggests: knitr, rmarkdown
License: GPL-2 | GPL-3
Encoding: UTF-8
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-08-18 17:53:56 UTC; andre
Author: Andrew T. Karl ORCID iD [cre, aut]
Repository: CRAN
Date/Publication: 2025-08-18 18:10:20 UTC

SVEMnet: Self-Validated Ensemble Models with Elastic Net Regression

Description

The SVEMnet package implements Self-Validated Ensemble Models (SVEM) using Elastic Net (including lasso and ridge) regression via glmnet. SVEM averages predictions from multiple models fitted to fractionally weighted bootstraps of the data, tuned with anti-correlated validation weights.

Functions

SVEMnet

Fit an SVEMnet model using Elastic Net regression.

svem_significance_test

Perform a whole-model significance test for SVEM models.

svem_significance_test_parallel

Perform a whole-model significance test for SVEM models. Parallelized version.

predict.svem_model

Predict method for SVEM models.

plot.svem_model

Plot method for SVEM models.

coef.svem_model

Plot method for SVEM models.

glmnet_with_cv

Wrapper for cv.glmnet

Acknowledgments

Development of this package was assisted by GPT o1-preview, which helped in constructing the structure of some of the code and the roxygen documentation. The code for the significance test is taken from the supplementary material of Karl (2024) (it was handwritten by that author).

Author(s)

Maintainer: Andrew T. Karl akarl@asu.edu (ORCID)

References

Gotwalt, C., & Ramsey, P. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference. https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849873/redirect_from_archived_page/true

Karl, A. T. (2024). A randomized permutation whole-model test heuristic for Self-Validated Ensemble Models (SVEM). Chemometrics and Intelligent Laboratory Systems, 249, 105122. doi:10.1016/j.chemolab.2024.105122

Karl, A., Wisnowski, J., & Rushing, H. (2022). JMP Pro 17 Remedies for Practical Struggles with Mixture Experiments. JMP Discovery Conference. doi:10.13140/RG.2.2.34598.40003/1

Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L. (2021). Self-Validated Ensemble Models for Design of Experiments. Chemometrics and Intelligent Laboratory Systems, 219, 104439. doi:10.1016/j.chemolab.2021.104439

Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap. The American Statistician, 74(4), 345–358. doi:10.1080/00031305.2020.1731599

Ramsey, P., Gaudard, M., & Levin, W. (2021). Accelerating Innovation with Space Filling Mixture Designs, Neural Networks and SVEM. JMP Discovery Conference. https://community.jmp.com/t5/Abstracts/Accelerating-Innovation-with-Space-Filling-Mixture-Designs/ev-p/756841

Ramsey, P., & Gotwalt, C. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849647/redirect_from_archived_page/true

Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C. (2021). SVEM: A Paradigm Shift in Design and Analysis of Experiments. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Abstracts/SVEM-A-Paradigm-Shift-in-Design-and-Analysis-of-Experiments-2021/ev-p/756634

Ramsey, P., & McNeill, P. (2023). CMC, SVEM, Neural Networks, DOE, and Complexity: It’s All About Prediction. JMP Discovery Conference.


Fit an SVEMnet Model

Description

Wrapper for 'glmnet' (Friedman et al. 2010) to fit an ensemble of Elastic Net models using the Self-Validated Ensemble Model method (SVEM, Lemkus et al. 2021). Allows searching over multiple alpha values in the Elastic Net penalty.

Usage

SVEMnet(
  formula,
  data,
  nBoot = 200,
  glmnet_alpha = c(0, 0.5, 1),
  weight_scheme = c("SVEM", "FWR", "Identity"),
  objective = c("wAICc", "wAIC", "wSSE"),
  standardize = TRUE,
  ...
)

Arguments

formula

A formula specifying the model to be fitted.

data

A data frame containing the variables in the model.

nBoot

Number of bootstrap iterations (default is 200).

glmnet_alpha

Elastic Net mixing parameter(s) (default is c(0, 0.5, 1)). Can be a vector of alpha values, where alpha = 1 is Lasso and alpha = 0 is Ridge.

weight_scheme

Weighting scheme for SVEM (default "SVEM"). One of "SVEM", "FWR", or "Identity".

objective

Objective used to pick lambda on each bootstrap path (default "wAIC"). One of "wSSE", "wAIC", or "wAICc".

Internally, SVEM creates complementary fractional weights for the same rows (training vs. validation) and normalizes the validation weights so that sum(w_valid) = n. Let SSE_w = sum(w_valid * resid^2) and k be the number of estimated parameters including the intercept (i.e., glmnet df + 1). The objectives are:

  • "wSSE": Weighted sum of squared errors. Uses SSE_w directly. This matches the original SVEM paper's selection rule. It tends to favor more complex fits near the n ~ p boundary because it has no complexity penalty; use mainly as a baseline.

  • "wAIC": Weighted AIC for Gaussian errors. Uses AIC = n * log(SSE_w / n) + 2 * k. Here n = sum(w_valid) (after normalization it equals the training sample size). Candidates with k >= n are excluded to avoid undefined likelihood / zero residual df.

  • "wAICc": Small-sample corrected weighted AIC. First compute AIC as above, then add the correction 2 * k * (k + 1) / (n_eff - k - 1) using the effective validation size n_eff = (sum(w_valid)^2) / sum(w_valid^2). We clip n_eff to [5, n] for stability. Candidates with k >= n or k >= n_eff - 1 are excluded. In practice this strongly discourages over-parameterized models when n is close to k and removes the familiar "spike" in selection around n - p approx 0.

Notes on k: We use k = df + 1 where df is glmnet's (non-intercept) degrees of freedom at each lambda. For numerical robustness near the boundary, penalties are evaluated with a conservative k_eff = pmin(pmax(1, k), n - 2), but admissibility still requires k < n (and k < n_eff - 1 for AICc).

When to use which? "wAICc" is generally safer for small or borderline designs (n not much larger than the model size). "wAIC" can be slightly better when n >> k and signal is strong. "wSSE" is provided for completeness and comparison with the original SVEM.

standardize

Logical; passed to glmnet (default TRUE).

...

Additional args to glmnet().

Details

The Self-Validated Ensemble Model (SVEM, Lemkus et al., 2021) framework provides a bootstrap approach to improve predictions from various base learning models, including Elastic Net regression as implemented in 'glmnet'. SVEM is particularly suited for situations where a complex response surface is modeled with relatively few experimental runs.

In each of the 'nBoot' iterations, SVEMnet applies random exponentially distributed weights to the observations. Anti-correlated weights are used for validation.

SVEMnet allows for the Elastic Net mixing parameter ('glmnet_alpha') to be a vector, enabling the function to search over multiple 'alpha' values within each bootstrap iteration. Within each iteration, the model is fit for each specified 'alpha', and the best 'alpha' is selected based on the specified 'objective'.

objective options:

"wSSE"

Weighted Sum of Squared Errors. Selects the lambda that minimizes the weighted validation error without penalizing model complexity. While this may lead to models that overfit when the number of parameters is large relative to the number of observations, SVEM mitigates overfitting (high prediction variance) by averaging over multiple bootstrap models. This is the objective function used by Lemkus et al. (2021) with weight_scheme="SVEM"

"wAIC"

Weighted Akaike Information Criterion. Balances model fit with complexity by penalizing the number of parameters. It is calculated as AIC = n * log(wSSE / n) + 2 * k, where wSSE is the weighted sum of squared errors, n is the number of observations, and k is the number of parameters with nonzero coefficients. Typically used with weight_scheme="FWR" or weight_scheme="Identity"

"wAICc"

Small-sample corrected weighted AIC. First compute AIC as above, then add the correction 2 * k * (k + 1) / (n_eff - k - 1), where n_eff = (sum(w_valid)^2) / sum(w_valid^2) is the effective validation size. For stability, n_eff is clipped to [5, n]. Candidates with k >= n or k >= n_eff - 1 are excluded. In practice this strongly discourages over-parameterized models when n is close to k and removes the familiar spike in selection around n - p = 0.

weight_scheme options:

"SVEM"

Uses anti-correlated fractional weights for training and validation sets, improving model generalization by effectively simulating multiple training-validation splits (Lemkus et al. (2021)). Published results (Lemkus et al. (2021), Karl (2024)) utilize objective="wSSE". However, unpublished simulation results suggest improved performance from using objective="wAIC" with weight_scheme="SVEM". See the SVEMnet Vignette for details.

"FWR"

Fractional Weight Regression as described by Xu et al. (2020). Weights are the same for both training and validation sets. This method does not provide the self-validation benefits of SVEM but is included for comparison. Used with objective="wAIC".

"Identity"

Uses weights of 1 for both training and validation. This uses the full dataset for both training and validation, effectively disabling the self-validation mechanism. Use with objective="wAIC" and nBoot=1 to fit the Elastic Net on the AIC of the training data.

A debiased fit is output (along with the standard fit). This is provided to allow the user to match the output of JMP, which returns a debiased fit whenever nBoot>=10. \ https://www.jmp.com/support/help/en/18.1/?utm_source=help&utm_medium=redirect#page/jmp/overview-of-selfvalidated-ensemble-models.shtml. The debiasing coefficients are always calculated by SVEMnet(), and the predict() function determines whether the raw or debiased predictions are returned via its debias argument. The default is debias=FALSE, based on performance on unpublished simulation results.

The output includes: **Model Output:** The returned object is a list of class svem_model, containing the following components:

Value

An object of class svem_model.

Acknowledgments

Development of this package was assisted by GPT o1-preview, which helped in constructing the structure of some of the code and the roxygen documentation. The code for the significance test is taken from the supplementary material of Karl (2024) (it was handwritten by that author).

References

Gotwalt, C., & Ramsey, P. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference. https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849873/redirect_from_archived_page/true

Karl, A. T. (2024). A randomized permutation whole-model test heuristic for Self-Validated Ensemble Models (SVEM). Chemometrics and Intelligent Laboratory Systems, 249, 105122. doi:10.1016/j.chemolab.2024.105122

Karl, A., Wisnowski, J., & Rushing, H. (2022). JMP Pro 17 Remedies for Practical Struggles with Mixture Experiments. JMP Discovery Conference. doi:10.13140/RG.2.2.34598.40003/1

Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L. (2021). Self-Validated Ensemble Models for Design of Experiments. Chemometrics and Intelligent Laboratory Systems, 219, 104439. doi:10.1016/j.chemolab.2021.104439

Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap. The American Statistician, 74(4), 345–358. doi:10.1080/00031305.2020.1731599

Ramsey, P., Gaudard, M., & Levin, W. (2021). Accelerating Innovation with Space Filling Mixture Designs, Neural Networks and SVEM. JMP Discovery Conference. https://community.jmp.com/t5/Abstracts/Accelerating-Innovation-with-Space-Filling-Mixture-Designs/ev-p/756841

Ramsey, P., & Gotwalt, C. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849647/redirect_from_archived_page/true

Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C. (2021). SVEM: A Paradigm Shift in Design and Analysis of Experiments. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Abstracts/SVEM-A-Paradigm-Shift-in-Design-and-Analysis-of-Experiments-2021/ev-p/756634

Ramsey, P., & McNeill, P. (2023). CMC, SVEM, Neural Networks, DOE, and Complexity: It’s All About Prediction. JMP Discovery Conference.

Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22.

Examples

# Simulate data
set.seed(0)
n <- 21
X1 <- runif(n)
X2 <- runif(n)
X3 <- runif(n)
y <- 1 + 2*X1 + 3*X2 + X1*X2 + X1^2  + rnorm(n)
data <- data.frame(y, X1, X2, X3)

# Fit the SVEMnet model with a formula
model <- SVEMnet(
  y ~ (X1 + X2 + X3)^2 + I(X1^2) + I(X2^2) + I(X3^2),
  glmnet_alpha = c(1),
  data = data
)
coef(model)
plot(model)
predict(model,data)


Plot Coefficient Nonzero Percentages from a SVEMnet Model

Description

This function calculates the percentage of bootstrap iterations in which each coefficient is nonzero.

Usage

## S3 method for class 'svem_model'
coef(object, ...)

Arguments

object

An object of class svem_model returned by the SVEMnet function.

...

other arguments to pass.

Value

Invisibly returns a data frame containing the percentage of bootstraps where each coefficient is nonzero.

Acknowledgments

Development of this package was assisted by GPT o1-preview, which helped in constructing the structure of some of the code and the roxygen documentation. The code for the significance test is taken from the supplementary material of Karl (2024) (it was handwritten by that author).


Fit a glmnet Model with Cross-Validation

Description

A wrapper function for cv.glmnet that takes input arguments in a manner similar to SVEMnet. This function searches over multiple alpha values by running cv.glmnet() for each provided alpha, and then selects the combination of alpha and lambda with the best cross-validation performance.

Usage

glmnet_with_cv(
  formula,
  data,
  glmnet_alpha = c(0, 0.5, 1),
  standardize = TRUE,
  nfolds = 10,
  ...
)

Arguments

formula

A formula specifying the model to be fitted.

data

A data frame containing the variables in the model.

glmnet_alpha

Elastic Net mixing parameter(s) (default is c(0, 0.5, 1)). If multiple values are provided, cv.glmnet is run for each alpha, and the model with the lowest cross-validation error is selected.

standardize

Logical flag passed to glmnet. If TRUE (default), each variable is standardized before model fitting.

nfolds

Number of cross-validation folds (default is 10).

...

Additional arguments passed to cv.glmnet.

Details

This function uses cv.glmnet to fit a generalized linear model with elastic net regularization, performing k-fold cross-validation to select the regularization parameter lambda. If multiple alpha values are provided, it selects the best-performing alpha-lambda pair based on the minimal cross-validation error.

After fitting, the function calculates a debiasing linear model (if possible). This is done by regressing the actual responses on the fitted values obtained from the selected model. The resulting linear model is stored in debias_fit.

Value

A list containing:

References

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. doi:10.18637/jss.v033.i01

See Also

glmnet, cv.glmnet, SVEMnet

Examples

set.seed(0)
n <- 50
X1 <- runif(n)
X2 <- runif(n)
y <- 1 + 2*X1 + 3*X2 + rnorm(n)
data <- data.frame(y, X1, X2)

model_cv <- glmnet_with_cv(y ~ X1 + X2, data = data, glmnet_alpha = c(0,0.5,1))
predictions <- predict_cv(model_cv, data)


Plot Method for SVEM Models

Description

Plots actual versus predicted values for an svem_model using ggplot2.

Usage

## S3 method for class 'svem_model'
plot(x, plot_debiased = FALSE, ...)

Arguments

x

An object of class svem_model.

plot_debiased

Logical; if TRUE, includes debiased predictions if available (default is FALSE).

...

Additional arguments passed to ggplot2 functions.

Details

This function creates an actual vs. predicted plot for the SVEM model. If plot_debiased is TRUE and debiased predictions are available, it includes them in the plot.

**Plot Features:**

Value

A ggplot object showing actual versus predicted values.

Acknowledgments

Development of this package was assisted by GPT o1-preview, which helped in constructing the structure of some of the code and the roxygen documentation. The code for the significance test is taken from the supplementary material of Karl (2024) (it was handwritten by that author).


Plot SVEM Significance Test Results for Multiple Responses

Description

Plots the Mahalanobis distances for the original and permuted data from multiple SVEM significance test results.

Usage

## S3 method for class 'svem_significance_test'
plot(..., labels = NULL)

Arguments

...

One or more objects of class svem_significance_test, which are the outputs from svem_significance_test.

labels

Optional character vector of labels for the responses. If not provided, the function uses the response variable names.

Details

This function creates a combined plot of the Mahalanobis distances (d_Y and d_pi_Y) for the original and permuted data from multiple SVEM significance test results. It groups the data by response and source type, displaying original and permutation distances side by side for each response.

**Usage Notes:**

Value

A ggplot object showing the distributions of Mahalanobis distances for all responses.

Acknowledgments

Development of this package was assisted by GPT o1-preview, which helped in constructing the structure of some of the code and the roxygen documentation. The code for the significance test is taken from the supplementary material of Karl (2024) (it was handwritten by that author).


Predict Method for SVEM Models

Description

Generates predictions from a fitted svem_model.

Usage

## S3 method for class 'svem_model'
predict(object, newdata, debias = FALSE, se.fit = FALSE, ...)

Arguments

object

An object of class svem_model.

newdata

A data frame of new predictor values.

debias

Logical; default is FALSE.

se.fit

Logical; if TRUE, returns standard errors (default is FALSE).

...

Additional arguments.

Details

A debiased fit is available (along with the standard fit). This is provided to allow the user to match the output of JMP.\ https://www.jmp.com/support/help/en/18.1/?utm_source=help&utm_medium=redirect#page/jmp/overview-of-selfvalidated-ensemble-models.shtml. The debiasing coefficients are always calculated by SVEMnet(), and the predict() function determines whether the raw or debiased predictions are returned via the debias argument. Default is FALSE based on performance on unpublished simulation studies.

Value

Predictions or a list containing predictions and standard errors.

Acknowledgments

Development of this package was assisted by GPT o1-preview, which helped in constructing the structure of some of the code and the roxygen documentation. The code for the significance test is taken from the supplementary material of Karl (2024) (it was handwritten by that author).


Predict Method for glmnet_with_cv Objects

Description

Generates predictions from a fitted object returned by glmnet_with_cv().

Usage

predict_cv(object, newdata, debias = FALSE, strict = FALSE, ...)

Arguments

object

A list returned by glmnet_with_cv().

newdata

A data frame of new predictor values.

debias

Logical; if TRUE and a debiasing fit is available, apply it (default FALSE).

strict

Logical; if TRUE, require exact column-name match with training design (default FALSE).

...

Additional arguments (currently unused).

Details

Columns are aligned by name. With strict=TRUE, a mismatch errors.

Value

A numeric vector of predictions.


Print Method for SVEM Significance Test

Description

Prints the p-value from an object of class svem_significance_test.

Usage

## S3 method for class 'svem_significance_test'
print(x, ...)

Arguments

x

An object of class svem_significance_test.

...

Additional arguments (not used).


Generate a Random Prediction Table for a Fitted SVEMnet Model

Description

This utility function generates a random sample of points from the predictor space and computes the corresponding predicted responses from a fitted SVEMnet model. It can be used to explore the fitted response surface in a way analogous to JMP's "Output Random Table" feature. The function recognizes mixture factor groups and draws Dirichlet-distributed compositions within the specified bounds so that mixture variables sum to a user-supplied total. Continuous non-mixture variables are sampled uniformly across their observed ranges using a maximin Latin hypercube design, and categorical variables are sampled from their observed levels. No random noise is added to the predicted responses.

Usage

svem_random_table(
  formula,
  data,
  n = 1000,
  mixture_groups = NULL,
  nBoot = 200,
  glmnet_alpha = c(1),
  weight_scheme = c("SVEM"),
  objective = c("wAIC", "wSSE"),
  debias = FALSE,
  ...
)

Arguments

formula

A formula specifying the fitted model. This should be the same formula used when fitting the SVEMnet model.

data

A data frame containing the variables in the model.

n

Number of random points to generate (default: 1000).

mixture_groups

Optional list describing mixture factor groups. Each element should be a list with components 'vars' (character vector of mixture variable names), 'lower' (numeric vector of lower bounds), 'upper' (numeric vector of upper bounds) and 'total' (scalar sum). See 'svem_significance_test_with_mixture()' for details. Defaults to 'NULL' (no mixtures).

nBoot

Number of bootstrap iterations to use when fitting the SVEMnet model (default: 200).

glmnet_alpha

Elastic net mixing parameter(s) passed to 'SVEMnet' (default: 'c(1)').

weight_scheme

Weighting scheme for SVEM (default: "SVEM").

objective

Objective function for SVEM ("wAIC" or "wSSE"; default: "wAIC").

debias

Logical; if 'TRUE', the debiasing coefficients of the fitted model are applied when predicting (default: 'FALSE').

...

Additional arguments passed to 'SVEMnet()' and then to 'glmnet()'.

Details

This function first fits an SVEMnet model using the supplied parameters. It then generates a random grid of points in the predictor space, honouring mixture constraints if 'mixture_groups' is provided. Predictions are computed from the fitted model on these points. No random noise is added; the predictions come directly from the model. If you wish to explore the uncertainty of predictions, consider adding noise separately or using the standard error output from 'predict.svem_model()'.

Value

A data frame containing the sampled predictor values and the corresponding predicted responses. The response column is named according to the left-hand side of 'formula'.

See Also

'SVEMnet', 'predict.svem_model', 'svem_significance_test_with_mixture'.

Examples


set.seed(42)
n <- 40

# Helper to generate training data mixtures with bounds
sample_trunc_dirichlet <- function(n, lower, upper, total) {
  k <- length(lower)
  min_sum <- sum(lower); max_sum <- sum(upper)
  stopifnot(total >= min_sum, total <= max_sum)
  avail <- total - min_sum
  out <- matrix(NA_real_, n, k)
  i <- 1
  while (i <= n) {
    g <- rgamma(k, 1, 1)
    w <- g / sum(g)
    x <- lower + avail * w
    if (all(x <= upper + 1e-12)) {
      out[i, ] <- x; i <- i + 1
    }
  }
  out
}

# Three mixture factors (A, B, C) with distinct bounds; sum to total = 1
lower <- c(0.10, 0.20, 0.05)
upper <- c(0.60, 0.70, 0.50)
total <- 1.0
ABC   <- sample_trunc_dirichlet(n, lower, upper, total)
A <- ABC[, 1]; B <- ABC[, 2]; C <- ABC[, 3]

# Additional predictors
X <- runif(n)
F <- factor(sample(letters[1:3], n, replace = TRUE))

# Response
y <- 1 + 2*A + 3*B + 1.5*C + 0.5*X +
     ifelse(F == "a", 0, ifelse(F == "b", 1, -1)) +
     rnorm(n, sd = 0.3)

dat <- data.frame(y = y, A = A, B = B, C = C, X = X, F = F)

# Mixture specification for the random table generator
mix_spec <- list(
  list(
    vars  = c("A", "B", "C"),
    lower = c(0.10, 0.20, 0.05),
    upper = c(0.60, 0.70, 0.50),
    total = 1.0
  )
)

# Fit SVEMnet and generate 50 random points
rand_tab <- svem_random_table(
  y ~ A + B + C + X + F,
  data           = dat,
  n              = 50,
  mixture_groups = mix_spec,
  nBoot          = 50,
  glmnet_alpha   = c(1),
  weight_scheme  = "SVEM",
  objective      = "wAIC",
  debias         = FALSE
)

# Check mixture validity in the generated table
stopifnot(all(abs((rand_tab$A + rand_tab$B + rand_tab$C) - 1) < 1e-8))
summary(rand_tab[c("A","B","C")])
head(rand_tab)


SVEM Significance Test with Mixture Support

Description

Performs a whole-model significance test using the SVEM framework and allows the user to specify mixture factor groups. Mixture factors are sets of continuous variables that are constrained to sum to a constant (the mixture total) and have optional lower and upper bounds. When mixture groups are supplied, the grid of evaluation points is generated by sampling Dirichlet variates over the mixture simplex rather than by independently sampling each continuous predictor. Non-mixture continuous predictors are sampled via a maximin Latin hypercube over their observed ranges, and categorical predictors are sampled from their observed levels. The remainder of the algorithm follows 'svem_significance_test()', computing standardized predictions on the grid, refitting SVEM on permutations of the response, and calculating a Mahalanobis distance for the original and permutation fits.

Usage

svem_significance_test(
  formula,
  data,
  mixture_groups = NULL,
  nPoint = 2000,
  nSVEM = 5,
  nPerm = 125,
  percent = 85,
  nBoot = 200,
  glmnet_alpha = c(1),
  weight_scheme = c("SVEM"),
  objective = c("wAIC", "wSSE"),
  verbose = TRUE,
  ...
)

Arguments

formula

A formula specifying the model to be tested.

data

A data frame containing the variables in the model.

mixture_groups

Optional list describing one or more mixture factor groups. Each element of the list should be a list with components 'vars' (character vector of column names), 'lower' (numeric vector of lower bounds of the same length as 'vars'), 'upper' (numeric vector of upper bounds of the same length), and 'total' (scalar specifying the sum of the mixture variables). All mixture variables must be included in 'vars', and no variable can appear in more than one mixture group. Defaults to 'NULL' (no mixtures).

nPoint

Number of random points in the factor space (default: 2000).

nSVEM

Number of SVEM fits on the original data (default: 5).

nPerm

Number of SVEM fits on permuted responses for the reference distribution (default: 125).

percent

Percentage of variance to capture in the SVD (default: 85).

nBoot

Number of bootstrap iterations within each SVEM fit (default: 200).

glmnet_alpha

The alpha parameter(s) for glmnet (default: 'c(1)').

weight_scheme

Weighting scheme for SVEM (default: "SVEM").

objective

Objective function for SVEM ("wAIC" or "wSSE", default: "wAIC").

verbose

Logical; if 'TRUE', displays progress messages (default: 'TRUE').

...

Additional arguments passed to 'SVEMnet()' and then to 'glmnet()'.

Details

This function extends 'svem_significance_test()' by allowing the user to specify mixture factor groups. In a mixture group, the specified variables are jointly sampled from a Dirichlet distribution so that their values sum to the specified 'total'. Lower and upper bounds can be supplied to shift and scale the mixture simplex. Feasibility is checked ('sum(lower) <= total <= sum(upper)'), and samples are generated as 'lower + (total - sum(lower)) * w' for Dirichlet weights 'w', with rejection of any draws violating the upper bounds. This guarantees the correct total while respecting all bounds.

If no mixture groups are supplied, this function behaves identically to 'svem_significance_test()'.

Value

A list of class 'svem_significance_test' containing the test results.

See Also

'svem_significance_test()'

Examples


  # Construct a small data set with a three-component mixture (A, B, C)
  # Each has distinct lower/upper bounds and they sum to 1
  set.seed(123)
  n <- 30

  # Helper used only for generating training data in this example
  sample_trunc_dirichlet <- function(n, lower, upper, total) {
    k <- length(lower)
    min_sum <- sum(lower); max_sum <- sum(upper)
    stopifnot(total >= min_sum, total <= max_sum)
    avail <- total - min_sum
    out <- matrix(NA_real_, n, k)
    i <- 1L
    while (i <= n) {
      g <- rgamma(k, 1, 1)
      w <- g / sum(g)
      x <- lower + avail * w
      if (all(x <= upper + 1e-12)) {
        out[i, ] <- x
        i <- i + 1L
      }
    }
    out
  }

  # Three mixture components with distinct bounds; sum to 1
  lower <- c(0.10, 0.20, 0.05)  # for A, B, C
  upper <- c(0.60, 0.70, 0.50)
  total <- 1.0
  ABC   <- sample_trunc_dirichlet(n, lower, upper, total)
  A <- ABC[, 1]; B <- ABC[, 2]; C <- ABC[, 3]

  # Additional predictors
  X <- runif(n)
  F <- factor(sample(c("red", "blue"), n, replace = TRUE))

  # Response
  y <- 2 + 3*A + 1.5*B + 1.2*C + 0.5*X + 1*(F == "red") + rnorm(n, sd = 0.3)
  dat <- data.frame(y = y, A = A, B = B, C = C, X = X, F = F)

  # Specify the mixture group for A, B, C
  mix_spec <- list(
    list(
      vars  = c("A", "B", "C"),
      lower = c(0.10, 0.20, 0.05),
      upper = c(0.60, 0.70, 0.50),
      total = 1.0
    )
  )

  # Run the whole-model significance test on this mixture model
  test_res <- svem_significance_test(
    y ~ A + B + C + X + F,
    data           = dat,
    mixture_groups = mix_spec,
    nPoint         = 200,
    nSVEM          = 3,
    nPerm          = 50,
    nBoot          = 100,
    glmnet_alpha   = c(1),
    weight_scheme  = "SVEM",
    objective      = "wAIC",
    verbose        = FALSE
  )

  print(test_res)
  plot(test_res)


SVEM Significance Test with Mixture Support (Parallel Version)

Description

Whole-model significance test using SVEM with support for mixture factor groups, parallelizing the SVEM fits for originals and permutations.

Usage

svem_significance_test_parallel(
  formula,
  data,
  mixture_groups = NULL,
  nPoint = 2000,
  nSVEM = 5,
  nPerm = 125,
  percent = 85,
  nBoot = 200,
  glmnet_alpha = c(1),
  weight_scheme = c("SVEM"),
  objective = c("wAIC", "wSSE"),
  verbose = TRUE,
  nCore = parallel::detectCores(),
  seed = NULL,
  ...
)

Arguments

formula

A formula specifying the model to be tested.

data

A data frame containing the variables in the model.

mixture_groups

Optional list describing one or more mixture factor groups. Each element of the list should be a list with components 'vars' (character vector of column names), 'lower' (numeric vector of lower bounds of the same length as 'vars'), 'upper' (numeric vector of upper bounds of the same length), and 'total' (scalar specifying the sum of the mixture variables). All mixture variables must be included in 'vars', and no variable can appear in more than one mixture group. Defaults to 'NULL' (no mixtures).

nPoint

Number of random points in the factor space (default: 2000).

nSVEM

Number of SVEM fits on the original data (default: 5).

nPerm

Number of SVEM fits on permuted responses for the reference distribution (default: 125).

percent

Percentage of variance to capture in the SVD (default: 85).

nBoot

Number of bootstrap iterations within each SVEM fit (default: 200).

glmnet_alpha

The alpha parameter(s) for glmnet (default: 'c(1)').

weight_scheme

Weighting scheme for SVEM (default: "SVEM").

objective

Objective function for SVEM ("wAIC" or "wSSE", default: "wAIC").

verbose

Logical; if 'TRUE', displays progress messages (default: 'TRUE').

nCore

Number of CPU cores for parallel processing (default: all available cores).

seed

Optional integer seed for reproducible parallel RNG (default: NULL).

...

Additional arguments passed to 'SVEMnet()' and then to 'glmnet()'.

Details

Identical to svem_significance_test() but runs the expensive SVEM refits in parallel using foreach + doParallel. Random draws (including permutations) use RNGkind("L'Ecuyer-CMRG") for parallel-suitable streams.

Value

A list of class 'svem_significance_test' containing the test results.

See Also

svem_significance_test svem_significance_test_parallel