SVEMnet Vignette

Andrew T. Karl

August 18, 2025

Version

version 1.4.0

Summary

SVEMnet implements Self-Validated Ensemble Models (SVEM, Lemkus et al. 2021) and the SVEM whole model test (Karl 2024) using Elastic Net regression via the glmnet package Friedman et al. (2010). This vignette provides an overview of the package’s functionality and usage.

Preface - Note from the author

The motivation to create the SVEMnet package was primarily to have a personal sandbox to explore SVEM performance in different scenarios and with various modifications to its structure. As noted in the documentation, I used GPT o1-preview to help form the code structure of the package and to code the Roxygen structure of the documentation. I have subsequently used more recent versions for auditing. The SVEM significance test R code comes from the supplementary material of Karl (2024). I wrote that code by hand and validated each step (not including the creation of the SVEM predictions) against corresponding results in JMP (the supplementary material of Karl (2024) provides the matching JSL script). For the SVEMnet() code, assuming only a single value of alpha for glmnet is being tested, the heart of the SVEM code is simply

#partial code for illustration of the SVEM loop
coef_matrix <- matrix(NA, nrow = nBoot, ncol = p + 1)
 for (i in 1:nBoot) {
      U <- runif(n)
      w_train <- -log(U)
      w_valid <- -log(1 - U)
      #match glmnet normalization of training weight vector
      w_train <- w_train * (n / sum(w_train))
      w_valid <- w_valid * (n / sum(w_valid))
      glmnet(
          X, y_numeric,
          alpha = alpha,
          weights = w_train,
          intercept = TRUE,
          standardize = standardize,
          maxit = 1e6,
          nlambda = 500
      )
      predict(fit, newx = X)
      val_errors <- colSums(w_valid * (y_numeric - pred_valid)^2)
      k_values <- fit$df
      n_obs <- length(y_numeric)
      aic_values <- n_obs * log(val_errors / n_obs) + 2 * k_values
         # Choose lambda
      if (objective == "wSSE") {
        idx_min <- which.min(val_errors)
        lambda_opt <- fit$lambda[idx_min]
        val_error <- val_errors[idx_min]
      } else if (objective == "wAIC") {
        idx_min <- which.min(aic_values)
        lambda_opt <- fit$lambda[idx_min]
        val_error <- aic_values[idx_min]
      }
      coef_matrix[i, ] <- as.vector(coef(fit, s = lambda_opt))
}

However, to get this to a stable implementation that includes error and warning handling and structure to pass to S3 methods for predict(), coef(), plot(), etc, it was only practical for me to utilize help from GPT o1-preview. I simply would not have taken the time to add that structure otherwise, and my implementation would have been inferior. I reviewed any of the code that was generated from this tool before integrating it, and corrected its occasional mistakes. If someone would like to create a purely human-written set of code for a similar purpose, let me know and I will be happy to add links to your package and a description to the SVEMnet documentation.

Later revisions make use of later versions of GPT for code auditing, stress testing, and simulaiton. Many of the later entries in this vignette were written with GPT (code, analysis, summary). I will defend this by noting the material is not being submitted for publication, but is being presented to show various validaiton attempts for deviating from the published SVEM SSE objective and as a form of hypothesis generation for any future researchers that might be interested in working with SVEM.

SVEMnet Example 1

library(SVEMnet)

# Example data
data <- iris
svem_model <- SVEMnet(Sepal.Length ~ ., data = data, nBoot = 300)
coef(svem_model)
##                   Percent of Bootstraps Nonzero
## Sepal.Width                           100.00000
## Petal.Length                          100.00000
## Speciesvirginica                       94.66667
## Petal.Width                            92.00000
## Speciesversicolor                      92.00000

Generate a plot of actual versus predicted values:

plot(svem_model)

Predict outcomes for new data using the predict() function:

predictions <- predict(svem_model, data)
print(predictions)
##        1        2        3        4        5        6        7        8 
## 5.004969 4.746061 4.775963 4.871504 5.056751 5.381124 4.927210 5.026849 
##        9       10       11       12       13       14       15       16 
## 4.694279 4.897482 5.182194 5.100510 4.772039 4.551054 5.116554 5.492709 
##       17       18       19       20       21       22       23       24 
## 5.086478 4.978991 5.355321 5.207998 5.174172 5.130238 4.762104 5.044457 
##       25       26       27       28       29       30       31       32 
## 5.321495 4.893384 5.048555 5.078631 4.953187 4.996947 4.945166 4.974893 
##       33       34       35       36       37       38       39       40 
## 5.415298 5.367440 4.871504 4.702301 4.931307 5.082728 4.672400 5.026849 
##       41       42       43       44       45       46       47       48 
## 4.905330 4.283951 4.775963 5.048381 5.476666 4.720083 5.307637 4.849624 
##       49       50       51       52       53       54       55       56 
## 5.182194 4.901406 6.466747 6.293446 6.536311 5.511060 6.159981 6.138275 
##       57       58       59       60       61       62       63       64 
## 6.466573 5.125143 6.263718 5.618547 5.065340 5.968898 5.537211 6.311402 
##       65       66       67       68       69       70       71       72 
## 5.527103 6.193981 6.189883 5.869781 5.775630 5.592917 6.436497 5.769967 
##       73       74       75       76       77       78       79       80 
## 6.225621 6.311576 6.042734 6.142199 6.333282 6.506235 6.138101 5.376030 
##       81       82       83       84       85       86       87       88 
## 5.467473 5.419790 5.670502 6.450530 6.189883 6.371031 6.388988 5.805706 
##       89       90       91       92       93       94       95       96 
## 5.947192 5.614623 5.987028 6.289522 5.692382 5.073362 5.865509 6.046831 
##       97       98       99      100      101      102      103      104 
## 5.969072 6.042734 4.929963 5.843629 6.966991 6.149214 6.841896 6.647063 
##      105      106      107      108      109      110      111      112 
## 6.742256 7.357526 5.655637 7.162693 6.587259 7.195997 6.382144 6.296537 
##      113      114      115      116      117      118      119      120 
## 6.547250 5.946012 6.071107 6.451534 6.625183 7.819463 7.319429 5.920556 
##      121      122      123      124      125      126      127      128 
## 6.746180 6.027695 7.353603 6.027869 6.849917 7.097054 6.005989 6.183213 
##      129      130      131      132      133      134      135      136 
## 6.517348 6.898123 6.937611 7.650434 6.491370 6.304907 6.595629 6.937263 
##      137      138      139      140      141      142      143      144 
## 6.750104 6.676964 6.109552 6.525370 6.594760 6.252430 6.149214 6.893503 
##      145      146      147      148      149      150 
## 6.746007 6.274310 5.971989 6.352243 6.628759 6.330537

Whole Model Significance Testing

This is the serial version of the significance test. It is slower but the code is less complicated to read than the faster parallel version.

test_result <- svem_significance_test(Sepal.Length ~ ., data = data)
print(test_result)
plot(test_result)
SVEM Significance Test p-value:
[1] 0
Whole model test result

Whole model test result

Note that there is a parallelized version that runs much faster

test_result <- svem_significance_test_parallel(Sepal.Length ~ ., data = data)
print(test_result)
plot(test_result)
SVEM Significance Test p-value:
[1] 0

SVEMnet Example 2

# Simulate data
set.seed(1)
n <- 25
X1 <- runif(n)
X2 <- runif(n)
X3 <- runif(n)
X4 <- runif(n)
X5 <- runif(n)

#y only depends on X1 and X2
y <- 1 + X1 +  X2 + X1 * X2 + X1^2 + rnorm(n)
data <- data.frame(y, X1, X2, X3, X4, X5)

# Perform the SVEM significance test
test_result <- svem_significance_test_parallel(
  y ~ (X1 + X2 + X3)^2 + I(X1^2) + I(X2^2) + I(X3^2),
  data = data

)

# View the p-value
print(test_result)
SVEM Significance Test p-value:
[1] 0.009399093


test_result2 <- svem_significance_test_parallel(
  y ~ (X1 + X2 )^2 + I(X1^2) + I(X2^2),
  data = data
)

# View the p-value
print(test_result2)
SVEM Significance Test p-value:
[1] 0.006475736

#note that the response does not depend on X4 or X5
test_result3 <- svem_significance_test_parallel(
  y ~ (X4 + X5)^2 + I(X4^2) + I(X5^2),
  data = data
)

# View the p-value
print(test_result3)
SVEM Significance Test p-value:
[1] 0.8968502

# Plot the Mahalanobis distances
plot(test_result,test_result2,test_result3)
Whole Model Test Results for Example 2

Whole Model Test Results for Example 2

21DEC2024: Add glmnet.cv wrapper

Newly added wrapper for cv.glmnet() to compare performance of SVEM to glmnet’s native CV implementation.

14 AUG 2025 — Simulation highlights (fixed‑settings study)

We ran focused simulations to examine when debiasing and objective choice help, under standardize=TRUE, alpha_set = c(1), SVEM weights, and disjoint train/test:

## ======================================================================
## SVEMnet boundary sims: wAIC vs wSSE + debias(TRUE/FALSE)
## - standardize = TRUE
## - glmnet_alpha = c(1)
## - No leakage; test is disjoint from train; debias refit check
## - n_train ranges chosen to cross n_train_used vs p_full boundary
## - Outputs:
##   * res ........ per-fit records (you can model this)
##   * A tidy summary with 4 win-rates:
##       winrate_debias_TRUE, winrate_debias_FALSE,
##       winrate_wAIC, winrate_wSSE
##   * Two simple plots (comment out prints if running headless)
## ======================================================================

if (!requireNamespace("SVEMnet", quietly = TRUE)) install.packages("SVEMnet")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
library(SVEMnet)
library(ggplot2)

## ---------- Logging / Progress ----------
VERBOSE     <- TRUE     # per-fit messages
ECHO_EVERY  <- 1        # print every k-th fit
PROGRESS    <- TRUE     # progress bar

stamp   <- function() format(Sys.time(), "%H:%M:%S")
log_msg <- function(...) if (VERBOSE) {
  cat(sprintf("[%s] %s\n", stamp(), sprintf(...))); flush.console()
}
options(warn = 1)       # surface warnings immediately

set.seed(20250814)

## ------------------ Controls ------------------
REPS      <- 10           # increase to 20+ for more power
NBOOT     <- 200
R2_GRID   <- c(0.30, 0.70, 0.90)
RHO_GRID  <- c(0.2, 0.6, 0.9)
P_GRID    <- c(4, 8, 12)
NTEST     <- 1000
MODELS    <- c( "main_plus_int", "full_quadratic")

## ------------------ Helpers -------------------
rmse <- function(obs, pred) sqrt(mean((obs - pred)^2))

safe_predict <- function(fit, newdata, debias=FALSE) {
  out <- try(predict(fit, newdata=newdata, debias=debias), silent=TRUE)
  if (inherits(out, "try-error")) return(rep(NA_real_, nrow(newdata)))
  as.numeric(out)
}

## Count p_full (columns in model.matrix minus intercept) WITHOUT response 'y'
count_p_full <- function(fm, p) {
  rhs_fml <- stats::reformulate(
    attr(stats::delete.response(stats::terms(fm)), "term.labels")
  )
  tmp <- as.data.frame(matrix(rnorm(4 * p), nrow = 4))
  names(tmp) <- paste0("X", seq_len(p))
  mm <- model.matrix(rhs_fml, data = tmp)
  sum(colnames(mm) != "(Intercept)")
}

verify_no_leakage <- function(fit, train_df_used, test_df) {
  ok_disjoint <- !any(rownames(train_df_used) %in% rownames(test_df))
  raw_pred_train <- safe_predict(fit, train_df_used, debias=FALSE)
  y_train <- train_df_used$y
  ref <- lm(y_train ~ raw_pred_train)
  co_ref <- coef(ref)
  if (is.null(fit$debias_fit)) {
    return(list(ok=FALSE, ok_disjoint=ok_disjoint, coef_close=FALSE))
  }
  co_pkg <- coef(fit$debias_fit)
  same_len <- length(co_pkg) == length(co_ref)
  coef_close <- same_len && all(abs(co_pkg - co_ref) < 1e-8)
  list(ok = (ok_disjoint && coef_close), ok_disjoint=ok_disjoint, coef_close=coef_close)
}

build_formulas <- function(p) {
  vars <- paste0("X", seq_len(p))
  main_terms <- paste(vars, collapse=" + ")
  int_terms  <- paste0("(", main_terms, ")^2")
  sq_terms   <- paste(sprintf("I(%s^2)", vars), collapse=" + ")
  list(
    main_effects   = as.formula(paste0("y ~ ", main_terms)),
    main_plus_int  = as.formula(paste0("y ~ ", int_terms)),
    full_quadratic = as.formula(paste0("y ~ ", int_terms, " + ", sq_terms))
  )
}

## Sparse truth
truth_for_p <- function(p) {
  list(
    active_main  = seq_len(min(3, p)),
    active_pairs = if (p >= 3) rbind(c(1,2), c(2,3)) else matrix(numeric(), ncol=2),
    active_sq    = if (p >= 2) 2L else integer(0)
  )
}

## Equal scales; AR-ish correlation
gen_X <- function(n, p, rho=0.5) {
  Z <- matrix(rnorm(n*p), n, p)
  if (p >= 2 && abs(rho) > 0) {
    for (j in 2:p) {
      Z[, j] <- rho * scale(Z[, j-1], TRUE, TRUE) + sqrt(1-rho^2) * scale(Z[, j], TRUE, TRUE)
    }
  }
  X <- Z
  colnames(X) <- paste0("X", seq_len(p))
  as.data.frame(X)
}

## Data with targeted in-sample R^2
make_sparse_data_R2 <- function(n, p, rho, target_R2,
                                active_main, active_pairs, active_sq,
                                coef_main=2, coef_int=1.5, coef_sq=1.0, seed=NULL) {
  if (!is.null(seed)) set.seed(seed)
  X <- gen_X(n, p, rho)
  mu <- 1
  y_sig <- rep(mu, n)
  for (j in active_main) y_sig <- y_sig + coef_main * X[[paste0("X", j)]]
  if (length(active_sq)) for (j in active_sq) y_sig <- y_sig + coef_sq * (X[[paste0("X", j)]]^2)
  if (nrow(active_pairs) > 0) {
    for (k in seq_len(nrow(active_pairs))) {
      i <- active_pairs[k,1]; j <- active_pairs[k,2]
      y_sig <- y_sig + coef_int * (X[[paste0("X", i)]] * X[[paste0("X", j)]])
    }
  }
  vs <- var(y_sig); sd_sig <- sqrt(max(vs, .Machine$double.eps))
  sd_eps <- sd_sig * sqrt((1 - target_R2) / target_R2)
  y <- y_sig + rnorm(n, 0, sd_eps)
  df <- cbind.data.frame(y=y, X)
  df$y_signal <- y_sig
  rownames(df) <- sprintf("row%06d", seq_len(n))
  df
}

## ------------------ RUN ------------------
rows <- list(); rid <- 1
leaks <- list()
TOTAL_FITS <- sum(sapply(P_GRID, function(p) length(MODELS)*length(RHO_GRID)*length(R2_GRID)*REPS))
fit_counter <- 0L
if (PROGRESS) pb <- txtProgressBar(min = 0, max = TOTAL_FITS, style = 3)

for (p in P_GRID) {
  fms <- build_formulas(p)
  thr <- truth_for_p(p)

  for (model in MODELS) {
    fm <- fms[[model]]
    p_full <- count_p_full(fm, p)

    ## choose n_train to cross the boundary; clamp [10, 80]
    n_grid_raw <- round(c(0.5, 0.75, 1.0, 1.25, 1.75, 2.5) * p_full)
    n_grid <- sort(unique(pmin(pmax(n_grid_raw, 10), 80)))

    for (rho in RHO_GRID) {
      for (R2_tgt in R2_GRID) {
        for (rep in seq_len(REPS)) {
          n_tr <- sample(n_grid, 1L)
          n_te <- NTEST

          df <- make_sparse_data_R2(
            n = n_tr + n_te, p = p, rho = rho, target_R2 = R2_tgt,
            active_main  = thr$active_main,
            active_pairs = thr$active_pairs,
            active_sq    = thr$active_sq,
            seed = 1000 + 7*rep + p + round(100*R2_tgt) + round(100*rho)
          )

          idx <- sample(seq_len(nrow(df)), size = n_tr)
          train_df <- df[idx, ]
          test_df  <- df[-idx, ]

          ## PRE-FILTER complete cases for THIS formula (avoid NA crash)
          mf_all <- model.frame(fm, data = train_df, na.action = na.pass)
          keep <- complete.cases(mf_all)
          if (sum(keep) < 2) {
            warning(sprintf("Skipping: n_train=%d but <2 complete cases for %s (p=%d)", n_tr, model, p))
            next
          }
          train_used <- train_df[keep, , drop = FALSE]
          n_used <- nrow(train_used)

          ## realized true R^2
          r2_true <- function(d) var(d$y_signal)/var(d$y)
          R2_train_true <- r2_true(train_used)
          R2_test_true  <- r2_true(test_df)

          fit_counter <- fit_counter + 1L
          if (PROGRESS) setTxtProgressBar(pb, fit_counter)
          if (fit_counter %% max(1L, ECHO_EVERY) == 0L) {
            log_msg("Start #%d | p=%d (p_full=%d) | model=%s | rho=%.2f | R2=%.2f | n_train=%d (used=%d) | ratio=%.2f",
                    fit_counter, p, p_full, model, rho, R2_tgt, n_tr, n_used, n_used/p_full)
          }

          ## ---- Fit wAIC ----
          t0 <- proc.time()[3]
          fit_waic <- SVEMnet(
            formula = fm, data = train_used,
            nBoot = NBOOT,
            glmnet_alpha = c(1),
            weight_scheme = "SVEM",
            objective = "wAIC",
            standardize = TRUE
          )
          t1 <- proc.time()[3]

          ## ---- Fit wSSE ----
          t2 <- proc.time()[3]
          fit_wsse <- SVEMnet(
            formula = fm, data = train_used,
            nBoot = NBOOT,
            glmnet_alpha = c(1),
            weight_scheme = "SVEM",
            objective = "wSSE",
            standardize = TRUE
          )
          t3 <- proc.time()[3]

          ## Predictions
          pr_tr_raw_waic <- safe_predict(fit_waic, train_used, debias = FALSE)
          pr_te_raw_waic <- safe_predict(fit_waic, test_df,   debias = FALSE)
          pr_tr_deb_waic <- safe_predict(fit_waic, train_used, debias = TRUE)
          pr_te_deb_waic <- safe_predict(fit_waic, test_df,    debias = TRUE)

          pr_tr_raw_wsse <- safe_predict(fit_wsse, train_used, debias = FALSE)
          pr_te_raw_wsse <- safe_predict(fit_wsse, test_df,   debias = FALSE)
          pr_tr_deb_wsse <- safe_predict(fit_wsse, train_used, debias = TRUE)
          pr_te_deb_wsse <- safe_predict(fit_wsse, test_df,    debias = TRUE)

          ## RMSEs
          rmse_tr_raw_waic <- rmse(train_used$y, pr_tr_raw_waic)
          rmse_te_raw_waic <- rmse(test_df$y,    pr_te_raw_waic)
          rmse_tr_deb_waic <- rmse(train_used$y, pr_tr_deb_waic)
          rmse_te_deb_waic <- rmse(test_df$y,    pr_te_deb_waic)

          rmse_tr_raw_wsse <- rmse(train_used$y, pr_tr_raw_wsse)
          rmse_te_raw_wsse <- rmse(test_df$y,    pr_te_raw_wsse)
          rmse_tr_deb_wsse <- rmse(train_used$y, pr_tr_deb_wsse)
          rmse_te_deb_wsse <- rmse(test_df$y,    pr_te_deb_wsse)

          log_msg("Done  #%d | AIC raw=%.3f deb=%.3f | SSE raw=%.3f deb=%.3f | Δ_deb(raw): AIC=%.3f SSE=%.3f | %.1fs+%.1fs",
                  fit_counter, rmse_te_raw_waic, rmse_te_deb_waic,
                  rmse_te_raw_wsse, rmse_te_deb_wsse,
                  rmse_te_deb_waic - rmse_te_raw_waic,
                  rmse_te_deb_wsse - rmse_te_raw_wsse,
                  (t1 - t0), (t3 - t2))

          ## Save row
          rows[[rid]] <- data.frame(
            p = p, p_full = p_full, model = model, rho = rho, R2_target = R2_tgt,
            n_train = n_tr, n_train_used = n_used, n_test = n_te,
            ratio_n_over_p = n_used / p_full,
            above_boundary = as.integer(n_used > p_full),
            R2_true_train = R2_train_true, R2_true_test = R2_test_true,
            rmse_test_raw_waic = rmse_te_raw_waic,
            rmse_test_deb_waic = rmse_te_deb_waic,
            rmse_test_raw_wsse = rmse_te_raw_wsse,
            rmse_test_deb_wsse = rmse_te_deb_wsse,
            rmse_train_raw_waic = rmse_tr_raw_waic,
            rmse_train_deb_waic = rmse_tr_deb_waic,
            rmse_train_raw_wsse = rmse_tr_raw_wsse,
            rmse_train_deb_wsse = rmse_tr_deb_wsse,
            time_sec_waic = round(t1 - t0, 2),
            time_sec_wsse = round(t3 - t2, 2),
            stringsAsFactors = FALSE
          )
          rid <- rid + 1

          ## light leakage check
          if (rep <= 2) {
            chk <- verify_no_leakage(fit_waic, train_used, test_df)
            leaks[[length(leaks) + 1]] <- data.frame(
              p = p, model = model, rho = rho, R2_target = R2_tgt, n_train = n_tr, n_train_used = n_used,
              ok = chk$ok, ok_disjoint = chk$ok_disjoint, coef_close = chk$coef_close
            )
          }
        }
      }
    }
  }
}

if (!length(rows)) stop("No successful fits were recorded. Check logs/warnings above.")
res <- do.call(rbind, rows)
leak_df <- if (length(leaks)) do.call(rbind, leaks) else NULL

## Ratio bins for readability on plots/tables
res$ratio_bin <- cut(
  res$ratio_n_over_p,
  breaks = c(-Inf, 0.75, 1.0, 1.25, 1.75, Inf),
  labels = c("<0.75", "0.75–1.0", "1.0–1.25", "1.25–1.75", ">1.75"),
  right = TRUE
)

## ------------------ SUMMARY (with 4 win-rates) ------------------

## CI helper
ci95 <- function(x) {
  x <- x[is.finite(x)]
  n <- length(x); if (!n) return(c(mean=NA, lo=NA, hi=NA, n=0))
  m <- mean(x); se <- sd(x)/sqrt(n)
  c(mean=m, lo=m-1.96*se, hi=m+1.96*se, n=n)
}

## Grouping key
grp_vars <- c("model","R2_target","ratio_bin","above_boundary")

## Build summary per group using by() to avoid unpack glitches
summ <- do.call(rbind, by(res, INDICES = res[grp_vars], FUN = function(d){
  ## win indicators (row-wise)
  deb_win_waic <- as.numeric(d$rmse_test_deb_waic < d$rmse_test_raw_waic)
  deb_win_wsse <- as.numeric(d$rmse_test_deb_wsse < d$rmse_test_raw_wsse)
  deb_lose_waic <- as.numeric(d$rmse_test_deb_waic > d$rmse_test_raw_waic)
  deb_lose_wsse <- as.numeric(d$rmse_test_deb_wsse > d$rmse_test_raw_wsse)

  waic_win_raw <- as.numeric(d$rmse_test_raw_waic < d$rmse_test_raw_wsse)
  waic_win_deb <- as.numeric(d$rmse_test_deb_waic < d$rmse_test_deb_wsse)
  wsse_win_raw <- as.numeric(d$rmse_test_raw_wsse < d$rmse_test_raw_waic)
  wsse_win_deb <- as.numeric(d$rmse_test_deb_wsse < d$rmse_test_deb_waic)

  ## pooled winrates (redundant on purpose)
  winrate_debias_TRUE  <- mean(c(deb_win_waic,  deb_win_wsse),  na.rm = TRUE)
  winrate_debias_FALSE <- mean(c(deb_lose_waic, deb_lose_wsse), na.rm = TRUE)
  winrate_wAIC         <- mean(c(waic_win_raw,  waic_win_deb),  na.rm = TRUE)
  winrate_wSSE         <- mean(c(wsse_win_raw,  wsse_win_deb),  na.rm = TRUE)

  ## RMSE means + CI
  a_raw <- ci95(d$rmse_test_raw_waic)
  a_deb <- ci95(d$rmse_test_deb_waic)
  s_raw <- ci95(d$rmse_test_raw_wsse)
  s_deb <- ci95(d$rmse_test_deb_wsse)

  ## debias deltas (for context)
  dA <- ci95(d$rmse_test_deb_waic - d$rmse_test_raw_waic)
  dS <- ci95(d$rmse_test_deb_wsse - d$rmse_test_raw_wsse)

  ## realized true R2
  rtr <- ci95(d$R2_true_train); rte <- ci95(d$R2_true_test)

  ## timings
  tt  <- ci95(rowMeans(cbind(d$time_sec_waic, d$time_sec_wsse), na.rm = TRUE))

  data.frame(
    model = d$model[1],
    R2_target = d$R2_target[1],
    ratio_bin = d$ratio_bin[1],
    above_boundary = d$above_boundary[1],

    R2_true_train_mean = rtr["mean"], R2_true_test_mean = rte["mean"],

    rmse_raw_waic_mean = a_raw["mean"],  rmse_raw_waic_ci95_lo = a_raw["lo"],  rmse_raw_waic_ci95_hi = a_raw["hi"],
    rmse_deb_waic_mean = a_deb["mean"],  rmse_deb_waic_ci95_lo = a_deb["lo"],  rmse_deb_waic_ci95_hi = a_deb["hi"],
    rmse_raw_wsse_mean = s_raw["mean"],  rmse_raw_wsse_ci95_lo = s_raw["lo"],  rmse_raw_wsse_ci95_hi = s_raw["hi"],
    rmse_deb_wsse_mean = s_deb["mean"],  rmse_deb_wsse_ci95_lo = s_deb["lo"],  rmse_deb_wsse_ci95_hi = s_deb["hi"],

    ## the four requested winrates
    winrate_debias_TRUE  = winrate_debias_TRUE,
    winrate_debias_FALSE = winrate_debias_FALSE,
    winrate_wAIC         = winrate_wAIC,
    winrate_wSSE         = winrate_wSSE,

    ## debias deltas by objective (context)
    deb_raw_diff_AIC_mean = dA["mean"], deb_raw_diff_AIC_ci95_lo = dA["lo"], deb_raw_diff_AIC_ci95_hi = dA["hi"],
    deb_raw_diff_SSE_mean = dS["mean"], deb_raw_diff_SSE_ci95_lo = dS["lo"], deb_raw_diff_SSE_ci95_hi = dS["hi"],

    mean_time = tt["mean"],
    row.names = NULL,
    check.names = FALSE
  )
}))
rownames(summ) <- NULL
summ <- summ[order(summ$model, summ$R2_target, summ$ratio_bin, summ$above_boundary), ]

## Leakage summary
cat("\n---- Leakage summary (should be ~100%) ----\n")
if (!is.null(leak_df)) {
  print(aggregate(cbind(ok, ok_disjoint, coef_close) ~ model, data = leak_df,
                  FUN = function(x) mean(x, na.rm=TRUE)), row.names = FALSE)
} else {
  cat("No leakage samples collected.\n")
}

## Print compact summary columns
ord <- c("model","R2_target","ratio_bin","above_boundary",
         "R2_true_train_mean","R2_true_test_mean",
         "rmse_raw_waic_mean","rmse_deb_waic_mean",
         "rmse_raw_wsse_mean","rmse_deb_wsse_mean",
         "winrate_debias_TRUE","winrate_debias_FALSE",
         "winrate_wAIC","winrate_wSSE",
         "deb_raw_diff_AIC_mean","deb_raw_diff_AIC_ci95_lo","deb_raw_diff_AIC_ci95_hi",
         "deb_raw_diff_SSE_mean","deb_raw_diff_SSE_ci95_lo","deb_raw_diff_SSE_ci95_hi",
         "mean_time")
cat("\n================ SUMMARY (with 4 win-rates) ================\n")
print(summ[, ord], row.names = FALSE, digits = 4)

## ------------------ Plots (optional) ------------------

## Debias − Raw (negative helps) for each objective
plot_df <- within(res, {
  deb_raw_AIC  <- rmse_test_deb_waic - rmse_test_raw_waic
  deb_raw_SSE  <- rmse_test_deb_wsse - rmse_test_raw_wsse
})
ggA <- ggplot(plot_df, aes(ratio_n_over_p, deb_raw_AIC)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_point(alpha = 0.2) +
  geom_smooth(se = TRUE, method = "loess", span = 0.7) +
  facet_grid(R2_target ~ model) +
  labs(x = expression(paste(n[train[used]], " / ", p[full])),
       y = "Debias − Raw (TEST RMSE) under wAIC",
       title = "When does debiasing help? (wAIC)") +
  theme_bw()

ggS <- ggplot(plot_df, aes(ratio_n_over_p, deb_raw_SSE)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_point(alpha = 0.2) +
  geom_smooth(se = TRUE, method = "loess", span = 0.7) +
  facet_grid(R2_target ~ model) +
  labs(x = expression(paste(n[train[used]], " / ", p[full])),
       y = "Debias − Raw (TEST RMSE) under wSSE",
       title = "When does debiasing help? (wSSE)") +
  theme_bw()

print(ggA)
print(ggS)

## ------------------ Export (for your own modeling) ------------------
keep_cols <- c("p","p_full","model","rho","R2_target",
               "n_train","n_train_used","n_test",
               "ratio_n_over_p","above_boundary",
               "R2_true_train","R2_true_test",
               "rmse_test_raw_waic","rmse_test_deb_waic",
               "rmse_test_raw_wsse","rmse_test_deb_wsse",
               "time_sec_waic","time_sec_wsse")
res_export <- res[, keep_cols]
cat("\nFirst few rows of res_export:\n")
print(utils::head(res_export), row.names = FALSE)

## Write to CSV if you like:
## write.csv(res_export, "res_export.csv", row.names = FALSE)

## ---- Oneway-style plots: block on Row and show ANOM-like needles --------
library(dplyr)
library(tidyr)
library(ggplot2)
library(patchwork)

# If your results frame is called res_export, just do:
# res <- res_export

# 1) Long format with a Row (block) ID and log(RMSE)
df <- res_export %>%
  mutate(Row = row_number()) %>%                           # block label = run id
  pivot_longer(
    cols = c(rmse_test_deb_waic, rmse_test_deb_wsse,
             rmse_test_raw_waic, rmse_test_raw_wsse),
    names_to  = "settings",
    values_to = "rmse"
  ) %>%
  mutate(
    settings = factor(settings,
                      levels = c("rmse_test_deb_waic","rmse_test_deb_wsse",
                                 "rmse_test_raw_waic","rmse_test_raw_wsse")),
    lmrse    = log(pmax(rmse, .Machine$double.eps))
  )

# 2) Block-centering (remove Row mean, then add back overall mean so the scale is familiar)
overall_mean <- mean(df$lmrse, na.rm = TRUE)
df <- df %>%
  group_by(Row) %>%
  mutate(lmrse_block = lmrse - mean(lmrse, na.rm = TRUE) + overall_mean) %>%
  ungroup()

# 3) Fit two-way fixed effects model to get a pooled MSE (settings + block)
fit <- lm(lmrse_block ~ settings + factor(Row), data = df)
MSE <- summary(fit)$sigma^2
df_resid <- df.residual(fit)

# Effective sample size per setting (usually each setting appears once per Row)
n_i <- as.numeric(table(df$settings))
SE_i <- sqrt(MSE / n_i)            # SE for each setting mean (RCBD-style)
names(SE_i) <- names(table(df$settings))

# 4) Means per setting and an ANOM-like CI
alpha <- 0.05
tcrit <- qt(1 - alpha/2, df = df_resid)

means <- df %>%
  group_by(settings) %>%
  summarise(mean_lmrse = mean(lmrse_block, na.rm = TRUE), .groups = "drop") %>%
  mutate(
    SE   = SE_i[as.character(settings)],
    lo   = mean_lmrse - tcrit * SE,
    hi   = mean_lmrse + tcrit * SE
  )

# Grand mean and a constant decision band
grand <- mean(df$lmrse_block, na.rm = TRUE)
SE_bar <- mean(SE_i)                      # one band for the panel
UDL <- grand + tcrit * SE_bar
LDL <- grand - tcrit * SE_bar
band_df <- data.frame(LDL = LDL, UDL = UDL)

# 5) Panel 1: “Block Centered lmrse” (violin + dots)
p1 <- ggplot(df, aes(settings, lmrse_block)) +
  geom_violin(fill = "grey90", color = "grey40", scale = "width") +
  geom_jitter(width = 0.10, height = 0, alpha = 0.25, size = 1) +
  geom_hline(yintercept = grand, linetype = 2, color = "grey35") +
  labs(title = "Oneway Analysis of lmrse by settings",
       subtitle = "Block: Row (run ID) — block-centered values",
       y = "Block-Centered lmrse", x = "settings") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# 6) Panel 2: “Analysis of Means” (lollipops + CI + decision band)
p2 <- ggplot(means, aes(settings, mean_lmrse)) +
  # decision band
  geom_rect(data = band_df,
            aes(xmin = -Inf, xmax = Inf, ymin = LDL, ymax = UDL),
            inherit.aes = FALSE, alpha = 0.12, fill = "skyblue") +
  geom_hline(yintercept = grand, color = "grey25") +
  # “needles”
  geom_segment(aes(x = settings, xend = settings, y = grand, yend = mean_lmrse),
               linewidth = 0.6, color = "black") +
  # point + CI
  geom_point(size = 3, color = "firebrick") +
  geom_errorbar(aes(ymin = lo, ymax = hi), width = 0.15) +
  annotate("text", x = Inf, y = UDL, hjust = 1.05, vjust = -0.4,
           label = sprintf("UDL=%.4f", UDL), size = 3.3) +
  annotate("text", x = Inf, y = grand, hjust = 1.05, vjust = -0.4,
           label = sprintf("Avg = %.4f", grand), size = 3.3) +
  annotate("text", x = Inf, y = LDL, hjust = 1.05, vjust = 1.2,
           label = sprintf("LDL=%.4f", LDL), size = 3.3) +
  labs(title = "Analysis of Means (ANOM-style)",
       subtitle = expression(paste("95% t-band; pooled MSE from ",
                                   "lm(lmrse_block ~ settings + block)")),
       y = "lmrse", x = "settings") +
  coord_cartesian(ylim = range(c(means$lo, means$hi, LDL, UDL))) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# 7)
(p1 / p2) + plot_annotation(
  title = "Blocked Oneway (Row as Block) for lmrse vs. settings",
  theme = theme(plot.title = element_text(face = "bold"))
)
Simulation-blocked Residual LRMSE for different objective functions and debiasing

Simulation-blocked Residual LRMSE for different objective functions and debiasing

15 AUG 2025 — Objective choice and SVEM vs CV (focused simulations)

We ran a larger simulation study spanning mixtures of factor structures (main-effects, interactions, quadratics), signal levels (R² ∈ {0.30, 0.50, 0.70, 0.90}), factor correlations ρ ∈ {−0.9, −0.5, 0, 0.5, 0.9}, sparsity levels (density ∈ {0.05, 0.10, 0.20, 0.30, 0.50, 0.75}), and a range of undersaturated to oversaturated settings via offsets in n_used − p_full. All fits used standardize=TRUE. Both SVEM and cv.glmnet searched α ∈ {0, 0.5, 1}. SVEM used nBoot=200 and no debiasing.

Headline findings

Guidance

Notes: All comparisons use disjoint train/test and log-RMSE summaries; both SVEM and CV searched α ∈ {0, 0.5, 1} with standardize=TRUE. Debiasing was off because it generally degraded test performance in these settings. Results are representative of the tested grid; users should feel free to focus simulations on their local regimes of interest.

How we compute wAIC/wAICc (and what if n < k?)

For each λ on the glmnet path, we score models on the validation side using SVEM’s fractional weights. Let w_i be validation weights normalized so Σw_i approx n. We define weighted MSE as MSE_w = SSE_w / Σw_i and replace n with an effective validation size n_eff = (Σw_i)^2 / Σw_i^2. Degrees of freedom k come from glmnet’s df plus the intercept. We then form:

To avoid pathologies we use a conservative k_eff = max(1, min(k, n−2)) and we mask any λ where the correction is undefined: if k ≥ n or k_eff ≥ n_eff − 1, that λ is marked inadmissible (score = +∞) and cannot be selected. So if someone asks, “how do you compute AICc when n < k?”, the answer is: we don’t—those candidates are explicitly excluded under the mask, and we only compare AICc where the correction is valid.

15 AUG 2025 — Mixture + categorical factor study (True.CDF)

Design & metric. We simulated mixture factors A–D (sum-to-one) and a two-level categorical factor E. Each method proposes an optimal recipe (A–E), and we score it by True.CDF (closer to 1 is better). We analyze within-block comparisons where a “block” is one simulated dataset (File.Name), so all methods face the same data. Methods: SVEM objectives AIC, AICc, SSE; baselines cv.glmnet (reported as “glmnet”) and JMP’s SVEM-LASSO_w_int.

Debiasing note. JMP’s SVEM-LASSO_w_int performs a post-lasso unpenalized refit (“debiasing”) by design and this cannot be turned off. SVEMnet runs with debias = FALSE (our earlier LRMSE grid showed debiasing often hurt test performance near/under the n_used ≲ p_full boundary). Despite that asymmetry, the True.CDF analysis below shows SVEMnet(AICc) tracks—and slightly edges—JMP on average regret and mean delta.

Regret definition (higher-is-better metric like True.CDF). For each block b and method m, score_{m,b} = True.CDF_{m,b} best_b = max_m score_{m,b} regret_{m,b} = best_b − score_{m,b} (≥ 0; 0 if the method is best/tied best) We report mean regret across blocks. (For lower-is-better metrics like RMSE, either negate the metric or take score − min_m score.)

SVEM objective choice (AIC vs AICc vs SSE). Across 2,296 paired blocks:

• Winner counts (SVEM-only): AICc = 1155, AIC = 659, SSE = 482.

• Paired True.CDF (means/medians):

Method Mean True.CDF Median True.CDF
AIC 0.8375 0.9854
AICc 0.8694 0.9901
SSE 0.8170 0.9719

• Paired deltas (positive favors first): – AICc − AIC: mean +0.03195, median approx 0, [10%,90%] = [−0.1789, +0.3566] – AICc − SSE: mean +0.05244, median approx 0, [10%,90%] = [−0.1944, +0.4683] – AIC − SSE: mean +0.02050, median approx 0, [10%,90%] = [−0.00261, +0.02687]

• Mean regret vs best SVEM objective (lower is better):

Always pick… Mean regret
AIC 0.0908
AICc 0.0589
SSE 0.1110

• By n_total × R² (means + AICc win rates): AICc tends to dominate AIC and SSE more clearly as R² rises and/or n_total increases. Examples:

n_total mean(AIC) mean(AICc) mean(SSE) WR[AICc>AIC] WR[AICc>SSE]
16 0.3 0.728 0.801 0.719 0.572 0.591
26 0.3 0.787 0.883 0.717 0.610 0.697
26 0.7 0.939 0.976 0.896 0.655 0.721

SVEM AICc vs cv.glmnet. Overall, cv.glmnet remains a strong, fast baseline; AICc is close and sometimes wins in tighter/low-signal strata. • Win rate: P(AICc > CV) = 0.453. • Paired delta: mean(AICc − CV) = −0.00443, median = 0. • Mean regret: Always-CV 0.0491 vs Always-AICc 0.0536. • Stratified highlights: AICc slightly beats CV at n_total=16, R²∈{0.3,0.5} (means: 0.801 vs 0.781; 0.851 vs 0.843), breaks even at n_total=26, R²approx0.5–0.7, and trails CV in many high-R² bins.

SVEM AICc vs JMP’s SVEM-LASSO_w_int (debiasing ON). AICc closely tracks JMP and is modestly better on average. • Win rate: P(AICc > JMP) = 0.497 (essentially a tie). • Paired delta: mean(AICc − JMP) = +0.0111, median = 0. • Mean regret: Always-JMP 0.0668 vs Always-AICc 0.0556 (AICc smaller).

League table (all blocks):

Metric AIC AICc SSE CV JMP
Mean True.CDF 0.8375 0.8694 0.8170 0.8739 0.8583
Median True.CDF 0.9854 0.9901 0.9719 0.9907 0.9898
N blocks (non-NA) 2296 2296 2296 2296 2296

Takeaways for SVEMnet. • Default objective: Use objective = “wAICc”. It wins most blocks among SVEM objectives, has the best average/median True.CDF, and the lowest regret. SSE is not recommended as a default in this setting. • Validation vs baselines: AICc is very close to cv.glmnet on this metric (slightly behind on average) and slightly ahead of JMP’s debiased SVEM-LASSO_w_int on mean regret and mean delta. • When to prefer CV: If you want the best average across all scenarios and speed, cv.glmnet remains a pragmatic choice. If you want SVEM’s ensembling and more robust tails, use AICc inside SVEM.

Relation to our earlier LRMSE study. The earlier grid (evaluated on test RMSE / LRMSE) also found AICc > AIC, wSSE inside SVEM and cv.glmnet slightly ahead of AICc on average (GM ratio AICc/CV approx 1.019). This True.CDF study—implemented in a different simulation environment and targeting a different metric—confirms the same hierarchy: inside SVEM, AICc is the most reliable objective; cv.glmnet edges AICc on average but by a small margin. The new study additionally verifies that SVEMnet(AICc, no debias) performs on par with (and slightly lowers regret vs) JMP’s debiased SVEM-LASSO_w_int.

## ============================================================
## SVEM (wAIC / wAICc / wSSE, no debias) vs glmnet_with_cv (defaults)
## Focus: small training sizes; compare test RMSE head-to-head
## Parallelized with future.apply (5 workers)
## ============================================================

## ----- Packages -----
if (!requireNamespace("SVEMnet", quietly = TRUE)) install.packages("SVEMnet")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
if (!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
if (!requireNamespace("tidyr", quietly = TRUE)) install.packages("tidyr")
if (!requireNamespace("patchwork", quietly = TRUE)) install.packages("patchwork")
if (!requireNamespace("future.apply", quietly = TRUE)) install.packages("future.apply")
# Optional progress bar:
# if (!requireNamespace("progressr", quietly = TRUE)) install.packages("progressr")

library(SVEMnet)
library(ggplot2)
library(dplyr)
library(tidyr)
library(patchwork)
library(future.apply)
 library(progressr)  # uncomment if you want progress bars

## ----- Controls -----
set.seed(425)
REPS         <- 20
NBOOT        <- 200
R2_GRID      <- c(0.30, 0.50, 0.70, 0.90)
RHO_GRID     <- c(0, -0.5, 0.5, -0.9, 0.9)  # correlation among factors
P_GRID       <- c(3, 5, 7)
MODELS       <- c("main_effects","main_plus_int", "full_quadratic")
NTEST        <- 1000
DENSITY_GRID <- c(.05, .10, .20, .30, .50,.75)  # fraction of non-intercept terms active
N_WORKERS    <- 7

## ----- Helpers -----
rmse <- function(obs, pred) sqrt(mean((obs - pred)^2))

safe_predict_svem <- function(fit, newdata, agg = "mean") {
  out <- try(predict(fit, newdata = newdata, debias = FALSE, agg = agg), silent = TRUE)
  if (inherits(out, "try-error")) return(rep(NA_real_, nrow(newdata)))
  as.numeric(out)
}
safe_predict_cv <- function(fitcv, newdata) {
  out <- try(predict_cv(fitcv, newdata = newdata, debias = FALSE), silent = TRUE)
  if (inherits(out, "try-error")) return(rep(NA_real_, nrow(newdata)))
  as.numeric(out)
}

## Count p_full (columns in model matrix – intercept)
count_p_full <- function(fm, p) {
  rhs <- stats::reformulate(attr(stats::delete.response(stats::terms(fm)), "term.labels"))
  tmp <- as.data.frame(matrix(rnorm(4 * p), nrow = 4))
  names(tmp) <- paste0("X", seq_len(p))
  mm <- model.matrix(rhs, data = tmp)
  sum(colnames(mm) != "(Intercept)")
}

build_formulas <- function(p) {
  vars <- paste0("X", seq_len(p))
  main_terms <- paste(vars, collapse = " + ")
  int_terms  <- paste0("(", main_terms, ")^2")
  sq_terms   <- paste(sprintf("I(%s^2)", vars), collapse = " + ")
  list(
    main_effects   = as.formula(paste0("y ~ ", main_terms)),
    main_plus_int  = as.formula(paste0("y ~ ", int_terms)),
    full_quadratic = as.formula(paste0("y ~ ", int_terms, " + ", sq_terms))
  )
}

gen_X <- function(n, p, rho = 0.5) {
  Z <- matrix(rnorm(n * p), n, p)
  if (p >= 2 && abs(rho) > 0) {
    for (j in 2:p) {
      Z[, j] <- rho * scale(Z[, j - 1], TRUE, TRUE) +
        sqrt(1 - rho^2) * scale(Z[, j], TRUE, TRUE)
    }
  }
  X <- Z
  colnames(X) <- paste0("X", seq_len(p))
  as.data.frame(X)
}

## Density-driven truth generator using the selected formula fm
make_sparse_data_R2 <- function(n, p, rho, target_R2, fm, density = 0.2, seed = NULL) {
  if (!is.null(seed)) set.seed(seed)
  Xdf <- gen_X(n, p, rho); names(Xdf) <- paste0("X", seq_len(p))
  terms_obj <- stats::delete.response(stats::terms(fm, data = transform(Xdf, y = 0)))
  mf  <- model.frame(terms_obj, data = Xdf)
  MM  <- model.matrix(terms_obj, mf)   # n x (M+1) with intercept at column 1

  M <- ncol(MM) - 1L
  if (M < 1L) stop("Formula produces no predictors beyond intercept.")
  n_active   <- max(1L, floor(density * M))
  active_idx <- sample.int(M, n_active, replace = FALSE) + 1L  # skip intercept

  beta <- numeric(M + 1L)
  beta[active_idx] <- rexp(n_active) - rexp(n_active)
  y_signal <- drop(MM %*% beta)

  sd_sig <- sqrt(max(var(y_signal), .Machine$double.eps))
  sd_eps <- sd_sig * sqrt((1 - target_R2) / target_R2)
  y <- y_signal + rnorm(n, 0, sd_eps)

  out <- cbind.data.frame(y = y, Xdf)
  out$y_signal <- y_signal
  rownames(out) <- sprintf("row%06d", seq_len(n))
  out
}

## Convenience wrapper to fit SVEM for a given objective (namespaced; robust)
fit_and_predict_svem <- function(fm, train_used, test_df, objective, nBoot = NBOOT, agg = "mean") {
  t0 <- proc.time()[3]
  mod <- try(SVEMnet::SVEMnet(
    formula = fm, data = train_used,
    nBoot = nBoot,
    glmnet_alpha = c(0,.5,1),
    weight_scheme = "SVEM",
    objective = objective,
    standardize = TRUE
  ), silent = TRUE)
  elapsed <- round(proc.time()[3] - t0, 2)
  if (inherits(mod, "try-error")) {
    preds <- rep(NA_real_, nrow(test_df))
  } else {
    preds <- safe_predict_svem(mod, test_df, agg = agg)
  }
  list(pred = preds, time = elapsed)
}

## ----- Build scenario grid -----
grid <- expand.grid(
  p       = P_GRID,
  model   = MODELS,
  rho     = RHO_GRID,
  R2_tgt  = R2_GRID,
  dens    = DENSITY_GRID,
  rep     = seq_len(REPS),
  stringsAsFactors = FALSE
)

## ----- Parallel plan (5 workers) -----
old_plan <- future::plan()
on.exit(future::plan(old_plan), add = TRUE)
future::plan(future::multisession, workers = N_WORKERS)

## Optional progress bar:
# handlers(global = TRUE)
# with_progress({
#   pbar <- progressor(along = seq_len(nrow(grid)))
#   ... call future_lapply with on.exit(pbar()) ...
# })

## ----- One scenario run -----
run_one <- function(row) {
  p      <- row$p
  model  <- row$model
  rho    <- row$rho
  R2_tgt <- row$R2_tgt
  dens   <- row$dens
  repi   <- row$rep

  fms    <- build_formulas(p)
  fm     <- fms[[model]]
  p_full <- count_p_full(fm, p)

  # n around the boundary; clamp to [12, 40]
  n_grid <- sort(unique(pmin(pmax(p_full + c(-12,-8,-6,-4,-2,0,2,4,6,8,12), 12), 40)))

  # per-row deterministic seed (so sampling is reproducible too)
  seed_i <- 3000 +
    19 * repi + 100 * p +
    1000 * match(model, MODELS) +
    round(1000 * R2_tgt) +
    round(1000 * rho) +
    round(1000 * dens)
  set.seed(seed_i)

  n_tr <- sample(n_grid, 1L)
  n_te <- NTEST

  df <- make_sparse_data_R2(
    n = n_tr + n_te, p = p, rho = rho, target_R2 = R2_tgt,
    fm = fm, density = dens, seed = seed_i + 1L
  )

  set.seed(seed_i + 2L)
  idx      <- sample(seq_len(nrow(df)), size = n_tr)
  train_df <- df[idx, ]
  test_df  <- df[-idx, ]

  keep <- complete.cases(model.frame(fm, data = train_df, na.action = stats::na.pass))
  if (sum(keep) < 2) return(NULL)
  train_used <- train_df[keep, , drop = FALSE]
  n_used     <- nrow(train_used)

  r2_true <- function(d) var(d$y_signal) / var(d$y)
  R2_train_true <- r2_true(train_used)
  R2_test_true  <- r2_true(test_df)

  # SVEM variants (alpha=1; no debias)
  sv_waic  <- fit_and_predict_svem(fm, train_used, test_df, "wAIC",  nBoot = NBOOT)
  sv_waicc <- fit_and_predict_svem(fm, train_used, test_df, "wAICc", nBoot = NBOOT)
  sv_wsse  <- fit_and_predict_svem(fm, train_used, test_df, "wSSE",  nBoot = NBOOT)

  # glmnet_with_cv (alpha=1; no debias)
  t1 <- proc.time()[3]
  fit_cv <- glmnet_with_cv(
    formula = fm, data = train_used,
    glmnet_alpha = c(0,.5,1),
    standardize = TRUE
  )
  pr_te_cv <- safe_predict_cv(fit_cv, test_df)
  time_cv  <- round(proc.time()[3] - t1, 2)

  data.frame(
    p = p, model = model, rho = rho, R2_target = R2_tgt, density = dens,
    p_full = p_full, n_train = n_tr, n_train_used = n_used, n_test = n_te,
    n_train_minus_p_full = n_tr - p_full,
    ratio_n_over_p = n_used / p_full,
    above_boundary = as.integer(n_used > p_full),
    R2_true_train = R2_train_true, R2_true_test = R2_test_true,
    rmse_test_svem_waic  = rmse(test_df$y, sv_waic$pred),
    rmse_test_svem_waicc = rmse(test_df$y, sv_waicc$pred),
    rmse_test_svem_wsse  = rmse(test_df$y, sv_wsse$pred),
    rmse_test_glmnet_cv  = rmse(test_df$y, pr_te_cv),
    time_sec_svem_waic   = sv_waic$time,
    time_sec_svem_waicc  = sv_waicc$time,
    time_sec_svem_wsse   = sv_wsse$time,
    time_sec_cv          = time_cv,
    stringsAsFactors = FALSE
  )
}

## ----- Run in parallel -----
res_list <- future.apply::future_lapply(
  seq_len(nrow(grid)),
  function(i) {
    # if using progressr:
    # on.exit(pbar(), add = TRUE)
    run_one(grid[i, , drop = FALSE])
  },
  future.seed = TRUE
)

res <- dplyr::bind_rows(Filter(Negate(is.null), res_list))
stopifnot(nrow(res) > 0)

## ---------- Summary & win-rates (robust, single block) ----------
stopifnot(all(c("rmse_test_svem_waic","rmse_test_svem_waicc","rmse_test_svem_wsse","rmse_test_glmnet_cv") %in% names(res)))

# Long format for plotting and summaries
res_long <- res %>%
  mutate(
    Row           = dplyr::row_number(),
    lmrse_waic    = log(pmax(rmse_test_svem_waic,  .Machine$double.eps)),
    lmrse_waicc   = log(pmax(rmse_test_svem_waicc, .Machine$double.eps)),
    lmrse_wsse    = log(pmax(rmse_test_svem_wsse,  .Machine$double.eps)),
    lmrse_cv      = log(pmax(rmse_test_glmnet_cv,  .Machine$double.eps))
  ) %>%
  tidyr::pivot_longer(
    cols = c(lmrse_waic, lmrse_waicc, lmrse_wsse, lmrse_cv),
    names_to = "metric", values_to = "lmrse"
  ) %>%
  mutate(
    settings = dplyr::recode(metric,
                             lmrse_waic  = "SVEM_wAIC",
                             lmrse_waicc = "SVEM_wAICc",
                             lmrse_wsse  = "SVEM_wSSE",
                             lmrse_cv    = "glmnet_cv"),
    settings = factor(settings, levels = c("SVEM_wAIC","SVEM_wAICc","SVEM_wSSE","glmnet_cv"))
  )

# Head-to-head win-rates vs glmnet_cv (paired by row)
win_rates_vs_cv <- res %>%
  summarise(
    SVEM_wAIC  = mean(rmse_test_svem_waic  < rmse_test_glmnet_cv, na.rm = TRUE),
    SVEM_wAICc = mean(rmse_test_svem_waicc < rmse_test_glmnet_cv, na.rm = TRUE),
    SVEM_wSSE  = mean(rmse_test_svem_wsse  < rmse_test_glmnet_cv, na.rm = TRUE)
  ) %>%
  tidyr::pivot_longer(everything(), names_to = "settings", values_to = "winrate_vs_cv")

# Overall summary (log RMSE)
summ <- res_long %>%
  group_by(settings) %>%
  summarise(
    mean_lmrse = mean(lmrse, na.rm = TRUE),
    sd_lmrse   = sd(lmrse,   na.rm = TRUE),
    n          = dplyr::n(),
    se         = sd_lmrse / sqrt(pmax(n, 1)),
    ci_lo      = mean_lmrse - 1.96 * se,
    ci_hi      = mean_lmrse + 1.96 * se,
    .groups    = "drop"
  ) %>%
  left_join(win_rates_vs_cv, by = "settings")

cat("\n================ SUMMARY (log RMSE) ================\n")
print(summ, row.names = FALSE, digits = 4)

## ---------- By-boundary summaries (attach boundary to long) ----------
if ("above_boundary" %in% names(res)) {
  bound_key <- res %>%
    mutate(Row = dplyr::row_number()) %>%
    select(Row, above_boundary)
} else if (all(c("n_train_used","p_full") %in% names(res))) {
  bound_key <- res %>%
    mutate(Row = dplyr::row_number(),
           above_boundary = as.integer(n_train_used > p_full)) %>%
    select(Row, above_boundary)
} else if ("ratio_n_over_p" %in% names(res)) {
  bound_key <- res %>%
    mutate(Row = dplyr::row_number(),
           above_boundary = as.integer(ratio_n_over_p > 1)) %>%
    select(Row, above_boundary)
} else {
  bound_key <- res %>%
    mutate(Row = dplyr::row_number(), above_boundary = NA_integer_) %>%
    select(Row, above_boundary)
}

res_long <- res_long %>% left_join(bound_key, by = "Row")

if ("above_boundary.x" %in% names(res_long) || "above_boundary.y" %in% names(res_long)) {
  res_long <- res_long %>%
    dplyr::mutate(
      above_boundary = dplyr::coalesce(
        if ("above_boundary.y" %in% names(.)) .data$above_boundary.y else NA_integer_,
        if ("above_boundary.x" %in% names(.)) .data$above_boundary.x else NA_integer_
      )
    ) %>%
    dplyr::select(-dplyr::any_of(c("above_boundary.x","above_boundary.y")))
}

# Means of log RMSE by boundary & setting
summ_by_boundary <- res_long %>%
  group_by(above_boundary, settings) %>%
  summarise(
    mean_lmrse = mean(lmrse, na.rm = TRUE),
    sd_lmrse   = sd(lmrse,   na.rm = TRUE),
    n          = dplyr::n(),
    .groups    = "drop"
  )

cat("\n---- Means by boundary (n_train_used > p_full) ----\n")
print(summ_by_boundary, row.names = FALSE, digits = 4)

# Boundary-specific win-rate for each SVEM variant vs glmnet
win_by_boundary <- res %>%
  mutate(
    above_boundary = if ("above_boundary" %in% names(.)) above_boundary
    else if (all(c("n_train_used","p_full") %in% names(.))) as.integer(n_train_used > p_full)
    else if ("ratio_n_over_p" %in% names(.)) as.integer(ratio_n_over_p > 1)
    else NA_integer_,
    win_waic  = rmse_test_svem_waic  < rmse_test_glmnet_cv,
    win_waicc = rmse_test_svem_waicc < rmse_test_glmnet_cv,
    win_wsse  = rmse_test_svem_wsse  < rmse_test_glmnet_cv
  ) %>%
  group_by(above_boundary) %>%
  summarise(
    winrate_waic  = mean(win_waic,  na.rm = TRUE),
    winrate_waicc = mean(win_waicc, na.rm = TRUE),
    winrate_wsse  = mean(win_wsse,  na.rm = TRUE),
    n             = sum(!is.na(win_waic) | !is.na(win_waicc) | !is.na(win_wsse)),
    .groups       = "drop"
  )

cat("\n---- Win-rate vs glmnet (by boundary) ----\n")
print(win_by_boundary, row.names = FALSE, digits = 4)

## ========== ANOM-style, row-blocked plot ==========
res_long_anom <- res_long %>%
  dplyr::mutate(Row = as.factor(Row)) %>%
  dplyr::filter(is.finite(lmrse))

fit_anom <- lm(lmrse ~ settings + Row, data = res_long_anom)
aov_tbl  <- anova(fit_anom)

MSE    <- aov_tbl[["Mean Sq"]][nrow(aov_tbl)]
df_res <- aov_tbl[["Df"]][nrow(aov_tbl)]

t_methods <- nlevels(res_long_anom$settings)
b_blocks  <- nlevels(res_long_anom$Row)
grand_mu  <- mean(res_long_anom$lmrse, na.rm = TRUE)

# Var(ȳ_i. − ȳ..) = σ^2 * (t − 1) / (t * b) under RBD
se_group <- sqrt(MSE * (t_methods - 1) / (t_methods * b_blocks))

alpha <- 0.05
crit  <- qt(1 - alpha / (2 * t_methods), df = df_res)
UCL   <- grand_mu + crit * se_group
LCL   <- grand_mu - crit * se_group

means_df <- res_long_anom %>%
  dplyr::group_by(settings) %>%
  dplyr::summarise(mean_lmrse = mean(lmrse, na.rm = TRUE), .groups = "drop") %>%
  dplyr::mutate(flag = dplyr::case_when(
    mean_lmrse > UCL ~ "Above UCL",
    mean_lmrse < LCL ~ "Below LCL",
    TRUE             ~ "Within Limits"
  ))

p_anom <- ggplot(means_df, aes(x = settings, y = mean_lmrse)) +
  geom_hline(yintercept = grand_mu, linetype = 2) +
  geom_hline(yintercept = UCL,      linetype = 3) +
  geom_hline(yintercept = LCL,      linetype = 3) +
  geom_segment(aes(xend = settings, y = grand_mu, yend = mean_lmrse), linewidth = 1) +
  geom_point(aes(color = flag), size = 3) +
  scale_color_manual(values = c("Within Limits" = "black",
                                "Above UCL"     = "red",
                                "Below LCL"     = "red")) +
  labs(
    title = "Blocked ANOM-style plot for lmrse (Row = block)",
    subtitle = sprintf("Grand mean = %.3f | Limits = [%.3f, %.3f] | df = %d",
                       grand_mu, LCL, UCL, df_res),
    x = NULL, y = "Mean lmrse", color = NULL
  ) +
  theme_bw() +
  theme(plot.title = element_text(face = "bold"))

p_pairs <- ggplot(res_long_anom, aes(x = settings, y = lmrse, group = Row)) +
  geom_line(alpha = 0.25) +
  geom_point(alpha = 0.6, position = position_dodge(width = 0.05)) +
  stat_summary(fun = mean, geom = "point", size = 3, color = "black") +
  labs(title = "Paired runs by Row", x = NULL, y = "lmrse") +
  theme_bw()

(p_anom / p_pairs)

## ---------- Geometric-mean RMSE ratio (SVEM wAIC vs CV) ----------
eps <- .Machine$double.eps
rmse_ratio <- with(res, pmax(rmse_test_svem_waic, eps) / pmax(rmse_test_glmnet_cv, eps))
logdiff    <- log(rmse_ratio)

gm_ratio <- exp(mean(logdiff, na.rm = TRUE))           # primary
q75      <- quantile(rmse_ratio, 0.75, na.rm = TRUE)   # tail
q90      <- quantile(rmse_ratio, 0.90, na.rm = TRUE)
winrate  <- mean(rmse_ratio < 1, na.rm = TRUE)

# simple bootstrap CI for GM ratio
set.seed(123)
B <- 2000
idx <- replicate(B, sample.int(length(logdiff), replace = TRUE))
gm_boot <- apply(idx, 2, function(ii) exp(mean(logdiff[ii], na.rm = TRUE)))
gm_ci   <- quantile(gm_boot, c(0.025, 0.975), na.rm = TRUE)

cat(sprintf(
  "\nPaired geometric-mean RMSE ratio (SVEM/CV): %.3f  (95%% CI %.3f – %.3f)\n",
  gm_ratio, gm_ci[1], gm_ci[2]
))
cat(sprintf("Win rate (SVEM better): %.1f%%\n", 100*winrate))
cat(sprintf("Tail ratios: 75th=%.3f, 90th=%.3f\n", q75, q90))

# Trim top/bottom 5% of logdiff before averaging
trim <- function(x, p=0.05) x[x >= quantile(x, p, na.rm=TRUE) & x <= quantile(x, 1-p, na.rm=TRUE)]
gm_ratio_trim <- exp(mean(trim(logdiff), na.rm = TRUE))
cat(sprintf("Trimmed GM ratio (5%%/5%%): %.3f\n", gm_ratio_trim))

Default SVEMnet settings (Aug 2025)

Added objective = "wAICc".

Chosen default: SVEMnet(..., objective = "wAICc") and predict(..., debias = FALSE).

Rationale from focused simulations (head-to-head test RMSE, disjoint train/test): - AICc vs AIC (inside SVEM): AICc consistently outperformed AIC. In paired comparisons on log-RMSE, the geometric-mean (GM) ratio favored AICc (e.g., GM approx 0.875 when debias=FALSE), indicating materially lower test error for AICc across diverse settings. - Debiasing: Turning debias on generally increased test error for both SVEM and glmnet (typical GM inflation approx +3%). Debias helped only in high-signal slices (e.g., R² ≳ 0.7), but tended to hurt near or below the n/p_full boundary. As a broad default, debias=FALSE is safer. - SVEM objectives hierarchy: wAICc > wAICwSSE in aggregate performance and tail risk. wSSE especially underperformed near boundary regimes. - glmnet (for context only): cv.glmnet is a strong baseline and was very close to SVEM–AICc on average. However, since this package’s goal is to provide a robust SVEM implementation with consistent defaults across regimes, we select AICc with debias off as the package default and use glmnet primarily for comparison.

Key practical implication: Unless you have strong prior evidence of a high-signal regime and want JMP-style debiased outputs, keep debias = FALSE. AICc’s small-sample penalty stabilizes selection near tight designs and lowers tail losses without noticeably sacrificing average accuracy.

Default usage:

# Fit with the recommended defaults
fit <- SVEMnet(
  y ~ (X1 + X2 + X3)^2 + I(X1^2) + I(X2^2) + I(X3^2),
  data = dat,
  objective = "wAICc",
  nBoot = 200,
  glmnet_alpha = c(0, 0.5, 1),
  weight_scheme = "SVEM",
  standardize = TRUE
)

# Predict with debias turned off (recommended default)
pred <- predict(fit, newdata = dat, debias = FALSE)
## ============================================================
## SVEM (wAICc & wAIC) vs glmnet:
## debias = FALSE vs TRUE for all methods
## Focus: small training sizes; head-to-head test RMSE
## Parallelized with future.apply (N_WORKERS)
## Includes paired analyses + interactions + extra slices
## ============================================================

## ----- Packages -----
if (!requireNamespace("SVEMnet", quietly = TRUE)) install.packages("SVEMnet")
if (!requireNamespace("glmnet",  quietly = TRUE)) install.packages("glmnet")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
if (!requireNamespace("dplyr",   quietly = TRUE)) install.packages("dplyr")
if (!requireNamespace("tidyr",   quietly = TRUE)) install.packages("tidyr")
if (!requireNamespace("tibble",  quietly = TRUE)) install.packages("tibble")
if (!requireNamespace("patchwork", quietly = TRUE)) install.packages("patchwork")
if (!requireNamespace("future.apply", quietly = TRUE)) install.packages("future.apply")
# Optional progress bar:
# if (!requireNamespace("progressr", quietly = TRUE)) install.packages("progressr")

library(SVEMnet)
library(glmnet)     # glmnet_with_cv likely depends on this
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
library(patchwork)
library(future.apply)
# library(progressr)

## ----- Controls -----
set.seed(67)
REPS         <- 10
NBOOT        <- 200
R2_GRID      <- seq(.1, .9, .2)
RHO_GRID     <- c(0, -0.9, 0.9)
P_GRID       <- c(4, 5, 6, 7)
MODELS       <- c("full_quadratic", "full_cubic")  # quadratic + cubic truths
NTEST        <- 1000
DENSITY_GRID <- seq(.1, .9, .2)
N_WORKERS    <- 7

## ----- Helpers -----
rmse <- function(obs, pred) sqrt(mean((obs - pred)^2))

safe_predict_svem <- function(fit, newdata, debias = FALSE, agg = "mean") {
  out <- try(predict(fit, newdata = newdata, debias = debias, agg = agg), silent = TRUE)
  if (inherits(out, "try-error")) return(rep(NA_real_, nrow(newdata)))
  as.numeric(out)
}

## Robustly call your glmnet_with_cv() + predict_cv() (they must exist in the environment)
safe_predict_cv <- function(object, newdata, debias = FALSE) {
  out <- try(predict_cv(object, newdata = newdata, debias = debias), silent = TRUE)
  if (inherits(out, "try-error")) return(rep(NA_real_, nrow(newdata)))
  as.numeric(out)
}

## Count p_full (columns in model matrix – intercept)
count_p_full <- function(fm, p) {
  rhs <- stats::reformulate(attr(stats::delete.response(stats::terms(fm)), "term.labels"))
  tmp <- as.data.frame(matrix(rnorm(4 * p), nrow = 4))
  names(tmp) <- paste0("X", seq_len(p))
  mm <- model.matrix(rhs, data = tmp)
  sum(colnames(mm) != "(Intercept)")
}

build_formulas <- function(p) {
  vars <- paste0("X", seq_len(p))
  main_terms <- paste(vars, collapse = " + ")
  int2_terms <- paste0("(", main_terms, ")^2")  # up to 2-way interactions
  int3_terms <- paste0("(", main_terms, ")^3")  # up to 3-way interactions
  sq_terms   <- paste(sprintf("I(%s^2)", vars), collapse = " + ")
  cube_terms <- paste(sprintf("I(%s^3)", vars), collapse = " + ")
  list(
    main_effects   = as.formula(paste0("y ~ ", main_terms)),
    main_plus_int  = as.formula(paste0("y ~ ", int2_terms)),
    full_quadratic = as.formula(paste0("y ~ ", int2_terms, " + ", sq_terms)),
    full_cubic     = as.formula(paste0("y ~ ", int3_terms, " + ", sq_terms, " + ", cube_terms))
  )
}

gen_X <- function(n, p, rho = 0.5) {
  Z <- matrix(rnorm(n * p), n, p)
  if (p >= 2 && abs(rho) > 0) {
    for (j in 2:p) {
      Z[, j] <- rho * scale(Z[, j - 1], TRUE, TRUE) +
        sqrt(1 - rho^2) * scale(Z[, j], TRUE, TRUE)
    }
  }
  X <- Z
  colnames(X) <- paste0("X", seq_len(p))
  as.data.frame(X)
}

## Density-driven truth generator using the selected formula fm
make_sparse_data_R2 <- function(n, p, rho, target_R2, fm, density = 0.2, seed = NULL) {
  if (!is.null(seed)) set.seed(seed)
  Xdf <- gen_X(n, p, rho); names(Xdf) <- paste0("X", seq_len(p))
  terms_obj <- stats::delete.response(stats::terms(fm, data = transform(Xdf, y = 0)))
  mf  <- model.frame(terms_obj, data = Xdf)
  MM  <- model.matrix(terms_obj, mf)   # n x (M+1) with intercept at column 1

  M <- ncol(MM) - 1L
  if (M < 1L) stop("Formula produces no predictors beyond intercept.")
  n_active   <- max(1L, floor(density * M))
  active_idx <- sample.int(M, n_active, replace = FALSE) + 1L  # skip intercept

  beta <- numeric(M + 1L)
  beta[active_idx] <- rexp(n_active) - rexp(n_active)
  y_signal <- drop(MM %*% beta)

  sd_sig <- sqrt(max(var(y_signal), .Machine$double.eps))
  sd_eps <- sd_sig * sqrt((1 - target_R2) / target_R2)
  y <- y_signal + stats::rnorm(n, 0, sd_eps)

  out <- cbind.data.frame(y = y, Xdf)
  out$y_signal <- y_signal
  rownames(out) <- sprintf("row%06d", seq_len(n))
  out
}

## SVEM: fit once for a given objective, then predict for debias=FALSE/TRUE
fit_and_predict_obj <- function(fm, train_used, test_df, objective, nBoot = NBOOT, agg = "mean") {
  t0 <- proc.time()[3]
  mod <- try(SVEMnet::SVEMnet(
    formula = fm, data = train_used,
    nBoot = nBoot,
    glmnet_alpha = c(0, .5, 1),
    weight_scheme = "SVEM",
    objective = objective,  # "wAICc" or "wAIC"
    standardize = TRUE
  ), silent = TRUE)
  elapsed <- round(proc.time()[3] - t0, 2)
  if (inherits(mod, "try-error")) {
    pred_false <- rep(NA_real_, nrow(test_df))
    pred_true  <- rep(NA_real_, nrow(test_df))
  } else {
    pred_false <- safe_predict_svem(mod, test_df, debias = FALSE, agg = agg)
    pred_true  <- safe_predict_svem(mod, test_df, debias = TRUE,  agg = agg)
  }
  list(pred_false = pred_false, pred_true = pred_true, time = elapsed)
}

## glmnet comparator using your glmnet_with_cv() + predict_cv()
fit_and_predict_glmnet_cv <- function(fm, train_used, test_df, alphas = c(0, 0.5, 1)) {
  if (!exists("glmnet_with_cv")) stop("glmnet_with_cv() not found in environment.")
  if (!exists("predict_cv"))    stop("predict_cv() not found in environment.")
  t0 <- proc.time()[3]
  mod <- try(glmnet_with_cv(formula = fm, data = train_used, glmnet_alpha = alphas), silent = TRUE)
  elapsed <- round(proc.time()[3] - t0, 2)
  if (inherits(mod, "try-error")) {
    return(list(pred_false = rep(NA_real_, nrow(test_df)),
                pred_true  = rep(NA_real_, nrow(test_df)),
                time = elapsed))
  }
  pred_false <- safe_predict_cv(mod, test_df, debias = FALSE)
  pred_true  <- safe_predict_cv(mod, test_df, debias = TRUE)
  list(pred_false = pred_false, pred_true = pred_true, time = elapsed)
}

## ----- Build scenario grid -----
grid <- expand.grid(
  p       = P_GRID,
  model   = MODELS,
  rho     = RHO_GRID,
  R2_tgt  = R2_GRID,
  dens    = DENSITY_GRID,
  rep     = seq_len(REPS),
  stringsAsFactors = FALSE
)

## ----- Parallel plan -----
old_plan <- future::plan()
on.exit(future::plan(old_plan), add = TRUE)
future::plan(future::multisession, workers = N_WORKERS)

## ----- One scenario run -----
run_one <- function(row) {
  p      <- row$p
  model  <- row$model
  rho    <- row$rho
  R2_tgt <- row$R2_tgt
  dens   <- row$dens
  repi   <- row$rep

  fms    <- build_formulas(p)
  fm     <- fms[[model]]
  p_full <- count_p_full(fm, p)

  # n around the boundary; clamp to [12, 40]
  n_grid <- sort(unique(pmin(pmax(p_full + c(-12, -8, -6, -4, -2, 0, 2), 12), 40)))

  # per-row deterministic seed
  seed_i <- 3000 +
    19 * repi + 100 * p +
    1000 * match(model, MODELS) +
    round(1000 * R2_tgt) +
    round(1000 * rho) +
    round(1000 * dens)
  set.seed(seed_i)

  n_tr <- sample(n_grid, 1L)
  n_te <- NTEST

  df <- make_sparse_data_R2(
    n = n_tr + n_te, p = p, rho = rho, target_R2 = R2_tgt,
    fm = fm, density = dens, seed = seed_i + 1L
  )

  set.seed(seed_i + 2L)
  idx      <- sample(seq_len(nrow(df)), size = n_tr)
  train_df <- df[idx, ]
  test_df  <- df[-idx, ]

  keep <- complete.cases(model.frame(fm, data = train_df, na.action = stats::na.pass))
  if (sum(keep) < 2) return(NULL)
  train_used <- train_df[keep, , drop = FALSE]
  n_used     <- nrow(train_used)

  r2_true <- function(d) var(d$y_signal) / var(d$y)
  R2_train_true <- r2_true(train_used)
  R2_test_true  <- r2_true(test_df)

  # SVEM (wAICc & wAIC)
  sv_wAICc <- fit_and_predict_obj(fm, train_used, test_df, "wAICc", nBoot = NBOOT)
  sv_wAIC  <- fit_and_predict_obj(fm, train_used, test_df, "wAIC",  nBoot = NBOOT)

  # glmnet comparator (cv + built-in debias)
  glmn <- fit_and_predict_glmnet_cv(fm, train_used, test_df)

  data.frame(
    p = p, model = model, rho = rho, R2_target = R2_tgt, density = dens,
    p_full = p_full, n_train = n_tr, n_train_used = n_used, n_test = n_te,
    n_train_minus_p_full = n_tr - p_full,
    ratio_n_over_p = n_used / p_full,
    above_boundary = as.integer(n_used > p_full),
    R2_true_train = R2_train_true, R2_true_test = R2_test_true,

    rmse_test_wAICc_debiasFalse = rmse(test_df$y, sv_wAICc$pred_false),
    rmse_test_wAICc_debiasTrue  = rmse(test_df$y, sv_wAICc$pred_true),
    rmse_test_wAIC_debiasFalse  = rmse(test_df$y, sv_wAIC$pred_false),
    rmse_test_wAIC_debiasTrue   = rmse(test_df$y, sv_wAIC$pred_true),
    rmse_test_glmnet_debiasFalse= rmse(test_df$y, glmn$pred_false),
    rmse_test_glmnet_debiasTrue = rmse(test_df$y, glmn$pred_true),

    time_sec_fit_wAICc          = sv_wAICc$time,
    time_sec_fit_wAIC           = sv_wAIC$time,
    time_sec_fit_glmnet         = glmn$time,
    stringsAsFactors = FALSE
  )
}

## ----- Run in parallel -----
# progressr::with_progress({
#   pbar <- progressr::progressor(along = seq_len(nrow(grid)))
#   res_list <- future.apply::future_lapply(
#     seq_len(nrow(grid)),
#     function(i) { on.exit(pbar(), add = TRUE); run_one(grid[i, , drop = FALSE]) },
#     future.seed = TRUE
#   )
# })
res_list <- future.apply::future_lapply(
  seq_len(nrow(grid)),
  function(i) run_one(grid[i, , drop = FALSE]),
  future.seed = TRUE
)

res <- dplyr::bind_rows(Filter(Negate(is.null), res_list))
stopifnot(nrow(res) > 0)

## ---------- Summaries (marginal + paired + interactions) ----------
need_cols <- c("rmse_test_wAICc_debiasFalse","rmse_test_wAICc_debiasTrue",
               "rmse_test_wAIC_debiasFalse","rmse_test_wAIC_debiasTrue",
               "rmse_test_glmnet_debiasFalse","rmse_test_glmnet_debiasTrue")
stopifnot(all(need_cols %in% names(res)))

## Long format (for marginal summaries & ANOM)
eps <- .Machine$double.eps
res_long <- res %>%
  mutate(Row = dplyr::row_number()) %>%
  transmute(
    Row, model, p, rho, R2_target, density,
    above_boundary, ratio_n_over_p,
    wAICc_FALSE = log(pmax(rmse_test_wAICc_debiasFalse, eps)),
    wAICc_TRUE  = log(pmax(rmse_test_wAICc_debiasTrue,  eps)),
    wAIC_FALSE  = log(pmax(rmse_test_wAIC_debiasFalse,  eps)),
    wAIC_TRUE   = log(pmax(rmse_test_wAIC_debiasTrue,   eps)),
    glmnet_FALSE= log(pmax(rmse_test_glmnet_debiasFalse,eps)),
    glmnet_TRUE = log(pmax(rmse_test_glmnet_debiasTrue, eps))
  ) %>%
  tidyr::pivot_longer(
    cols = c(wAICc_FALSE, wAICc_TRUE, wAIC_FALSE, wAIC_TRUE, glmnet_FALSE, glmnet_TRUE),
    names_to = "setting", values_to = "lmrse"
  ) %>%
  mutate(
    method    = case_when(grepl("^glmnet", setting) ~ "glmnet",
                          grepl("^wAICc", setting) ~ "SVEM-wAICc",
                          TRUE                      ~ "SVEM-wAIC"),
    debias    = ifelse(grepl("TRUE$",  setting), "TRUE", "FALSE"),
    method    = factor(method, levels = c("SVEM-wAICc","SVEM-wAIC","glmnet")),
    debias    = factor(debias, levels = c("FALSE","TRUE")),
    setting   = factor(paste0(method, " (debias=", debias, ")"),
                       levels = c("SVEM-wAICc (debias=FALSE)","SVEM-wAICc (debias=TRUE)",
                                  "SVEM-wAIC (debias=FALSE)","SVEM-wAIC (debias=TRUE)",
                                  "glmnet (debias=FALSE)","glmnet (debias=TRUE)"))
  )

## ---------- Which combo is best overall? (per-row winner among 6)
winners <- res %>%
  mutate(Row = dplyr::row_number()) %>%
  transmute(
    Row,
    `SVEM-wAICc (debias=FALSE)` = rmse_test_wAICc_debiasFalse,
    `SVEM-wAICc (debias=TRUE)`  = rmse_test_wAICc_debiasTrue,
    `SVEM-wAIC (debias=FALSE)`  = rmse_test_wAIC_debiasFalse,
    `SVEM-wAIC (debias=TRUE)`   = rmse_test_wAIC_debiasTrue,
    `glmnet (debias=FALSE)`     = rmse_test_glmnet_debiasFalse,
    `glmnet (debias=TRUE)`      = rmse_test_glmnet_debiasTrue
  ) %>%
  tidyr::pivot_longer(-Row, names_to = "setting", values_to = "rmse") %>%
  group_by(Row) %>%
  filter(is.finite(rmse), rmse == min(rmse, na.rm = TRUE)) %>%
  ungroup()

best_counts <- winners %>% count(setting, name = "wins") %>%
  mutate(share = wins / sum(wins)) %>%
  arrange(desc(share))
cat("\n==== Which (method, debias) wins most often? ====\n")
print(best_counts, row.names = FALSE, digits = 4)

## ---------- Marginal summary (log RMSE) — info only
summ_marg <- res_long %>%
  group_by(setting) %>%
  summarise(
    mean_lmrse = mean(lmrse, na.rm = TRUE),
    sd_lmrse   = sd(lmrse,   na.rm = TRUE),
    n          = dplyr::n(),
    se         = sd_lmrse / sqrt(pmax(n, 1)),
    ci_lo      = mean_lmrse - 1.96 * se,
    ci_hi      = mean_lmrse + 1.96 * se,
    .groups    = "drop"
  ) %>%
  arrange(mean_lmrse)
cat("\n================ SUMMARY (log RMSE; marginal) ================\n")
print(summ_marg, row.names = FALSE, digits = 4)

## ---------- Debias effect within each method (paired)
paired_effect <- function(num, den, label) {
  logdiff <- log(pmax(num, eps)) - log(pmax(den, eps))
  logdiff <- logdiff[is.finite(logdiff)]
  n <- length(logdiff)
  mu <- mean(logdiff); se <- sd(logdiff)/sqrt(n); ci <- mu + c(-1,1) * qt(0.975, n-1) * se
  tibble(contrast = label, n = n,
         gm_ratio = exp(mu), gm_lo = exp(ci[1]), gm_hi = exp(ci[2]),
         winrate = mean(exp(logdiff) < 1))
}

debias_wAICc <- paired_effect(res$rmse_test_wAICc_debiasTrue,  res$rmse_test_wAICc_debiasFalse, "SVEM-wAICc: TRUE / FALSE")
debias_wAIC  <- paired_effect(res$rmse_test_wAIC_debiasTrue,   res$rmse_test_wAIC_debiasFalse,  "SVEM-wAIC: TRUE / FALSE")
debias_glm   <- paired_effect(res$rmse_test_glmnet_debiasTrue, res$rmse_test_glmnet_debiasFalse,"glmnet: TRUE / FALSE")

cat("\n==== Debias effect within each method (GM ratios; <1 favors debias=TRUE) ====\n")
print(bind_rows(debias_wAICc, debias_wAIC, debias_glm), row.names = FALSE, digits = 4)

## ---------- Objective effect at fixed debias (SVEM only; paired)
obj_FALSE <- paired_effect(res$rmse_test_wAICc_debiasFalse, res$rmse_test_wAIC_debiasFalse,
                           "SVEM: wAICc / wAIC (debias=FALSE)")
obj_TRUE  <- paired_effect(res$rmse_test_wAICc_debiasTrue,  res$rmse_test_wAIC_debiasTrue,
                           "SVEM: wAICc / wAIC (debias=TRUE)")
cat("\n==== Objective effect (SVEM) at each debias (GM ratios; <1 favors wAICc) ====\n")
print(bind_rows(obj_FALSE, obj_TRUE), row.names = FALSE, digits = 4)

## ---------- SVEM vs glmnet at fixed debias (paired; optional but handy)
svem_wAICc_vs_glm_FALSE <- paired_effect(res$rmse_test_wAICc_debiasFalse, res$rmse_test_glmnet_debiasFalse,
                                         "SVEM-wAICc / glmnet  (debias=FALSE)")
svem_wAICc_vs_glm_TRUE  <- paired_effect(res$rmse_test_wAICc_debiasTrue,  res$rmse_test_glmnet_debiasTrue,
                                         "SVEM-wAICc / glmnet  (debias=TRUE)")
svem_wAIC_vs_glm_FALSE  <- paired_effect(res$rmse_test_wAIC_debiasFalse,  res$rmse_test_glmnet_debiasFalse,
                                         "SVEM-wAIC / glmnet   (debias=FALSE)")
svem_wAIC_vs_glm_TRUE   <- paired_effect(res$rmse_test_wAIC_debiasTrue,   res$rmse_test_glmnet_debiasTrue,
                                         "SVEM-wAIC / glmnet   (debias=TRUE)")
cat("\n==== SVEM vs glmnet (GM ratios; <1 favors SVEM) ====\n")
print(bind_rows(svem_wAICc_vs_glm_FALSE, svem_wAICc_vs_glm_TRUE,
                svem_wAIC_vs_glm_FALSE, svem_wAIC_vs_glm_TRUE),
      row.names = FALSE, digits = 4)

## ---------- Interaction: method × debias (Row-blocked + DiD style)
# assumes res_long is already built
res_long_dm <- res_long %>%
  dplyr::group_by(Row) %>%
  dplyr::mutate(lmrse_dm = lmrse - mean(lmrse)) %>%
  dplyr::ungroup()

fit_interact_fast <- lm(lmrse_dm ~ method * debias, data = res_long_dm)
anova(fit_interact_fast)

# If you want the “setting” one-way with Row FE:
fit_setting_fast <- lm(lmrse_dm ~ setting, data = res_long_dm)
anova(fit_setting_fast)

## ---------- By-boundary (marginal means over 6 settings)
ab_key <- res %>% mutate(Row = dplyr::row_number()) %>% select(Row, above_boundary)
res_long_ab <- res_long %>%
  select(-dplyr::any_of(c("above_boundary"))) %>%
  left_join(ab_key, by = "Row")

summ_by_boundary <- res_long_ab %>%
  group_by(above_boundary, setting) %>%
  summarise(
    mean_lmrse = mean(lmrse, na.rm = TRUE),
    sd_lmrse   = sd(lmrse,   na.rm = TRUE),
    n          = dplyr::n(),
    .groups    = "drop"
  )
cat("\n---- Means by boundary (n_train_used > p_full; marginal) ----\n")
print(summ_by_boundary, row.names = FALSE, digits = 4)

## ---------- Win-rate by boundary (which setting wins among 6; paired within row)
win_by_boundary6 <- res %>%
  mutate(Row = dplyr::row_number(),
         above_boundary = as.integer(n_train_used > p_full)) %>%
  transmute(
    Row, above_boundary,
    `SVEM-wAICc (debias=FALSE)` = rmse_test_wAICc_debiasFalse,
    `SVEM-wAICc (debias=TRUE)`  = rmse_test_wAICc_debiasTrue,
    `SVEM-wAIC (debias=FALSE)`  = rmse_test_wAIC_debiasFalse,
    `SVEM-wAIC (debias=TRUE)`   = rmse_test_wAIC_debiasTrue,
    `glmnet (debias=FALSE)`     = rmse_test_glmnet_debiasFalse,
    `glmnet (debias=TRUE)`      = rmse_test_glmnet_debiasTrue
  ) %>%
  tidyr::pivot_longer(-c(Row, above_boundary), names_to = "setting", values_to = "rmse") %>%
  group_by(Row) %>%
  filter(is.finite(rmse), rmse == min(rmse, na.rm = TRUE)) %>%
  ungroup() %>%
  count(above_boundary, setting, name = "wins") %>%
  group_by(above_boundary) %>%
  mutate(share = wins / sum(wins)) %>%
  ungroup()

cat("\n---- Win shares by boundary (among 6 settings) ----\n")
print(win_by_boundary6, row.names = FALSE, digits = 4)

## ---------- Extra slices: wins by density & by R2_target
win_by_density <- winners %>%
  left_join(res %>% mutate(Row = dplyr::row_number(), density_fac = factor(density)), by = "Row") %>%
  count(density_fac, setting, name = "wins") %>%
  group_by(density_fac) %>% mutate(share = wins / sum(wins)) %>% ungroup()
cat("\n---- Win shares by density ----\n")
print(win_by_density, row.names = FALSE, digits = 4)

win_by_R2 <- winners %>%
  left_join(res %>% mutate(Row = dplyr::row_number(), R2_fac = factor(R2_target)), by = "Row") %>%
  count(R2_fac, setting, name = "wins") %>%
  group_by(R2_fac) %>% mutate(share = wins / sum(wins)) %>% ungroup()
cat("\n---- Win shares by R2_target ----\n")
print(win_by_R2, row.names = FALSE, digits = 4)

## ---------- ANOM-style, row-blocked plot (now 6 settings)
means_df <- res_long_anom %>%
  group_by(setting) %>%
  summarise(mean_lmrse = mean(lmrse, na.rm = TRUE), .groups = "drop")

## ---- FAST ANOM pieces (avoid lm(... + Row)) ----
# 1) Row-demean ( already ran this earlier)
res_long_dm <- res_long %>%
  dplyr::group_by(Row) %>%
  dplyr::mutate(lmrse_dm = lmrse - mean(lmrse)) %>%
  dplyr::ungroup()

# 2) One-way on setting using the demeaned response
fit_setting_fast <- lm(lmrse_dm ~ setting, data = res_long_dm)
aov_tbl <- anova(fit_setting_fast)

# 3) Extract the same quantities you were using before
MSE      <- aov_tbl[["Mean Sq"]][nrow(aov_tbl)]
df_res   <- aov_tbl[["Df"]][nrow(aov_tbl)]
t_methods<- nlevels(res_long_anom$setting)
b_blocks <- nlevels(res_long_anom$Row)
grand_mu <- mean(res_long_anom$lmrse, na.rm = TRUE)

# ANOM group SE and limits (same formulas as before)
se_group <- sqrt(MSE * (t_methods - 1) / (t_methods * b_blocks))
crit     <- qt(1 - 0.05 / (2 * t_methods), df = df_res)
UCL      <- grand_mu + crit * se_group
LCL      <- grand_mu - crit * se_group


p_anom <- ggplot(means_df, aes(x = setting, y = mean_lmrse)) +
  geom_hline(yintercept = grand_mu, linetype = 2) +
  geom_hline(yintercept = UCL,      linetype = 3) +
  geom_hline(yintercept = LCL,      linetype = 3) +
  geom_segment(aes(xend = setting, y = grand_mu, yend = mean_lmrse), linewidth = 1) +
  geom_point(size = 3) +
  labs(title = "Blocked ANOM-style plot (6 settings)", x = NULL, y = "Mean log RMSE") +
  theme_bw()

p_pairs <- ggplot(res_long_anom, aes(x = setting, y = lmrse, group = Row)) +
  geom_line(alpha = 0.2) +
  geom_point(alpha = 0.5, position = position_dodge(width = 0.05)) +
  stat_summary(fun = mean, geom = "point", size = 3, color = "black") +
  labs(title = "Paired runs by Row", x = NULL, y = "log RMSE") +
  theme_bw()

p_anom

## ---------- (Optional) Timing summary
time_summary <- res %>%
  summarise(
    median_wAICc = median(time_sec_fit_wAICc, na.rm = TRUE),
    median_wAIC  = median(time_sec_fit_wAIC,  na.rm = TRUE),
    median_glm   = median(time_sec_fit_glmnet,na.rm = TRUE)
  )
cat("\n---- Median fit times (sec) ----\n")
print(time_summary, row.names = FALSE, digits = 4)



## Win shares by n/p deciles (how “over/under” the boundary affects winners)
win_by_np <- winners %>%
  left_join(res %>% mutate(Row = dplyr::row_number(),
                           np_decile = ntile(ratio_n_over_p, 10)), by = "Row") %>%
  count(np_decile, setting, name = "wins") %>%
  group_by(np_decile) %>% mutate(share = wins / sum(wins)) %>% ungroup()

print(win_by_np, n = 60)

## Winners by model family (quadratic vs. cubic)
win_by_model <- winners %>%
  left_join(res %>% mutate(Row = dplyr::row_number(),
                           model = factor(model)), by = "Row") %>%
  count(model, setting, name = "wins") %>%
  group_by(model) %>% mutate(share = wins / sum(wins)) %>% ungroup()

print(win_by_model, n = 20)








paired_by <- function(res, num_col, den_col, by) {
  eps <- .Machine$double.eps
  res %>%
    mutate(Row = dplyr::row_number()) %>%
    select(Row, !!by, num = all_of(num_col), den = all_of(den_col)) %>%
    filter(is.finite(num), is.finite(den)) %>%
    group_by(!!sym(by)) %>%
    summarise(
      n = n(),
      gm_ratio = exp(mean(log(pmax(num, eps)) - log(pmax(den, eps)))),
      gm_lo = exp(t.test(log(pmax(num, eps)) - log(pmax(den, eps)))$conf.int[1]),
      gm_hi = exp(t.test(log(pmax(num, eps)) - log(pmax(den, eps)))$conf.int[2]),
      winrate = mean(num < den),
      .groups = "drop"
    )
}

# Prepare stratifiers
res_np <- res %>% mutate(np_decile = ntile(ratio_n_over_p, 10))
res_den <- res %>% mutate(density_fac = factor(density))
res_r2  <- res %>% mutate(R2_fac = factor(R2_target))

# SVEM-wAICc debias effect
db_wAICc_np <- paired_by(res_np,  "rmse_test_wAICc_debiasTrue",  "rmse_test_wAICc_debiasFalse", "np_decile")
db_wAICc_den<- paired_by(res_den, "rmse_test_wAICc_debiasTrue",  "rmse_test_wAICc_debiasFalse", "density_fac")
db_wAICc_r2 <- paired_by(res_r2,  "rmse_test_wAICc_debiasTrue",  "rmse_test_wAICc_debiasFalse", "R2_fac")

# glmnet debias effect
db_glm_np  <- paired_by(res_np,  "rmse_test_glmnet_debiasTrue", "rmse_test_glmnet_debiasFalse", "np_decile")
db_glm_den <- paired_by(res_den, "rmse_test_glmnet_debiasTrue", "rmse_test_glmnet_debiasFalse", "density_fac")
db_glm_r2  <- paired_by(res_r2,  "rmse_test_glmnet_debiasTrue", "rmse_test_glmnet_debiasFalse", "R2_fac")

# SVEM-wAIC debias effect
db_wAIC_np  <- paired_by(res_np,  "rmse_test_wAIC_debiasTrue",  "rmse_test_wAIC_debiasFalse",  "np_decile")
db_wAIC_den <- paired_by(res_den, "rmse_test_wAIC_debiasTrue",  "rmse_test_wAIC_debiasFalse",  "density_fac")
db_wAIC_r2  <- paired_by(res_r2,  "rmse_test_wAIC_debiasTrue",  "rmse_test_wAIC_debiasFalse",  "R2_fac")

db_wAICc_np; db_glm_np; db_wAIC_np
db_wAICc_den; db_glm_den; db_wAIC_den
db_wAICc_r2; db_glm_r2; db_wAIC_r2



svem_vs_glm_by <- function(res, by, debias_flag = c("False","True")) {
  eps <- .Machine$double.eps
  col_svem <- paste0("rmse_test_wAICc_debias", debias_flag)
  col_glm  <- paste0("rmse_test_glmnet_debias", debias_flag)
  res %>%
    mutate(Row = dplyr::row_number()) %>%
    select(Row, !!by, svem = all_of(col_svem), glm = all_of(col_glm)) %>%
    filter(is.finite(svem), is.finite(glm)) %>%
    group_by(!!sym(by)) %>%
    summarise(
      n = n(),
      gm_ratio = exp(mean(log(pmax(svem, eps)) - log(pmax(glm, eps)))),
      gm_lo = exp(t.test(log(pmax(svem, eps)) - log(pmax(glm, eps)))$conf.int[1]),
      gm_hi = exp(t.test(log(pmax(svem, eps)) - log(pmax(glm, eps)))$conf.int[2]),
      winrate = mean(svem < glm),
      .groups = "drop"
    ) %>%
    mutate(debias = debias_flag)
}

sv_glm_np_F <- svem_vs_glm_by(res_np,  "np_decile",  "False")
sv_glm_np_T <- svem_vs_glm_by(res_np,  "np_decile",  "True")
sv_glm_den_F<- svem_vs_glm_by(res_den, "density_fac","False")
sv_glm_den_T<- svem_vs_glm_by(res_den, "density_fac","True")
sv_glm_r2_F <- svem_vs_glm_by(res_r2,  "R2_fac",     "False")
sv_glm_r2_T <- svem_vs_glm_by(res_r2,  "R2_fac",     "True")

rbind(sv_glm_np_F, sv_glm_np_T) %>% arrange(np_decile, debias)
na_rates <- res %>%
  transmute(
    glmnet_FALSE = is.na(rmse_test_glmnet_debiasFalse),
    glmnet_TRUE  = is.na(rmse_test_glmnet_debiasTrue),
    wAICc_FALSE  = is.na(rmse_test_wAICc_debiasFalse),
    wAICc_TRUE   = is.na(rmse_test_wAICc_debiasTrue),
    wAIC_FALSE   = is.na(rmse_test_wAIC_debiasFalse),
    wAIC_TRUE    = is.na(rmse_test_wAIC_debiasTrue)
  ) %>%
  summarise(across(everything(), ~mean(.)))
print(na_rates, digits = 3)
frontier <- res_long %>%
  group_by(setting) %>%
  summarise(mean_lmrse = mean(lmrse, na.rm = TRUE), .groups="drop") %>%
  mutate(median_time = case_when(
    grepl("^SVEM-wAICc", setting) ~ median(res$time_sec_fit_wAICc, na.rm = TRUE),
    grepl("^SVEM-wAIC",  setting) ~ median(res$time_sec_fit_wAIC,  na.rm = TRUE),
    grepl("^glmnet",     setting) ~ median(res$time_sec_fit_glmnet,na.rm = TRUE)
  ))
frontier
# ggplot(frontier, aes(median_time, mean_lmrse, label = setting)) + geom_point() + ggrepel::geom_text_repel()





winner_margins <- res %>%
  mutate(Row = dplyr::row_number()) %>%
  transmute(
    Row,
    `SVEM-wAICc (debias=FALSE)` = rmse_test_wAICc_debiasFalse,
    `SVEM-wAICc (debias=TRUE)`  = rmse_test_wAICc_debiasTrue,
    `SVEM-wAIC (debias=FALSE)`  = rmse_test_wAIC_debiasFalse,
    `SVEM-wAIC (debias=TRUE)`   = rmse_test_wAIC_debiasTrue,
    `glmnet (debias=FALSE)`     = rmse_test_glmnet_debiasFalse,
    `glmnet (debias=TRUE)`      = rmse_test_glmnet_debiasTrue
  ) %>%
  tidyr::pivot_longer(-Row, names_to="setting", values_to="rmse") %>%
  group_by(Row) %>%
  summarise(
    best_setting = setting[which.min(rmse)],
    best_rmse = min(rmse, na.rm=TRUE),
    second_rmse = sort(rmse, partial=2)[2],
    margin_pct = (second_rmse / best_rmse) - 1,
    .groups="drop"
  )

winner_margins %>%
  summarise(
    median_margin_pct = median(margin_pct, na.rm=TRUE),
    p90_margin_pct    = quantile(margin_pct, 0.9, na.rm=TRUE)
  )


winner_margins <- res %>%
  mutate(Row = dplyr::row_number()) %>%
  transmute(
    Row,
    `SVEM-wAICc (debias=FALSE)` = rmse_test_wAICc_debiasFalse,
    `SVEM-wAICc (debias=TRUE)`  = rmse_test_wAICc_debiasTrue,
    `SVEM-wAIC (debias=FALSE)`  = rmse_test_wAIC_debiasFalse,
    `SVEM-wAIC (debias=TRUE)`   = rmse_test_wAIC_debiasTrue,
    `glmnet (debias=FALSE)`     = rmse_test_glmnet_debiasFalse,
    `glmnet (debias=TRUE)`      = rmse_test_glmnet_debiasTrue
  ) %>%
  tidyr::pivot_longer(-Row, names_to="setting", values_to="rmse") %>%
  group_by(Row) %>%
  summarise(
    best_setting = setting[which.min(rmse)],
    best_rmse = min(rmse, na.rm=TRUE),
    second_rmse = { vals <- rmse[is.finite(rmse)] ; sort(vals, partial=2)[2] },
    margin_pct = (second_rmse / best_rmse) - 1,
    .groups="drop"
  )

# Overall margins (adds “how often is it basically a tie?”)
winner_margins %>%
  summarise(
    median_margin_pct = median(margin_pct, na.rm=TRUE),
    p90_margin_pct    = quantile(margin_pct, 0.9, na.rm=TRUE),
    share_lt_2pct     = mean(margin_pct < 0.02, na.rm=TRUE),
    share_lt_5pct     = mean(margin_pct < 0.05, na.rm=TRUE)
  )

# Margins by winner (who wins “by a lot”?)
winner_margins %>%
  group_by(best_setting) %>%
  summarise(
    n = n(),
    med = median(margin_pct, na.rm=TRUE),
    p90 = quantile(margin_pct, 0.9, na.rm=TRUE),
    share_gt_5pct  = mean(margin_pct > 0.05, na.rm=TRUE),
    share_gt_10pct = mean(margin_pct > 0.10, na.rm=TRUE),
    .groups="drop"
  ) %>%
  arrange(desc(med))

margins_np_r2 <- winner_margins %>%
  left_join(res %>% mutate(
    Row = dplyr::row_number(),
    np_decile = dplyr::ntile(ratio_n_over_p, 10),
    R2_fac = factor(R2_target)
  ), by="Row") %>%
  group_by(R2_fac, np_decile) %>%
  summarise(
    med_margin = median(margin_pct, na.rm=TRUE),
    p90_margin = quantile(margin_pct, 0.9, na.rm=TRUE),
    .groups="drop"
  )
margins_np_r2

tau <- 0.02  # try 0.02 (2%) or 0.05 (5%)

policy_eval <- winner_margins %>%
  mutate(choice = ifelse(best_setting == "SVEM-wAICc (debias=FALSE)" & margin_pct > tau,
                         "SVEM-wAICc (debias=FALSE)",
                         "glmnet (debias=FALSE)")) %>%
  left_join(
    res %>% mutate(Row = dplyr::row_number()) %>%
      transmute(Row,
                `SVEM-wAICc (debias=FALSE)`=rmse_test_wAICc_debiasFalse,
                `SVEM-wAICc (debias=TRUE)` =rmse_test_wAICc_debiasTrue,
                `SVEM-wAIC (debias=FALSE)` =rmse_test_wAIC_debiasFalse,
                `SVEM-wAIC (debias=TRUE)`  =rmse_test_wAIC_debiasTrue,
                `glmnet (debias=FALSE)`    =rmse_test_glmnet_debiasFalse,
                `glmnet (debias=TRUE)`     =rmse_test_glmnet_debiasTrue
      ) %>% tidyr::pivot_longer(-Row, names_to="setting", values_to="rmse"),
    by=c("Row","choice"="setting")
  ) %>%
  rename(rmse_choice = rmse) %>%
  mutate(regret_pct = (rmse_choice / best_rmse) - 1) %>%
  left_join(
    res %>% mutate(Row = dplyr::row_number()) %>%
      select(Row, time_sec_fit_wAICc, time_sec_fit_wAIC, time_sec_fit_glmnet),
    by="Row"
  ) %>%
  mutate(time_choice = dplyr::case_when(
    grepl("^glmnet", choice)      ~ time_sec_fit_glmnet,
    grepl("^SVEM-wAICc", choice)  ~ time_sec_fit_wAICc,
    grepl("^SVEM-wAIC", choice)   ~ time_sec_fit_wAIC
  ))

policy_eval %>%
  summarise(
    median_regret_pct = median(regret_pct, na.rm=TRUE),
    p90_regret_pct    = quantile(regret_pct, 0.9, na.rm=TRUE),
    share_no_regret   = mean(regret_pct <= 0, na.rm=TRUE),
    median_time_sec   = median(time_choice, na.rm=TRUE)
  )

References and Citations

  1. Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L. (2021). Self-Validated Ensemble Models for Elastic Net Regression.
    Chemometrics and Intelligent Laboratory Systems, 219, 104439.
    DOI: 10.1016/j.chemolab.2021.104439

  2. Karl, A. T. (2024). A Randomized Permutation Whole-Model Test for SVEM.
    Chemometrics and Intelligent Laboratory Systems, 249, 105122.
    DOI: 10.1016/j.chemolab.2024.105122

  3. Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent.
    Journal of Statistical Software, 33(1), 1–22.
    DOI: 10.18637/jss.v033.i01

  4. Gotwalt, C., & Ramsey, P. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals.
    JMP Discovery Conference.
    Link

  5. Ramsey, P., Gaudard, M., & Levin, W. (2021). Accelerating Innovation with Space-Filling Mixture Designs, Neural Networks, and SVEM.
    JMP Discovery Conference.
    Link

  6. Ramsey, P., & Gotwalt, C. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals.
    JMP Discovery Summit Europe.
    Link

  7. Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C. (2021). SVEM: A Paradigm Shift in Design and Analysis of Experiments.
    JMP Discovery Summit Europe.
    Link

  8. Ramsey, P., & McNeill, P. (2023). CMC, SVEM, Neural Networks, DOE, and Complexity: It’s All About Prediction.
    JMP Discovery Conference.

  9. Karl, A., Wisnowski, J., & Rushing, H. (2022). JMP Pro 17 Remedies for Practical Struggles with Mixture Experiments.
    JMP Discovery Conference.
    Link

  10. Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap.
    The American Statistician, 74(4), 345–358.
    Link

  11. Karl, A. T. (2024). SVEMnet: Self-Validated Ensemble Models with Elastic Net Regression.
    R package

  12. JMP Help Documentation Overview of Self-Validated Ensemble Models.
    Link