It is important to note that nct is based on the
NCT function in the R package
NetworkComparisonTest. Key extensions to the
methodology are nonconvex regularization, the de-sparsified glasso
estimator, additional default tests (e.g., KL-divergence), and
user-defined test-statistics, to name but a few.
Furthermore, nct should be much faster for two primary
reasons:
glassoFast, introduced in (Sustik and Calderhead 2012), is used instead of glasso.
Parallel computing via parallel’s
parLapply, whereas NetworkComparisonTest
uses a serially executed for loop (with cores = 1,
GGMncv uses lapply)
The gains in speed will be most notable in larger networks, as shown in the following comparison.
In the following, time, as a function of network size, is
investigated for several cores compared to
NCT. Note that all settings in GGMncv are
set to be comparable to NCT, for example, using
lasso and the number of tuning parameters (i.e.,
50). One important distinction is that
GGMncv compute 4 default test-statistic instead of 2
(as in NCT).
library(GGMncv)
library(dplyr)
library(ggplot2)
library(NetworkComparisonTest)
p <- seq(10, 25, 5)
sim_fun <- function(x){
main <- gen_net(p = x, edge_prob = 0.1)
y1 <- MASS::mvrnorm(n = 500,
mu = rep(0, x),
Sigma = main$cors)
y2 <- MASS::mvrnorm(n = 500,
mu = rep(0, x),
Sigma = main$cors)
st_1 <- system.time({
fit_1 <- nct(Y_g1 = y1,
Y_g2 = y2,
iter = 1000,
desparsify = FALSE,
penalty = "lasso",
cores = 1,
progress = FALSE)
})
st_2 <- system.time({
fit_1 <- nct(Y_g1 = y1,
Y_g2 = y2,
iter = 1000,
desparsify = FALSE,
penalty = "lasso",
cores = 2,
progress = FALSE)
})
st_4 <- system.time({
fit_1 <- nct(Y_g1 = y1,
Y_g2 = y2,
iter = 1000,
desparsify = FALSE,
penalty = "lasso",
cores = 4,
progress = FALSE)
})
st_8 <- system.time({
fit_1 <- nct(Y_g1 = y1,
Y_g2 = y2,
iter = 1000,
desparsify = FALSE,
penalty = "lasso",
cores = 8,
progress = FALSE)
})
st_NCT <- system.time({
fit_NCT <- NCT(data1 = y1,
data2 = y2,
it = 1000,
progressbar = FALSE)
})
ret <- data.frame(
times = c(st_1[3], st_2[3],
st_4[3], st_8[3], st_NCT[3]),
models = c("one", "two",
"four", "eight", "NCT"),
nodes = x
)
return(ret)
}
sim_res <- list()
reps <- 5
for(i in seq_along(p)){
print(paste("nodes", p[i]))
sim_res[[i]] <- do.call(rbind.data.frame,
replicate(reps, sim_fun(p[i]),
simplify = FALSE))
}
do.call(rbind.data.frame, sim_res) %>%
group_by(models, nodes) %>%
summarise(mu = mean(times),
std = sd(times)) %>%
ggplot(aes(x = nodes, y = mu, group = models)) +
theme_bw() +
geom_ribbon(aes(ymax = mu + std, ymin = mu - std),
alpha = 0.1) +
geom_line(aes(color = models), size = 2) +
ylab("Seconds") +
xlab("Nodes") +
scale_color_discrete(name = "Model",
labels = c("GGMncv(8)",
"GGMncv(4)",
"GGMncv(2)",
"GGMncv(1)",
"NCT"),
breaks = c("eight",
"four",
"two",
"one",
"NCT"))The performance gains are clear, especially when using 8 cores with GGMncv.