The seqwrap function lets the user specify which modeling algorithm to use when modeling data provided as a data frame and metadata describing each sample. Target-specific models are passed to the results together with optional user-specified summary and model-evaluation functions. The function aims not to limit the user to the preferences of specific models. This allows for comparisons between modeling frameworks and flexibility for many types of high-dimensional data. In this vignette, three popular packages are applied to the same underlying data with varying model definitions.
Fitting count data to a negative binomial model using glmmTBM
We will start by simulating data from a simple experiment where count data has been gathered from 25 clusters in two conditions (e.g. treatment and control).
library(seqwrap)
library(glmmTMB)
Warning: package 'glmmTMB' was built under R version 4.5.3
# Simulate data
dat <- seqwrap::simcounts(n_genes = 1000,
n_samples = 50,
beta_0 = 2,
sigma_0 = 0.5,
beta_1 = 0.5,,
sigma_1 = 2,
b_0 = 1,
clusters = 25,
overdispersion_min = c(1, 10))
# Calculate library sizes
dat$metadata$libsize <- log(colSums(dat$data[, -1]) )
Using seqwrap_compose we can populate a swcontainer object with all needed information to fit models. The swcontainer objects is basically a list with data, meta data and arguments that gets passed the model algorithm. The swcontainer object also contain reference to the desired model function.
swobject <- seqwrap_compose(
data = dat$data, # A data frame with the first column as the target identifier
metadata = dat$metadata, # Meta data bout samples, containing predictors
samplename = "sample", # The name sample id column
modelfun = glmmTMB::glmmTMB, # The model fitting function
# A list of argumnets supplied to the modelfunction
arguments = list(
formula = y ~ x + (1 | cluster),
family = glmmTMB::nbinom1
),
summary_fun = NULL,
eval_fun = NULL,
additional_vars = NULL
)
The seqwrap function will iterate over all targets in the data and fit the model and apply the summary and evaluation function to each model. Notice that we can use return_models = TRUE to save model in the output. Below we also use subset = 1:10 to only do the first 10 targets. This is useful when we want to check if teh summary and evaluation functions work as intended. The swobject created above is updated when we supply ne w arguments to seqwrap.
models <- seqwrap(swobject,
return_models = TRUE,
subset = 1:10)
── seqwrap ─────────────────────────────────────────────────────────────────────
Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:17: Completed model fitting and evaluation. Elapsed time was
0.07 minutes
The resulting object can be summarized using seqwrap_summarise
modelsum <- seqwrap_summarise(models)
── seqwrap summarise ───────────────────────────────────────────────────────────
• A total of 50 samples and 10 targets where used in `glmmTMB::glmmTMB` with
arguments `list(formula = y ~ x + (1 | cluster), family = glmmTMB::nbinom1)`
• Attempting to combine results from the provided summary function.
• Attempting to combine results from the provided evaluations function.
• Attempting to summarise errors and warnings from the fitting process.
── Model summaries ─────────────────────────────────────────────────────────────
• 10 targets have associated summaries
── Model evaluations ───────────────────────────────────────────────────────────
• 10 targets have associated evaluation results
── Combined results ────────────────────────────────────────────────────────────
ℹ Combined results have been generated and were
silently returned.
We have left the arguments summary_fun and eval_fun as NULL. This means that seqwrap will try to combine model parameter estimates and performance using broom.mixed::/broom::tidy and DHARMa. These evaluations are thought of as minimal summaries of the model and might not completly satisfy the needs of the user. From our example above, the fitted parameters from the models are returned in modelsum$summaries and model evaluations are returned in modelsum$evaluations.
The functions are exported from the seqwrap package and we can inspect them:
function(x) {
out <- broom.mixed::tidy(x)
return(out)
}
<bytecode: 0x0000026cba89f1c0>
<environment: namespace:seqwrap>
seqwrap::generic_evaluation
function(x) {
# Simulate residuals using DHARMa
sim_resid <- DHARMa::simulateResiduals(x, n = 250, plot = FALSE)
# Combine potential metrics
out <- tibble::tibble(
uniformity = DHARMa::testUniformity(sim_resid, plot = FALSE)$p.value,
dispersion = DHARMa::testDispersion(sim_resid, plot = FALSE)$p.value,
outliers = DHARMa::testOutliers(sim_resid, plot = FALSE)$p.value
)
return(out)
}
<bytecode: 0x0000026cba913ee0>
<environment: namespace:seqwrap>
We encourage users to define their own custom functions.
Instead of counts, let’s say we are interested in analyzing counts per million, expressed as $\text{CPM} = \frac{\text{Count}}{\text{Library size}} \times 10^6$. We will convert our simulated count data to CPM.
lib_sizes <- colSums(dat$data[,-1]) / 1e6
cpm <- sweep(dat$data[,-1], MARGIN = 2,
STATS = lib_sizes,
FUN = "/")
dat$data <- data.frame(cbind(dat$data[,1], cpm))
We will create a new swcontainer using lme4::lmer as our fitting algorithm.
swobject <- seqwrap_compose(
data = dat$data, # A data frame with the first column as the target identifier
metadata = dat$metadata, # Meta data bout samples, containing predictors
samplename = "sample", # The name sample id column
modelfun = lme4::lmer, # The model fitting function
# A list of argumnets supplied to the modelfunction
arguments = list(
formula = y ~ x + (1 | cluster)
),
summary_fun = NULL,
eval_fun = NULL,
additional_vars = NULL
)
Next we can run seqwrap to get estimates for our genes, to save some time we will start by using a subset of the simulated genes.
models <- seqwrap(swobject,
return_models = TRUE,
subset = 1:10)
── seqwrap ─────────────────────────────────────────────────────────────────────
Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:19: Completed model fitting and evaluation. Elapsed time was
0.03 minutes
The generic summary function will provide fitted parameter statistics.
modelsum <- seqwrap_summarise(models)
── seqwrap summarise ───────────────────────────────────────────────────────────
• A total of 50 samples and 10 targets where used in `lme4::lmer` with
arguments `list(formula = y ~ x + (1 | cluster))`
• Attempting to combine results from the provided summary function.
• Attempting to combine results from the provided evaluations function.
• Attempting to summarise errors and warnings from the fitting process.
── Model summaries ─────────────────────────────────────────────────────────────
• 10 targets have associated summaries
── Model evaluations ───────────────────────────────────────────────────────────
• 10 targets have associated evaluation results
── Combined results ────────────────────────────────────────────────────────────
ℹ Combined results have been generated and were
silently returned.
# A tibble: 40 × 7
target effect group term estimate std.error statistic
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 gene1 fixed <NA> (Intercept) 540. 62.7 8.61
2 gene1 fixed <NA> x -534. 88.7 -6.03
3 gene1 ran_pars cluster sd__(Intercept) 0.0000221 NA NA
4 gene1 ran_pars Residual sd__Observation 313. NA NA
5 gene10 fixed <NA> (Intercept) 570. 50.0 11.4
6 gene10 fixed <NA> x -536. 70.7 -7.58
7 gene10 ran_pars cluster sd__(Intercept) 0 NA NA
8 gene10 ran_pars Residual sd__Observation 250. NA NA
9 gene2 fixed <NA> (Intercept) 846. 120. 7.05
10 gene2 fixed <NA> x -832. 168. -4.94
# ℹ 30 more rows
Using nlme::lme and nlme::gls, need for additional_vars
We will now try to fit the same model as previously fitted in lme4::lmer using nlme::lme. We can use the same swcontainer object and update the relevant lines. Notice that nlme::lme requires a different syntax for defining the model. Unsurprisingly, the results are the same Figure 1.
models_lme <- seqwrap(swobject,
modelfun = nlme::lme,
arguments = alist(fixed = y ~ x,
random = ~ 1|cluster),
return_models = TRUE,
eval_fun = NULL,
subset = 1:10)
── seqwrap ─────────────────────────────────────────────────────────────────────
Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:21: Completed model fitting and evaluation. Elapsed time was
0.02 minutes
ℹ Some targets had associated errors or warnings
• Modeling algorithm (errors): n = 0 (0%)
• Modeling algorithm (warnings): n = 0 (0%)
• Summary function (errors): n = 0 (0%)
• Summary function (warnings): n = 0 (0%)
• Evaluation function (errors): n = 0 (0%)
• Evaluation function (warnings): n = 10 (100%)
modelsum_lme <- seqwrap_summarise(models_lme)
── seqwrap summarise ───────────────────────────────────────────────────────────
• A total of 50 samples and 10 targets where used in `The model function has
been updated` with arguments `list(formula = y ~ x + (1 | cluster))`
• Attempting to combine results from the provided summary function.
• Attempting to combine results from the provided evaluations function.
• Attempting to summarise errors and warnings from the fitting process.
── Model summaries ─────────────────────────────────────────────────────────────
• 10 targets have associated summaries
── Model evaluations ───────────────────────────────────────────────────────────
ℹ No evaluations available
── Combined results ────────────────────────────────────────────────────────────
ℹ Combined results have been generated and were
silently returned.
lme_est <- data.frame(modelsum_lme$summaries[, c(1,4,5, 6,9)])
lmer_est <- data.frame(modelsum$summaries[, c(1,4,5, 6)])
seqwrap will attempt to extract relevant metadata for use in modeling. This is done by extracting variable names from model formulations. In some cases, this can be difficult because we use additional functionality in some packages. Models constructed in nlme::gls are an example.
We can construct a similar model to nlme::lme using nlme::CorCompSymm, resulting in a model where the correlations between observations within clusters are uniform (see ?nlme::CorCompSymm). To help seqwrap understand that we want to include cluster from the metadata in all fitting objects, we need to include it in additional_vars.
models_gls <- seqwrap(swobject,
modelfun = nlme::gls,
arguments = alist(
model = y ~ x,
correlation = nlme::corCompSymm(form = ~ 1|cluster) ),
additional_vars = "cluster",
return_models = TRUE,
eval_fun = NULL,
subset = 1:10)
── seqwrap ─────────────────────────────────────────────────────────────────────
Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:23: Completed model fitting and evaluation. Elapsed time was
0.02 minutes
ℹ Some targets had associated errors or warnings
• Modeling algorithm (errors): n = 0 (0%)
• Modeling algorithm (warnings): n = 0 (0%)
• Summary function (errors): n = 0 (0%)
• Summary function (warnings): n = 0 (0%)
• Evaluation function (errors): n = 0 (0%)
• Evaluation function (warnings): n = 10 (100%)
modelsum_gls <- seqwrap_summarise(models_gls)
── seqwrap summarise ───────────────────────────────────────────────────────────
• A total of 50 samples and 10 targets where used in `The model function has
been updated` with arguments `list(formula = y ~ x + (1 | cluster))`
• Attempting to combine results from the provided summary function.
• Attempting to combine results from the provided evaluations function.
• Attempting to summarise errors and warnings from the fitting process.
── Model summaries ─────────────────────────────────────────────────────────────
• 10 targets have associated summaries
── Model evaluations ───────────────────────────────────────────────────────────
ℹ No evaluations available
── Combined results ────────────────────────────────────────────────────────────
ℹ Combined results have been generated and were
silently returned.
As evident from Figure 2 we can see that the two models will produce slightly different p-values.
gls_est <- data.frame(modelsum_gls$summaries[, c(1,2,3,4,6)])