Working with lme4 and nlme in seqwrap

The seqwrap function lets the user specify which modeling algorithm to use when modeling data provided as a data frame and metadata describing each sample. Target-specific models are passed to the results together with optional user-specified summary and model-evaluation functions. The function aims not to limit the user to the preferences of specific models. This allows for comparisons between modeling frameworks and flexibility for many types of high-dimensional data. In this vignette, three popular packages are applied to the same underlying data with varying model definitions.

Fitting count data to a negative binomial model using glmmTBM

We will start by simulating data from a simple experiment where count data has been gathered from 25 clusters in two conditions (e.g. treatment and control).

library(seqwrap)
library(glmmTMB)

Warning: package 'glmmTMB' was built under R version 4.5.3

# Simulate data
dat <- seqwrap::simcounts(n_genes = 1000, 
                    n_samples = 50, 
                    beta_0 = 2, 
                    sigma_0 = 0.5, 
                    beta_1 = 0.5,, 
                    sigma_1 = 2, 
                    b_0 = 1,
                    clusters = 25,
                    overdispersion_min = c(1, 10))

# Calculate library sizes
dat$metadata$libsize <- log(colSums(dat$data[, -1]) )

Using seqwrap_compose we can populate a swcontainer object with all needed information to fit models. The swcontainer objects is basically a list with data, meta data and arguments that gets passed the model algorithm. The swcontainer object also contain reference to the desired model function.

swobject <- seqwrap_compose(
  data = dat$data, # A data frame with the first column as the target identifier
  metadata = dat$metadata, # Meta data bout samples, containing predictors
  samplename = "sample", # The name sample id column 
  modelfun = glmmTMB::glmmTMB, # The model fitting function
  # A list of argumnets supplied to the modelfunction
  arguments = list(
    formula = y ~ x + (1 | cluster),
    family = glmmTMB::nbinom1
  ),
  summary_fun = NULL,
  eval_fun = NULL,
  additional_vars = NULL
)

The seqwrap function will iterate over all targets in the data and fit the model and apply the summary and evaluation function to each model. Notice that we can use return_models = TRUE to save model in the output. Below we also use subset = 1:10 to only do the first 10 targets. This is useful when we want to check if teh summary and evaluation functions work as intended. The swobject created above is updated when we supply ne w arguments to seqwrap.

models <- seqwrap(swobject, 
                  return_models = TRUE,
                  subset = 1:10)

── seqwrap ─────────────────────────────────────────────────────────────────────

Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:17: Completed model fitting and evaluation. Elapsed time was
0.07 minutes

modelsum <- seqwrap_summarise(models)

── seqwrap summarise ───────────────────────────────────────────────────────────

• A total of 50 samples and 10 targets where used in `glmmTMB::glmmTMB` with
arguments `list(formula = y ~ x + (1 | cluster), family = glmmTMB::nbinom1)`

• Attempting to combine results from the provided summary function.

• Attempting to combine results from the provided evaluations function.

• Attempting to summarise errors and warnings from the fitting process.

── Model summaries ─────────────────────────────────────────────────────────────

• 10 targets have associated summaries

── Model evaluations ───────────────────────────────────────────────────────────

• 10 targets have associated evaluation results

── Combined results ────────────────────────────────────────────────────────────

ℹ Combined results have been generated and were
silently returned.

We have left the arguments summary_fun and eval_fun as NULL. This means that seqwrap will try to combine model parameter estimates and performance using broom.mixed::/broom::tidy and DHARMa. These evaluations are thought of as minimal summaries of the model and might not completly satisfy the needs of the user. From our example above, the fitted parameters from the models are returned in modelsum$summaries and model evaluations are returned in modelsum$evaluations.

seqwrap::generic_summary

function(x) {
  out <- broom.mixed::tidy(x)
  return(out)
}
<bytecode: 0x0000026cba89f1c0>
<environment: namespace:seqwrap>

seqwrap::generic_evaluation

function(x) {
  # Simulate residuals using DHARMa
  sim_resid <- DHARMa::simulateResiduals(x, n = 250, plot = FALSE)
  # Combine potential metrics
  out <- tibble::tibble(
    uniformity = DHARMa::testUniformity(sim_resid, plot = FALSE)$p.value,
    dispersion = DHARMa::testDispersion(sim_resid, plot = FALSE)$p.value,
    outliers = DHARMa::testOutliers(sim_resid, plot = FALSE)$p.value
  )
  return(out)
}
<bytecode: 0x0000026cba913ee0>
<environment: namespace:seqwrap>

Fitting transformed count data to a Gaussian model using lme4

Instead of counts, let’s say we are interested in analyzing counts per million, expressed as $\text{CPM} = \frac{\text{Count}}{\text{Library size}} \times 10^6$. We will convert our simulated count data to CPM.

lib_sizes <- colSums(dat$data[,-1]) / 1e6

cpm <- sweep(dat$data[,-1], MARGIN = 2, 
             STATS = lib_sizes, 
             FUN = "/")

dat$data <- data.frame(cbind(dat$data[,1], cpm))

swobject <- seqwrap_compose(
  data = dat$data, # A data frame with the first column as the target identifier
  metadata = dat$metadata, # Meta data bout samples, containing predictors
  samplename = "sample", # The name sample id column 
  modelfun = lme4::lmer, # The model fitting function
  # A list of argumnets supplied to the modelfunction
  arguments = list(
    formula = y ~ x + (1 | cluster)
  ),
  summary_fun = NULL,
  eval_fun = NULL,
  additional_vars = NULL
)

Next we can run seqwrap to get estimates for our genes, to save some time we will start by using a subset of the simulated genes.

models <- seqwrap(swobject, 
                  return_models = TRUE,
                  subset = 1:10)

── seqwrap ─────────────────────────────────────────────────────────────────────

Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:19: Completed model fitting and evaluation. Elapsed time was
0.03 minutes

modelsum <- seqwrap_summarise(models)

── seqwrap summarise ───────────────────────────────────────────────────────────

• A total of 50 samples and 10 targets where used in `lme4::lmer` with
arguments `list(formula = y ~ x + (1 | cluster))`

• Attempting to combine results from the provided summary function.

• Attempting to combine results from the provided evaluations function.

• Attempting to summarise errors and warnings from the fitting process.

── Model summaries ─────────────────────────────────────────────────────────────

• 10 targets have associated summaries

── Model evaluations ───────────────────────────────────────────────────────────

• 10 targets have associated evaluation results

── Combined results ────────────────────────────────────────────────────────────

ℹ Combined results have been generated and were
silently returned.

modelsum$summaries

# A tibble: 40 × 7
   target effect   group    term                estimate std.error statistic
   <chr>  <chr>    <chr>    <chr>                  <dbl>     <dbl>     <dbl>
 1 gene1  fixed    <NA>     (Intercept)      540.             62.7      8.61
 2 gene1  fixed    <NA>     x               -534.             88.7     -6.03
 3 gene1  ran_pars cluster  sd__(Intercept)    0.0000221      NA       NA   
 4 gene1  ran_pars Residual sd__Observation  313.             NA       NA   
 5 gene10 fixed    <NA>     (Intercept)      570.             50.0     11.4 
 6 gene10 fixed    <NA>     x               -536.             70.7     -7.58
 7 gene10 ran_pars cluster  sd__(Intercept)    0              NA       NA   
 8 gene10 ran_pars Residual sd__Observation  250.             NA       NA   
 9 gene2  fixed    <NA>     (Intercept)      846.            120.       7.05
10 gene2  fixed    <NA>     x               -832.            168.      -4.94
# ℹ 30 more rows

Using nlme::lme and nlme::gls, need for additional_vars

We will now try to fit the same model as previously fitted in lme4::lmer using nlme::lme. We can use the same swcontainer object and update the relevant lines. Notice that nlme::lme requires a different syntax for defining the model. Unsurprisingly, the results are the same Figure 1.

models_lme <- seqwrap(swobject, 
                  modelfun = nlme::lme,
                  
                  arguments = alist(fixed = y ~ x, 
                                    random = ~ 1|cluster), 
                  return_models = TRUE,
                  eval_fun = NULL,
                  subset = 1:10)

── seqwrap ─────────────────────────────────────────────────────────────────────

Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:21: Completed model fitting and evaluation. Elapsed time was
0.02 minutes
ℹ Some targets had associated errors or warnings

• Modeling algorithm (errors): n = 0 (0%)
• Modeling algorithm (warnings): n = 0 (0%)
• Summary function (errors): n = 0 (0%)
• Summary function (warnings): n = 0 (0%)
• Evaluation function (errors): n = 0 (0%)
• Evaluation function (warnings): n = 10 (100%)

modelsum_lme <- seqwrap_summarise(models_lme)


── seqwrap summarise ───────────────────────────────────────────────────────────
• A total of 50 samples and 10 targets where used in `The model function has
been updated` with arguments `list(formula = y ~ x + (1 | cluster))`
• Attempting to combine results from the provided summary function.
• Attempting to combine results from the provided evaluations function.
• Attempting to summarise errors and warnings from the fitting process.

── Model summaries ─────────────────────────────────────────────────────────────
• 10 targets have associated summaries

── Model evaluations ───────────────────────────────────────────────────────────
ℹ No evaluations available

── Combined results ────────────────────────────────────────────────────────────
ℹ Combined results have been generated and were
silently returned.

lme_est <- data.frame(modelsum_lme$summaries[, c(1,4,5, 6,9)])
lmer_est <- data.frame(modelsum$summaries[, c(1,4,5, 6)])

Figure 1: Similar estimates between `lme4::lmer` and `nlme::lme` models

seqwrap will attempt to extract relevant metadata for use in modeling. This is done by extracting variable names from model formulations. In some cases, this can be difficult because we use additional functionality in some packages. Models constructed in nlme::gls are an example.

We can construct a similar model to nlme::lme using nlme::CorCompSymm, resulting in a model where the correlations between observations within clusters are uniform (see ?nlme::CorCompSymm). To help seqwrap understand that we want to include cluster from the metadata in all fitting objects, we need to include it in additional_vars.

models_gls <- seqwrap(swobject, 
                  modelfun = nlme::gls,
                  arguments = alist(
                  model = y ~ x,
                  correlation = nlme::corCompSymm(form = ~ 1|cluster) ),
                  additional_vars = "cluster",
                  return_models = TRUE,
                  eval_fun = NULL,
                  subset = 1:10)

── seqwrap ─────────────────────────────────────────────────────────────────────

Initiating clusters for parallel processing with 1 core
Merging and modelling data
mai 15 2026 16:17:23: Completed model fitting and evaluation. Elapsed time was
0.02 minutes
ℹ Some targets had associated errors or warnings

• Modeling algorithm (errors): n = 0 (0%)
• Modeling algorithm (warnings): n = 0 (0%)
• Summary function (errors): n = 0 (0%)
• Summary function (warnings): n = 0 (0%)
• Evaluation function (errors): n = 0 (0%)
• Evaluation function (warnings): n = 10 (100%)

modelsum_gls <- seqwrap_summarise(models_gls)


── seqwrap summarise ───────────────────────────────────────────────────────────
• A total of 50 samples and 10 targets where used in `The model function has
been updated` with arguments `list(formula = y ~ x + (1 | cluster))`
• Attempting to combine results from the provided summary function.
• Attempting to combine results from the provided evaluations function.
• Attempting to summarise errors and warnings from the fitting process.

── Model summaries ─────────────────────────────────────────────────────────────
• 10 targets have associated summaries

── Model evaluations ───────────────────────────────────────────────────────────
ℹ No evaluations available

── Combined results ────────────────────────────────────────────────────────────
ℹ Combined results have been generated and were
silently returned.

As evident from Figure 2 we can see that the two models will produce slightly different p-values.

Working with lme4 and nlme in seqwrap

Fitting count data to a negative binomial model using `glmmTBM`

Fitting transformed count data to a Gaussian model using `lme4`

Using `nlme::lme` and `nlme::gls`, need for `additional_vars`

Fitting count data to a negative binomial model using glmmTBM

Fitting transformed count data to a Gaussian model using lme4

Using nlme::lme and nlme::gls, need for additional_vars

Fitting count data to a negative binomial model using `glmmTBM`

Fitting transformed count data to a Gaussian model using `lme4`

Using `nlme::lme` and `nlme::gls`, need for `additional_vars`