Spatial modeling using the sommer package

Giovanny Covarrubias-Pazaran

2025-04-04

The sommer package was developed to provide R users with a powerful and reliable multivariate mixed model solver for different genetic (in diploid and polyploid organisms) and non-genetic analyses. This package allows the user to estimate variance components in a mixed model with the advantages of specifying the variance-covariance structure of the random effects, specifying heterogeneous variances, and obtaining other parameters such as BLUPs, BLUEs, residuals, fitted values, variances for fixed and random effects, etc. The core algorithms of the package are coded in C++ using the Armadillo library to optimize dense matrix operations common in the derect-inversion algorithms.

This vignette is focused on showing the capabilities of sommer to fit spatial models using the two dimensional splines models.

**SECTION 1: Introduction **

  1. Background in tensor products

SECTION 2: Spatial models

  1. Two dimensional splines (multiple spatial components)
  2. Two dimensional splines (single spatial component)
  3. Spatial models in multiple trials at once

SECTION 1: Introduction

Backgrounds in tensor products

TBD

SECTION 2: Spatial models

1) Two dimensional splines (multiple spatial components)

In this example we show how to obtain the same results than using the SpATS package. This is achieved by using the spl2Db function which is a wrapper of the tpsmmb function.

library(sommer)
data(DT_yatesoats)
DT <- DT_yatesoats
DT$row <- as.numeric(as.character(DT$row))
DT$col <- as.numeric(as.character(DT$col))
DT$R <- as.factor(DT$row)
DT$C <- as.factor(DT$col)

# SPATS MODEL
# m1.SpATS <- SpATS(response = "Y",
#                   spatial = ~ PSANOVA(col, row, nseg = c(14,21), degree = 3, pord = 2),
#                   genotype = "V", fixed = ~ 1,
#                   random = ~ R + C, data = DT,
#                   control = list(tolerance = 1e-04))
# 
# summary(m1.SpATS, which = "variances")
# 
# Spatial analysis of trials with splines 
# 
# Response:                   Y         
# Genotypes (as fixed):       V         
# Spatial:                    ~PSANOVA(col, row, nseg = c(14, 21), degree = 3, pord = 2)
# Fixed:                      ~1        
# Random:                     ~R + C    
# 
# 
# Number of observations:        72
# Number of missing data:        0
# Effective dimension:           17.09
# Deviance:                      483.405
# 
# Variance components:
#                   Variance            SD     log10(lambda)
# R                 1.277e+02     1.130e+01           0.49450
# C                 2.673e-05     5.170e-03           7.17366
# f(col)            4.018e-15     6.339e-08          16.99668
# f(row)            2.291e-10     1.514e-05          12.24059
# f(col):row        1.025e-04     1.012e-02           6.59013
# col:f(row)        8.789e+01     9.375e+00           0.65674
# f(col):f(row)     8.036e-04     2.835e-02           5.69565
# 
# Residual          3.987e+02     1.997e+01 

# SOMMER MODEL
M <- spl2Dmats(x.coord.name = "col", y.coord.name = "row", data=DT, 
               nseg =c(14,21), degree = c(3,3), penaltyord = c(2,2) 
               )
mix <- mmes(Y~V, henderson = TRUE,
            random=~ R + C + vsm(ism(M$fC)) + vsm(ism(M$fR)) + 
              vsm(ism(M$fC.R)) + vsm(ism(M$C.fR)) +
              vsm(ism(M$fC.fR)),
            rcov=~units, verbose=FALSE,
            data=M$data)
summary(mix)$varcomp
##                   VarComp  VarCompSE    Zratio Constraint
## R:mu:mu       106.7372504 68.0312885 1.5689435   Positive
## C:mu:mu       177.6246506 85.9690310 2.0661469   Positive
## M:fC:mu:mu      1.5372882  4.6559585 0.3301765   Positive
## M:fR:mu:mu      0.2247536  0.5629912 0.3992134   Positive
## M:fC.R:mu:mu    0.4967322  1.5044611 0.3301729   Positive
## M:C.fR:mu:mu    0.1566605  0.3953450 0.3962628   Positive
## M:fC.fR:mu:mu   8.0964758  5.2934001 1.5295416   Positive
## units:mu:mu   490.8149644 86.6773655 5.6625506   Positive

2) Two dimensional splines in single field (single spatial component)

To reduce the computational burden of fitting multiple spatial kernels sommer provides a single spatial kernel method through the spl2Da function. This as will be shown, can produce similar results to the more flexible model. Use the one that fits better your needs.

# SOMMER MODEL
mix <- mmes(Y~V,
            random=~ R + C +
              spl2Dc(row,col),
            rcov=~units, verbose=FALSE,
            data=DT)
summary(mix)$varcomp
##                      VarComp VarCompSE    Zratio Constraint
## R:mu:mu             112.0476  84.81928 1.3210157   Positive
## C:mu:mu             157.5950 162.05252 0.9724933   Positive
## row:col:A:all:A:all 406.6870 450.21170 0.9033239   Positive
## units:mu:mu         405.3580 107.65889 3.7652071   Positive

3) Spatial models in multiple trials at once

Sometimes we want to fit heterogeneous variance components when e.g., have multiple trials or different locations. The spatial models can also be fitted that way using the at.var and at.levels arguments. The first argument expects a variable that will define the levels at which the variance components will be fitted. The second argument is a way for the user to specify the levels at which the spatial kernels should be fitted if the user doesn’t want to fit it for all levels (e.g., trials or fields).

DT2 <- rbind(DT,DT)
DT2$Y <- DT2$Y + rnorm(length(DT2$Y))
DT2$trial <- c(rep("A",nrow(DT)),rep("B",nrow(DT)))
head(DT2)
##   row col         Y   N          V  B         MP R C trial
## 1   1   1  90.15611 0.2    Victory B2    Victory 1 1     A
## 2   2   1  60.69612   0    Victory B2    Victory 2 1     A
## 3   3   1 119.85452 0.4 Marvellous B2 Marvellous 3 1     A
## 4   4   1 143.93575 0.6 Marvellous B2 Marvellous 4 1     A
## 5   5   1 148.26361 0.6 GoldenRain B2 GoldenRain 5 1     A
## 6   6   1 107.00112 0.2 GoldenRain B2 GoldenRain 6 1     A
# SOMMER MODEL
mix <- mmes(Y~V,
            random=~ R + C +
              spl2Dc(row,col, at.var = trial),
            rcov=~units, verbose=FALSE,
            data=DT2)
summary(mix)$varcomp
##                            VarComp VarCompSE    Zratio Constraint
## R:mu:mu                   188.0076  82.90759 2.2676763   Positive
## C:mu:mu                   179.6440 158.99523 1.1298702   Positive
## row:col:trial:A:all:A:all 246.4509 304.32889 0.8098176   Positive
## row:col:trial:B:all:B:all 265.0399 309.25820 0.8570181   Positive
## units:mu:mu               344.3400  59.26901 5.8097827   Positive

Literature

Covarrubias-Pazaran G. 2016. Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6):1-15.

Covarrubias-Pazaran G. 2018. Software update: Moving the R package sommer to multivariate mixed models for genome-assisted prediction. doi: https://doi.org/10.1101/354639

Bernardo Rex. 2010. Breeding for quantitative traits in plants. Second edition. Stemma Press. 390 pp.

Gilmour et al. 1995. Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440-1450.

Henderson C.R. 1975. Best Linear Unbiased Estimation and Prediction under a Selection Model. Biometrics vol. 31(2):423-447.

Kang et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709-1723.

Lee, D.-J., Durban, M., and Eilers, P.H.C. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics and Data Analysis, 61, 22 - 37.

Lee et al. 2015. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Cold Spring Harbor. doi: http://dx.doi.org/10.1101/027201.

Maier et al. 2015. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet; 96(2):283-294.

Rodriguez-Alvarez, Maria Xose, et al. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics 23 (2018): 52-71.

Searle. 1993. Applying the EM algorithm to calculating ML and REML estimates of variance components. Paper invited for the 1993 American Statistical Association Meeting, San Francisco.

Yu et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Genetics 38:203-208.

Tunnicliffe W. 1989. On the use of marginal likelihood in time series model estimation. JRSS 51(1):15-27.