The sommer package was developed to provide R users with a powerful and reliable multivariate mixed model solver for different genetic (in diploid and polyploid organisms) and non-genetic analyses. This package allows the user to estimate variance components in a mixed model with the advantages of specifying the variance-covariance structure of the random effects, specifying heterogeneous variances, and obtaining other parameters such as BLUPs, BLUEs, residuals, fitted values, variances for fixed and random effects, etc. The core algorithms of the package are coded in C++ using the Armadillo library to optimize dense matrix operations common in the derect-inversion algorithms.
This vignette is focused on showing the capabilities of sommer to fit spatial models using the two dimensional splines models.
**SECTION 1: Introduction **
SECTION 2: Spatial models
TBD
In this example we show how to obtain the same results than using the SpATS package. This is achieved by using the spl2Db
function which is a wrapper of the tpsmmb
function.
library(sommer)
data(DT_yatesoats)
DT <- DT_yatesoats
DT$row <- as.numeric(as.character(DT$row))
DT$col <- as.numeric(as.character(DT$col))
DT$R <- as.factor(DT$row)
DT$C <- as.factor(DT$col)
# SPATS MODEL
# m1.SpATS <- SpATS(response = "Y",
# spatial = ~ PSANOVA(col, row, nseg = c(14,21), degree = 3, pord = 2),
# genotype = "V", fixed = ~ 1,
# random = ~ R + C, data = DT,
# control = list(tolerance = 1e-04))
#
# summary(m1.SpATS, which = "variances")
#
# Spatial analysis of trials with splines
#
# Response: Y
# Genotypes (as fixed): V
# Spatial: ~PSANOVA(col, row, nseg = c(14, 21), degree = 3, pord = 2)
# Fixed: ~1
# Random: ~R + C
#
#
# Number of observations: 72
# Number of missing data: 0
# Effective dimension: 17.09
# Deviance: 483.405
#
# Variance components:
# Variance SD log10(lambda)
# R 1.277e+02 1.130e+01 0.49450
# C 2.673e-05 5.170e-03 7.17366
# f(col) 4.018e-15 6.339e-08 16.99668
# f(row) 2.291e-10 1.514e-05 12.24059
# f(col):row 1.025e-04 1.012e-02 6.59013
# col:f(row) 8.789e+01 9.375e+00 0.65674
# f(col):f(row) 8.036e-04 2.835e-02 5.69565
#
# Residual 3.987e+02 1.997e+01
# SOMMER MODEL
M <- spl2Dmats(x.coord.name = "col", y.coord.name = "row", data=DT,
nseg =c(14,21), degree = c(3,3), penaltyord = c(2,2)
)
mix <- mmes(Y~V, henderson = TRUE,
random=~ R + C + vsm(ism(M$fC)) + vsm(ism(M$fR)) +
vsm(ism(M$fC.R)) + vsm(ism(M$C.fR)) +
vsm(ism(M$fC.fR)),
rcov=~units, verbose=FALSE,
data=M$data)
summary(mix)$varcomp
## VarComp VarCompSE Zratio Constraint
## R:mu:mu 106.7372504 68.0312885 1.5689435 Positive
## C:mu:mu 177.6246506 85.9690310 2.0661469 Positive
## M:fC:mu:mu 1.5372882 4.6559585 0.3301765 Positive
## M:fR:mu:mu 0.2247536 0.5629912 0.3992134 Positive
## M:fC.R:mu:mu 0.4967322 1.5044611 0.3301729 Positive
## M:C.fR:mu:mu 0.1566605 0.3953450 0.3962628 Positive
## M:fC.fR:mu:mu 8.0964758 5.2934001 1.5295416 Positive
## units:mu:mu 490.8149644 86.6773655 5.6625506 Positive
To reduce the computational burden of fitting multiple spatial kernels sommer
provides a single spatial kernel method through the spl2Da
function. This as will be shown, can produce similar results to the more flexible model. Use the one that fits better your needs.
# SOMMER MODEL
mix <- mmes(Y~V,
random=~ R + C +
spl2Dc(row,col),
rcov=~units, verbose=FALSE,
data=DT)
summary(mix)$varcomp
## VarComp VarCompSE Zratio Constraint
## R:mu:mu 112.0476 84.81928 1.3210157 Positive
## C:mu:mu 157.5950 162.05252 0.9724933 Positive
## row:col:A:all:A:all 406.6870 450.21170 0.9033239 Positive
## units:mu:mu 405.3580 107.65889 3.7652071 Positive
Sometimes we want to fit heterogeneous variance components when e.g., have multiple trials or different locations. The spatial models can also be fitted that way using the at.var
and at.levels
arguments. The first argument expects a variable that will define the levels at which the variance components will be fitted. The second argument is a way for the user to specify the levels at which the spatial kernels should be fitted if the user doesn’t want to fit it for all levels (e.g., trials or fields).
DT2 <- rbind(DT,DT)
DT2$Y <- DT2$Y + rnorm(length(DT2$Y))
DT2$trial <- c(rep("A",nrow(DT)),rep("B",nrow(DT)))
head(DT2)
## row col Y N V B MP R C trial
## 1 1 1 90.15611 0.2 Victory B2 Victory 1 1 A
## 2 2 1 60.69612 0 Victory B2 Victory 2 1 A
## 3 3 1 119.85452 0.4 Marvellous B2 Marvellous 3 1 A
## 4 4 1 143.93575 0.6 Marvellous B2 Marvellous 4 1 A
## 5 5 1 148.26361 0.6 GoldenRain B2 GoldenRain 5 1 A
## 6 6 1 107.00112 0.2 GoldenRain B2 GoldenRain 6 1 A
# SOMMER MODEL
mix <- mmes(Y~V,
random=~ R + C +
spl2Dc(row,col, at.var = trial),
rcov=~units, verbose=FALSE,
data=DT2)
summary(mix)$varcomp
## VarComp VarCompSE Zratio Constraint
## R:mu:mu 188.0076 82.90759 2.2676763 Positive
## C:mu:mu 179.6440 158.99523 1.1298702 Positive
## row:col:trial:A:all:A:all 246.4509 304.32889 0.8098176 Positive
## row:col:trial:B:all:B:all 265.0399 309.25820 0.8570181 Positive
## units:mu:mu 344.3400 59.26901 5.8097827 Positive
Covarrubias-Pazaran G. 2016. Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6):1-15.
Covarrubias-Pazaran G. 2018. Software update: Moving the R package sommer to multivariate mixed models for genome-assisted prediction. doi: https://doi.org/10.1101/354639
Bernardo Rex. 2010. Breeding for quantitative traits in plants. Second edition. Stemma Press. 390 pp.
Gilmour et al. 1995. Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440-1450.
Henderson C.R. 1975. Best Linear Unbiased Estimation and Prediction under a Selection Model. Biometrics vol. 31(2):423-447.
Kang et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709-1723.
Lee, D.-J., Durban, M., and Eilers, P.H.C. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics and Data Analysis, 61, 22 - 37.
Lee et al. 2015. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Cold Spring Harbor. doi: http://dx.doi.org/10.1101/027201.
Maier et al. 2015. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet; 96(2):283-294.
Rodriguez-Alvarez, Maria Xose, et al. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics 23 (2018): 52-71.
Searle. 1993. Applying the EM algorithm to calculating ML and REML estimates of variance components. Paper invited for the 1993 American Statistical Association Meeting, San Francisco.
Yu et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Genetics 38:203-208.
Tunnicliffe W. 1989. On the use of marginal likelihood in time series model estimation. JRSS 51(1):15-27.