In crop and animal breeding programs, selection is rarely performed in a single stage. Breeders usually evaluate multiple traits across different stages of testing, successively discarding inferior genotypes and advancing superior ones. This multistage approach requires selection indices to account for the changes in variances and covariances induced by selection at prior stages.
Chapter 9 of the selection.index package focuses on the
mathematical formulation and practical application of Multistage Linear
Selection Indices. We introduce indices that properly adjust for prior
selection using the Cochran (1951) and Cunningham (1975) method,
calculating corrected covariance matrices to appropriately predict
subsequent selection responses. This vignette will demonstrate six
multistage indices utilizing phenotypic and genomic estimated breeding
values (GEBVs).
We will use the synthetic maize phenotypic and genotypic datasets
(maize_pheno and maize_geno) to illustrate
these complex functions. Let us first prepare the covariance
matrices.
For our examples, we will evaluate 3 quantitative traits from the dataset: Yield, PlantHeight, and DaysToMaturity. We assume that Stage 1 selection evaluates the first two traits (Yield, PlantHeight), and Stage 2 evaluates all 3 traits (adding DaysToMaturity).
library(selection.index)
# Estimate phenotypic and genotypic covariance matrices for the 3 traits
# The traits are Yield, PlantHeight, DaysToMaturity
traits <- c("Yield", "PlantHeight", "DaysToMaturity")
pmat <- phen_varcov(maize_pheno[, traits], maize_pheno$Environment, maize_pheno$Genotype)
gmat <- gen_varcov(maize_pheno[, traits], maize_pheno$Environment, maize_pheno$Genotype)
# Matrix limits for Stage 1 (Traits 1 to 2)
P1 <- pmat[1:2, 1:2]
G1 <- gmat[1:2, 1:2]
# Complete Matrices for Stage 2
P <- pmat
C <- gmat
# Economic weights for the 3 traits
weights <- c(10, -5, -2)The Multistage Linear Phenotypic Selection Index accounts for changes in phenotypic (\(\mathbf{P}\)) and genotypic (\(\mathbf{C}\)) covariance matrices due to previous selection cycles.
At Stage 1, the index coefficients are computed just as in the standard Smith-Hazel index: \[ \mathbf{b}_1 = \mathbf{P}_1^{-1}\mathbf{G}_1\mathbf{w}_1 \] where \(\mathbf{w}_1\) contains the economic weights for Stage 1 traits.
At Stage 2, the coefficients for the entire set of traits are: \[ \mathbf{b}_2 = \mathbf{P}^{-1}\mathbf{C}\mathbf{w} \]
However, due to selection at stage 1, the covariances \(\mathbf{P}\) and \(\mathbf{C}\) are adjusted for stage 2 evaluation: \[ \mathbf{P}^* = \mathbf{P} - u \frac{Cov(\mathbf{y},\mathbf{x}_1)\mathbf{b}_1\mathbf{b}_1'Cov(\mathbf{x}_1,\mathbf{y})}{\mathbf{b}_1'\mathbf{P}_1\mathbf{b}_1} \] \[ \mathbf{C}^* = \mathbf{C} - u \frac{\mathbf{G}_1'\mathbf{b}_1\mathbf{b}_1'\mathbf{G}_1}{\mathbf{b}_1'\mathbf{P}_1\mathbf{b}_1} \] where \(u = k_1(k_1 - \tau)\) calculates the effect of selection based on the standardized truncation point \(\tau\) and selection intensity \(k_1\).
The function mlpsi simultaneously performs adjustments
and metric estimations for both stages.
# We apply a selection proportion of 10% (0.10) per stage.
mlpsi_res <- mlpsi(
P1 = P1, P = P, G1 = G1, C = C,
wmat = weights,
selection_proportion = 0.1
)
#> Warning in .index_correlation(b1, b2, P1, P, stage1_indices): Invalid variance
#> for correlation calculation.
#> Warning in .adjust_phenotypic_matrix(P, P1, b1, k1, tau, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
#> Warning in .adjust_genotypic_matrix(C, G1, b1, k1, tau, P1, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
# Stage 1 metrics
mlpsi_res$summary_stage1
#> Stage Trait b E
#> 1 1 Yield -527740915826 NA
#> 2 1 PlantHeight 4066491692882 NA
# Stage 2 metrics
mlpsi_res$summary_stage2
#> Stage Trait b E
#> 1 2 Yield -1.167617e+12 857097221
#> 2 2 PlantHeight 1.254688e+13 22420339
#> 3 2 DaysToMaturity 7.244343e+13 12421006The MRLPSI method applies when the breeder aims to maintain one or
more quantitative traits without change over the multistage evaluation
(e.g., maintaining constant PlantHeight while optimizing
other variables).
The restricted coefficient vectors for Stage 1 and Stage 2 are defined as: \[ \mathbf{b}_{R_1} = \mathbf{K}_1 \mathbf{b}_1 \] \[ \mathbf{b}_{R_2} = \mathbf{K}_2 \mathbf{b}_2 \]
Where \(\mathbf{K}_1 = \mathbf{I}_1 - \mathbf{Q}_1\) and \(\mathbf{K}_2 = \mathbf{I}_2 - \mathbf{Q}_2\) are matrices that impose zero genetic gain vectors, derived from the constraint matrices \(\mathbf{C}_1\) and \(\mathbf{C}_2\).
# We constrain PlantHeight (Trait 2) at Stage 1
C1 <- matrix(0, nrow = 2, ncol = 1)
C1[2, 1] <- 1
# We constrain PlantHeight (Trait 2) at Stage 2
C2 <- matrix(0, nrow = 3, ncol = 1)
C2[2, 1] <- 1
mrlpsi_res <- mrlpsi(
P1 = P1, P = P, G1 = G1, C = C,
wmat = weights,
C1 = C1, C2 = C2,
selection_proportion = 0.1
)
#> Warning in .index_correlation(b_R1, b_R2, P1, P, stage1_indices): Invalid
#> variance for correlation calculation.
#> Warning in .adjust_phenotypic_matrix(P, P1, b_R1, k1, tau, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
#> Warning in .adjust_genotypic_matrix(C, G1, b_R1, k1, tau, P1, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
# Observe that Expected Gain (E) for PlantHeight is approximately 0
mrlpsi_res$summary_stage1
#> Stage Trait b_R E
#> 1 1 Yield -529172810900 NA
#> 2 1 PlantHeight 4077525117142 NAUnlike MRLPSI which imposes zero genetic gain bounds, MPPG-LPSI forces proportional changes mapped by the \(\mathbf{d}_1\) and \(\mathbf{d}_2\) restricted-difference vectors. At Stage \(i\), the difference vector creates an updated target trajectory.
# Target specific proportional gains
d1 <- c(2, 1) # Yield gains twice as much as PlantHeight at stage 1
d2 <- c(3, 1, 0.5) # Desired proportions at stage 2
mppg_res <- mppg_lpsi(
P1 = P1, P = P, G1 = G1, C = C,
wmat = weights,
d1 = d1, d2 = d2,
selection_proportion = 0.1
)
#> Warning in .index_correlation(b_M1, b_M2, P1, P, stage1_indices): Invalid
#> variance for correlation calculation.
#> Warning in .adjust_phenotypic_matrix(P, P1, b_M1, k1, tau, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
#> Warning in .adjust_genotypic_matrix(C, G1, b_M1, k1, tau, P1, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
# Observe the Expected Gain (E) in the resulting summary stats aligns with d1 proportions
mppg_res$summary_stage1
#> Stage Trait b_M d E Ratio
#> 1 1 Yield -522178847351 2 NA NA
#> 2 1 PlantHeight 4023633342185 1 NA NAIn modern breeding, Genomic Estimated Breeding Values (GEBVs) computed from whole-genome markers accelerate cyclical selection. For the Multi-Stage Genomic framework, we replace phenotypic estimators with GEBV matrix derivations: \(\mathbf{\Gamma}\) (GEBV variance-covariance) replaces \(\mathbf{P}\). \(\mathbf{A}\) (Covariance between GEBVs and true BVs) provides genomic mappings.
For illustrative purposes, we mock simulate the arrays via a pseudo-reliability scaling of the genetic matrices:
set.seed(42)
reliability <- 0.7 # Simulated genomic prediction reliability
Gamma1 <- reliability * G1
Gamma <- reliability * C
A1 <- reliability * G1
A <- C[, 1:2] # n x n1 covariance mappingmlgsi_res <- mlgsi(
Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A,
C = C, G1 = G1, P1 = P1,
wmat = weights,
selection_proportion = 0.1
)
#> Warning in mlgsi(Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = C, :
#> Invalid phenotypic variance at stage 1. Using unadjusted C.
mlgsi_res$summary_stage1
#> Stage Trait beta E
#> 1 1 Yield 10 4822.1921
#> 2 1 PlantHeight -5 126.4395Similarly, traits can be biologically constrained in multiple genome-assisted breeding cycles.
mrlgsi_res <- mrlgsi(
Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A,
C = C, G1 = G1, P1 = P1,
wmat = weights,
C1 = C1, C2 = C2,
selection_proportion = 0.1
)
mrlgsi_res$summary_stage2
#> Stage Trait beta_R E
#> 1 2 Yield 10.0000 NA
#> 2 2 PlantHeight -380.2161 NA
#> 3 2 DaysToMaturity -2.0000 NAThe procedure calculates predetermined gains over multiple cycles exclusively utilizing whole-genome predictions.
mppg_lgsi_res <- mppg_lgsi(
Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A,
C = C, G1 = G1, P1 = P1,
wmat = weights,
d1 = d1, d2 = d2,
selection_proportion = 0.1
)
#> Warning in mppg_lgsi(Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = C, :
#> Invalid phenotypic variance at stage 1. Using unadjusted C.
mppg_lgsi_res$summary_stage1
#> Stage Trait beta_P d E Ratio
#> 1 1 Yield -0.4151 2 3.2116 1.6058
#> 2 1 PlantHeight 15.8301 1 1.6058 1.6058For all the multistage indices above (Phenotypic and Genomic), we evaluate statistical properties to compare efficiency.
The accuracy (or correlation between the index and true breeding value) indicates the efficiency of the index: \[ \rho_{H} = \frac{\sigma_{H, I}}{\sigma_H \sigma_I} = \sqrt{ \frac{\mathbf{b}'\mathbf{P}\mathbf{b}}{\mathbf{w}'\mathbf{C}\mathbf{w}} } \] where \(\mathbf{P}\) and \(\mathbf{C}\) are substituted by their adjusted equivalents at Stage 2 (\(\mathbf{P}^*\) and \(\mathbf{C}^*\) or \(\mathbf{\Gamma}^*\)).
The overall selection response generated by the index evaluates the genetic superiority: \[ R = k \sigma_{I} = k \sqrt{\mathbf{b}'\mathbf{P}\mathbf{b}} \] where \(k\) is the selection intensity.
The expected genetic gain per individual trait is given by the vector: \[ \mathbf{E} = k \frac{\mathbf{G}'\mathbf{b}}{\sigma_I} = k \frac{\mathbf{G}'\mathbf{b}}{\sqrt{\mathbf{b}'\mathbf{P}\mathbf{b}}} \] For the Multistage Genomic Indices, \(\mathbf{G}'\) is replaced by the marker association mapping matrix \(\mathbf{A}'\).
Multistage Selection Indices effectively manage breeding resources by filtering inferior variants sequentially while accurately recalibrating covariance parameters using genomic and phenotypic variables across progressive evaluations. These analytical tools prevent severe variance distortion over breeding cycles.