Introduction

In crop and animal breeding programs, selection is rarely performed in a single stage. Breeders usually evaluate multiple traits across different stages of testing, successively discarding inferior genotypes and advancing superior ones. This multistage approach requires selection indices to account for the changes in variances and covariances induced by selection at prior stages.

Chapter 9 of the selection.index package focuses on the mathematical formulation and practical application of Multistage Linear Selection Indices. We introduce indices that properly adjust for prior selection using the Cochran (1951) and Cunningham (1975) method, calculating corrected covariance matrices to appropriately predict subsequent selection responses. This vignette will demonstrate six multistage indices utilizing phenotypic and genomic estimated breeding values (GEBVs).

We will use the synthetic maize phenotypic and genotypic datasets (maize_pheno and maize_geno) to illustrate these complex functions. Let us first prepare the covariance matrices.

Data Preparation

For our examples, we will evaluate 3 quantitative traits from the dataset: Yield, PlantHeight, and DaysToMaturity. We assume that Stage 1 selection evaluates the first two traits (Yield, PlantHeight), and Stage 2 evaluates all 3 traits (adding DaysToMaturity).

library(selection.index)

# Estimate phenotypic and genotypic covariance matrices for the 3 traits
# The traits are Yield, PlantHeight, DaysToMaturity
traits <- c("Yield", "PlantHeight", "DaysToMaturity")
pmat <- phen_varcov(maize_pheno[, traits], maize_pheno$Environment, maize_pheno$Genotype)
gmat <- gen_varcov(maize_pheno[, traits], maize_pheno$Environment, maize_pheno$Genotype)

# Matrix limits for Stage 1 (Traits 1 to 2)
P1 <- pmat[1:2, 1:2]
G1 <- gmat[1:2, 1:2]

# Complete Matrices for Stage 2
P <- pmat
C <- gmat

# Economic weights for the 3 traits
weights <- c(10, -5, -2)

1. Multistage Linear Phenotypic Selection Index (MLPSI)

The Multistage Linear Phenotypic Selection Index accounts for changes in phenotypic (\(\mathbf{P}\)) and genotypic (\(\mathbf{C}\)) covariance matrices due to previous selection cycles.

At Stage 1, the index coefficients are computed just as in the standard Smith-Hazel index: \[ \mathbf{b}_1 = \mathbf{P}_1^{-1}\mathbf{G}_1\mathbf{w}_1 \] where \(\mathbf{w}_1\) contains the economic weights for Stage 1 traits.

At Stage 2, the coefficients for the entire set of traits are: \[ \mathbf{b}_2 = \mathbf{P}^{-1}\mathbf{C}\mathbf{w} \]

However, due to selection at stage 1, the covariances \(\mathbf{P}\) and \(\mathbf{C}\) are adjusted for stage 2 evaluation: \[ \mathbf{P}^* = \mathbf{P} - u \frac{Cov(\mathbf{y},\mathbf{x}_1)\mathbf{b}_1\mathbf{b}_1'Cov(\mathbf{x}_1,\mathbf{y})}{\mathbf{b}_1'\mathbf{P}_1\mathbf{b}_1} \] \[ \mathbf{C}^* = \mathbf{C} - u \frac{\mathbf{G}_1'\mathbf{b}_1\mathbf{b}_1'\mathbf{G}_1}{\mathbf{b}_1'\mathbf{P}_1\mathbf{b}_1} \] where \(u = k_1(k_1 - \tau)\) calculates the effect of selection based on the standardized truncation point \(\tau\) and selection intensity \(k_1\).

The function mlpsi simultaneously performs adjustments and metric estimations for both stages.

# We apply a selection proportion of 10% (0.10) per stage.
mlpsi_res <- mlpsi(
  P1 = P1, P = P, G1 = G1, C = C,
  wmat = weights,
  selection_proportion = 0.1
)
#> Warning in .index_correlation(b1, b2, P1, P, stage1_indices): Invalid variance
#> for correlation calculation.
#> Warning in .adjust_phenotypic_matrix(P, P1, b1, k1, tau, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
#> Warning in .adjust_genotypic_matrix(C, G1, b1, k1, tau, P1, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.

# Stage 1 metrics
mlpsi_res$summary_stage1
#>   Stage       Trait             b  E
#> 1     1       Yield -527740915826 NA
#> 2     1 PlantHeight 4066491692882 NA

# Stage 2 metrics
mlpsi_res$summary_stage2
#>   Stage          Trait             b         E
#> 1     2          Yield -1.167617e+12 857097221
#> 2     2    PlantHeight  1.254688e+13  22420339
#> 3     2 DaysToMaturity  7.244343e+13  12421006

2. Multistage Restricted Linear Phenotypic Selection Index (MRLPSI)

The MRLPSI method applies when the breeder aims to maintain one or more quantitative traits without change over the multistage evaluation (e.g., maintaining constant PlantHeight while optimizing other variables).

The restricted coefficient vectors for Stage 1 and Stage 2 are defined as: \[ \mathbf{b}_{R_1} = \mathbf{K}_1 \mathbf{b}_1 \] \[ \mathbf{b}_{R_2} = \mathbf{K}_2 \mathbf{b}_2 \]

Where \(\mathbf{K}_1 = \mathbf{I}_1 - \mathbf{Q}_1\) and \(\mathbf{K}_2 = \mathbf{I}_2 - \mathbf{Q}_2\) are matrices that impose zero genetic gain vectors, derived from the constraint matrices \(\mathbf{C}_1\) and \(\mathbf{C}_2\).

# We constrain PlantHeight (Trait 2) at Stage 1
C1 <- matrix(0, nrow = 2, ncol = 1)
C1[2, 1] <- 1

# We constrain PlantHeight (Trait 2) at Stage 2
C2 <- matrix(0, nrow = 3, ncol = 1)
C2[2, 1] <- 1

mrlpsi_res <- mrlpsi(
  P1 = P1, P = P, G1 = G1, C = C,
  wmat = weights,
  C1 = C1, C2 = C2,
  selection_proportion = 0.1
)
#> Warning in .index_correlation(b_R1, b_R2, P1, P, stage1_indices): Invalid
#> variance for correlation calculation.
#> Warning in .adjust_phenotypic_matrix(P, P1, b_R1, k1, tau, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
#> Warning in .adjust_genotypic_matrix(C, G1, b_R1, k1, tau, P1, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.

# Observe that Expected Gain (E) for PlantHeight is approximately 0
mrlpsi_res$summary_stage1
#>   Stage       Trait           b_R  E
#> 1     1       Yield -529172810900 NA
#> 2     1 PlantHeight 4077525117142 NA

3. Multistage Predetermined Proportional Gain LPSI (MPPG-LPSI)

Unlike MRLPSI which imposes zero genetic gain bounds, MPPG-LPSI forces proportional changes mapped by the \(\mathbf{d}_1\) and \(\mathbf{d}_2\) restricted-difference vectors. At Stage \(i\), the difference vector creates an updated target trajectory.

# Target specific proportional gains
d1 <- c(2, 1) # Yield gains twice as much as PlantHeight at stage 1
d2 <- c(3, 1, 0.5) # Desired proportions at stage 2

mppg_res <- mppg_lpsi(
  P1 = P1, P = P, G1 = G1, C = C,
  wmat = weights,
  d1 = d1, d2 = d2,
  selection_proportion = 0.1
)
#> Warning in .index_correlation(b_M1, b_M2, P1, P, stage1_indices): Invalid
#> variance for correlation calculation.
#> Warning in .adjust_phenotypic_matrix(P, P1, b_M1, k1, tau, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.
#> Warning in .adjust_genotypic_matrix(C, G1, b_M1, k1, tau, P1, stage1_indices):
#> Invalid variance at stage 1 (b1'P1b1 <= 0). Returning unadjusted matrix.

# Observe the Expected Gain (E) in the resulting summary stats aligns with d1 proportions
mppg_res$summary_stage1
#>   Stage       Trait           b_M d  E Ratio
#> 1     1       Yield -522178847351 2 NA    NA
#> 2     1 PlantHeight 4023633342185 1 NA    NA

4. Multistage Linear Genomic Selection Index (MLGSI)

In modern breeding, Genomic Estimated Breeding Values (GEBVs) computed from whole-genome markers accelerate cyclical selection. For the Multi-Stage Genomic framework, we replace phenotypic estimators with GEBV matrix derivations: \(\mathbf{\Gamma}\) (GEBV variance-covariance) replaces \(\mathbf{P}\). \(\mathbf{A}\) (Covariance between GEBVs and true BVs) provides genomic mappings.

For illustrative purposes, we mock simulate the arrays via a pseudo-reliability scaling of the genetic matrices:

set.seed(42)
reliability <- 0.7 # Simulated genomic prediction reliability

Gamma1 <- reliability * G1
Gamma <- reliability * C
A1 <- reliability * G1
A <- C[, 1:2] # n x n1 covariance mapping

mlgsi_res <- mlgsi(
  Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A,
  C = C, G1 = G1, P1 = P1,
  wmat = weights,
  selection_proportion = 0.1
)
#> Warning in mlgsi(Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = C, :
#> Invalid phenotypic variance at stage 1. Using unadjusted C.

mlgsi_res$summary_stage1
#>   Stage       Trait beta         E
#> 1     1       Yield   10 4822.1921
#> 2     1 PlantHeight   -5  126.4395

5. Multistage Restricted Genomic Selection Index (MRLGSI)

Similarly, traits can be biologically constrained in multiple genome-assisted breeding cycles.

mrlgsi_res <- mrlgsi(
  Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A,
  C = C, G1 = G1, P1 = P1,
  wmat = weights,
  C1 = C1, C2 = C2,
  selection_proportion = 0.1
)

mrlgsi_res$summary_stage2
#>   Stage          Trait    beta_R  E
#> 1     2          Yield   10.0000 NA
#> 2     2    PlantHeight -380.2161 NA
#> 3     2 DaysToMaturity   -2.0000 NA

6. Multistage PPG Genomic Selection Index (MPPG-LGSI)

The procedure calculates predetermined gains over multiple cycles exclusively utilizing whole-genome predictions.

mppg_lgsi_res <- mppg_lgsi(
  Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A,
  C = C, G1 = G1, P1 = P1,
  wmat = weights,
  d1 = d1, d2 = d2,
  selection_proportion = 0.1
)
#> Warning in mppg_lgsi(Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = C, :
#> Invalid phenotypic variance at stage 1. Using unadjusted C.

mppg_lgsi_res$summary_stage1
#>   Stage       Trait  beta_P d      E  Ratio
#> 1     1       Yield -0.4151 2 3.2116 1.6058
#> 2     1 PlantHeight 15.8301 1 1.6058 1.6058

Statistical Properties

For all the multistage indices above (Phenotypic and Genomic), we evaluate statistical properties to compare efficiency.

Accuracy

The accuracy (or correlation between the index and true breeding value) indicates the efficiency of the index: \[ \rho_{H} = \frac{\sigma_{H, I}}{\sigma_H \sigma_I} = \sqrt{ \frac{\mathbf{b}'\mathbf{P}\mathbf{b}}{\mathbf{w}'\mathbf{C}\mathbf{w}} } \] where \(\mathbf{P}\) and \(\mathbf{C}\) are substituted by their adjusted equivalents at Stage 2 (\(\mathbf{P}^*\) and \(\mathbf{C}^*\) or \(\mathbf{\Gamma}^*\)).

Selection Response

The overall selection response generated by the index evaluates the genetic superiority: \[ R = k \sigma_{I} = k \sqrt{\mathbf{b}'\mathbf{P}\mathbf{b}} \] where \(k\) is the selection intensity.

Expected Genetic Gain

The expected genetic gain per individual trait is given by the vector: \[ \mathbf{E} = k \frac{\mathbf{G}'\mathbf{b}}{\sigma_I} = k \frac{\mathbf{G}'\mathbf{b}}{\sqrt{\mathbf{b}'\mathbf{P}\mathbf{b}}} \] For the Multistage Genomic Indices, \(\mathbf{G}'\) is replaced by the marker association mapping matrix \(\mathbf{A}'\).

Multistage Phenotypic Selection Indices