Stochastic simulation provides breeders with a comprehensive method to evaluate the long-term behavior of different selection indices across numerous sequential cycles. By running in silico (computer simulated) models, breeders can evaluate the expected genetic gains, selection accuracy, and the eventual decay of genetic variance over an extended selection timeframe.
Chapter 10 of the selection.index package focuses on the
performance comparison of over 50 recurrent generic population breeding
cycles. The package provides the stochastic_simulation()
function to evaluate multiple restricted and unrestricted linear
phenotypic indices:
The stochastic_simulation function mimics a forward
recurrent selection scheme. Within each cycle, the simulation performs
the following biological steps: 1. Recombines parental haplotypes using
Haldane’s mapping function to generate double-haploid lines via
crossing. 2. Simulates quantitative trait variations assuming a full
pleiotropic model, applying sampled major/minor QTL effects. 3. Computes
the phenotypic score of the progenies by applying realistic
environmental variances based on expected heritabilities (\(h^2\)). 4. Evaluates the linear
combinations of each chosen index to select the top-performing
progenies. 5. Randomly mates the selected superior lines to act as the
founders for the next evolutionary cycle.
We can run a customized, small-scale simulation to demonstrate the tracking functionality. We will analyze a scenario with three traits to trace their behavior across 5 generation cycles.
In accordance with the textbook paradigm, let us analyze three genetically correlated target traits: - Trait 1: Low heritability (\(h^2 = 0.2\)), negatively correlated to Trait 2. - Trait 2: High heritability (\(h^2 = 0.5\)). - Trait 3: Independent trait with high heritability (\(h^2 = 0.5\)).
First, we generate a known initial phenotypic and genotypic covariance matrix to serve as the baseline simulation logic for our markers. For simplicity, we will simulate a rapid 5-cycle breeding program across a smaller subset of 50 QTL markers.
n_traits <- 3
n_loci <- 50 # Number of segregating sites / markers
# Generate random base QTL effects for the markers across the 3 traits
# Negative correlation infused between trait 1 and 2
qtl_eff <- matrix(rnorm(n_loci * n_traits), nrow = n_loci, ncol = n_traits)
qtl_eff[, 2] <- -0.5 * qtl_eff[, 1] + 0.5 * qtl_eff[, 2]
# Define heritabilities and corresponding environmental variance
heritabilities <- c(0.2, 0.5, 0.5)
# Simulate base genetic variance to deduce correct environmental variance noise
base_gv <- apply(qtl_eff, 2, var) * n_loci
env_var <- base_gv * (1 - heritabilities) / heritabilitiesAll the indices will apply an equal starting economic weight. However, for the Restricted indices (RLPSI and RESIM), we will force a constraint onto the first trait.
The simulate_selection_cycles routine requires various
inputs covering biological configuration (individuals, cycles) and
selection configurations (weights, restricted traits, selection
proportion).
Note: For performance within the vignette, we use minimal
cyclic/generational values. In real breeding scenarios, you would set
n_cycles = 50 and n_individuals = 10000 to
mirror the book’s 35,000 double-haploid evaluations.
# Run the stochastic selection (may take a moment)
sim_results <- simulate_selection_cycles(
n_cycles = 5,
n_individuals = 200,
n_loci = n_loci,
n_traits = n_traits,
qtl_effects = qtl_eff,
heritability = heritabilities,
economic_weights = weights,
selection_proportion = 0.25, # Select upper 25% progeny
restricted_traits = 1
)The simulation tracks comprehensive metrics for all computed methods
(LPSI, ESIM, RLPSI, RESIM) across all generated cycles. The return
object provides tracking arrays for: 1. Genetic Gain
(*_gain) 2. Genetic Variance (*_var) 3.
Estimated Heritabilities (*_heritability) 4. Mean
Phenotypic Value (*_mean)
We can extract the continuous genetic gain trajectories that map the success of the applied phenotypic criteria across the multiple sequential cycles. Let’s observe the restricted indices:
# Expected: Because Trait 1 was constrained via the U_mat for the RLPSI metric,
# its expected generational gain should stabilize at 0.
print(sim_results$rlpsi_gain)
#> [,1] [,2] [,3]
#> [1,] 0.00000000 0.0000000 0.000000
#> [2,] -0.07991849 0.3576849 4.396779
#> [3,] 0.09595190 0.7166170 2.927811
#> [4,] 0.44428982 -0.2318664 2.712902
#> [5,] 1.14420568 -0.4358615 3.740713As clearly demonstrated, the selection gain mapped for Trait 1 drops immediately towards effectively \(0\), verifying that the restricted Kempthorne-Nordskog properties hold up systematically across stochastic evaluations. In comparison, unrestricted indices (such as LPSI) map increasing steady gains across all associated traits.
Additionally, intense generational selection logically extinguishes raw genetic variability. If the selection index consistently applies extreme thresholds, the available allelic variance diminishes, making subsequent trait improvements plateau.
# Observe the diminishing variance arrays for the LPSI evaluations
print(sim_results$lpsi_var)
#> NULLBy evaluating cyclic genetic simulations across stochastic marker environments, breeders identify crucial plateau mechanisms in multidimensional indices. Specifically: - Restricted indices properly maintain and prevent target constraints over multi-generational limits, albeit slightly reducing overarching expected gains. - Independence of traits limits associative “ride-along” gains obtained via pleiotropy in highly positive correlated setups. - Accuracies degrade linearly as available variant variance diminishes, confirming limits built into simple additive pleiotropic models.