Kinship Links

Mason Garrison

Introduction

This vignette demonstrates how to use the discord package with non-NLSY data. Specifically, it shows how to construct kinship links from a pedigree, estimate additive genetic relatedness and shared environmental exposure, and fit a discordant-kinship regression model. We use data and tools from the {BGmisc} package to illustrate the process.

Loading Packages and Data

We begin by loading the required packages and loading a built-in toy dataset from {BGmisc}.

library(BGmisc)
library(tidyverse)
library(discord)

data(potter)

We rename the family ID column to avoid naming conflicts and generate a pedigree-encoded data frame. This creates a consistent family identifier for use in subsequent functions.

df_potter <- potter

names(df_potter)[names(df_potter) == "famID"] <- "oldfam"

df_potter <- ped2fam(df_potter,
  famID = "famID",
  personID = "personID"
)

Next, we compute two component matrices based on the pedigree:

add <- ped2add(df_potter)
cn <- ped2cn(df_potter)

We then verify and, if necessary, repair sex coding to ensure compatibility with downstream pedigree-based operations.

df_potter <- checkSex(df_potter,
  code_male = 1,
  code_female = 0,
  verbose = FALSE, repair = TRUE
)
Pedigree plot of the Potter dataset
Pedigree plot of the Potter dataset
#> named list()

Simulate Phenotypes and Fit Model

To illustrate the modeling framework, we simulate outcome variables for the cousin pairs using kinsim(). The simulated data reflect a known variance structure: additive genetic correlation = 0.125, no shared environment, and residual (unique environment) variance = 0.4. Latent component scores are excluded from the final dataset, but they can be useful for debugging and understanding the underlying structure of the data.

set.seed(1234)
syn_df <- discord::kinsim(
  mu_all = c(1, 1), cov_a = .4,
  cov_e = .4, c_all = 0,
  r_vector = df_cousin$addRel
) %>%
  select(-c(
    A1_1, A1_2, A2_1, A2_2,
    C1_1, C1_2, C2_1, C2_2,
    E1_1, E1_2, E2_1, E2_2,
    r
  ))

We bind the simulated outcome data to the cousin link data to prepare it for modeling.

data_demo <- cbind(df_cousin, syn_df)


summary(data_demo)
#>       ID1             ID2             addRel          cnuRel       y1_1        
#>  Min.   : 3.00   Min.   :  7.00   Min.   :0.125   Min.   :0   Min.   :-3.6557  
#>  1st Qu.:21.00   1st Qu.: 26.00   1st Qu.:0.125   1st Qu.:0   1st Qu.:-0.4007  
#>  Median :23.00   Median : 28.00   Median :0.125   Median :0   Median : 0.3537  
#>  Mean   :21.62   Mean   : 36.55   Mean   :0.125   Mean   :0   Mean   : 0.6055  
#>  3rd Qu.:24.00   3rd Qu.: 30.00   3rd Qu.:0.125   3rd Qu.:0   3rd Qu.: 1.6073  
#>  Max.   :28.00   Max.   :104.00   Max.   :0.125   Max.   :0   Max.   : 4.5506  
#>       y1_2              y2_1              y2_2               id      
#>  Min.   :-1.1505   Min.   :-2.2658   Min.   :-2.3618   Min.   : 1.0  
#>  1st Qu.:-0.0983   1st Qu.:-0.4629   1st Qu.: 0.3215   1st Qu.:12.5  
#>  Median : 1.1069   Median : 0.7087   Median : 1.4460   Median :24.0  
#>  Mean   : 1.1957   Mean   : 0.9443   Mean   : 1.1523   Mean   :24.0  
#>  3rd Qu.: 2.1121   3rd Qu.: 2.2073   3rd Qu.: 2.1726   3rd Qu.:35.5  
#>  Max.   : 5.0089   Max.   : 5.3431   Max.   : 4.2828   Max.   :47.0

We then use discord_regression() to fit a discordant-kinship model, predicting y1 from y2. Based on the structure of the data, we expect that there will be a significant association between the two outcome variables, as there is a known overlapping non-shared environment covariance.

The model is fit using the discord_regression() function, which takes the following arguments:

model_output <- discord_regression(
  data = data_demo,
  outcome = "y1",
  predictors = "y2",
  id = "id",
  sex = NULL,
  race = NULL,
  pair_identifiers = c("_1", "_2")
)
summary(model_output)
#> 
#> Call:
#> stats::lm(formula = stats::as.formula(paste(realOutcome, preds, 
#>     sep = " ~ ")), data = preppedData)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.6044 -0.9315 -0.1824  0.7204  2.7929 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.21796    0.24153   5.043  8.8e-06 ***
#> y1_mean      0.08375    0.13099   0.639   0.5260    
#> y2_diff      0.27022    0.08830   3.060   0.0038 ** 
#> y2_mean      0.11734    0.11812   0.993   0.3261    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.117 on 43 degrees of freedom
#> Multiple R-squared:  0.2033, Adjusted R-squared:  0.1477 
#> F-statistic: 3.657 on 3 and 43 DF,  p-value: 0.01959

The output of the model includes estimates of the regression coefficients, standard errors, and p-values for the association between the two outcome variables.

This example demonstrates how the discord package can be used with custom kinship data derived from pedigrees. By extracting component matrices for additive genetic relatedness and shared environment using {BGmisc}, and converting them to linked pair data, we can apply discordant-kinship regression outside of the NLSY context. This enables flexible applications of the method to any dataset where kin structure is known or derivable.

References