This vignette demonstrates how to use the discord package with non-NLSY data. Specifically, it shows how to construct kinship links from a pedigree, estimate additive genetic relatedness and shared environmental exposure, and fit a discordant-kinship regression model. We use data and tools from the {BGmisc} package to illustrate the process.
We begin by loading the required packages and loading a built-in toy dataset from {BGmisc}.
We rename the family ID column to avoid naming conflicts and generate a pedigree-encoded data frame. This creates a consistent family identifier for use in subsequent functions.
df_potter <- potter
names(df_potter)[names(df_potter) == "famID"] <- "oldfam"
df_potter <- ped2fam(df_potter,
famID = "famID",
personID = "personID"
)
Next, we compute two component matrices based on the pedigree:
add: the additive genetic relatedness matrix
cn: the shared environment matrix, indicating whether kin were raised together (1) or apart (0)
We then verify and, if necessary, repair sex coding to ensure compatibility with downstream pedigree-based operations.
#> named list()
The com2links
function generates a data frame of kinship
links based on the pedigree data. The writetodisk
argument
is set to FALSE to avoid writing the output to disk, and we filter out
upper triangular values to focus on unique pairs.
We use the com2links()
function to convert the additive
genetic and shared environment component matrices into a long-form data
frame of kin pairs. Each row in the resulting data frame represents a
pair of individuals, along with their additive genetic relatedness and
shared environmental status. Self-pairs and duplicate (upper-triangular)
entries are removed.
df_links <- com2links(
writetodisk = FALSE,
ad_ped_matrix = add, cn_ped_matrix = cn,
drop_upper_triangular = TRUE
) %>%
filter(ID1 != ID2)
df_links %>%
slice(1:10) %>%
knitr::kable()
ID1 | ID2 | addRel | cnuRel |
---|---|---|---|
1 | 2 | 0.500 | 1 |
3 | 4 | 0.500 | 1 |
1 | 6 | 0.500 | 0 |
2 | 6 | 0.250 | 0 |
3 | 6 | 0.500 | 0 |
4 | 6 | 0.250 | 0 |
3 | 7 | 0.250 | 0 |
4 | 7 | 0.500 | 0 |
5 | 7 | 0.500 | 0 |
6 | 7 | 0.125 | 0 |
From the full set of kin links, we extract two subsets based on expected genetic and environmental structure:
Full siblings: additive relatedness = 0.5 and shared environment = 1
Cousins: additive relatedness = 0.125 and shared environment = 0
To illustrate the modeling framework, we simulate outcome variables
for the cousin pairs using kinsim()
. The simulated data
reflect a known variance structure: additive genetic correlation =
0.125, no shared environment, and residual (unique environment) variance
= 0.4. Latent component scores are excluded from the final dataset, but
they can be useful for debugging and understanding the underlying
structure of the data.
set.seed(1234)
syn_df <- discord::kinsim(
mu_all = c(1, 1), cov_a = .4,
cov_e = .4, c_all = 0,
r_vector = df_cousin$addRel
) %>%
select(-c(
A1_1, A1_2, A2_1, A2_2,
C1_1, C1_2, C2_1, C2_2,
E1_1, E1_2, E2_1, E2_2,
r
))
We bind the simulated outcome data to the cousin link data to prepare it for modeling.
data_demo <- cbind(df_cousin, syn_df)
summary(data_demo)
#> ID1 ID2 addRel cnuRel y1_1
#> Min. : 3.00 Min. : 7.00 Min. :0.125 Min. :0 Min. :-3.6557
#> 1st Qu.:21.00 1st Qu.: 26.00 1st Qu.:0.125 1st Qu.:0 1st Qu.:-0.4007
#> Median :23.00 Median : 28.00 Median :0.125 Median :0 Median : 0.3537
#> Mean :21.62 Mean : 36.55 Mean :0.125 Mean :0 Mean : 0.6055
#> 3rd Qu.:24.00 3rd Qu.: 30.00 3rd Qu.:0.125 3rd Qu.:0 3rd Qu.: 1.6073
#> Max. :28.00 Max. :104.00 Max. :0.125 Max. :0 Max. : 4.5506
#> y1_2 y2_1 y2_2 id
#> Min. :-1.1505 Min. :-2.2658 Min. :-2.3618 Min. : 1.0
#> 1st Qu.:-0.0983 1st Qu.:-0.4629 1st Qu.: 0.3215 1st Qu.:12.5
#> Median : 1.1069 Median : 0.7087 Median : 1.4460 Median :24.0
#> Mean : 1.1957 Mean : 0.9443 Mean : 1.1523 Mean :24.0
#> 3rd Qu.: 2.1121 3rd Qu.: 2.2073 3rd Qu.: 2.1726 3rd Qu.:35.5
#> Max. : 5.0089 Max. : 5.3431 Max. : 4.2828 Max. :47.0
We then use discord_regression()
to fit a
discordant-kinship model, predicting y1 from y2. Based on the structure
of the data, we expect that there will be a significant association
between the two outcome variables, as there is a known overlapping
non-shared environment covariance.
The model is fit using the discord_regression()
function, which takes the following arguments:
model_output <- discord_regression(
data = data_demo,
outcome = "y1",
predictors = "y2",
id = "id",
sex = NULL,
race = NULL,
pair_identifiers = c("_1", "_2")
)
summary(model_output)
#>
#> Call:
#> stats::lm(formula = stats::as.formula(paste(realOutcome, preds,
#> sep = " ~ ")), data = preppedData)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.6044 -0.9315 -0.1824 0.7204 2.7929
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.21796 0.24153 5.043 8.8e-06 ***
#> y1_mean 0.08375 0.13099 0.639 0.5260
#> y2_diff 0.27022 0.08830 3.060 0.0038 **
#> y2_mean 0.11734 0.11812 0.993 0.3261
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.117 on 43 degrees of freedom
#> Multiple R-squared: 0.2033, Adjusted R-squared: 0.1477
#> F-statistic: 3.657 on 3 and 43 DF, p-value: 0.01959
The output of the model includes estimates of the regression coefficients, standard errors, and p-values for the association between the two outcome variables.
This example demonstrates how the discord package can be used with custom kinship data derived from pedigrees. By extracting component matrices for additive genetic relatedness and shared environment using {BGmisc}, and converting them to linked pair data, we can apply discordant-kinship regression outside of the NLSY context. This enables flexible applications of the method to any dataset where kin structure is known or derivable.