This package provides functionalities to perform a meta-analysis of genetic-wide association studies results in plant quantitative genetics. The present document displays the code and results corresponding to the analysis of the package data set.
First one need to install the metaGE
package.
if (!require('metaGE')){
install.packages('metaGE')
} #> Le chargement a nécessité le package : metaGE
library('metaGE')
The analysis requires the following R packages:
library(tidyverse)
library(DT)
## For graphical displays
library(data.table)
library(corrplot)
library(ggplot2)
The starting point of the analysis is the list of files corresponding to the association studies results performed in the different environments. Here we consider only three association files for simplification. These files are included in the package.
## Get the folder containing the association file
<- system.file("extdata", package = "metaGE")
RepData
## Get the complete list of association files
<- list.files(RepData ,full.names = TRUE) %>%
File.list tibble(Names = .)%>%
mutate(ShortNames = Names %>%
str_remove(pattern = paste0(RepData,"/")) %>%
str_remove(pattern = "_3DF.txt")) %>%
select(ShortNames,Names) %>%
deframe
File.list#> Gai12R
#> "C:/Users/Annaig/AppData/Local/Temp/RtmpkH3JOo/Rinst1acc39573a95/metaGE/extdata/Gai12R_3DF.txt"
#> Gai12W
#> "C:/Users/Annaig/AppData/Local/Temp/RtmpkH3JOo/Rinst1acc39573a95/metaGE/extdata/Gai12W_3DF.txt"
#> Gai13R
#> "C:/Users/Annaig/AppData/Local/Temp/RtmpkH3JOo/Rinst1acc39573a95/metaGE/extdata/Gai13R_3DF.txt"
Here is what the data look like in a single file:
## Have a look at the first one
fread(File.list[1]) %>% head() %>% datatable()
One now needs to collect the results of all association files into a
single dataset using the metaGE.collect
function.
Notice that all files may not have the same SNPs (due to filtering per environment). This will result in NAs when files are merged. By default, any marker with NAs is discarded. Specify NA.rmv = FALSE if you want to keep the marker with NAs.
###Build the dataset
## First provide the list of variable names
<- list(MARKER="Marker_Name",
Names.list CHR="Chromosome",
POS="Marker_Position",
FREQ="Maf",
EFFECT="SNP_Weight",
PVAL="Pvalue",
ALLELE0="Allele1",
ALLELE1="Allele2")
<- 0.07
MinFreq
## Now collect
<- metaGE.collect(FileNames = File.list, VariableNames = Names.list, MinFreq = MinFreq)
MetaData #> 106 duplicated markers removed
#> 4 markers with NAs removed
head(MetaData$Data) %>% datatable()
The association files can be in several folders. In this case, one can define the FileNames argument as a list of lists. The argument VariableNames may also be a list of lists.
## Get the list of the association files
<- File.list[1:2]
File.list1
## Get the list of the other association files
<- File.list[3]
File.list2
<- list("List1" = File.list1, "List2" = File.list2)
File.listc
File.listc#> $List1
#> Gai12R
#> "C:/Users/Annaig/AppData/Local/Temp/RtmpkH3JOo/Rinst1acc39573a95/metaGE/extdata/Gai12R_3DF.txt"
#> Gai12W
#> "C:/Users/Annaig/AppData/Local/Temp/RtmpkH3JOo/Rinst1acc39573a95/metaGE/extdata/Gai12W_3DF.txt"
#>
#> $List2
#> Gai13R
#> "C:/Users/Annaig/AppData/Local/Temp/RtmpkH3JOo/Rinst1acc39573a95/metaGE/extdata/Gai13R_3DF.txt"
###Build the dataset
## First provide the list of variable names
<- list(Names.list, Names.list)
Names.listc
<- 0.07
MinFreq
## Now collect
<- metaGE.collect(FileNames = File.listc, VariableNames = Names.listc, MinFreq = MinFreq)
MetaData #> 106 duplicated markers removed
#> 4 markers with NAs removed
head(MetaData$Data) %>% datatable()
From now on, we consider the dataset metaData
included
in the package metaGE
. This dataset contains the results of
10 different genetic association studies on maize lines testing the
association between a set of 27 411 markers and the grain yield.
# Import the data
data("metaData")
One can compute the correlations between the individual GWAS using
the metaGE.cor
function.
<- 0.8
Threshold <- metaGE.cor(metaData, Threshold = Threshold) MatCorr
Here is what the correlation matrix looks like:
corrplot(MatCorr,order = "hclust")
One may fit different models with the metaGE.fit
function depending on the argument Method
:
Method = 'Fe'
, then the Fixed Effect (Fe) model is
fitted, and a test to identify globally significant markers is
performed.Method ='Re'
, then Random Effect (Re) model is
fitted, and a test that allows some heterogeneity on the effect of the
marker is performed.Here are the two different models:
## Fixed Effect
<- metaGE.fit(metaData, MatCorr, Method = "Fe")
FeDF head(FeDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE)) %>% datatable()
## Random Effect
<- metaGE.fit(metaData, MatCorr, Method = "Re")
ReDF head(ReDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE)) %>% datatable()
One can have a look at the pvalues one gets for the different sets of
tests to check for any problem. This can be done using the
metaGE.pvalplot
function, which displays the p-value
distribution and the QQplot of the -log10(pvalues).
<- list('PVALUE.Fe'= FeDF$PVALUE, 'PVALUE.Re'= ReDF$PVALUE)
Pvalue.list <- map(names(Pvalue.list),~metaGE.pvalplot(Pvalue.list[[.x]], Main= .x) ) plott
First apply some multiple testing procedure (here Benjamini-Hochberg with \(H_0\) prior estimation).
<- function(Pvalues){
CorrectingPValues
## Get a p0 estimate
= min(2*sum(Pvalues > 0.5)/length(Pvalues),1-1/length(Pvalues))
p0
## Get the corrected p-values
<- stats::p.adjust(Pvalues, method = 'BH')*p0
CorrectedPval
return(CorrectedPval)
}
Here is the number of significant markers at a 0.05 threshold for the different sets of tests :
## FDR control
<- 0.05
Alpha <- lapply(Pvalue.list, FUN = function(i){
Signif return(CorrectingPValues(i) %>%
`<`(Alpha) %>%
which)})
lapply(X = Signif,FUN = length)
#> $PVALUE.Fe
#> [1] 40
#>
#> $PVALUE.Re
#> [1] 163
One can draw the corresponding manhattan plot using the
metaGE.manhattan
:
<-FeDF[Signif$PVALUE.Fe,]$PVALUE%>% max %>% max(.,0)
PvalThresholdFe <- metaGE.manhattan(Data = FeDF,VarName = 'PVALUE', Threshold = PvalThresholdFe,Main = '-log10(Pval) alongside the chromosome Fe method' )
manhattanFe print(manhattanFe)
<- ReDF[Signif$PVALUE.Re,]$PVALUE%>% max %>% max(.,0)
PvalThresholdRe <- metaGE.manhattan(Data = ReDF,VarName = 'PVALUE', Threshold = PvalThresholdRe,Main = '-log10(Pval) alongside the chromosome Re method')
manhattanRe print(manhattanRe)
One can draw the corresponding heatmap using the
metaGE.heatmap
:
<- metaGE.heatmap(Data = FeDF[Signif$PVALUE.Fe,],Prefix = "Z.",Main = "Z-scores of Fe significant markers across environments") heatmapFe
<- metaGE.heatmap(Data = ReDF[Signif$PVALUE.Re,],Prefix = "Z.", Main = "Z-scores of Re significant markers across environments") heatmapRe
One can use the score local approach developed by Fariello MI,
Boitard S, Mercier S, et al.(2017) using the metaGE.lscore
.
This approach aims to detect significant regions in a genome sequence by
accumulating single marker p-values. The technical details of the
computation can be found in Fariello MI, Boitard S, Mercier S, et
al. Accounting for linkage disequilibrium in genome scans for selection
without individual genotypes: The local score approach. Mol Ecol.
2017;00:1–15. https://doi.org/10.1111/mec.14141.
<- 3
x <- metaGE.lscore(Data = FeDF, PvalName = "PVALUE", xi = x) FeDF_ls
Here are the significant regions founded by the score local approach :
$SigZones %>% datatable() FeDF_ls
One can draw the manhattan plot of the score local and highlight the
significtives markers using the metaGE.manhattan
:
<- metaGE.manhattan(Data = FeDF_ls$Data,
manhattanFe_lscore VarName = "SCORE",
Score = T,
SigZones = FeDF_ls$SigZones )
print(manhattanFe_lscore+ggtitle('Score local alongside the chromosome Fe method'))
Different tests may be performed with the metaGE.test
function depending on the argument :
Incidence
is provided to the function and
Contrast = NULL
, then a test of contrast to identify
markers significant for at least one subclass of environments is
performed.Incidence
and Contrast
are provided to
the function, then the test of contrast specified is performed.Covariate
is provided to the function, then Random
Effect regression, a test to identify significant markers correlated to
environments covariate, is performed.One can perform several tests by providing a list of the arguments
Contrast
and/or Incidence
and/or
Covariate
.
Some covariates describing the environments are available in the
envDesc
data set :
data("envDesc")
%>% datatable() envDesc
First, one must build the incidence matrix using the
metaGE.incidence
function.
## Build the incidence matrix
<- metaGE.incidence(VarName = "Temp",Covariate = envDesc,EnvName = "ShortName", Data = metaData))
(Incidence.Temp #> Cool Hot Hot(day)
#> Cam12R 0 1 0
#> Cam12W 0 1 0
#> Deb12R 1 0 0
#> Deb12W 1 0 0
#> Gai12R 1 0 0
#> Gai12W 1 0 0
#> Kar12R 1 0 0
#> Kar12W 1 0 0
#> Ner12R 0 0 1
#> Ner12W 0 0 1
One can test whether the markers are significant in at least one
environment subclass by setting Contrast
to
NULL
. One can also identify significant markers with a
distinct effect for the different subclasses of environments by
specifying the appropriate Contrast
.
One can use the metaGE.test
function to perform tests of
contrast.
## Build the list of Incidence
<- list(Temp = Incidence.Temp,
Incidence.list Diff.Temp = Incidence.Temp)
#Build the list of Contrast
<- list(Temp = NULL,
Contrast.list Diff.Temp = matrix(c(1,-1,0,0,1,-1),2,byrow = T))
<- metaGE.test(Data = metaData, MatCorr = MatCorr,
ContrastDF Incidence = Incidence.list,
Contrast = Contrast.list)
Here are the number of significant markers at a 0.05 threshold on control FDR for the different sets of tests : (Benjamini-Hochberg with \(H_0\) prior estimation)
<- 0.05
Alpha <- apply(ContrastDF %>% select(contains("PVALUE.")),2,
SignifContrast FUN = function(i){return(CorrectingPValues(i) %>%
`<`(Alpha) %>%
which)})
lapply(SignifContrast,length)
#> $PVALUE.Temp
#> [1] 662
#>
#> $PVALUE.Diff.Temp
#> [1] 610
One can draw the corresponding heatmap using the
metaGE.heatmap
:
# Specify the groups of environment
<- metaGE.heatmap(Data = ContrastDF[SignifContrast$PVALUE.Temp,],EnvGroups = envDesc[,1:2],Prefix = "Z.",Main = "Z-scores of markers with contrasted effect according the temperature") heatmap_Temp
One may want to identify markers correlated to environments
covariate. These can be done by performing a meta-regression test thanks
to the function metaGE.test
by providing a data frame
containing one column corresponding to the environments (the first one)
and column(s) corresponding to the covariate(s) in the argument
Covariate
.One can test whether the markers are significant
in at least one environment subclass by setting Contrast
to
NULL
. One can also identify significant markers with a
distinct effect for the different subclasses of environments by
specifying the appropriate Contrast
.
One can use the metaGE.test
function to perform tests of
contrast.
<- metaGE.test(Data=metaData,MatCorr = MatCorr,Covariate = envDesc[,c(1,5,6)],EnvName = "ShortName") RegressionDF
Here are the number of significant markers at a 0.05 threshold on control FDR for the meta-regression tests : (Benjamini-Hochberg with \(H_0\) prior estimation)
<- apply(RegressionDF %>% select(contains("PVALUE.")),2,
SignifRegression FUN = function(i){return(CorrectingPValues(i) %>%
`<`(Alpha) %>%
which)})
lapply(SignifRegression,length)
#> $PVALUE.Tnight.mean
#> [1] 345
#>
#> $PVALUE.ET0.mean
#> [1] 0
Here are the five most significant markers correlated to the Tnight.mean covariate.
$PVALUE.Tnight.mean,] %>% select(CHR, POS, MARKER, PVALUE.Tnight.mean) %>% arrange(PVALUE.Tnight.mean) %>% head() %>% datatable() RegressionDF[SignifRegression
One can draw the graph of the marker effects according to the
covariate using the metaGE.regplot
.
metaGE.regplot(Data = metaData, Covariate = envDesc,VarName = "Tnight.mean",EnvName = "ShortName", MarkerName = "AX-90548528", aesCol = "Classification")