gwas2crispr prepares genome-wide association study
(GWAS) results for downstream clustered regularly interspaced short
palindromic repeats (CRISPR) workflows.
The package retrieves significant single-nucleotide polymorphisms (SNPs) for an Experimental Factor Ontology (EFO) trait from the EMBL-EBI GWAS Catalog REST API v2 and returns CRISPR-ready outputs for the GRCh38/hg38 human genome build.
The main outputs are:
Install from CRAN:
Optional packages for FASTA output:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c(
"Biostrings",
"GenomeInfoDb",
"BSgenome.Hsapiens.UCSC.hg38"
))Development version:
By default, no files are written.
To write output files, provide out_prefix. In examples,
use tempdir().
out_prefix <- file.path(tempdir(), "lung")
res <- run_gwas2crispr(
efo_id = "EFO_0000707",
p_cut = 1e-6,
flank_bp = 300,
out_prefix = out_prefix,
verbose = FALSE
)
res$writtenExpected output paths:
paste0(out_prefix, "_snps_full.csv")
paste0(out_prefix, "_snps_hg38.bed")
paste0(out_prefix, "_snps_flank300.fa")The FASTA file is created only when the optional genome packages are available.
sessionInfo()
#> R version 4.4.3 (2025-02-28 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 22621)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=Arabic_Libya.utf8
#> [3] LC_MONETARY=Arabic_Libya.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Arabic_Libya.utf8
#>
#> time zone: Africa/Tripoli
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.56
#> [5] cachem_1.1.0 knitr_1.51 htmltools_0.5.9 rmarkdown_2.30
#> [9] lifecycle_1.0.5 cli_3.6.5 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.4.3 rstudioapi_0.18.0 tools_4.4.3 evaluate_1.0.5
#> [17] bslib_0.10.0 yaml_2.3.10 otel_0.2.0 jsonlite_2.0.0
#> [21] rlang_1.1.6