| Type: | Package |
| Title: | Surrogate Functional False Discovery Rates for Genome-Wide Association Studies |
| Version: | 1.1.0 |
| Maintainer: | Andrew Bass <ab3105@cam.ac.uk> |
| Description: | Pleiotropy-informed significance analysis of genome-wide association studies with surrogate functional false discovery rates (sfFDR). The sfFDR framework adapts the fFDR to leverage informative data from multiple sets of GWAS summary statistics to increase power in study while accommodating for linkage disequilibrium. sfFDR provides estimates of key FDR quantities in a significance analysis such as the functional local FDR and $q$-value, and uses these estimates to derive a functional $p$-value for type I error rate control and a functional local Bayes' factor for post-GWAS analyses (e.g., fine mapping and colocalization). |
| URL: | https://github.com/ajbass/sffdr |
| License: | LGPL-2 | LGPL-2.1 | LGPL-3 [expanded from: LGPL] |
| Encoding: | UTF-8 |
| VignetteBuilder: | knitr |
| LazyData: | true |
| Depends: | R (≥ 4.1.0) |
| Imports: | locfit, splines, ggplot2 (≥ 3.5.1), patchwork (≥ 1.3.0), qvalue, fastglm, withr |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-11 11:24:56 UTC; andrewbass |
| Author: | Andrew Bass [aut, cre], Chris Wallace [aut] |
| Repository: | CRAN |
| Date/Publication: | 2026-02-11 11:40:02 UTC |
sffdr: Surrogate Functional False Discovery Rate
Description
Estimate functional p-values, q-values, and local false discovery rates for GWAS data using summary statistics of related traits. The surrogate functional FDR (sfFDR) methodology extends the functional FDR framework to leverage informative covariates (e.g., functional annotations, GWAS summary statistics) for increased power while controlling false discovery rates.
Main Functions
pi0_modelBuild spline model for functional pi0 with adaptive knot selection
fpi0estEstimate functional pi0 using GLM with constrained binomial
sffdrCompute functional p-values, q-values, and local FDRs
plot.sffdrVisualize sfFDR results
Workflow
Prepare informative variables (i.e., p-values from related GWAS)
Build functional pi0 model:
mpi0 <- pi0_model(z)Estimate functional pi0:
fpi0 <- fpi0est(p, z = mpi0$zt, pi0_model = mpi0$fmod)Apply sfFDR:
sffdr_out <- sffdr(p, fpi0 = fpi0$fpi0)Visualize:
plot(sffdr_out)
Author(s)
Andrew J. Bass
See Also
pi0_model, fpi0est, sffdr, plot.sffdr
Subset of p-values from the UK Biobank
Description
A dataset containing a subset of p-values from the UK Biobank.
Format
A data frame with 10,000 rows and 4 columns:
- bmi
Body mass index
- bfp
Body fat percentage
- cholesterol
Cholesterol
- triglycerides
Triglycerides
Examples
# Import data
data(bmi)
# Separate main p-values and informative p-values
p <- sumstats$bmi
z <- as.matrix(sumstats[, -1])
Fit Bivariate Density with Adaptive Downsampling
Description
Internal helper for 2D kernel density estimation with signal prioritization.
Usage
fit_bivariate_density(
train_s,
tail_threshold,
target_null,
nn,
maxit,
maxk,
...
)
Arguments
train_s |
2-column matrix on probit scale (p-values, covariate). |
tail_threshold |
Z-score threshold for signal detection. |
target_null |
Maximum null SNPs to retain after downsampling. |
nn |
Nearest-neighbor bandwidth (NULL for automatic). |
maxit |
Maximum iterations for locfit. |
maxk |
Maximum fitting points for locfit. |
... |
Additional arguments for lp(). |
Value
Fitted locfit object or NULL.
Fit locfit with Multiple Fallback Strategies
Description
Internal helper function that attempts to fit a locfit model using multiple strategies with different kernels, degrees, and smoothing parameters.
Usage
fit_strategy_final(
data,
valid_data,
nn_high,
nn_safe,
maxit,
maxk,
verbose = TRUE,
...
)
Arguments
data |
Data frame with columns V1, V2, and w (weights). |
valid_data |
Subset of data used for validation (typically signal-rich region). |
nn_high |
First (higher resolution) nn value to try. |
nn_safe |
Second (safer, more conservative) nn value. |
maxit |
Maximum iterations for locfit. |
maxk |
Maximum number of fitting points for locfit. |
verbose |
Logical; print progress messages. |
... |
Additional arguments passed to lp() in locfit. |
Value
A fitted locfit object, or NULL if all strategies fail.
Fit Univariate Density
Description
Internal helper for 1D kernel density estimation.
Usage
fit_univariate_density(train_s, nn, maxit, maxk, ...)
Arguments
train_s |
Numeric vector on probit scale. |
nn |
Nearest-neighbor bandwidth (NULL for automatic). |
maxit |
Maximum iterations for locfit. |
maxk |
Maximum fitting points for locfit. |
... |
Additional arguments for lp(). |
Value
Fitted locfit object.
Estimate Functional Proportion of Null Tests
Description
Estimates the functional proportion of null tests (pi0) using a GLM approach with a constrained binomial family. This function fits models across multiple lambda thresholds and selects the optimal estimate via MISE minimization.
Usage
fpi0est(
p,
pi0_model_obj = NULL,
z = NULL,
pi0_model = NULL,
indep_snps = NULL,
lambda = seq(0.05, 0.95, 0.05),
constrained.p = TRUE,
tol = 1e-09,
maxit = 200,
verbose = TRUE,
...
)
Arguments
p |
Numeric vector of p-values. |
pi0_model_obj |
Optional list object returned by |
z |
Optional data frame or matrix of rank-transformed covariates.
Required if |
pi0_model |
Optional formula (as character string or formula object).
Required if |
indep_snps |
Optional logical vector indicating independent SNPs for model fitting. Default is NULL (all SNPs used). |
lambda |
Numeric vector of lambda thresholds. Default is |
constrained.p |
Logical; use constrained binomial family. Default is TRUE. |
tol |
Numeric; convergence tolerance. Default is 1e-9. |
maxit |
Integer; maximum iterations. Default is 200. |
verbose |
Logical; print progress messages. Default is TRUE. |
... |
Additional arguments passed to |
Details
Algorithm:
For each lambda threshold, fit a binomial GLM:
P(p \ge \lambda | z)Use constrained binomial family to ensure predictions in (0, 1)
Select optimal lambda via MISE minimization
Usage Patterns:
Pattern 1 (Recommended): Use output from pi0_model
mpi0 <- pi0_model(z) # Clean syntax: fpi0_out <- fpi0est(p, mpi0)
Pattern 2: Manually specify formula and covariates
fpi0_out <- fpi0est(p, z = z_matrix, pi0_model = formula_obj)
Value
An object of class fpi0 (a list) containing:
- fpi0
Numeric vector of functional pi0 estimates for each test.
- tableLambda
A data frame summarizing results for each lambda value.
- MISE
The Mean Integrated Squared Error (MISE) for the chosen model.
- lambda
The selected optimal lambda value.
Examples
# Import data
data(bmi)
# Separate main p-values and conditioning p-values
p <- sumstats$bmi
z <- as.matrix(sumstats[, -1])
# Apply pi0_model to create model (uses adaptive knot selection)
fmod <- pi0_model(z)
# Estimate functional pi0
fpi0_out <- fpi0est(p, fmod)
fpi0 <- fpi0_out$fpi0
# Apply sffdr
sffdr_out <- sffdr(p, fpi0)
Functional p-values
Description
Calculate functional p-values from functional local FDRs.
Usage
fpvalues(lfdr, p = NULL)
Arguments
lfdr |
A vector of functional local FDRs. |
p |
A vector of p-values for ranking purposes. Default is NULL. |
Value
A list containing:
fp |
Functional p-values. |
fq |
Functional q-values. |
Kernel Density Estimation for GWAS P-values
Description
Performs kernel density estimation on p-values (univariate) or joint p-value/covariate pairs (bivariate) using local regression on the probit scale. The estimator is optimized for GWAS data with linkage disequilibrium (LD) and uses adaptive downsampling to prioritize signal-rich regions while maintaining computational efficiency.
Usage
kernelEstimator(
x,
eval.points = x,
epsilon = 1e-15,
epsilon.max = 1 - 1e-04,
maxk = 5e+05,
maxit = 200,
target_null = 1e+05,
trim = 0,
nn = NULL,
tail_threshold = -2,
...
)
Arguments
x |
Numeric vector of p-values (for 1D density) or a 2-column matrix where the first column contains p-values and the second column contains an informative covariate/surrogate. All p-values must be in (0, 1) and the covariate/surrogate must be rank-transformed to be (0,1]. |
eval.points |
Points at which to evaluate the density estimate. Defaults to
|
epsilon |
Lower bound for p-values to prevent numerical issues. P-values
below this are clamped to |
epsilon.max |
Upper bound for p-values. P-values above this are clamped to
|
maxk |
Maximum number of fitting points passed to |
maxit |
Maximum number of iterations for local regression fitting. Default:
|
target_null |
Maximum number of null SNPs to include in the weighted fit
(bivariate case only). SNPs in the signal-enriched tail (defined by
|
trim |
For 1D estimation, fixes the density estimate to constant values on
the intervals |
nn |
Nearest-neighbor bandwidth parameter for |
tail_threshold |
Z-score threshold on the probit-transformed covariate scale
(bivariate case only). SNPs with z < |
... |
Additional arguments passed to |
Details
The function implements a multi-stage density estimation procedure:
-
Probit transformation: P-values are transformed to the normal scale via
qnormto stabilize variance and handle extreme values. -
Adaptive downsampling (bivariate only): To handle large GWAS datasets efficiently, the null region (where the covariate suggests low signal) is downsampled to
target_nullSNPs, with inverse-probability weighting to preserve the density. Signal-enriched SNPs (tail) are always retained. -
Cascade fitting: Multiple fitting strategies are attempted in sequence, with decreasing resolution and increasing robustness, until a valid fit is obtained.
-
Jacobian correction: Density estimates are transformed back to the original p-value scale using the Jacobian of the probit transformation.
The nearest-neighbor bandwidth nn controls smoothing and LD robustness.
By targeting ~5000 neighbors (default), the estimator naturally averages over
multiple LD blocks (~30-50 blocks in European ancestry populations), reducing
spurious local structure while preserving true signal-covariate relationships.
Value
A data frame with columns:
- x
Evaluation points (original scale).
- fx
Estimated density at evaluation points (original scale).
- s
Evaluation points on probit scale (
qnorm(x)).- fs
Estimated density on probit scale.
The returned object has an attribute "lfit" containing the fitted
locfit object for diagnostics.
See Also
Examples
## Not run:
# 1D density estimation
p <- runif(10000, 0, 1)
dens <- kernelEstimator(p)
plot(dens$x, dens$fx, type = "l")
# 2D density with informative covariate
p <- runif(10000, 0, 1)
z <- runif(10000) # rank-norm transformed covariate
x_mat <- cbind(p, z)
dens <- kernelEstimator(x_mat)
## End(Not run)
Build Model for Functional Proportion of Null Tests (fpi0)
Description
Generates a natural spline model for the functional proportion of null tests (fpi0) by adaptively selecting knots based on an FDR threshold.
Usage
pi0_model(
z,
indep_snps = NULL,
fdr_threshold = 0.25,
min_discoveries = 2500,
min_snps_per_knot = 100,
n_knots = 5,
verbose = TRUE
)
Arguments
z |
Matrix/data.frame of p-values (rows=tests, cols=traits). |
indep_snps |
Logical vector indicating independent SNPs (training subset). If NULL, uses all tests. Default: NULL. |
fdr_threshold |
FDR threshold for signal definition. Default: 0.25. |
min_discoveries |
Min significant hits required to include trait. Default: 2500. |
min_snps_per_knot |
Min significant SNPs per knot interval. Default: 100. Note that this is the minimum independent SNPs required per knot interval if 'indep_snps' is provided. |
n_knots |
Target knot count. Default: 5. Automatically reduced if insufficient discoveries (via 'min_snps_per_knot') or capped by 'max_knots'. |
verbose |
Print selection details. Default: TRUE. |
Details
Independent SNPs determine knot placement; all SNPs train the model to capture signal shape. For smaller datasets (<100K), consider reducing 'min_snps_per_knot' and 'min_discoveries' to improve knot placement.
Value
List containing:
fmod |
Model formula (using |
zt |
Data frame of globally rank-transformed p-values |
Plotting function for sffdr object
Description
Graphical display of the sffdr object
Usage
## S3 method for class 'sffdr'
plot(x, rng = c(0, 5e-08), ...)
Arguments
x |
A sffdr object. |
rng |
Significance region to show. Optional. |
... |
Additional arguments. Currently unused. |
Value
Nothing of interest.
Author(s)
Andrew J. Bass
See Also
Examples
# import data
data(bmi)
# Separate main p-values and conditioning p-values
p <- sumstats$bmi
z <- as.matrix(sumstats[, -1])
# Apply pi0_model to create model
fmod <- pi0_model(z)
# Estimate functional pi0
fpi0_out <- fpi0est(p, z = fmod$zt, pi0_model = fmod$fmod)
fpi0 <- fpi0_out$fpi0
# Apply sffdr
sffdr_out <- sffdr(p, fpi0)
# Plot significance results
plot(sffdr_out, rng = c(0, 5e-4))
Surrogate Functional False Discovery Rate Analysis
Description
Estimate functional p-values, q-values, and local false discovery rates (lfdr) for GWAS data leveraging summary statistics from related traits. Functional p-values map from the functional q-value (FDR-based measure) to a p-value for type I error rate control, accounting for pleiotropy that impacts the prior probability of association.
Usage
sffdr(
p.value,
fpi0,
surrogate = NULL,
epsilon = .Machine$double.xmin,
nn = NULL,
fp_ties = TRUE,
seed = 2026,
verbose = TRUE,
...
)
Arguments
p.value |
Numeric vector of p-values to analyze. |
fpi0 |
Numeric vector of functional pi0 estimates, obtained
from |
surrogate |
Optional numeric vector (same length as |
epsilon |
Numeric; lower bound for p-value clamping during density estimation.
Default is |
nn |
Numeric; nearest-neighbor bandwidth for |
fp_ties |
Logical; whether to break ties in functional p-values using the original p-value ordering. Default is TRUE. |
seed |
Integer; random seed for reproducibility of rank tie-breaking. Default is 2026. |
verbose |
Logical; print progress messages. Default is TRUE. |
... |
Additional arguments passed to |
Details
The surrogate functional FDR (sfFDR) methodology extends the functional FDR framework to leverage multiple informative variables (e.g., functional annotations, GWAS summary statistics) for increased power while controlling false discovery rates.
Workflow:
Estimate functional pi0 (proportion of nulls) using
fpi0estCall
sffdr()with p-values and estimated functional pi0Use returned functional p-values/q-values/local FDRs for significance testing
Surrogate Variable: If not specified, the estimated functional pi0 is used as the surrogate variable.
Value
An S3 object of class "sffdr" containing:
call |
The function call. |
pvalues |
Original p-values. |
fpvalues |
Functional p-values. |
fqvalues |
Functional q-values. |
flfdr |
Functional local false discovery rates. |
fpi0 |
Functional pi0 estimates. |
fx |
Joint density estimates at observed (p-value, surrogate) pairs. |
Author(s)
Andrew J. Bass
See Also
fpi0est, plot.sffdr, kernelEstimator
Examples
# Import data
data(bmi)
# Separate main p-values and conditioning p-values
p <- sumstats$bmi
z <- as.matrix(sumstats[, -1])
# Apply pi0_model to create model
# (note: use indep_snps argument to specify independent SNPs for training)
fmod <- pi0_model(z)
# Estimate functional pi0
# (note: use indep_snps argument to specify independent SNPs for training)
fpi0_out <- fpi0est(p, fmod)
fpi0 <- fpi0_out$fpi0
# Apply sffdr
sffdr_out <- sffdr(p, fpi0)
# Plot significance results
plot(sffdr_out)
# Extract functional quantities
fp <- sffdr_out$fpvalues
fq <- sffdr_out$fqvalues
flfdr <- sffdr_out$flfdr