Type: | Package |
Title: | Covariance Measure Tests for Conditional Independence |
Version: | 0.2-1 |
Description: | Covariance measure tests for conditional independence testing against conditional covariance and nonlinear conditional mean alternatives. The package implements versions of the generalised covariance measure test (Shah and Peters, 2020, <doi:10.1214/19-aos1857>) and projected covariance measure test (Lundborg et al., 2023, <doi:10.1214/24-AOS2447>). The tram-GCM test, for censored responses, is implemented including the Cox model and survival forests (Kook et al., 2024, <doi:10.1080/01621459.2024.2395588>). Application examples to variable significance testing and modality selection can be found in Kook and Lundborg (2024, <doi:10.1093/bib/bbae475>). |
Depends: | R (≥ 4.2.0) |
Imports: | ranger, glmnet, Formula, survival, coin, Rcpp |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Suggests: | testthat (≥ 3.0.0), ggplot2, tidyr, dplyr, xgboost, lightgbm |
Config/testthat/edition: | 3 |
URL: | https://github.com/LucasKook/comets |
BugReports: | https://github.com/LucasKook/comets/issues |
LinkingTo: | Rcpp |
NeedsCompilation: | yes |
Packaged: | 2025-09-10 08:41:54 UTC; lkook |
Author: | Lucas Kook |
Maintainer: | Lucas Kook <lucasheinrich.kook@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-09-10 11:10:02 UTC |
Covariance measure tests with formula interface
Description
Covariance measure tests with formula interface
Usage
comet(formula, data, test = c("gcm", "pcm", "wgcm", "kgcm"), ...)
comets(formula, data, test = c("gcm", "pcm", "wgcm", "kgcm"), ...)
Arguments
formula |
Formula of the form |
data |
Data.frame containing the variables in |
test |
Character string; |
... |
Additional arguments passed to |
Details
Formula-based interface for the generalised and projected covariance measure tests.
Value
Object of class "gcm"
, "wgcm"
, "kgcm"
, or
"pcm"
and "htest"
. See gcm
, wgcm
,
kgcm
, pcm
for details.
References
Kook, L. & Lundborg A. R. (2024). Algorithm-agnostic significance testing in supervised learning with multimodal data. Briefings in Bioinformatics, 25(6), 2024. doi:10.1093/bib/bbae475
Examples
tn <- 1e2
df <- data.frame(y = rnorm(tn), x1 = rnorm(tn), x2 = rnorm(tn), z = rnorm(tn))
comet(y ~ x1 + x2 | z, data = df, test = "gcm")
Generalised covariance measure test
Description
Generalised covariance measure test
Usage
gcm(
Y,
X,
Z,
alternative = c("two.sided", "less", "greater"),
reg_YonZ = "rf",
reg_XonZ = "rf",
args_YonZ = NULL,
args_XonZ = NULL,
type = c("quadratic", "max", "scalar"),
B = 499L,
coin = FALSE,
cointrol = list(distribution = "asymptotic"),
return_fitted_models = FALSE,
multivariate = c("none", "YonZ", "XonZ", "both"),
...
)
Arguments
Y |
Vector or matrix of response values. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
alternative |
A character string specifying the alternative hypothesis,
must be one of |
reg_YonZ |
Character string or function specifying the regression for
Y on Z. See |
reg_XonZ |
Character string or function specifying the regression for
X on Z. See |
args_YonZ |
A list of named arguments passed to |
args_XonZ |
A list of named arguments passed to |
type |
Type of test statistic, either |
B |
Number of bootstrap samples. Only applies if |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
multivariate |
Character; specifying which regression can handle
multivariate outcomes ( |
... |
Additional arguments passed to |
Details
The generalised covariance measure test tests whether the conditional covariance of Y and X given Z is zero.
Value
Object of class 'gcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis. |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Residuals for the X on Z regression. |
models |
List of fitted regressions if |
References
Rajen D. Shah, Jonas Peters "The hardness of conditional independence testing and the generalised covariance measure," The Annals of Statistics, 48(3), 1514-1538. doi:10.1214/19-aos1857
Examples
n <- 1e2
X <- matrix(rnorm(2 * n), ncol = 2)
colnames(X) <- c("X1", "X2")
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X[, 2]^2 + Z[, 2] + rnorm(n)
(gcm1 <- gcm(Y, X, Z))
Kernel generalised covariance measure test
Description
Kernel generalised covariance measure test
Usage
kgcm(
Y,
X,
Z,
reg_YonZ = "rf",
reg_XonZ = "rf",
args_YonZ = NULL,
args_XonZ = NULL,
B = 499L,
return_fitted_models = FALSE,
multivariate = c("none", "XonZ"),
bandwidth = NULL,
...
)
Arguments
Y |
Vector of response values. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
reg_YonZ |
Character string or function specifying the regression for
Y on Z. See |
reg_XonZ |
Character string or function specifying the regression for
X on Z. See |
args_YonZ |
A list of named arguments passed to |
args_XonZ |
A list of named arguments passed to |
B |
Number of wild bootstrap samples. |
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
multivariate |
Character; specifying which regression can handle
multivariate outcomes ( |
bandwidth |
Numeric; value of the bandwidth for the Gaussian kernel.
Defaults to |
... |
Currently ignored |
Details
The kernelized generalised covariance measure test tests whether the weighted conditional covariance of Y and X given Z is zero.
Value
Object of class 'kgcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis. |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Residuals for the X on Z regression. |
models |
List of fitted regressions if |
References
Fernández, T., & Rivera, N. (2024). A general framework for the analysis of kernel-based tests. Journal of Machine Learning Research, 25(95), 1-40.
Examples
n <- 1e2
X <- matrix(rnorm(2 * n), ncol = 2)
colnames(X) <- c("X1", "X2")
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X[, 2]^2 + Z[, 2] + rnorm(n)
(gcm1 <- kgcm(Y, X, Z))
Projected covariance measure test for conditional mean independence
Description
Projected covariance measure test for conditional mean independence
Usage
pcm(
Y,
X,
Z,
rep = 1,
est_vhat = TRUE,
reg_YonXZ = "rf",
reg_YonZ = "rf",
reg_YhatonZ = "rf",
reg_VonXZ = "rf",
reg_RonZ = "rf",
args_YonXZ = NULL,
args_YonZ = NULL,
args_YhatonZ = NULL,
args_VonXZ = NULL,
args_RonZ = NULL,
frac = 0.5,
indices = NULL,
coin = FALSE,
cointrol = NULL,
return_fitted_models = FALSE,
...
)
Arguments
Y |
Vector of response values. Can be supplied as a numeric vector or a single column matrix. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
rep |
Number of repetitions with which to repeat the PCM test |
est_vhat |
Logical; whether to estimate the variance functional |
reg_YonXZ |
Character string or function specifying the regression
for Y on X and Z, default is |
reg_YonZ |
Character string or function specifying the regression
for Y on Z, default is |
reg_YhatonZ |
Character string or function specifying the regression
for the predicted values of |
reg_VonXZ |
Character string or function specifying the regression
for estimating the conditional variance of Y given X and Z, default
is |
reg_RonZ |
Character string or function specifying the regression
for the estimated transformation of Y, X, and Z on Z, default is
|
args_YonXZ |
A list of named arguments passed to |
args_YonZ |
A list of named arguments passed to |
args_YhatonZ |
A list of named arguments passed to |
args_VonXZ |
A list of named arguments passed to |
args_RonZ |
A list of named arguments passed to |
frac |
Relative size of train split. |
indices |
A numeric vector of indices specifying the observations used
for estimating the estimating the direction (the other observations will
be used for computing the final test statistic). Default is |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
... |
Additional arguments currently ignored. |
Details
The projected covariance measure test tests whether the conditional mean of Y given X and Z is independent of X.
Value
Object of class 'pcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
Null hypothesis of conditional mean independence. |
null.value |
Null hypothesis of conditional mean independence. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
check.data |
A |
models |
List of fitted regressions if |
References
Lundborg, A. R., Kim, I., Shah, R. D., & Samworth, R. J. (2022). The Projected Covariance Measure for assumption-lean variable significance testing. arXiv preprint. doi:10.48550/arXiv.2211.02039
Examples
n <- 1e2
X <- matrix(rnorm(2 * n), ncol = 2)
colnames(X) <- c("X1", "X2")
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X[, 2]^2 + Z[, 2] + rnorm(n)
(pcm1 <- pcm(Y, X, Z))
Equivalence test for the parameter in a partially linear model
Description
Equivalence test for the parameter in a partially linear model
Usage
plm_equiv_test(Y, X, Z, from, to, scale = c("plm", "cov", "cor"), ...)
Arguments
Y |
Vector or matrix of response values. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
from |
Lower bound of the equivalence margin |
to |
Upper bound of the equivalence margin |
scale |
Scale on which to specify the equivalence margin. Default
|
... |
Further arguments passed to |
Details
The partially linear model postulates
Y = X \theta + g(Z) + \epsilon,
and the target of inference is theta. The target is closely related to the conditional covariance between Y and X given Z:
\theta = E[cov(X, Y | Z)] / E[Var(X | Z)].
The equivalence test (based
on the GCM test) tests H_0: \theta \not\in [{\tt from}, {\tt to}]
versus
H_1: \theta \in [{\tt from}, {\tt to}]
. Y, X (and theta) can only be
one-dimensional. There are no restrictions on Z. The equivalence test can
also be performed on the conditional covariance scale directly (using
scale = "cov"
) or on the conditional correlation scale:
E[cov(X, Y | Z)] / \sqrt{E[Var(X | Z)]E[Var(Y | Z)]}
,
using scale = "cor"
.
Value
Object of class 'gcm
' and 'htest
'
Examples
n <- 150
X <- rnorm(n)
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X^2 + Z[, 2] + rnorm(n)
plm_equiv_test(Y, X, Z, from = -1, to = 1)
Plotting methods for COMETs
Description
Plotting methods for COMETs
Usage
## S3 method for class 'gcm'
plot(x, plot = TRUE, ...)
## S3 method for class 'kgcm'
plot(x, plot = TRUE, ...)
## S3 method for class 'pcm'
plot(x, plot = TRUE, ...)
## S3 method for class 'wgcm'
plot(x, plot = TRUE, ...)
Arguments
x |
Object of class ' |
plot |
Logical; whether to print the plot (default: |
... |
Currently ignored. |
Implemented regression methods
Description
Implemented regression methods
Usage
rf(y, x, ...)
survforest(y, x, ...)
qrf(y, x, ...)
lrm(y, x, ...)
glrm(y, x, ...)
lasso(y, x, s = "lambda.min", ...)
ridge(y, x, s = "lambda.min", ...)
postlasso(y, x, s = "lambda.min", ...)
cox(y, x, ...)
tuned_rf(
y,
x,
max.depths = 1:5,
mtrys = list(1, function(p) ceiling(sqrt(p)), identity),
verbose = FALSE,
...
)
xgb(y, x, nrounds = 2L, verbose = 0L, ...)
tuned_xgb(
y,
x,
nfold,
folds,
etas = c(0.1, 0.5, 1),
max_depths = 1:5,
nrounds = c(2, 10, 50),
verbose = 0,
metrics = list("rmse"),
...
)
lgbm(y, x, nrounds = 100L, verbose = -1L, ...)
Arguments
y |
Vector (or matrix) of response values. |
x |
Design matrix of predictors. |
... |
Additional arguments passed to the underlying regression method.
In case of |
s |
Which lambda to use for prediction, defaults to
|
max.depths |
Values for |
mtrys |
for |
verbose |
See |
nrounds |
See |
nfold |
Number of folds for |
folds |
Specify folds for cross validation. |
etas |
Values for |
max_depths |
Values for |
metrics |
See |
Details
The implemented choices are "rf"
for random forests as implemented in
ranger, "lasso"
for cross-validated Lasso regression (using the
one-standard error rule), "ridge"
for cross-validated ridge regression (using the one-standard error rule),
"cox"
for the Cox proportional
hazards model as implemented in survival, "qrf"
or "survforest"
for quantile and survival random forests, respectively. The option
"postlasso"
option refers to a cross-validated LASSO (using the
one-standard error rule) and subsequent OLS regression. The "lrm"
option implements a standard linear regression model. The "xgb"
and
"tuned_xgb"
options require the xgboost
package.
The "tuned_rf"
regression method tunes the mtry
and
max.depth
parameters in ranger
out-of-bag.
The "tuned_xgb"
regression method uses k-fold cross-validation to
tune the nrounds
, mtry
and max_depth
parameters in
xgb.cv
.
New regression methods can be implemented and supplied as well and need the
following structure. The regression method "custom_reg"
needs to take
arguments y, x, ...
, fit the model using y
and x
as
matrices and return an object of a user-specified class, for instance,
'custom
'. For the GCM test, implementing a residuals.custom
method is sufficient, which should take arguments
object, response = NULL, data = NULL, ...
. For the PCM test, a
predict.custom
method is necessary for out-of-sample prediction
and computation of residuals.
GCM test with pre-computed residuals
Description
GCM test with pre-computed residuals
Usage
rgcm(
rY,
rX,
alternative = "two.sided",
coin = FALSE,
B = 499L,
type = c("quadratic", "max", "scalar"),
...
)
Arguments
rY |
Vector or matrix of response values. |
rX |
Matrix or data.frame of covariates. |
alternative |
A character string specifying the alternative hypothesis,
must be one of |
coin |
Logical; whether or not to use the |
B |
Number of bootstrap samples. Only applies if |
type |
Type of test statistic, either |
... |
Further arguments passed to |
Value
Object of class 'gcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis. |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Residuals for the X on Z regression. |
Weighted Generalised covariance measure test
Description
Weighted Generalised covariance measure test
Usage
wgcm(
Y,
X,
Z,
reg_YonZ = "rf",
reg_XonZ = "rf",
reg_wfun = "rf",
args_YonZ = NULL,
args_XonZ = NULL,
args_wfun = NULL,
frac = 0.5,
B = 499L,
coin = TRUE,
cointrol = NULL,
return_fitted_models = FALSE,
multivariate = c("none", "YonZ", "XonZ", "both"),
...
)
Arguments
Y |
Vector of response values. Can be supplied as a numeric vector or a single column matrix. |
X |
Matrix or data.frame of covariates. |
Z |
Matrix or data.frame of covariates. |
reg_YonZ |
Character string or function specifying the regression for
Y on Z. See |
reg_XonZ |
Character string or function specifying the regression for
X on Z. See |
reg_wfun |
Character string or function specifying the regression for
estimating the weighting function.
See |
args_YonZ |
A list of named arguments passed to |
args_XonZ |
A list of named arguments passed to |
args_wfun |
Additional arguments passed to |
frac |
Relative size of train split. |
B |
Number of bootstrap samples. Only applies if |
coin |
Logical; whether or not to use the |
cointrol |
List; further arguments passed to
|
return_fitted_models |
Logical; whether to return the fitted regressions
(default is |
multivariate |
Character; specifying which regression can handle
multivariate outcomes ( |
... |
Additional arguments currently ignored. |
Details
The weighted generalised covariance measure test tests whether a weighted version of the conditional covariance of Y and X given Z is zero.
Value
Object of class 'wgcm
' and 'htest
' with the following
components:
statistic |
The value of the test statistic. |
p.value |
The p-value for the |
parameter |
In case X is multidimensional, this is the degrees of freedom used for the chi-squared test. |
hypothesis |
String specifying the null hypothesis . |
null.value |
String specifying the null hypothesis. |
method |
The string |
data.name |
A character string giving the name(s) of the data. |
rY |
Residuals for the Y on Z regression. |
rX |
Weighted residuals for the X on Z regression. |
W |
Estimated weights. |
models |
List of fitted regressions if |
References
Scheidegger, C., Hörrmann, J., & Bühlmann, P. (2022). The weighted generalised covariance measure. Journal of Machine Learning Research, 23(273), 1-68.
Examples
n <- 100
X <- matrix(rnorm(2 * n), ncol = 2)
colnames(X) <- c("X1", "X2")
Z <- matrix(rnorm(2 * n), ncol = 2)
colnames(Z) <- c("Z1", "Z2")
Y <- X[, 2]^2 + Z[, 2] + rnorm(n)
(wgcm1 <- wgcm(Y, X, Z))