Type: | Package |
Title: | Block-Wise Rank in Similarity Graph Edge-Count Two-Sample Test (BRISE) |
Version: | 0.1.0 |
Maintainer: | Kejian Zhang <kejianzhang@u.nus.edu> |
Description: | Implements the Block-wise Rank in Similarity Graph Edge-count test (BRISE), a rank-based two-sample test designed for block-wise missing data. The method constructs (pattern) pair-wise similarity graphs and derives quadratic test statistics with asymptotic chi-square distribution or permutation-based p-values. It provides both vectorized and congregated versions for flexible inference. The methodology is described in Zhang, Liang, Maile, and Zhou (2025) <doi:10.48550/arXiv.2508.17411>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5.0) |
Imports: | stats |
Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
RoxygenNote: | 7.3.3 |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-09-24 10:54:58 UTC; keenov |
Author: | Kejian Zhang [aut, cre],
Doudou Zhou |
Repository: | CRAN |
Date/Publication: | 2025-10-01 07:00:26 UTC |
Block-wise Rank In Similarity graph Edge-count (BRISE) Test
Description
BRISE
implements the Two-Sample Test that handles block-wise missingness.
It identifies missing-data patterns, constructs a (blockwise) dissimilarity matrix,
induces ranks via a k-nearest neighbor style graph, and computes a quadratic statistic under two versions:
the congregated form (‘con’) and vectorized form (‘vec’). Permutation p-values are optionally available.
Usage
BRISE(
X = NULL,
Y = NULL,
D = NULL,
ptn_list = NULL,
k = 10,
perm = 0,
skip = 1,
ver = "con"
)
Arguments
X |
Numeric matrix (m × p) of observations for X (Sample 1). Optional if |
Y |
Numeric matrix (n × p) of observations for Y (Sample 2). Optional if |
D |
Numeric square dissimilarity matrix (N × N), where N = m + n. Required when |
ptn_list |
List of integer vectors. Each element contains indices (1…N) of observations that share the same missing-data pattern. |
k |
Positive integer. Neighborhood size offset for rank truncation in nearest-neighbor ranking. Default is 10. |
perm |
Integer. Number of permutations for computing permutation p-value. Default is 0 (no permutation). |
skip |
Integer (0 or 1). When set to 1 (default), skip rank-based dissimilarity for modality pairs with no shared observed variables; setting to 0 computes them (slower). |
ver |
Character. Version of the test statistic: |
Details
If both X
and Y
are supplied, Identify_mods
is used to detect missing patterns and reorganize variables by modality. The dissimilarity matrix D
is then constructed via Blockdist
. Patterns with too few observations in either sample (e.g. fewer than 2) or patterns that are very small relative to the largest pattern are filtered out for robustness. A symmetric rank matrix is built based on truncated nearest-neighbor ranks. Under ver="con"
the contrast statistic (two degrees of freedom) is used; under ver="vec"
a higher-dimensional vector statistic is used. Asymptotic p-values use chi-square approximations; if perm > 0
, empirical permutation p-values are also computed.
Value
A list with elements:
- test.statistic
Numeric. The computed test statistic.
- pval.approx
Numeric. Asymptotic p-value (chi-square based).
- Cov
Covariance matrix used in computing the test statistic.
- pval.perm
(Optional) Permutation p-value if
perm > 0
.
References
Zhang, K., Liang, M., Maile, R. & Zhou, D. (2025). Two-Sample Testing with Block-wise Missingness in Multi-source Data. arXiv preprint arXiv:2508.17411.
See Also
BRISE_Rank
, Cov_mu.c
, Cov_mu.v
Examples
set.seed(1)
X <- matrix(rnorm(50*200, mean = 0), nrow=50)
Y <- matrix(rnorm(50*200, mean = 0.3), nrow=50)
X[1:20, 1:100] <- 0
X[30:50, 101:200] <- 0
Y[1:10, 1:100] <- 0
Y[30:40, 101:200] <- 0
out <- BRISE(X = X, Y = Y, k = 5, perm = 1000, ver = "con")
print(out$test.statistic)
print(out$pval.approx)
Rank Induction within- and cross-pattern similarity blocks
Description
Compute row-wise ranks of a similarity matrix for two cases:
-
method = "row"
: within-pattern block (Sii) (square). Because self-pairs exist, the diagonal (self-similarity) is first forced below the minimum entry of (S) so that self-neighbors are always ranked last and thus excluded when top-(k) truncation is applied downstream. -
method = "rowij"
: cross-pattern block (Sij) (rectangular, i!=j). There are no self-pairs, so no diagonal adjustment is needed.
Ranks are computed row-wise with rank()
and then shifted by 1 (i.e., the function returns rank - 1
).
Usage
BRISE_Rank(S, method = "row")
Arguments
S |
Numeric similarity matrix: (Sii) (square) when |
method |
Character, either |
Value
A numeric matrix with the same dimensions as S
containing row-wise ranks minus one.
Block-wise Statistic (Congregated Form)
Description
For the contrast version of BRISE (“con”), computes within-sample sums of the rank matrix R (i.e. Ux, Uy) over all observations in X and Y, for congregated BRISE test.
Usage
BRISE_c.stat(R, sample1ID, sample2ID)
Arguments
R |
Numeric symmetric rank matrix with zero diagonal. |
sample1ID |
Integer vector of indices for X. |
sample2ID |
Integer vector of indices for Y. |
Value
Numeric vector c(Ux, Uy)
, the within-sample rank sums for the two samples.
Block-wise Statistic (Vectorized Form)
Description
For the vectorized version of BRISE, computes the within-pattern rank sums for both samples across all pattern pairs. Returns a concatenated vector of (Ux_ab, Uy_ab) for all blocks (a, b) with a>b.
Usage
BRISE_v.stat(R, sample1ID, sample2ID, ptn_list)
Arguments
R |
Numeric symmetric rank matrix (N × N). |
sample1ID |
Integer vector. Indices of observations in X. |
sample2ID |
Integer vector. Indices of observations in Y. |
ptn_list |
List of integer vectors that indexes observations sharing the same missing pattern. |
Value
Numeric vector containing the sums of R entries within X and Y, for each pattern pair.
Block-wise Distance Matrix Construction
Description
Constructs a symmetric dissimilarity matrix that accounts for missing-data patterns. Within blocks where both observations share a modality, standard Euclidean distances are used. Optionally, for observations without shared observed features (based on modality), a rank-based dissimilarity is computed (if skip = 0
).
Usage
Blockdist(data, m, n, d, ptn_list, mod_id, modality, mod_bound, skip = 1)
Arguments
data |
List with |
m |
Integer. Number of rows (observations) in |
n |
Integer. Number of rows in |
d |
Integer. Number of features (columns). |
ptn_list |
List of integer vectors: each element indexes observations sharing the same missing pattern. |
mod_id |
Binary matrix (N × modality) indicating modality membership per observation. |
modality |
Integer. Number of modalities. |
mod_bound |
Integer vector. Feature indices boundaries per modality block. |
skip |
Integer (0 or 1). If set to 1, dissimilarity for modality-disjoint pairs is skipped. If 0, computed rank-based distances are used. |
Value
Numeric symmetric matrix (N × N) of pairwise dissimilarities.
Covariance and Expectation (Congregated Form)
Description
Computes the 2×2 covariance matrix and expectation vector (mu) for the congregated BRISE statistic (Ux, Uy), under the pattern-wise permutation null distribution.
Usage
Cov_mu.c(R, m_, n_, ptn_list)
Arguments
R |
Numeric symmetric rank matrix (N × N). |
m_ |
Integer vector. X's sample sizes in each pattern. |
n_ |
Integer vector. Y's sample sizes in each pattern. |
ptn_list |
List of integer vectors that indexes observations sharing the same missing pattern. |
Value
List with two elements:
- Cov
2×2 covariance matrix for (Ux, Uy).
- mu
Numeric vector length-2 giving expected values of (Ux, Uy) under null.
Covariance and Expectation (Vectorized Form)
Description
Computes the asymptotic covariance matrix and expectation (mu) vector for the vectorized BRISE statistic under the pattern-wise permutation null distribution, based on rank matrix R and the list of pattern indicator. Used to form the quadratic statistic and its chi-square approximation.
Usage
Cov_mu.v(R, m_, n_, ptn_list)
Arguments
R |
Numeric symmetric rank matrix (N × N). |
m_ |
Integer vector. X's sample sizes in each pattern. |
n_ |
Integer vector. Y's sample sizes in each pattern. |
ptn_list |
List of integer vectors that indexes observations sharing the same missing pattern. |
Value
List with two elements:
- Cov
Covariance matrix corresponding to the vector of pair-wise statistics.
- mu
Expectation vector for those pair-wise statistics under the null.
Identify Data Modalities
Description
Detects modalities across the combined data (samples X and Y), rearranges variables/columns by modality, and produces identification structures used downstream for blockwise operations.
Usage
Identify_mods(data, m, n, d)
Arguments
data |
List with components |
m |
Integer. Number of rows in |
n |
Integer. Number of rows in |
d |
Integer. Number of features (columns) in |
Value
List with components:
- rearr_data
List with rearranged
X
,Y
after grouping features by modality.- modality
Integer. Number of distinct missing-data modalities.
- mod_bound
Integer vector. Cumulative boundaries of modalities among the features.
- mod_id
Binary matrix (N × modality) indicating, for each observation, whether each modality is observed (1) or missing (0).