| Type: | Package |
| Title: | Calculate Model-Based Metrics of Proportionality on Count-Based Compositional Data |
| Version: | 1.0.1 |
| Maintainer: | Kevin McGregor <kevinmcg@yorku.ca> |
| Description: | Calculates metrics of proportionality using the logit-normal multinomial model. It can also provide empirical and plugin estimates of these metrics. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| LazyData: | true |
| Depends: | R (≥ 3.5.0) |
| Imports: | glasso, compositions, parallel, zCompositions |
| RoxygenNote: | 7.2.3 |
| Suggests: | knitr, rmarkdown |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2023-08-17 13:53:01 UTC; kevin |
| Author: | Kevin McGregor [aut, cre, cph], Nneka Okaeme [aut] |
| Repository: | CRAN |
| Date/Publication: | 2023-08-18 06:12:38 UTC |
Extended Bayesian Information Criterion
Description
Calculates the Extended Bayesian Information Criterion (EBIC) of a model. Used for model selection to asses the fit of the multinomial logit-Normal model which includes a graphical lasso penalty.
Usage
ebic(l, n, d, df, gamma)
Arguments
l |
Log-likelihood estimates of the model |
n |
Number of rows of the data set for which the log-likelihood has been calculated |
d |
The size of the (k-1) by (k-1) covariance matrix of a k by k count-compositional data matrix |
df |
Degrees of freedom |
gamma |
A tuning parameter. Larger values means more penalization |
Value
The value of the EBIC.
Note
The graphical lasso penalty
is the sum of the absolute value of the elements of the covariance matrix Sigma.
The penalization parameter lambda controls the sparsity of Sigma.
Examples
data(singlecell)
mle <- mleLR(singlecell, lambda.gl=0.5)
log.lik_1 <- mle$est[[1]]$log.lik
n <- NROW(singlecell)
k <- NCOL(singlecell)
df_1 <- mle$est[[1]]$df
ebic(log.lik_1, n, k, df_1, 0.1)
Extended Bayesian Information Criterion Plot
Description
Plots the extended Bayesian information criterion (EBIC) of the model fit for
various penalization parameters lambda.
Usage
ebicPlot(fit, xlog = TRUE, col = "darkred")
Arguments
fit |
The model fit object from |
xlog |
TRUE or FALSE. Renders plot with the x-axis in the log-scale if |
col |
Colour of the plot (character) |
Value
Plot of the EBIC (y-axis) against each lambda (x-axis).
Examples
data(singlecell)
mle <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)
ebicPlot(mle, xlog = TRUE)
Log-Likelihood
Description
Calculates the log-likelihood, under the multinomial logit-Normal model.
Usage
logLik(v, y, ni, S, invSigma)
Arguments
v |
The additive log-ratio transform of y |
y |
Compositional dataset |
ni |
The row sums of y |
S |
Covariance of |
invSigma |
The inverse of the Sigma matrix |
Value
The estimated log-likelihood under the Multinomial logit-Normal distribution.
Examples
data(singlecell)
mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)
n <- NROW(singlecell)
logLik(mle.sim$est.min$v,
singlecell,
n,
cov(mle.sim$est.min$v),
mle.sim$est.min$Sigma.inv)
Full logp Variance-Covariance
Description
Estimates the variance-covariance of the log of the proportions using a Taylor-series approximation.
Usage
logVarTaylorFull(
mu,
Sigma,
transf = c("alr", "clr"),
order = c("second", "first")
)
Arguments
mu |
The mean vector of the log-ratio-transformed data (ALR or CLR) |
Sigma |
The variance-covariance matrix of the log-ratio-transformed data (ALR or CLR) |
transf |
The desired transformation. If |
order |
The desired order of the Taylor Series approximation |
Value
The estimated variance-covariance matrix for log p.
Examples
data(singlecell)
mle <- mleLR(singlecell)
mu <- mle$mu
Sigma <- mle$Sigma
logVarTaylorFull(mu, Sigma)
Logit Normal Variation
Description
Estimates the variation matrix of count-compositional data based on a multinomial logit-Normal distribution. Estimation is performed using only the parameters of the distribution.
Usage
logitNormalVariation(
mu,
Sigma,
type = c("standard", "phi", "phis", "rho"),
order = c("second", "first")
)
Arguments
mu |
The mle estimate of the mu matrix |
Sigma |
The mle estimate of the Sigma matrix |
type |
Type of variation metric to be calculated: |
order |
The order of the Taylor-series approximation to be used in the estimation |
Value
An estimate of the requested metric of proportionality.
Examples
data(singlecell)
mle <- mleLR(singlecell)
mu.hat <- mle$mu
Sigma.hat <- mle$Sigma
logitNormalVariation(mu.hat, Sigma.hat)
logitNormalVariation(mu.hat, Sigma.hat, type="phi")
logitNormalVariation(mu.hat, Sigma.hat, type="rho")
Maximum Likelihood Estimate for multinomial logit-normal model
Description
Returns the maximum likelihood estimates of multinomial logit-normal model parameters given a count-compositional dataset. The MLE procedure is based on the multinomial logit-Normal distribution, using the EM algorithm from Hoff (2003).
Usage
mleLR(
y,
max.iter = 10000,
max.iter.nr = 100,
tol = 1e-06,
tol.nr = 1e-06,
lambda.gl = 0,
gamma = 0.1,
verbose = FALSE
)
Arguments
y |
Matrix of counts; samples are rows and features are columns. |
max.iter |
Maximum number of iterations |
max.iter.nr |
Maximum number of Newton-Raphson iterations |
tol |
Stopping rule |
tol.nr |
Stopping rule for the Newton-Raphson algorithm |
lambda.gl |
Penalization parameter lambda, for the graphical lasso penalty. Controls the sparsity of Sigma |
gamma |
Gamma value for EBIC calculation of the log-likelihood |
verbose |
If TRUE, print information as the functions run |
Value
The additive log-ratio of y (v); maximum likelihood estimates of
mu, Sigma, and Sigma.inv;
the log-likelihood (log.lik); the EBIC (extended Bayesian information criterion)
of the log-likelihood of the multinomial logit-Normal model with the
graphical lasso penalty (ebic); degrees of freedom of the Sigma.inv
matrix (df).
Note
The graphical lasso penalty
is the sum of the absolute value of the elements of the covariance matrix Sigma.
The penalization parameter lambda controls the sparsity of Sigma.
This function is also used within the mlePath() function.
Examples
data(singlecell)
mle <- mleLR(singlecell)
mle$mu
mle$Sigma
mle$ebic
Maximum Likelihood Estimator Paths
Description
Calculates the maximum likelihood estimates of the parameters for the
mutlinomial logit-Normal distribution under various values
of the penalization parameter lambda. Parameter lambda controls
the sparsity of the covariance matrix Sigma, and penalizes the false
large correlations that may arise in high-dimensional data.
Usage
mlePath(
y,
max.iter = 10000,
max.iter.nr = 100,
tol = 1e-06,
tol.nr = 1e-06,
lambda.gl = NULL,
lambda.min.ratio = 0.1,
n.lambda = 1,
n.cores = 1,
gamma = 0.1
)
Arguments
y |
Matrix of counts; samples are rows and features are columns. |
max.iter |
Maximum number of iterations |
max.iter.nr |
Maximum number of Newton-Raphson iterations |
tol |
Stopping rule |
tol.nr |
Stopping rule for the Newton Raphson algorithm |
lambda.gl |
Vector of penalization parameters lambda, for the graphical lasso penalty |
lambda.min.ratio |
Minimum lambda ratio of the maximum lambda, used for the sequence of lambdas |
n.lambda |
Number of lambdas to evaluate the model on |
n.cores |
Number of cores to use (for parallel computation) |
gamma |
Gamma value for EBIC calculation of the log-likelihood |
Value
The MLE estimates of y for each element lambda of lambda.gl, (est);
the value of the estimates which produce the minimum EBIC, (est.min);
the vector of lambdas used for graphical lasso, (lambda.gl); the index of
the minimum EBIC (extended Bayesian information criterion), (min.idx);
vector containing the EBIC for each lambda, (ebic).
Note
If using parallel computing, consider setting n.cores to be equal
to the number of lambdas being evaluated for, n.lambda.
The graphical lasso penalty
is the sum of the absolute value of the elements of the covariance matrix Sigma.
The penalization parameter lambda controls the sparsity of Sigma.
Examples
data(singlecell)
mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)
mu.hat <- mle.sim$est.min$mu
Sigma.hat <- mle.sim$est.min$Sigma
Naive (Empirical) Variation
Description
Naive (empirical) estimates of proportionality metrics using only the observed counts.
Usage
naiveVariation(
counts,
pseudo.count = 0,
type = c("standard", "phi", "phis", "rho", "logp"),
impute.zeros = TRUE,
...
)
Arguments
counts |
Matrix of counts; samples are rows and features are columns |
pseudo.count |
Positive count to be added to all elements of count matrix. |
type |
Type of variation metric to be calculated: |
impute.zeros |
If TRUE, then |
... |
Optional arguments passed to zero-imputation function |
Value
An estimate of the requested metric of proportionality.
Examples
#' data(singlecell)
naiveVariation(singlecell)
naiveVariation(singlecell, type="phi")
naiveVariation(singlecell, type="rho")
Plugin Variation
Description
Estimates the variation matrix of count-compositional data
based on a the same approximation used in logitNormalVariation()
only for this function it uses empirical estimates of mu and Sigma.
Also performs zero-imputation using cmultRepl() from the zCompositions package.
Usage
pluginVariation(
counts,
type = c("standard", "phi", "phis", "rho"),
order = c("second", "first"),
impute.zeros = TRUE,
...
)
Arguments
counts |
Matrix of counts; samples are rows and features are columns. |
type |
Type of variation metric to be calculated: |
order |
The order of the Taylor-series approximation to be used in the estimation |
impute.zeros |
If TRUE, then |
... |
Optional arguments passed to zero-imputation function |
Value
An estimate of the requested metric of proportionality.
Examples
data(singlecell)
pluginVariation(singlecell)
pluginVariation(singlecell, type="phi")
pluginVariation(singlecell, type="rho")
Single cell sequencing data from mouse embryonic stem cells in G1 phase
Description
A subset of single cell data from Buettner et al. 2015. Contains single cell measurements from 96 mouse embryonic stem cells all in G1 phase.
Usage
data(singlecell)
Format
## 'singlecell' A matrix with 96 rows and 10 columns.
Source
<https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-2805>
Examples
data(singlecell)