| Title: | Hierarchical Probit Estimation for Dichotomized Data |
| Version: | 0.1.0 |
| Description: | Provides likelihood-based and hierarchical estimation methods for thresholded (binomial-probit) data. Supports fixed-mean and random-mean models with maximum likelihood estimation (MLE), generalized linear mixed model (GLMM), and Bayesian Markov chain Monte Carlo (MCMC) implementations. For methodological background, see Albert and Chib (1993) <doi:10.1080/01621459.1993.10476321> and McCulloch (1994) <doi:10.2307/2297959>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | lme4, rstan, stats, statmod |
| NeedsCompilation: | no |
| Packaged: | 2025-11-12 18:52:57 UTC; zhaoze |
| Author: | Zhaoze Liu [aut, cre] |
| Maintainer: | Zhaoze Liu <786633848@qq.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-17 21:10:13 UTC |
bin2norm: A user-friendly interface to estimate normal distribution parameters from dichotomized data
Description
This function handles two data-collection settings for estimating normal parameters from threshold-based (dichotomized) data:
-
Single-threshold per study: Each of
Istudies reports one thresholdc_i, a sample sizen_i, and the observed proportionp_i^{obs}of samples above that threshold. We assume one normal distribution\mathcal{N}(\mu,\sigma^2)across all studies. Methods include"MLE"and"probit". -
Multiple-thresholds per study: Each study
ireportsK_ithresholds\{c_{ij}\}, each with an observed proportionp_{ij}^{obs}. We assume the study-specific mean\mu_i \sim \mathcal{N}(\mu_0,\tau^2)and within-study variance\sigma^2. Because each study has multiple cutpoints, one can estimate\mu_0, \sigma, \tau. Methods include"MLE_integration","GLMM", or"Bayesian"(MCMC).
Usage
bin2norm(
scenario = c("single_threshold", "multiple_thresholds"),
method = NULL,
n_i = NULL,
c_i = NULL,
p_i_obs = NULL,
data_list = NULL,
...
)
Arguments
scenario |
character string, either |
method |
character string indicating which estimation method to use.
|
n_i, c_i, p_i_obs |
used only if |
data_list |
used only if
|
... |
additional arguments passed to lower-level functions (e.g. |
Value
A list of estimated parameters, depending on the data-collection setting
(scenario) and the chosen method. Typically includes:
-
muormu0 -
sigma -
tau(only for multiple-threshold methods)
Examples
# Single-threshold example
n_i <- c(100, 120, 80)
c_i <- c(1.2, 1.0, 1.5)
p_i_obs <- c(0.30, 0.25, 0.40)
bin2norm(scenario="single_threshold", method="MLE", n_i=n_i, c_i=c_i, p_i_obs=p_i_obs)
# Multiple-thresholds example
data_list <- list(
n_i = c(100, 120),
c_ij = list(c(1.0,1.2), c(0.8,1.5,2.0)),
p_ij_obs = list(c(0.20,0.30), c(0.15,0.40,0.55))
)
# MLE with numeric integration
bin2norm(scenario="multiple_thresholds", method="MLE_integration",
data_list=data_list, gh_points=5)
# GLMM approximation
# library(lme4)
bin2norm(scenario="multiple_thresholds", method="GLMM",
data_list=data_list, use_lme4=TRUE)
# Bayesian MCMC approach
# library(rstan)
bin2norm(scenario="multiple_thresholds", method="Bayesian",
data_list=data_list, iter=1000, chains=2)
Get initial values from data
Description
Get initial values from data
Usage
estimate_initial_values_from_data(data_list)
Arguments
data_list |
your inputs |
Value
a named list of initial values
GLMM (Multiple Thresholds per Study, Probit Link, Random Intercepts)
Description
Creates a single data frame stacking all thresholds from all studies, then
calls lme4::glmer(..., family=binomial(link='probit')) to fit a random-intercept
model:
k_{ij} \sim \mathrm{Binomial}\bigl(n_i, \Phi(\alpha_i + \beta\, c_{ij})\bigr),
with \alpha_i \sim \mathcal{N}(0, \sigma_\alpha^2).
Interpreting results: \sigma = 1/|\,\beta\,|,
\tau^2 = \sigma^2 \times \sigma_\alpha^2,
\mu_0 = (\mathrm{Intercept}) \times \sigma (if not forced to 0).
Usage
estimate_multiThresh_GLMM(data_list, use_lme4 = TRUE)
Arguments
data_list |
same structure: |
use_lme4 |
logical; if |
Value
A list with mu0, sigma, tau, method="GLMM_probit".
Bayesian MCMC (Multiple Thresholds per Study) using rstan
Description
Builds an inline Stan model for multiple thresholds per study. The user must have
the rstan package installed. We place random effects \mu_i = \mu_0 + \tau * mu\_raw[i]
and use a binomial likelihood for each threshold. By default, uses simple weakly
informative priors.
Usage
estimate_multiThresh_MCMC(data_list, iter = 2000, chains = 2)
Arguments
data_list |
same structure as above: |
iter |
number of total iterations for each chain (default 2000) |
chains |
number of MCMC chains (default 2) |
Value
a list containing stan_fit (the full Stan fit object), plus
mu0_est, sigma_est, tau_est as posterior means, and
method="Bayesian_MCMC".
MLE with Numeric Integration (Multiple Thresholds per Study)
Description
Each study i has thresholds \{c_{ij}\}, each with an observed proportion
p_{ij}^{obs}. We assume \mu_i \sim \mathcal{N}(\mu_0,\tau^2) and
X_{ij} \sim \mathcal{N}(\mu_i,\sigma^2). The log-likelihood integrates out
\mu_i via Gauss-Hermite quadrature.
Usage
estimate_multiThresh_MLE(data_list, gh_points = 20)
Arguments
data_list |
A list with:
|
gh_points |
integer; number of Gauss-Hermite points (default 12). |
Value
A list with mu0, sigma, tau, method="MLE_integration".
MLE (Single Threshold per Study)
Description
Treats the count of "above threshold" in study i as binomial with probability
1 - \Phi((c_i - \mu)/\sigma). This uses numerical optimization (optim)
to maximize the binomial likelihood. Optionally uses Weighted OLS estimates as starting
values to improve convergence.
Usage
estimate_singleThresh_MLE(n_i, c_i, p_i_obs, use_wols_init = TRUE)
Arguments
n_i |
numeric vector of sample sizes |
c_i |
numeric vector of thresholds |
p_i_obs |
numeric vector of observed proportions above threshold |
use_wols_init |
logical; if |
Value
A list with mu, sigma, method="MLE".
Weighted OLS (Initial value in Single Threshold per Study MLE)
Description
Implements the formula c_i = \mu + \sigma * \Phi^{-1}(1 - p_i^{obs})
in a weighted least-squares sense, with weights = n_i.
Usage
estimate_singleThresh_WOLS(n_i, c_i, p_i_obs)
Arguments
n_i |
numeric vector |
c_i |
numeric vector |
p_i_obs |
numeric vector |
Value
A list with mu, sigma.
GLM probit (Single Threshold per Study)
Description
For each group i, we assume the data follows:
\Pr(Y_i = 1) = \Phi\left( \frac{\mu - c_i}{\sigma} \right)
where c_i is a known threshold, and \Phi is the standard normal CDF (the probit link).
The function reconstructs individual binary outcomes based on observed probabilities,
and estimates the parameters using generalized linear modeling with a probit link.
Usage
estimate_singleThresh_probit(n_i, c_i, p_i_obs)
Arguments
n_i |
numeric vector |
c_i |
numeric vector |
p_i_obs |
numeric vector |
Value
A list with mu, sigma, method="probit".
Minimal Gauss-Hermite Quadrature
Description
Returns (nodes, weights) for approximating \int f(x) e^{-x^2} dx,
ignoring any normalizing constant. This is a simple demonstration; for serious
applications, more robust libraries or expansions might be used.
Usage
gaussHermite(n)
Arguments
n |
integer number of quadrature points |
Value
list with nodes and weights