This vignette demonstrates the Parametric Exponential Tilting estimator for handling Not Missing at Random (NMAR). This method implements the maximum likelihood approach described in Riddles, Kim, and Im (2016)
This method is applicable when: - Data is individual-level with both the outcome variable and auxiliary covariates. - The outcome variable follows a known distribution (e.g., normal).
y).x).The estimation is configured using exptilt_engine.
Critical parameters include the assumed outcome density for respondents
and the variance estimation method
The model is specified via a formula: Outcome ~ Response_Covariates | Instrument.
# Exptilt engine configuration
exptilt_config <- exptilt_engine(
y_dens = 'normal', # Assume f(y|x, delta=1) is Normal
family = 'logit', # Logit link for response probability
variance_method = 'bootstrap',
bootstrap_reps = 5,
control = list(maxit = 10), # Solver control parameters
stopping_threshold = 0.01, # Convergence threshold
standardize = FALSE
)
formula = y ~x
res <- nmar(formula = formula, data = riddles_case1, engine = exptilt_config, trace_level = 0)The function returns an object containing the estimated coefficients for the response model and the adjusted mean estimate for the outcome.
The adjusted mean is calculated using propensity score weighting, where the weights are derived from the fitted nonignorable response model
print(res)
#> NMAR Result (Exponential tilting)
#> -------------------------------
#> y mean: -1.003197 (0.039987)
#> Converged: TRUE
#> Variance method: bootstrap
#> Estimator: exponential_tilting
#>
#> Exptilt diagnostics:
#> Loss value: 0.000000 0.000000
#> Iterations: 5
#> Variance method: bootstrap
#> Bootstrap reps: 5
#> Stopping threshold: 0.010000Comparing the results to the true population mean (known from the synthetic generation) demonstrates the bias correction.
y_true_mean <- mean(riddles_case1$y_true)
est_mean <- as.numeric(res$y_hat)
se <- res$se
cat('Estimated (NMAR) Mean:', sprintf('%.4f', est_mean),
' (SE:', sprintf('%.4f', se), ')\n')
#> Estimated (NMAR) Mean: -1.0032 (SE: 0.0400 )
cat('True Y Mean: ', sprintf('%.4f', y_true_mean), '\n')
#> True Y Mean: -1.0105
cat('Naive (MCAR) Mean: ', sprintf('%.4f', mean(riddles_case1$y, na.rm = TRUE)), '\n')
#> Naive (MCAR) Mean: -1.0716The engine supports survey.design objects. When provided, the algorithm incorporates sampling weights into the likelihood estimation and the final propensity score weighting.
The weights affect both the estimation of the respondent density f(y∣x) and the solution to the mean score equation, ensuring population-level inference
# if survey is installed
if (requireNamespace("survey", quietly = TRUE)) {
library('survey')
surv_test_weights <- runif(length(riddles_case1$y), 0.9, 1.1)
surv_test_data <- svydesign(ids = ~1, data = riddles_case1, weights = ~surv_test_weights)
exptilt_config <- exptilt_engine(
standardize = FALSE,
on_failure = "error",
bootstrap_reps = 5,
supress_warnings = FALSE,
control = list(maxit = 15, method = "Newton"),
family = "logit",
y_dens = "normal",
variance_method = "bootstrap",
stopping_threshold = 1
)
formula = y ~ x
res <- nmar(formula = formula, data = surv_test_data, engine = exptilt_config, trace_level = 3)
print(res)
}
#> [STEP-1] ============================================================
#> [STEP-1] EXPTILT ESTIMATION STARTED
#> [STEP-1] ============================================================
#> [INFO] Running with trace_level = 3
#> [INFO] Formula: y ~ x
#> [INFO]
#> [INFO] -- DATA SUMMARY --
#> [INFO] Total observations: 500
#> [INFO] Respondents: 368 (26.4%)
#> [INFO] Non-respondents: 132 (73.6%)
#> [INFO]
#> [INFO] -- MODEL SPECIFICATION --
#> [INFO] Outcome variable: y
#> [INFO] Auxiliary variables: x
#> [INFO] Missingness predictors: (intercept only)
#> [INFO] Response model: logit
#> [INFO] Outcome density: normal
#> [INFO] Standardization: disabled
#> [INFO]
#> [INFO] -- PARAMETER INITIALIZATION --
#> [INFO] Number of parameters: 2
#> [DETAIL-3]
#> [INFO] Initial values:
#> [INFO] (Intercept) = 0.0481 [response model intercept]
#> [INFO] y = 0.0928 [outcome effect on response]
#> [INFO]
#> [INFO] -- CONDITIONAL DENSITY ESTIMATION --
#> [INFO] Selected distribution: normal
#> [INFO] Density parameters: 3
#> [INFO] Fitted density model summary:
#> [INFO]
#>
#> Call:
#> glm(formula = formula, family = gaussian(), data = data)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1.06281 0.04866 -21.84 <2e-16 ***
#> x 1.11333 0.06712 16.59 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 0.8711681)
#>
#> Null deviance: 558.57 on 367 degrees of freedom
#> Residual deviance: 318.85 on 366 degrees of freedom
#> AIC: 997.58
#>
#> Number of Fisher Scoring iterations: 2
#>
#> [INFO] Computing density matrices...
#> [INFO]
#> [INFO] -- NONLINEAR SOLVER (nleqslv) --
#> [INFO] Early stopping threshold: 1.0000
#> [INFO] Method: Newton
#> [INFO] Control parameters:
#> [INFO] maxit = 15
#> [INFO] Solving...
#> [INFO]
#> [INFO] OK Converged
#> [INFO] Iterations: 3
#> [INFO] Termination code: 1
#> [INFO] Max |score|: 0.000000
#> [INFO] Final parameter estimates (scaled):
#> [INFO] (Intercept) = 0.865942 [response intercept]
#> [INFO] y = -0.159226 [outcome -> response]
#> [INFO]
#> [INFO] -- VARIANCE ESTIMATION (Bootstrap) --
#> [INFO] Bootstrap replications: 5
#> [INFO] OK Bootstrap complete
#> [INFO] Standard error: 0.054672
#> [INFO]
#> [RESULT] ============================================================
#> [RESULT] ESTIMATION COMPLETE
#> [RESULT] ============================================================
#> [RESULT] Mean estimate: -1.004793
#> [RESULT] Standard error: 0.054672
#> [RESULT] 95% CI: [-1.111949, -0.897637]
#> [INFO]
#> [INFO] Response model coefficients:
#> [INFO] (Intercept) : 0.865942 (Intercept)
#> [INFO] y : -0.159226 (Effect of y on response prob.)
#> [RESULT] ============================================================
#> NMAR Result (Exponential tilting)
#> -------------------------------
#> y mean: -1.004793 (0.054672)
#> Converged: TRUE
#> Variance method: bootstrap
#> Estimator: exponential_tilting
#>
#> Exptilt diagnostics:
#> Loss value: 0.000000 0.000000
#> Iterations: 3
#> Variance method: bootstrap
#> Bootstrap reps: 5
#> Stopping threshold: 1.000000