---
title: "Getting Started with CIMEHR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with CIMEHR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4,
  eval = TRUE
)
```

## Introduction

Electronic Health Record (EHR) data are longitudinal where patients visit at irregular times, and even when a visit occurs, an outcome or biomarker may not be measured. **CIMEHR** (**C**linical **I**nformative **M**issingness for **E**lectronic **H**ealth **R**ecord data) provides methods for longitudinal EHR analyses where patients visit irregularly and outcomes may be selectively measured at visits. The primary methods jointly characterize three linked processes:

1. the **visiting process**: when a patient has an EHR visit;
2. the **observation process**: whether the outcome/biomarker is recorded at a visit;
3. the **longitudinal outcome process**: the outcome trajectory among the target population.

The main estimator is `CIMEHR()`. Two extensions, `CIMEHR_timevarying_integral()` and `CIMEHR_timevarying_ou()`, allow more flexible time-varying/serially correlated observation-process structures.

In addition, the R package **CIMEHR** provides simple benchmark methods and semiparametric joint models for analyses affected by informative visiting and informative observation.

This vignette starts with the data format, then moves from simpler baseline methods to the more complex CIMEHR estimators. Examples are kept small (a 500-subject subset of the bundled dataset) so each chunk runs in a few seconds.

```{r load-package, eval = TRUE}
library(CIMEHR)
```

## 1. Input data format

The analysis functions expect **long-format data**, with one row per subject-visit or subject-time record. The CIMEHR estimators expect at least the following column roles, with default column names shown:

| Column role             | Default name | Meaning                                                              |
|:------------------------|:-------------|:---------------------------------------------------------------------|
| Subject identifier      | `id`         | Unique subject identifier                                            |
| Visit time              | `time`       | Visit time; `NA` indicates a row without a visit                     |
| Outcome                 | `Y`          | Longitudinal outcome; usually `NA` when the outcome was not recorded |
| Observation indicator   | `R`          | 1 if the outcome was recorded at the visit, 0 otherwise              |
| Censoring/follow-up end | `C`          | End of follow-up for each subject                                    |
| Covariates              | user-named   | Baseline or model covariates                                         |

**A note on time units.** The methods do not assume a specific unit for `time` and `C` — months, weeks, days, or years all work, as long as a single unit is used consistently within a dataset. The choice affects only the interpretation of the slope on `time` in the longitudinal outcome model: that coefficient is the expected change in outcome per one unit of time. The bundled `sim_ehr_data` was generated with `time` and `C` in months over a 120-month (10-year) follow-up window, parameter values calibrated to match an outpatient HbA1c cadence. For your own data, choose the unit that gives a clinically meaningful slope (typically months or years for chronic-disease EHR cohorts) and document it alongside your analysis.

All package methods should be given long/panel data. The CIMEHR-family functions accept custom column names through `id_col`, `time_col`, `y_col`, `r_col`, and `censor_col`, so renaming your data is not required:

```{r custom-column-names, eval = FALSE}
fit <- CIMEHR(
  data       = my_data,
  id_col     = "patient_id",
  time_col   = "visit_time",
  y_col      = "hba1c",
  r_col      = "measured",
  censor_col = "followup_end",
  covars_visit_XV               = c("age", "female"),
  covars_outcome_fixed_XY       = c("age", "female"),
  covars_outcome_random_link_ZY = "female",
  covars_obs_fixed_XO           = c("age", "female"),
  covars_obs_random_link_ZO     = "female"
)
```

### 1.1 Converting short/wide data to long data

If your dataset has one row per subject and repeated visit/outcome columns, reshape it first:

```{r wide-to-long}
library(dplyr)
library(tidyr)

wide_dat <- tibble::tibble(
  id = 1:2,
  Age = c(0.4, -0.2),
  Gender = c(1, 0),
  NSES = c(0.2, -0.4),
  C = c(60, 60),
  time_1 = c(5, 8),  time_2 = c(20, NA), time_3 = c(45, 50),
  Y_1 = c(1.2, NA), Y_2 = c(1.6, NA), Y_3 = c(2.0, 1.1)
)

long_dat <- wide_dat %>%
  pivot_longer(
    cols = matches("^(time|Y)_\\d+$"),
    names_to = c(".value", "visit"),
    names_pattern = "(time|Y)_(\\d+)"
  ) %>%
  mutate(R = as.integer(!is.na(Y))) %>%
  arrange(id, time)

head(long_dat)
```


## 2. Example dataset

The package includes `sim_ehr_data`, a long-format simulated EHR cohort designed to look like an outpatient HbA1c extract. The bundled dataset is one replicate from the package's Monte Carlo simulation framework (see Section 8, [Simulation study](#simulation-study)), specifically the `continuous_phi0` scenario (focal Z = NSES, phi = 0). The columns are listed below. Latent quantities used internally by the data-generating process (the shared factor, the visiting frailty, and per-process random effects) are not included, since they are not observable in real EHR data.

| Column      | Type        | Description                                                  |
|:------------|:------------|:-------------------------------------------------------------|
| `id`        | integer     | Subject identifier                                           |
| `Age`       | continuous  | Standardized age                                             |
| `Gender`    | binary      | 1 = male, 0 = female                                         |
| `Marital`   | binary      | 1 = married, 0 = not                                         |
| `Black`     | binary      | 1 = Black race, 0 = otherwise                                |
| `Hispanic`  | binary      | 1 = Hispanic ethnicity, 0 = otherwise                        |
| `NSES`      | continuous  | Standardized neighborhood median household income            |
| `time`      | continuous  | Visit time on [0, 120] months; `NA` for subjects with no visits|
| `R`         | binary      | 1 if HbA1c was recorded at the visit, 0 otherwise            |
| `log_HbA1c` | continuous  | Natural log of HbA1c; recommended outcome for analysis. `NA` when `R = 0` |
| `C`         | continuous  | End-of-follow-up time (120 months for all subjects)       |

```{r load-data, eval = TRUE}
data("sim_ehr_data")
nrow(sim_ehr_data)
head(sim_ehr_data)
```

For the rest of this vignette, we work with a 500-subject subset to keep the executable examples quick:

```{r subset-data, eval = TRUE}
set.seed(1)
sub_ids <- sample(unique(sim_ehr_data$id), 500)
ehr500  <- sim_ehr_data[sim_ehr_data$id %in% sub_ids, ]
nrow(ehr500)
length(unique(ehr500$id))
```

The true simulation parameters are stored as an attribute and can be passed to `method_comparisons()` to populate true-value, bias, and RMSE columns:

```{r true-params, eval = TRUE}
true_params <- attr(sim_ehr_data, "true_params")
true_params$beta
true_params$gamma
```

## 3. Method comparison

This section compares the statistical methods for longitudinal EHR data implemented in (or referenced by) the package. Methods are differentiated by whether they account for informative visiting (IP) and informative observation (IO), the specific scope of the shared latent structure linking these processes to the outcome, and the adjustment mechanism applied. Symbols: ✓ = explicitly handled or modeled; ✗ = ignored or not handled.

**Abbreviations**: LMM, linear mixed model; VA/OA, visit-adjusted / observation-adjusted; JMVL, joint modeling of visiting and longitudinal data; IIRR, inverse intensity rate ratio; MAR, missing at random; MI, multiple imputation; LI, linear increment.

### Methods implemented in CIMEHR

The following methods have function implementations in the package:

| Method | Function name |
|:--|:--|
| Summary statistics | `Summary_stat()` |
| Standard LMM (Laird and Ware, 1982) | `Linear_mixed_model()` |
| Liang (Liang et al., 2009) | `Joint_modeling_visiting_and_longitudinal_Liang()` |
| IIRR-Weighting (Bůžková and Lumley, 2007) | `Inverse_intensity_rate_ratio_weighting()` |
| IIRR-Balanced (Yiu and Su, 2025) | `Inverse_intensity_rate_ratio_balancing()` |
| Pairwise Likelihood (Chen et al., 2015) | `Pairwise_likelihood()` |
| Multiple Imputation (Rubin, 1987) | `Multiple_imputation_IP()` |
| Linear Increment (Aalen and Gunnes, 2010) | `Linear_increment_IP()` |
| EHRJoint (Du et al., 2025) | `EHRJoint()` |
| Proposed (Yang, Shi, and Mukherjee, 2026) | `CIMEHR()`, `CIMEHR_timevarying_integral()`, `CIMEHR_timevarying_ou()` |

VA-LMM, OA-LMM, JMVL-G, and Adapted-Liang appear in the comparison tables below for context but are not implemented as separate functions; VA-LMM and OA-LMM are available through `Linear_mixed_model()` via the `visit_adjust` argument.

### Outcome-only

| Method | IP | IO | Shared Latent | Adjustment Mechanism | Package |
|:--|:--:|:--:|:--|:--|:--|
| Summary statistics | ✗ | ✗ | None | Reduces the longitudinal series to a single scalar (e.g., min, mean, median, max) for regression. | --- |
| Standard LMM (Laird and Ware, 1982) | ✗ | ✗ | None | Conditions on observed covariates (valid only under MAR). | lme4 |
| VA-LMM (Goldstein et al., 2016) | ✓ | ✗ | None | Includes *visit count* as a fixed covariate proxy. | lme4 |
| OA-LMM | ✗ | ✓ | None | Includes *observation count* as a fixed covariate proxy. | lme4 |

### IP-only

| Method | IP | IO | Shared Latent | Adjustment Mechanism | Package |
|:--|:--:|:--:|:--|:--|:--|
| Liang (Liang et al., 2009) | ✓ | ✗ | Visiting & Outcome | Joint likelihood estimation via shared frailty linking visiting and outcome. | IrregLong |
| JMVL-G (Gasparini et al., 2020) | ✓ | ✗ | Visiting & Outcome | Joint likelihood modeling inter-visit times via shared random effects. | merlin |
| IIRR-Weighting (Bůžková and Lumley, 2007) | ✓ | ✗ | None | Inverse intensity weighting based on visiting intensity. | --- |
| IIRR-Balanced (Yiu and Su, 2025) | ✓ | ✗ | None | Inverse intensity weighting adjusted for covariate balance. | --- |
| Pairwise Likelihood (Chen et al., 2015) | ✓ | ✗ | None | Constructs a likelihood from all within-subject pairs of observations; conditioning on observed visit times eliminates the visit intensity from the likelihood. | --- |

### Imputation + IP-only

| Method | IP | IO | Shared Latent | Adjustment Mechanism | Package |
|:--|:--:|:--:|:--|:--|:--|
| Multiple Imputation (MI) (Rubin, 1987) | ✓ | ✓ | Determined by IP-only | Multiple imputation based on posterior draws, followed by IP-only methods. | mice |
| Linear Increment Methods (LI) (Aalen and Gunnes, 2010) | ✓ | ✓ | Determined by IP-only | Deterministic linear interpolation between observed visits to impute missing values, followed by IP-only methods. | slim |

### IP+IO

| Method | IP | IO | Shared Latent | Adjustment Mechanism | Package |
|:--|:--:|:--:|:--|:--|:--|
| Adapted-Liang (Liang et al., 2009) | ✓ | ✓ | Visiting & Outcome | Joint model structure defined, but observation coefficients are constrained to zero. | --- |
| EHRJoint (Du et al., 2025) | ✓ | ✓ | Visiting & Outcome | Tripartite model; observation process modeled via covariates only (no latent link). | --- |
| Proposed (this package) | ✓ | ✓ | All three processes | Tripartite model; full shared latent structure links Visiting, Observation, and Outcome. | CIMEHR |

IP = informative visiting / informative presence adjustment; IO = informative observation adjustment.
The "Proposed" row corresponds to `CIMEHR()`; `CIMEHR_timevarying_integral()` and `CIMEHR_timevarying_ou()` extend this framework with more flexible observation-process modeling.

## 4. Simpler benchmark methods first

We define a covariate vector once and reuse it. The focal coefficient for the bundled dataset is `NSES`.

```{r covars-vec}
covars_all <- c("Age", "Gender", "Marital", "Black", "Hispanic", "NSES")
out_form   <- log_HbA1c ~ Age + Gender + Marital + Black + Hispanic + NSES + time
```

### 4.1 Summary-statistic regression

`Summary_stat()` is a naive baseline that collapses each subject's observed outcomes to a summary statistic and fits a cross-sectional regression.

```{r summarystat}
fit_ss <- Summary_stat(
  ehr500,
  outcome = "log_HbA1c",
  formula = ~ Age + Gender + Marital + Black + Hispanic + NSES
)
summary(fit_ss)
```

### 4.2 Linear mixed model

`Linear_mixed_model()` wraps `nlme::lme()` and supports standard, observation-adjusted, and visit-adjusted variants.

```{r lmm}
fit_lmm <- Linear_mixed_model(
  ehr500,
  fixed         = out_form,
  random        = ~ 1 | id,
  obs_indicator = "R"
)
summary(fit_lmm)
```

A visit-adjusted variant adds the running visit count as a fixed covariate before observation-indicator filtering:

```{r lmm-va}
fit_va <- Linear_mixed_model(
  ehr500,
  fixed         = out_form,
  random        = ~ 1 | id,
  obs_indicator = "R",
  visit_adjust  = "VA"
)
summary(fit_va)
```

### 4.3 Pairwise likelihood

`Pairwise_likelihood()` fits a pairwise composite likelihood model for observed outcomes. The outcome column is read from `y_col`, so we set `y_col = "log_HbA1c"` explicitly here (the default is `"Y"`).

```{r pairwise}
obs_data <- ehr500[!is.na(ehr500$R) & ehr500$R == 1, ]

fit_pl <- Pairwise_likelihood(
  data        = obs_data,
  formula     = out_form,
  y_col       = "log_HbA1c",
  pair_sample = 1000
)
summary(fit_pl)
```

## 5. Visiting-process and joint methods

### 5.1 Inverse intensity rate ratio weighting

```{r iirr}
fit_iirr <- Inverse_intensity_rate_ratio_weighting(
  ehr500,
  outcome    = "log_HbA1c",
  visit_covs = covars_all
)
summary(fit_iirr)
```

### 5.2 Inverse intensity rate ratio balancing

`Inverse_intensity_rate_ratio_balancing()` extends IIRR with covariate-balancing weights. The `balance_covs` argument defaults to `visit_covs` when omitted. The `phi` argument controls a sensitivity tuning parameter.

```{r iirr-bal}
fit_iirr_bal <- Inverse_intensity_rate_ratio_balancing(
  ehr500,
  outcome    = "log_HbA1c",
  visit_covs = covars_all
)
summary(fit_iirr_bal)
```

### 5.3 Liang--Lu--Ying joint visiting/outcome model

`Joint_modeling_visiting_and_longitudinal_Liang()` fits a joint model for the visiting process and the longitudinal outcome, returning model-based standard errors for the visiting stage and a clustered sandwich estimator for the outcome stage (conditional on the Stage 1 nuisance estimates).

```{r liang}
fit_liang <- Joint_modeling_visiting_and_longitudinal_Liang(
  ehr500,
  outcome     = "log_HbA1c",
  visit_covs  = covars_all,
  long_covs   = covars_all,
  random_covs = "NSES"
)
summary(fit_liang)
```

### 5.4 EHRJoint

`EHRJoint()` is a tripartite joint model. Its observation process uses a Generalized Linear Model (GLM) with a probit link by default (set `obs_link = "logit"` to switch). The fitted object returns model-based standard errors for the visiting and observation stages and a clustered sandwich estimator for the outcome stage.

```{r ehrjoint}
fit_ej <- EHRJoint(
  ehr500,
  outcome     = "log_HbA1c",
  visit_covs  = covars_all,
  long_covs   = covars_all,
  random_covs = "NSES"
)
summary(fit_ej)
```

## 6. Primary CIMEHR methods

### 6.1 Base CIMEHR

`CIMEHR()` jointly models informative visiting, informative observation, and the longitudinal outcome. Its visiting process uses Andersen--Gill partial likelihood with log-normal frailty, and its observation process uses a probit model with latent-random-link random effects when specified.

```{r cimehr}
fit_cimehr <- CIMEHR(
  data                          = ehr500,
  y_col                         = "log_HbA1c",
  covars_visit_XV               = covars_all,
  covars_outcome_fixed_XY       = covars_all,
  covars_outcome_random_link_ZY = "NSES",
  covars_obs_fixed_XO           = covars_all,
  covars_obs_random_link_ZO     = "NSES"
)
print(fit_cimehr) # Print function only prinnts the outcome process estimates
summary(fit_cimehr)
```

### 6.1.1 Stage-specific summaries and coefficient extraction

You can print one process at a time and extract a particular coefficient programmatically:

```{r stage-summary-extract}
summary_visiting(fit_cimehr)
summary_observation(fit_cimehr)
summary_outcome(fit_cimehr)

extract_coefficient(fit_cimehr, stage = "outcome",     parameter = "NSES")
extract_coefficient(fit_cimehr, stage = "visiting",    parameter = "NSES")
extract_coefficient(fit_cimehr, stage = "observation", parameter = "NSES")
```

### 6.2 CIMEHR with Gauss--Hermite quadrature

`CIMEHR_timevarying_integral()` uses Gauss--Hermite quadrature and online filtering for a more flexible observation process. Analytic standard errors are not currently exposed for this variant, so bootstrap inference is recommended (see Section 9).

```{r cimehr-integral,eval = FALSE}
fit_integral <- CIMEHR_timevarying_integral(
  data                          = ehr500,
  y_col                         = "log_HbA1c",
  covars_visit_XV               = covars_all,
  covars_outcome_fixed_XY       = covars_all,
  covars_outcome_random_link_ZY = "NSES",
  covars_obs_fixed_XO           = covars_all,
  covars_obs_random_link_ZO     = "NSES",
  gh_points_U = 7,
  gh_points_q = 5
)
```

### 6.3 CIMEHR with OU pairwise composite likelihood

`CIMEHR_timevarying_ou()` models serial correlation in the observation process using an Ornstein--Uhlenbeck process and pairwise composite likelihood.

```{r cimehr-ou,eval = FALSE}
fit_ou <- CIMEHR_timevarying_ou(
  data                          = ehr500,
  y_col                         = "log_HbA1c",
  covars_visit_XV               = covars_all,
  covars_outcome_fixed_XY       = covars_all,
  covars_outcome_random_link_ZY = "NSES",
  covars_obs_fixed_XO           = covars_all,
  covars_obs_random_link_ZO     = "NSES",
  pair_type = "adjacent"
)
```

## 7. Comparing methods with `method_comparisons()`

The helper `method_comparisons()` fits any user-specified combination of methods and returns separate tables for the visiting, observation, and outcome processes. If a method does not estimate a given process, that entry is shown as `NA`.

```{r method-comparisons-available}
available_comparison_methods()
```

The full call below fits five methods at once on the 500-subject subset. It takes a minute or two to run; if you are building the vignette and want to skip the runtime, change `eval = TRUE` to `eval = FALSE` in the chunk header.

```{r method-comparisons}
cmp_subset <- method_comparisons(
  data    = ehr500,
  methods = c("Summary_stat", "Linear_mixed_model",
              "Inverse_intensity_rate_ratio_weighting",
              "Joint_modeling_visiting_and_longitudinal_Liang",
              "CIMEHR"),
  method_args = list(
    Summary_stat = list(
      outcome = "log_HbA1c",
      formula = ~ Age + Gender + Marital + Black + Hispanic + NSES
    ),
    Linear_mixed_model = list(
      fixed = out_form, random = ~ 1 | id, obs_indicator = "R"
    ),
    Inverse_intensity_rate_ratio_weighting = list(
      outcome = "log_HbA1c", visit_covs = covars_all
    ),
    Joint_modeling_visiting_and_longitudinal_Liang = list(
      outcome = "log_HbA1c",
      visit_covs = covars_all, long_covs = covars_all, random_covs = "NSES"
    ),
    CIMEHR = list(
      y_col                         = "log_HbA1c",
      covars_visit_XV               = covars_all,
      covars_outcome_fixed_XY       = covars_all,
      covars_outcome_random_link_ZY = "NSES",
      covars_obs_fixed_XO           = covars_all,
      covars_obs_random_link_ZO     = "NSES"
    )
  ),
  printSE = TRUE,
  print95CI = TRUE,
  true_value_verbose = TRUE
)

print(cmp_subset)
cmp_subset$tables$outcome
```

If a method fails, `method_comparisons()` keeps going by default, stores the error in `cmp$errors`, and prints `NA` rows for that method.

## 8. Simulation study {#simulation-study}

A Monte Carlo simulation study is provided in `simulations/simulation_study.R`. The script crosses two focal-covariate types (continuous `NSES` and binary `Gender`) with four values of the Ornstein-Uhlenbeck serial-correlation parameter (`phi` $\in$ {0, 0.001, 0.1, 0.3}), producing eight scenarios. Each scenario runs `N_SIM = 1000` Monte Carlo replications; each replication generates a fresh dataset of `n = 2000` participants on the [0, 120] window via `sim_data_gen()` and fits four methods:

- `CIMEHR()`
- `Joint_modeling_visiting_and_longitudinal_Liang()`
- `Inverse_intensity_rate_ratio_weighting()`
- `IrregLong::iiwgee()` (external comparator)

For each method × scenario × parameter (beta on each of Age, Gender, Marital, Black, Hispanic, NSES), the script reports bias, empirical SE, mean analytic SE, SE ratio, RMSE, and 95% CI coverage. Coverage is reported as `---` for `Inverse_intensity_rate_ratio_weighting()` because that method does not return analytic standard errors; bootstrap inference would be needed there.

The bundled `sim_ehr_data` used throughout this vignette is itself one replicate from this framework (the `continuous_phi0` scenario, `sim_id = 1`), so the data-generating spec used by the study and the spec of the example dataset are identical aside from `phi` and the focal covariate.

To run the study:

```{bash, eval = FALSE}
Rscript simulations/simulation_study.R
Rscript simulations/simulation_study.R --cores 8 --nsim 100
Rscript simulations/simulation_study.R --scenarios continuous_phi0,binary_phi0.3
```

## 9. Bootstrap inference

`CIMEHR()`, `CIMEHR_timevarying_ou()`, `EHRJoint()`, and `Joint_modeling_visiting_and_longitudinal_Liang()` already return analytic standard errors (sandwich or model-based). `bootstrap()` is recommended for the methods that do not (`CIMEHR_timevarying_integral()`, `Inverse_intensity_rate_ratio_weighting()`, `Inverse_intensity_rate_ratio_balancing()`), and as an optional robustness check or for propagating Stage 1 nuisance uncertainty for the methods that do. The bootstrap resamples subjects with replacement and refits the selected method on each replicate.

```{r bootstrap, eval = FALSE}
bs <- bootstrap(
  fit_iirr,
  data = ehr500,
  B    = 200,
  seed = 1
)
print(bs)
summary(bs)
```

For CIMEHR-family methods, forward covariate specifications through `covars`:

```{r bootstrap-cimehr, eval = FALSE}
bs_cimehr <- bootstrap(
  fit_cimehr,
  data = ehr500,
  B    = 200,
  seed = 1,
  covars = list(
    covars_visit_XV               = covars_all,
    covars_outcome_fixed_XY       = covars_all,
    covars_outcome_random_link_ZY = "NSES",
    covars_obs_fixed_XO           = covars_all,
    covars_obs_random_link_ZO     = "NSES"
  )
)
```

## 10. Simulating custom data

`sim_data_gen()` accepts a `covariates` list spec; per-process coefficient vectors are keyed by the names declared there. The default spec is the generic two-covariate `Z`/`X` form used in the package's internal tests, but the EHR-style spec used to build `sim_ehr_data` is an instructive example:

```{r custom-sim}
ehr_covs <- list(
  Age      = list(type = "continuous", dist = "uniform",
                  min = 18, max = 99, standardize = TRUE),
  Gender   = list(type = "binary", prob = 0.5),
  Marital  = list(type = "binary", prob = 0.4),
  Black    = list(type = "binary", prob = 0.30),
  Hispanic = list(type = "binary", prob = 0.20),
  NSES     = list(type = "continuous", dist = "normal",
                  mean = 0, sd = 1, standardize = FALSE)
)

dat <- sim_data_gen(
  n          = 500,
  time.start = 0,
  time.end   = 120,
  seed       = 42,
  covariates = ehr_covs,
  gamma = c(intercept = -2.8, Age = 0.5, Gender = -0.10, Marital = -0.05,
            Black = 0.15, Hispanic = -0.05, NSES = 0.3),
  sigma_zeta = 0.5,
  alpha = c(intercept = 0.2, Age = -0.3, Gender = -0.10, Marital = 0.00,
            Black = 0.05, Hispanic = -0.05, NSES = -0.6),
  obs_random_covs = "NSES",
  delta   = c(intercept = -0.3, NSES = -0.3),
  sigma_q = c(intercept =  0.5, NSES =  0.2),
  phi     = 0,
  beta = c(intercept = -2.0, Age = 0.5, Gender = 0.10,
           Marital = -0.03, Black = 0.15, Hispanic = 0.05,
           NSES = -0.5),
  beta_t              = 0.01,
  outcome_random_covs = "NSES",
  theta   = c(intercept = 0.4, NSES = 0.0),
  Sigma_b = matrix(c(0.5, 0, 0, 0.8), 2, 2),
  sigma_y = 0.7,
  verbose = FALSE
)

head(dat$long_data)
```

## References

Aalen, O. O., Borgan, Ø., and Gjessing, H. K. (2008). *Survival and event history analysis: A process point of view*. Springer.

Aalen, O. O. and Gunnes, N. (2010). A dynamic approach for reconstructing missing longitudinal data using the linear increments model. *Biostatistics*, 11, 453--472.

Bůžková, P. and Lumley, T. (2007). Longitudinal data analysis for generalized linear models with follow-up dependent on outcome-related variables. *Canadian Journal of Statistics*, 35, 485--500.

Chen, Y., Ning, J., and Cai, C. (2015). Regression analysis of longitudinal data with irregular and informative observation times. *Biostatistics*, 16, 727--739.

Du, J., Shi, X., and Mukherjee, B. (2025). A new statistical approach for joint modeling of longitudinal outcomes measured in electronic health records with clinically informative presence and observation processes.

Gasparini, A., Abrams, K. R., Barrett, J. K., Major, R. W., Sweeting, M. J., Brunskill, N. J., and Crowther, M. J. (2020). Mixed-effects models for health care longitudinal data with an informative visiting process: A Monte Carlo simulation study. *Statistica Neerlandica*, 74, 5--23.

Goldstein, B. A., Bhavsar, N. A., Phelan, M., and Pencina, M. J. (2016). Controlling for informed presence bias due to the number of health encounters in an electronic health record. *American Journal of Epidemiology*, 184, 847--855.

Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. *Biometrics*, 38, 963--974.

Liang, Y., Lu, W., and Ying, Z. (2009). Joint modeling and analysis of longitudinal data with informative observation times. *Biometrics*, 65, 377--384.

Pullenayegum, E. M. (2026). IrregLong: Analysis of Longitudinal Data with Irregular Observation Times. R package version 0.4.1.

Rubin, D. B. (1987). *Multiple imputation for nonresponse in surveys*. John Wiley & Sons, New York.

Yiu, S. and Su, L. (2025). Accommodating informative visit times for analysing irregular longitudinal data: A sensitivity analysis approach with balancing weights estimators. *Journal of the Royal Statistical Society Series C: Applied Statistics*, 74, 824--843.

Yang, C.-H., Shi, X., and Mukherjee, B. (2026). Joint modeling of longitudinal EHR data with shared random effects for informative visiting and observation processes. *arXiv:2602.15374*.
