---
title: "Introduction to PMLE4SCR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to PMLE4SCR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(kableExtra)
```

## Overview

The `PMLE4SCR` package implements two-stage pseudo maximum likelihood
estimation (PMLE) for copula-based regression models with semi-competing
risks data. In semi-competing risks data, a non-terminal event (e.g.,
disease relapse) is subject to dependent censoring by a terminal event
(e.g., death), but not vice versa.

The package provides two main functions:

- `PMLE4SCR()`: Two-stage pseudo maximum likelihood estimation
- `MLE4SCR()`: Simultaneous maximum likelihood estimation

## Data

We illustrate the package using the Bone Marrow Transplant (BMT) dataset
from the `SemiCompRisks` package. This dataset contains 137 patients with
acute leukemia aged 7 to 52. The non-terminal event is leukemia relapse
and the terminal event is death. Three disease groups are considered:
AML-low risk, AML-high risk, and ALL (acute lymphoblastic leukemia),
with AML-low as the baseline group.

```{r data}
library(PMLE4SCR)
data(BMT, package = "SemiCompRisks")
BMT$g <- factor(BMT$g, levels = c(2, 3, 1),
                labels = c("AML-low", "AML-high", "ALL"))
```

The key variables in the dataset are described below:

```{r data-table, echo = FALSE}
var_desc <- data.frame(
  Variable = c("T1", "T2", "delta1", "delta2", "g"),
  Description = c(
    "Time to death (terminal event)",
    "Time to leukemia relapse (non-terminal event)",
    "Death indicator (1 = observed, 0 = censored)",
    "Relapse indicator (1 = observed, 0 = censored)",
    "Disease group (AML-low, AML-high, ALL)"
  )
)
knitr::kable(var_desc, caption = "Table 1: Key variables in the BMT dataset")
```

A preview of the first six observations:

```{r head}
head(BMT[, c("T1", "T2", "delta1", "delta2", "g")])
```

## Fitting the Model with PMLE

We fit a copula-based model using the Clayton copula, with proportional
hazards (PH) marginals for both relapse and death. The disease group `g`
is included as a covariate for both marginals and the copula parameter.

```{r pmle}
myfit <- PMLE4SCR(BMT, time = "T2", death = "T1",
                  status_time = "delta2", status_death = "delta1",
                  T.fmla = ~ g, D.fmla = ~ g,
                  copula.family = "Clayton",
                  copula.control = list(formula = ~ g))
```

## Results

### Dependence between relapse and death

The estimated copula parameters and corresponding Kendall's $\tau$ values
measure the dependence between relapse and death times for each disease group.
A larger $\tau$ indicates stronger dependence.

```{r gamma}
knitr::kable(myfit$gamma,
             digits = 3,
             caption = "Table 2: Estimated copula parameters (PMLE)")
```

### Regression coefficients for relapse (non-terminal event)

The estimated regression coefficients for the marginal distribution of
relapse time. Positive values indicate higher risk of relapse compared
to the AML-low baseline group.

```{r betaT, echo = FALSE}
tab <- myfit$betaT
rownames(tab) <- NULL
tab$para <- c("AML-high", "ALL")
knitr::kable(tab,
             col.names = c("Group", "Estimate", "SE"),
             digits = 3,
             caption = "Table 3: Regression coefficients for relapse time (PMLE)")
```

### Regression coefficients for death (terminal event)

```{r betaD, echo = FALSE}
tab <- myfit$betaD
rownames(tab) <- NULL
tab$para <- c("AML-high", "ALL")
knitr::kable(tab,
             col.names = c("Group", "Estimate", "SE"),
             digits = 3,
             caption = "Table 4: Regression coefficients for death time (PMLE)")
```

## Comparison: PMLE vs Simultaneous MLE

For comparison, we also fit the simultaneous MLE and compare the estimates
from both methods side by side.

```{r mle}
myfit_mle <- MLE4SCR(BMT, time = "T2", death = "T1",
                     status_time = "delta2", status_death = "delta1",
                     T.fmla = ~ g, D.fmla = ~ g,
                     copula.family = "Clayton",
                     copula.control = list(formula = ~ g))
```

```{r comparison, echo = FALSE}
comp <- data.frame(
  Parameter = c("Copula (Intercept)", "Copula (AML-high)", "Copula (ALL)",
                "Relapse: AML-high", "Relapse: ALL",
                "Death: AML-high", "Death: ALL"),
  PMLE_Est = c(myfit$gamma$est,
               myfit$betaT$est,
               myfit$betaD$est),
  PMLE_SE  = c(myfit$gamma$se,
               myfit$betaT$se,
               myfit$betaD$se),
  MLE_Est  = c(myfit_mle$gamma$est,
               myfit_mle$betaT$est,
               myfit_mle$betaD$est),
  MLE_SE   = c(myfit_mle$gamma$se,
               myfit_mle$betaT$se,
               myfit_mle$betaD$se)
)
knitr::kable(comp,
             digits = 3,
             col.names = c("Parameter",
                           "Estimate", "SE",
                           "Estimate", "SE"),
             caption = "Table 5: Comparison of PMLE and simultaneous MLE estimates") |>
  kableExtra::add_header_above(c(" " = 1, "PMLE" = 2, "Simultaneous MLE" = 2))
```

The two methods produce similar estimates. The PMLE is computationally
more efficient and robust against copula misspecification for the
marginal parameters.

## Reference

Arachchige, S. J., Chen, X., and Zhou, Q. M. (2025). Two-stage pseudo
maximum likelihood estimation of semiparametric copula-based regression
models for semi-competing risks data. *Lifetime Data Analysis*, 31, 52-75.
