Time Dependent Covariate Use

library(Colossus)
library(data.table)

General Usage

For Cox Proportional Hazards regression, the model is generally assumed to be independent of event time. However for more complex models, Colossus has the capability to perform regression using covariates that change over time. These can be split into two general types of covariates, step functions changing with time and multiplicative interactions with time. Colossus generates the new dataset by splitting each row of the original dataset into smaller intervals. This assumes that over each interval the values of every covariate are approximately constant. For Cox Proportional Hazards, rows that do not contain a event time are not used for regression, so Colossus has an option to only use small intervals around each event time. With this option the time dependent covariate is evaluated only at event times. For data-sets with a small number of discrete event times this can save memory.

Multiplicative Interaction

The simplest type of time dependent covariate is an interaction term between time and another covariate. Suppose we have a row in a dataset with a factor covariate “group” and some arbitrary endpoints to the time interval. Colossus starts by using a user provided function to calculate the value of the time dependent covariate at the endpoints. We assume that the value of “group” is constant over the interval and time is changing linearly. Colossus calculates the value of the time dependent covariate over intervals by linearly interpolating between the value at the endpoints. This process assumes that the interaction is linear or the interval is small enough for the interaction to be approximately linear.

dft <- data.table("x" = c(1, 2, 3), "y" = c(2, 5, 10))
g <- ggplot2::ggplot(dft, ggplot2::aes(x = .data$x, y = .data$y)) +
  ggplot2::geom_point(color = "black") +
  ggplot2::geom_line(color = "black", alpha = 1) +
  ggplot2::labs(x = "age (days)", y = "Covariate Value")
x <- seq(1, 3, by = 0.1)
y <- 1 + x^2
dft <- data.table("x" = x, "y" = y)
g <- g + ggplot2::geom_line(
  data = dft, ggplot2::aes(x = .data$x, y = .data$y),
  color = "black", linetype = "dashed"
)
g

Linear Interpolated Function

\[ \begin{aligned} Y(x)=x^2 + 1 \end{aligned} \]

This is most helpful in a situation where the user has continuous data over a series of intervals and believes that the values can be interpolated within each interval.

Step Function Interaction

The second type of time dependent covariate is one which changes based on conditional statements. One example is a covariate to split data into bins by time. Colossus uses a string to identify where to change value. The user inputs a string of the form “#l?” for a time value “#”, a condition “l”, and a question mark as a delimiter. Colossus allows for four conditions:

So the following would be equivalent to “\(0g?6g?12g?\)

dft <- data.table("x" = c(-1, 1, 5, 8, 13), "y" = c(0, 1, 1, 2, 3))
g <- ggplot2::ggplot(dft, ggplot2::aes(x = .data$x, y = .data$y)) +
  ggplot2::geom_point(color = "black")
dft <- data.table("x" = c(-1, -0.01, 0, 1, 5.99, 6, 11.99, 12, 13), "y" = c(0, 0, 1, 1, 1, 2, 2, 3, 3))
g <- g + ggplot2::geom_line(data = dft, ggplot2::aes(x = .data$x, y = .data$y), color = "black") +
  ggplot2::labs(x = "age (days)", y = "Covariate Value")
g

Monotonic Step Function Applied

\[ \begin{aligned} Y(x)=\begin{cases} 0 &(x < 0) \\ 1 & (x \ge 0) \\ 2 &(x \ge 6) \\ 3 &(x \ge 12) \end{cases}\\ \end{aligned} \]

Meanwhile the following is equivalent to “\(0g?6g?12l?\)

dft <- data.table("x" = c(-1, 1, 5, 8, 13), "y" = c(1, 2, 2, 3, 2))
g <- ggplot2::ggplot(dft, ggplot2::aes(x = .data$x, y = .data$y)) +
  ggplot2::geom_point(color = "black")
dft <- data.table("x" = c(-1, -0.01, 0, 1, 5.99, 6, 11.99, 12, 13), "y" = c(1, 1, 2, 2, 2, 3, 3, 2, 2))
g <- g + ggplot2::geom_line(data = dft, ggplot2::aes(x = .data$x, y = .data$y), color = "black") +
  ggplot2::labs(x = "age (days)", y = "Covariate Value")
g

Step Function Applied

\[ \begin{aligned} Y(x)=\begin{cases} 1 &(x < 0) \\ 2 & (x \ge 0) \\ 3 &(x \ge 6) \\ 2 &(x \ge 12) \end{cases}\\ \end{aligned} \]

This is most helpful in situations where the user has reason to believe that the effect of a covariate on events is not uniform over time despite the covariate being constant over each interval. This allows the user to generate a list of factors to interact with any covariate of interest.