In a previous vignette, we introduced the “marginal effect” as a partial derivative. Since derivatives are only properly defined for continuous variables, we cannot use them to interpret the effects of changes in categorical variables. For this, we turn to contrasts between Adjusted predictions. In the context of this package, a “Contrast” is defined as:
The difference between two adjusted predictions, calculated for meaningfully different regressor values (e.g., College graduates vs. Others).
The marginaleffects() function automatically calculates contrasts instead of derivatives for factor, logical, or character variables. The comparisons() function gives users more powerful features to compute different contrasts.
Consider a simple model with a logical and a factor variable:
library(marginaleffects)
library(magrittr)
tmp <- mtcars
tmp$am <- as.logical(tmp$am)
mod <- lm(mpg ~ am + factor(cyl), tmp)The marginaleffects function automatically computes contrasts for each level of the categorical variables, relative to the baseline category (FALSE for logicals, and the reference level for factors), while holding all other values at their mode or mean:
mfx <- marginaleffects(mod)
summary(mfx)
#> Average marginal effects
#> Term Contrast Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 am TRUE - FALSE 2.560 1.298 1.973 0.04851 0.01675 5.103
#> 2 cyl 6 - 4 -6.156 1.536 -4.009 6.1077e-05 -9.16608 -3.146
#> 3 cyl 8 - 4 -10.068 1.452 -6.933 4.1147e-12 -12.91359 -7.222
#>
#> Model type: lm
#> Prediction type: responseThe summary printed above says that moving from the reference category 4 to the level 6 on the cyl factor variable is associated with a change of -6.156 in the adjusted prediction. Similarly, the contrast from FALSE to TRUE on the am variable is equal to 2.560.
We can obtain different contrasts by using the comparisons() function. For example:
comparisons(mod, contrast_factor = "sequential") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 am TRUE - FALSE 2.559954 1.297579 1.972869 4.851044e-02 0.01674586
#> 2 cyl 6 - 4 -6.156118 1.535723 -4.008612 6.107658e-05 -9.16607927
#> 3 cyl 8 - 6 -3.911442 1.470254 -2.660385 7.805144e-03 -6.79308707
#> conf.high
#> 1 5.103162
#> 2 -3.146156
#> 3 -1.029797
comparisons(mod, contrast_factor = "pairwise") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 am TRUE - FALSE 2.559954 1.297579 1.972869 4.851044e-02 0.01674586
#> 2 cyl 6 - 4 -6.156118 1.535723 -4.008612 6.107658e-05 -9.16607927
#> 3 cyl 8 - 4 -10.067560 1.452082 -6.933187 4.114709e-12 -12.91358877
#> 4 cyl 8 - 6 -3.911442 1.470254 -2.660385 7.805144e-03 -6.79308707
#> conf.high
#> 1 5.103162
#> 2 -3.146156
#> 3 -7.221530
#> 4 -1.029797
comparisons(mod, contrast_factor = "reference") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 am TRUE - FALSE 2.559954 1.297579 1.972869 4.851044e-02 0.01674586
#> 2 cyl 6 - 4 -6.156118 1.535723 -4.008612 6.107658e-05 -9.16607927
#> 3 cyl 8 - 4 -10.067560 1.452082 -6.933187 4.114709e-12 -12.91358877
#> conf.high
#> 1 5.103162
#> 2 -3.146156
#> 3 -7.221530For comparison, this code produces the same results using the emmeans package:
library(emmeans)
emm <- emmeans(mod, specs = "cyl")
contrast(emm, method = "revpairwise")
#> contrast estimate SE df t.ratio p.value
#> 6 - 4 -6.16 1.54 28 -4.009 0.0012
#> 8 - 4 -10.07 1.45 28 -6.933 <.0001
#> 8 - 6 -3.91 1.47 28 -2.660 0.0331
#>
#> Results are averaged over the levels of: am
#> P value adjustment: tukey method for comparing a family of 3 estimates
emm <- emmeans(mod, specs = "am")
contrast(emm, method = "revpairwise")
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 2.56 1.3 28 1.973 0.0585
#>
#> Results are averaged over the levels of: cylNote that these commands also work on for other types of models, such as GLMs, on different scales:
mod_logit <- glm(am ~ factor(gear), data = mtcars, family = binomial)
comparisons(mod_logit) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 gear 4 - 3 0.6666667 1.360805e-01 4.899061 9.629569e-07 0.3999538
#> 2 gear 5 - 3 1.0000000 1.071403e-05 93335.529606 0.000000e+00 0.9999790
#> conf.high
#> 1 0.9333795
#> 2 1.0000210
comparisons(mod_logit, type = "link") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 gear 4 - 3 21.25922 4577.962 0.004643817 0.9962948 -8951.381 8993.90
#> 2 gear 5 - 3 41.13214 9155.924 0.004492407 0.9964156 -17904.148 17986.41We can also compute contrasts for differences in numeric variables. For example, we can see what happens to the adjusted predictions when we increment the hp variable by 1 unit (default) or by 5 units:
mod <- lm(mpg ~ hp, data = mtcars)
comparisons(mod) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 hp +1 -0.06822828 0.0101193 -6.742389 1.558043e-11 -0.08806175
#> conf.high
#> 1 -0.04839481
comparisons(mod, contrast_numeric = 5) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low
#> 1 hp +5 -0.3411414 0.05059652 -6.742389 1.558043e-11 -0.4403087
#> conf.high
#> 1 -0.241974Compare adjusted predictions for a change in the regressor between two arbitrary values:
comparisons(mod, contrast_numeric = c(90, 110)) %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp 110 - 90 -1.364566 0.2023861 -6.742389 1.558043e-11 -1.761235 -0.9678961Compare adjusted predictions when the regressor changes across the interquartile range, across one or two standard deviations about its mean, or from across its full range:
comparisons(mod, contrast_numeric = "iqr") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp IQR -5.697061 0.8449619 -6.742389 1.558043e-11 -7.353156 -4.040966
comparisons(mod, contrast_numeric = "sd") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp sd -4.677926 0.6938085 -6.742389 1.558043e-11 -6.037766 -3.318087
comparisons(mod, contrast_numeric = "2sd") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp 2sd -9.355853 1.387617 -6.742389 1.558043e-11 -12.07553 -6.636174
comparisons(mod, contrast_numeric = "minmax") %>% tidy()
#> term contrast estimate std.error statistic p.value conf.low conf.high
#> 1 hp Max - Min -19.3086 2.863763 -6.742389 1.558043e-11 -24.92147 -13.69573In models with multiplicative interactions, the contrasts of a categorical variable will depend on the values of the interacted variable:
We can now use the newdata argument of the marginaleffects function to compute contrasts for different values of the other regressors. As in the marginal effects vignette, the datagrid function can be handy. Since we only care about the logical am contrast, we use the variables to indicate the subset of results to report:
comparisons(mod_int, newdata = datagrid(cyl = tmp$cyl), variables = "am")
#> rowid term contrast comparison std.error am cyl
#> 1: 1 am TRUE - FALSE 1.441667 2.315925 FALSE 6
#> 2: 2 am TRUE - FALSE 5.175000 2.052848 FALSE 4
#> 3: 3 am TRUE - FALSE 0.350000 2.315925 FALSE 8Once again, we obtain the same results with emmeans:
emm <- emmeans(mod_int, specs = "am", by = "cyl")
contrast(emm, method = "revpairwise")
#> cyl = 4:
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 5.17 2.05 26 2.521 0.0182
#>
#> cyl = 6:
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 1.44 2.32 26 0.623 0.5390
#>
#> cyl = 8:
#> contrast estimate SE df t.ratio p.value
#> TRUE - FALSE 0.35 2.32 26 0.151 0.8810As described above, the marginaleffects package includes limited support to compute contrasts. Users who require more powerful features are encouraged to consider alternative packages such as emmeans, modelbased, or ggeffects. These packages offer useful features such as automatic back-transforms, p value correction for multiple comparisons, and more.