In the context of this package, “marginal means” refer to the values obtained by this three step process:
For example, consider a model with a numeric, a factor, and a logical predictor:
library(marginaleffects)
dat <- mtcars
dat$cyl <- as.factor(dat$cyl)
dat$am <- as.logical(dat$am)
mod <- lm(mpg ~ hp + cyl + am, data = dat)Using the predictions function, we set the hp variable at its mean and compute predictions for all combinations for am and cyl:
predictions(mod, variables = c("am", "cyl"))
#> rowid type predicted std.error conf.low conf.high mpg hp am
#> 1 1 response 21.03914 1.213043 18.55019 23.52810 20.09062 146.6875 TRUE
#> 2 2 response 24.96372 1.176830 22.54907 27.37838 20.09062 146.6875 TRUE
#> 3 3 response 21.43031 1.826126 17.68341 25.17721 20.09062 146.6875 TRUE
#> 4 4 response 16.88129 1.272938 14.26944 19.49314 20.09062 146.6875 FALSE
#> 5 5 response 20.80587 1.756564 17.20169 24.41004 20.09062 146.6875 FALSE
#> 6 6 response 17.27245 1.116885 14.98079 19.56411 20.09062 146.6875 FALSE
#> cyl
#> 1 6
#> 2 4
#> 3 8
#> 4 6
#> 5 4
#> 6 8For illustration purposes, it is useful to reshape the above results:
| cyl | TRUE | FALSE | Marginal mean of cyl |
|---|---|---|---|
| 6 | 21.0 | 16.9 | 19.0 |
| 4 | 25.0 | 20.8 | 22.9 |
| 8 | 21.4 | 17.3 | 19.4 |
| Marginal means of am | 22.5 | 18.3 |
The marginal means of am and cyl are obtained by taking the mean of the adjusted predictions across cells. The marginalmeans function gives us the same results easily:
marginalmeans(mod)
#> term value marginalmean std.error conf.low conf.high
#> 1 am FALSE 18.31987 0.7853925 16.78053 19.85921
#> 2 am TRUE 22.47772 0.8343346 20.84246 24.11299
#> 3 cyl 4 22.88479 1.3566479 20.22581 25.54378
#> 4 cyl 6 18.96022 1.0729360 16.85730 21.06313
#> 5 cyl 8 19.35138 1.3770817 16.65235 22.05041The same results can be obtained using the very powerful emmeans package:
library(emmeans)
emmeans(mod, specs = "cyl")
#> cyl emmean SE df lower.CL upper.CL
#> 4 22.9 1.36 27 20.1 25.7
#> 6 19.0 1.07 27 16.8 21.2
#> 8 19.4 1.38 27 16.5 22.2
#>
#> Results are averaged over the levels of: am
#> Confidence level used: 0.95
emmeans(mod, specs = "am")
#> am emmean SE df lower.CL upper.CL
#> FALSE 18.3 0.785 27 16.7 19.9
#> TRUE 22.5 0.834 27 20.8 24.2
#>
#> Results are averaged over the levels of: cyl
#> Confidence level used: 0.95By default, the marginalmeans() function calculates marginal means for each categorical predictor one after the other. We can also compute marginal means for combinations of categories by setting interaction=TRUE:
library(glmmTMB)
dat <- "https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv"
dat <- read.csv(dat)
titanic <- glmmTMB(
Survived ~ Sex * PClass + Age + (1 | PClass),
family = binomial,
data = dat)Regardless of the scale of the predictions (type argument), marginalmeans() always computes standard errors using the Delta Method:
marginalmeans(titanic,
type = "response",
variables = c("Sex", "PClass"))
#> Sex PClass marginalmean std.error
#> 1 female 1st 0.9701724 0.01392585
#> 2 female 2nd 0.8803769 0.03608862
#> 3 female 3rd 0.3644761 0.05020662
#> 4 male 1st 0.4450399 0.05150155
#> 5 male 2nd 0.1422606 0.03045734
#> 6 male 3rd 0.1189557 0.02124176When the model is linear or on the link scale, it also produces confidence intervals:
marginalmeans(
titanic,
type = "link",
variables = c("Sex", "PClass"))
#> Sex PClass marginalmean std.error conf.low conf.high
#> 1 female 1st 3.4820408 0.4811625 2.5389797 4.4251019
#> 2 female 2nd 1.9960037 0.3426777 1.3243678 2.6676396
#> 3 female 3rd -0.5559886 0.2167161 -0.9807443 -0.1312329
#> 4 male 1st -0.2207321 0.2085377 -0.6294584 0.1879942
#> 5 male 2nd -1.7966396 0.2495423 -2.2857336 -1.3075456
#> 6 male 3rd -2.0023565 0.2025903 -2.3994262 -1.6052869It is easy to transform those link-scale marginal means with arbitrary functions using the transform_post argument:
marginalmeans(titanic,
type = "link",
transform_post = insight::link_inverse(titanic),
variables = c("Sex", "PClass"))
#> Sex PClass marginalmean conf.low conf.high
#> 1 female 1st 0.9701724 0.92682967 0.9881687
#> 2 female 2nd 0.8803769 0.78990748 0.9350899
#> 3 female 3rd 0.3644761 0.27274413 0.4672388
#> 4 male 1st 0.4450399 0.34763335 0.5468606
#> 5 male 2nd 0.1422606 0.09231141 0.2128979
#> 6 male 3rd 0.1189557 0.08321647 0.1672440When a model does not include interactions, marginalmeans() defaults to reporting EMMs for each category individually, without interactions:
titanic2 <- glmmTMB(
Survived ~ Sex + PClass + Age + (1 | PClass),
family = binomial,
data = dat)
marginalmeans(
titanic2,
variables = c("Sex", "PClass"))
#> term value marginalmean std.error
#> 1 PClass 1st 0.7065907 0.02889053
#> 2 PClass 2nd 0.4935160 0.02871382
#> 3 PClass 3rd 0.2910045 0.02680176
#> 4 Sex female 0.7408546 0.02402629
#> 5 Sex male 0.2532196 0.02031785We can force the interactions:
marginalmeans(
titanic2,
interaction = TRUE,
variables = c("Sex", "PClass"))
#> Sex PClass marginalmean std.error
#> 1 female 1st 0.92882414 0.01610223
#> 2 female 2nd 0.78190513 0.03564791
#> 3 female 3rd 0.51183442 0.04583722
#> 4 male 1st 0.48435732 0.04680392
#> 5 male 2nd 0.20512692 0.03080665
#> 6 male 3rd 0.07017461 0.01354387The summary, tidy, and glance functions are also available to summarize and manipulate the results:
mm <- marginalmeans(mod)
tidy(mm)
#> term value estimate std.error statistic p.value conf.low conf.high
#> 1 am FALSE 18.31987 0.7853925 23.32575 0 16.78053 19.85921
#> 2 am TRUE 22.47772 0.8343346 26.94090 0 20.84246 24.11299
#> 3 cyl 4 22.88479 1.3566479 16.86863 0 20.22581 25.54378
#> 4 cyl 6 18.96022 1.0729360 17.67134 0 16.85730 21.06313
#> 5 cyl 8 19.35138 1.3770817 14.05246 0 16.65235 22.05041
glance(mm)
#> aic bic r.squared adj.r.squared rmse nobs
#> 1 161.0033 169.7978 0.824875 0.7989306 2.482432 32
summary(mm)
#> Estimated marginal means
#> Term Value Mean Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 am FALSE 18.32 0.7854 23.33 < 2.22e-16 16.78 19.86
#> 2 am TRUE 22.48 0.8343 26.94 < 2.22e-16 20.84 24.11
#> 3 cyl 4 22.88 1.3566 16.87 < 2.22e-16 20.23 25.54
#> 4 cyl 6 18.96 1.0729 17.67 < 2.22e-16 16.86 21.06
#> 5 cyl 8 19.35 1.3771 14.05 < 2.22e-16 16.65 22.05
#>
#> Model type: lm
#> Prediction type: responseThanks to those tidiers, we can also present the results in the style of a regression table using the modelsummary package. For examples, see the tables and plots vignette.
This example requires version 0.2.0 of the marginaleffects package.
To begin, we generate data and estimate a large model:
library(nnet)
library(marginaleffects)
set.seed(1839)
n <- 1200
x <- factor(sample(letters[1:3], n, TRUE))
y <- vector(length = n)
y[x == "a"] <- sample(letters[4:6], sum(x == "a"), TRUE)
y[x == "b"] <- sample(letters[4:6], sum(x == "b"), TRUE, c(1 / 4, 2 / 4, 1 / 4))
y[x == "c"] <- sample(letters[4:6], sum(x == "c"), TRUE, c(1 / 5, 3 / 5, 2 / 5))
dat <- data.frame(x = x, y = factor(y))
tmp <- as.data.frame(replicate(20, factor(sample(letters[7:9], n, TRUE))))
dat <- cbind(dat, tmp)
void <- capture.output({
mod <- multinom(y ~ ., dat)
})Try to compute marginal means, but realize that your grid won’t fit in memory:
marginalmeans(mod, type = "probs")
#> Error: You are trying to create a prediction grid with more than 1 billion rows, which is likely to exceed the memory and computational power available on your local machine. Presumably this is because you are considering many variables with many levels. All of the functions in the `marginaleffects` package include arguments to specify a restricted list of variables over which to create a prediction grid.Use the variables and variables_grid arguments to compute marginal means over a more reasonably sized grid: