A “marginal effect” (MFX) is a measure of the association between a change in a regressor, and a change in the response variable. More formally, the excellent margins vignette defines the concept as follows:
Marginal effects are partial derivatives of the regression equation with respect to each variable in the model for each unit in the data.
Put differently, the marginal effect measures the association between a change in a regressor \(x\), and a change in the response \(y\). Put differently, differently, the marginal effect is the slope of the prediction function, measured at a specific value of the regressor \(x\).
Marginal effects are extremely useful, because they are intuitive and easy to interpret. They are often the main quantity of interest in an empirical analysis.
In scientific practice, the “Marginal Effect” falls in the same toolbox as the “Contrast.” Both try to answer a counterfactual question: What would happen to \(y\) if \(x\) were different? They allow us to model the “effect” of a change/difference in the regressor \(x\) on the response \(y\).1
To illustrate the concept, consider this quadratic function:
\[y = -x^2\]
From the definition above, we know that the marginal effect is the partial derivative of \(y\) with respect to \(x\):
\[\frac{\partial y}{\partial x} = -2x\]
To get intuition about how to interpret this quantity, consider the response of \(y\) to \(x\). It looks like this:
When \(x\) increases, \(y\) starts to increase. But then, as \(x\) increases further, \(y\) creeps back down in negative territory.
A marginal effect is the slope of this response function at a certain value of \(x\). The next plot adds three tangent lines, highlighting the slopes of the response function for three values of \(x\). The slopes of these tangents tell us three things:
Below, we show how to reach the same conclusions in an estimation context, with simulated data and the marginaleffects function.
marginaleffects functionThe marginal effect is a unit-level measure of association between changes in a regressor and changes in the response. Except in the simplest linear models, the value of the marginal effect will be different from individual to individual, because it will depend on the values of the other covariates for each individual.
The marginaleffects function thus produces distinct estimates of the marginal effect for each row of the data used to fit the model. The output of marginaleffects is a simple data.frame, which can be inspected with all the usual R commands.
To show this, we load the library, download the Palmer Penguins data from the Rdatasets archive, and estimate a GLM model:
library(marginaleffects)
dat <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv")
dat$large_penguin <- ifelse(dat$body_mass_g > median(dat$body_mass_g, na.rm = TRUE), 1, 0)
mod <- glm(large_penguin ~ bill_length_mm + flipper_length_mm + species,
           data = dat, family = binomial)mfx <- marginaleffects(mod)
head(mfx)
#>   rowid     type           term contrast       dydx   std.error large_penguin
#> 1     1 response bill_length_mm          0.01762275 0.007837288             0
#> 2     2 response bill_length_mm          0.03584665 0.011917159             0
#> 3     3 response bill_length_mm          0.08443344 0.021119186             0
#> 4     4 response bill_length_mm          0.03471401 0.006506804             0
#> 5     5 response bill_length_mm          0.05087500 0.013407802             0
#> 6     6 response bill_length_mm          0.01650827 0.007252823             0
#>   bill_length_mm flipper_length_mm species
#> 1           39.1               181  Adelie
#> 2           39.5               186  Adelie
#> 3           40.3               195  Adelie
#> 4           36.7               193  Adelie
#> 5           39.3               190  Adelie
#> 6           38.9               181  AdelieA dataset with one marginal effect estimate per unit of observation is a bit unwieldy and difficult to interpret. Many analysts like to report the “Average Marginal Effect”, that is, the average of all the observation-specific marginal effects. These are easy to compute based on the full data.frame shown above, but the summary function is convenient:
summary(mfx)
#> Average marginal effects 
#>                Term           Contrast   Effect Std. Error z value   Pr(>|z|)
#> 1    bill_length_mm                     0.02757   0.005782  4.7679 1.8613e-06
#> 2 flipper_length_mm                     0.01058   0.002357  4.4897 7.1312e-06
#> 3           species Chinstrap - Adelie -0.41479   0.056900 -7.2898 3.1042e-13
#> 4           species    Gentoo - Adelie  0.06170   0.106856  0.5774    0.56367
#>       2.5 %  97.5 %
#> 1  0.016237  0.0389
#> 2  0.005962  0.0152
#> 3 -0.526312 -0.3033
#> 4 -0.147735  0.2711
#> 
#> Model type:  glm 
#> Prediction type:  responseNote that since marginal effects are derivatives, they are only properly defined for continuous numeric variables. When the model also includes categorical regressors, the summary function will try to display relevant (regression-adjusted) contrasts between different categories, as shown above.
You can also extract average marginal effects using tidy and glance methods which conform to the broom package specification:
tidy(mfx)
#>       type              term           contrast    estimate   std.error
#> 1 response    bill_length_mm                     0.02756962 0.005782306
#> 2 response flipper_length_mm                     0.01058151 0.002356822
#> 3 response           species Chinstrap - Adelie -0.41479008 0.056899735
#> 4 response           species    Gentoo - Adelie  0.06169819 0.106855875
#>    statistic      p.value    conf.low   conf.high
#> 1  4.7679280 1.861303e-06  0.01623651  0.03890273
#> 2  4.4897349 7.131187e-06  0.00596222  0.01520079
#> 3 -7.2898420 3.104184e-13 -0.52631151 -0.30326865
#> 4  0.5773963 5.636718e-01 -0.14773548  0.27113186
glance(mfx)
#>   null.deviance df.null    logLik      AIC      BIC deviance df.residual nobs
#> 1      473.8202     341 -84.92257 179.8451 199.0192 169.8451         337  342
#>          F      rmse
#> 1 15.66122 0.7099236Sometimes, we are not interested in all the unit-specific marginal effects, but would rather look at the estimated marginal effects for certain “typical” individuals, or for user-specified values of the regressors. The datagrid function helps us build a data grid full of “typical” rows. For example, to generate artificial Adelies and Gentoos with 180mm flippers:
datagrid(flipper_length_mm = 180,
         species = c("Adelie", "Gentoo"),
         model = mod)
#>   bill_length_mm flipper_length_mm species
#> 1       43.92193               180  Adelie
#> 2       43.92193               180  GentooThe same command can be used (omitting the model argument) to marginaleffects’s newdata argument to compute marginal effects for those (fictional) individuals:
marginaleffects(mod,
                newdata = datagrid(flipper_length_mm = 180,
                                   species = c("Adelie", "Gentoo")))
#>   rowid     type              term           contrast        dydx   std.error
#> 1     1 response    bill_length_mm                     0.06067504 0.033257106
#> 2     2 response    bill_length_mm                     0.08466334 0.040417468
#> 3     1 response flipper_length_mm                     0.02328764 0.005498309
#> 4     2 response flipper_length_mm                     0.03249469 0.008616513
#> 5     1 response           species Chinstrap - Adelie -0.21105002 0.103178237
#> 6     2 response           species Chinstrap - Adelie -0.21105002 0.103178237
#> 7     1 response           species    Gentoo - Adelie  0.15912111 0.303473386
#> 8     2 response           species    Gentoo - Adelie  0.15912111 0.303473386
#>   bill_length_mm flipper_length_mm species
#> 1       43.92193               180  Adelie
#> 2       43.92193               180  Gentoo
#> 3       43.92193               180  Adelie
#> 4       43.92193               180  Gentoo
#> 5       43.92193               180  Adelie
#> 6       43.92193               180  Gentoo
#> 7       43.92193               180  Adelie
#> 8       43.92193               180  GentooWhen variables are omitted from the datagrid call, they will automatically be set at their median or mode (depending on variable type).
The “Marginal Effect at the Mean” is a marginal effect calculated for a hypothetical observation where each regressor is set at its mean or mode. By default, the datagrid function that we used in the previous section sets all regressors to their means or modes. To calculate the MEM, we can simply type:
marginaleffects(mod, newdata = datagrid())
#>   rowid     type              term           contrast        dydx   std.error
#> 1     1 response    bill_length_mm                     0.05065165 0.011975692
#> 2     1 response flipper_length_mm                     0.01944084 0.005597224
#> 3     1 response           species Chinstrap - Adelie -0.80569873 0.073539686
#> 4     1 response           species    Gentoo - Adelie  0.08359420 0.114498716
#>   bill_length_mm flipper_length_mm species
#> 1       43.92193          200.9152  Adelie
#> 2       43.92193          200.9152  Adelie
#> 3       43.92193          200.9152  Adelie
#> 4       43.92193          200.9152  AdelieThe datagrid function allowed us look at completely fictional individuals. Setting the grid.type argument of this function to "counterfactual" lets us compute the marginal effects for the actual observations in our dataset, but with a few manipulated values. For example, this code will create a data.frame twice as long as the original dat, where each observation is repeated with different values of the flipper_length_mm variable:
We see that the rows 1, 2, and 3 of the original dataset have been replicated twice, with different values of the flipper_length_mm variable:
nd[nd$rowid %in% 1:3,]
#>     rowid_original bill_length_mm species flipper_length_mm
#> 1                1           39.1  Adelie               160
#> 2                2           39.5  Adelie               160
#> 3                3           40.3  Adelie               160
#> 343              1           39.1  Adelie               180
#> 344              2           39.5  Adelie               180
#> 345              3           40.3  Adelie               180Again, we can use this to compute average (or median, or anything else) marginal effects over the counterfactual individuals:
The plot_cme function can be used to draw “Conditional Marginal Effects.” This is useful when a model includes interaction terms and we want to plot how the marginal effect of a variable changes as the value of a “condition” (or “moderator”) variable changes:
The marginal effects in the plot above were computed with values of all regressors – except the effect and the condition – held at their means or modes, depending on variable type.
In the “Definition” section of this vignette, we considered how marginal effects can be computed analytically in a simple quadratic equation context. We can now use the marginaleffects function to replicate our analysis of the quadratic function in a regression application.
Say you estimate a linear regression model with a quadratic term:
\[Y = \beta_0 + \beta_1 X^2 + \varepsilon\]
and obtain estimates of \(\beta_0=1\) and \(\beta_1=2\). Taking the partial derivative with respect to \(X\) and plugging in our estimates gives us the marginal effect of \(X\) on \(Y\):
\[\partial Y / \partial X = \beta_0 + 2 \cdot \beta_1 X\] \[\partial Y / \partial X = 1 + 4X\]
This result suggests that the effect of a change in \(X\) on \(Y\) depends on the level of \(X\). When \(X\) is large and positive, an increase in \(X\) is associated to a large increase in \(Y\). When \(X\) is small and positive, an increase in \(X\) is associated to a small increase in \(Y\). When \(X\) is a large negative value, an increase in \(X\) is associated with a decrease in \(Y\).
marginaleffects arrives at the same conclusion in simulated data:
library(tidyverse)
N <- 1e5
quad <- data.frame(x = rnorm(N))
quad$y <- 1 + 1 * quad$x + 2 * quad$x^2 + rnorm(N)
mod <- lm(y ~ x + I(x^2), quad)
marginaleffects(mod, newdata = datagrid(x = -2:2))  %>%
    mutate(truth = 1 + 4 * x) %>%
    select(dydx, truth)
#>         dydx truth
#> 1 -6.9978857    -7
#> 2 -3.0010537    -3
#> 3  0.9957784     1
#> 4  4.9926104     5
#> 5  8.9894425     9We can also plot the result with the plot_cme function (see section below):
Again, the conclusion is the same. When \(x<0\), an increase in \(x\) is associated with an increase in \(y\). When \(x=0\), the marginal effect is equal to 0. When \(x>0\), an increase in \(x\) is associated with a decrease in \(y\).
The marginaleffect function takes the derivative of the fitted (or predicted) values of the model, as is typically generated by the predict(model) function. By default, predict produces predictions on the "response" scale, so the marginal effects should be interpreted on that scale. However, users can pass a string or a vector of strings to the type argument, and marginaleffects will consider different outcomes.
Typical values include "response" and "link", but users should refer to the documentation of the predict of the package they used to fit the model to know what values are allowable. documentation.
mod <- glm(am ~ mpg, family = binomial, data = mtcars)
mfx <- marginaleffects(mod, type = c("response", "link"))
summary(mfx)
#> Average marginal effects 
#>       type Term  Effect Std. Error z value   Pr(>|z|)   2.5 %  97.5 %
#> 1     link  mpg 0.30703   0.114842   2.673  0.0075066 0.08194 0.53211
#> 2 response  mpg 0.04649   0.008856   5.249 1.5285e-07 0.02913 0.06384
#> 
#> Model type:  glm 
#> Prediction type:  response linkAverage marginal effects are easy to display in a regression table using packages like modelsummary.
library(modelsummary)
library(marginaleffects)
# fit models and store them in a named list
mod <- list(
    "Short" = glm(large_penguin ~ flipper_length_mm, data = dat, family = binomial),
    "Long" = glm(large_penguin ~ flipper_length_mm + bill_length_mm, data = dat, family = binomial))
# apply the `marginaleffects` function to all the models using `lapply`
mfx <- lapply(mod, marginaleffects)
modelsummary(mfx)| Short | Long | |
|---|---|---|
| flipper_length_mm | 0.020 | 0.022 | 
| (0.001) | (0.001) | |
| bill_length_mm | -0.005 | |
| (0.004) | ||
| Num.Obs. | 342 | 342 | 
| AIC | 222.2 | 222.1 | 
| BIC | 229.9 | 233.6 | 
| Log.Lik. | -109.111 | -108.057 | 
| F | 94.415 | 45.936 | 
| RMSE | 0.80 | 0.80 | 
The same results can be presented in a coefficient plot:
When the models include contrasts, we can use modelsummary’s group argument to display them cleanly:
mod <- list(
    "Logit" = glm(large_penguin ~ flipper_length_mm + species, data = dat, family = binomial),
    "OLS" = lm(body_mass_g ~ flipper_length_mm + bill_length_mm + species, data = dat))
mfx <- lapply(mod, marginaleffects)
modelsummary(mfx, group = term + contrast ~ model)| Logit | OLS | ||
|---|---|---|---|
| flipper_length_mm | 0.016 | 27.429 | |
| (0.002) | (3.174) | ||
| species | Chinstrap - Adelie | -0.165 | -748.562 | 
| (0.043) | (81.534) | ||
| Gentoo - Adelie | 0.244 | 90.435 | |
| (0.158) | (88.647) | ||
| bill_length_mm | 61.736 | ||
| (7.126) | |||
| Num.Obs. | 342 | 342 | |
| R2 | 0.822 | ||
| R2 Adj. | 0.820 | ||
| AIC | 199.0 | 4964.7 | |
| BIC | 214.4 | 4987.8 | |
| Log.Lik. | -95.506 | -2476.373 | |
| F | 19.252 | 389.713 | |
| RMSE | 0.75 | 340.11 | 
In most cases, extending marginaleffects to support new models is easy. Imagine you want to add support for an object called model of class EXAMPLE with N observations.
marginaleffects default functions work:# returns a named vector of coefficients
get_coef(model)
# returns a named vector of predictions 
# returns a named matrix of size NxK for models with K levels (e.g., multinomial logit)
get_predict(model)
# returns a named square matrix of size equal to the number of coefficients
get_vcov(model)
# returns a new model object with different stored coefficients 
# calling get_predict(model) and get_predict(model_new) should produce different results
model_new <- set_coef(model, rep(0, length(get_coef(model))))
predict(model) != predict(model_new)If all of these functions work out-of-the-box, there’s a good chance your model will be supported automatically. If they do not work, move to…
Find the class name of your model by calling:
Then, create functions (methods) called get_coef.EXAMPLE, get_predict.EXAMPLE, vcov.EXAMPLE, and set_coef.EXAMPLE, with the “EXAMPLE” replaced by the name your model class.
Create a file called tests/testthat/test-PKGNAME.R and write a few tests. Ideally, we would like to compare the results obtained by marginaleffects to an external source, like the margins package for R, or the margins command for Stata.
Add your new model class to the lists of supported models in:
sanity_model function of the R/sanity.R file.data-raw/supported_models.csvDESCRIPTION file.The term “effect” is itself tricky. To be clear, this vignette does not use the word “effect” to imply “causality”.↩