| Type: | Package |
| Title: | Post-Estimation Functions for Generalized Linear Mixed Models |
| Version: | 1.2.2 |
| Description: | Several functions for working with mixed effects regression models for limited dependent variables. The functions facilitate post-estimation of model predictions or margins, and comparisons between model predictions for assessing or probing moderation. Additional helper functions facilitate model comparisons and implements simulation-based inference for model predictions of alternative-specific outcome models. See also, Melamed and Doan (2024, ISBN: 978-1032509518). |
| Depends: | R (≥ 3.5.0) |
| License: | GPL-2 | GPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| Imports: | methods |
| Suggests: | tidyverse, ggplot2, MASS, emmeans, pscl, nnet, marginaleffects, ggpubr, knitr, rmarkdown, Epi, dplyr, ordinal, nlme, lme4 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2024-09-03 15:54:50 UTC; melamed.9 |
| Author: | David Melamed |
| Maintainer: | David Melamed <dmmelamed@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2024-09-04 17:30:02 UTC |
Data to replicate Long and Freese's (2006) count models (pp354-414)
Description
For replication purposes between Stata and R. Long and Freese (2006) analyze these data to illustrate regression models for count dependent variables.
Usage
data("LF06art")
Format
A data frame with 915 observations on the following 6 variables.
artcount response
femdummy for sex
mardummy for married
kid5number of children under five
phda numeric vector
menta numeric vector
Source
Long, Scott J. and Jeremy Freese. 2006. "Regression Models for Categorical Dependent Variables Using Stata." Austin, TX: Stata Press
Examples
data(LF06art)
head(LF06art)
Travel time example data for alternative-specific outcomes.
Description
Example data, also used in Long and Freese (2006), to illustrate conditional or fixed effects logistic regression. Also refered to as alternative-specific outcomes.
Usage
data("LF06travel")
Format
A data frame with 456 observations on the following 13 variables.
ida numeric vector denoting nested units (individuals) or strata
modea numeric vector denoting mode of transit
traina dummy variable for selecting the train
busa dummy variable for selecting the bus
cara dummy variable for selecting a car
timea numeric vector denoting transit time
invca numeric vector denoting invertng cost
choicea numeric vector denoting the choice of travel, i.e. the dependent variable
ttmea numeric vector
invta numeric vector
gca numeric vector
hinca numeric vector
psizea numeric vector
Source
Long, Scott J. and Jeremy Freese. 2006. "Regression Models for Categorical Dependent Variables Using Stata." Austin, TX: Stata Press
Examples
data(LF06travel)
head(LF06travel)
Add-Health Data analzed in Mize (2019)
Description
Mize (2019) illustrates how to establish moderation in the context of regression models for limited dependent variables. He illustrates using AddHealth data and provides Stata code to replicate the results. Catregs functions can replicate these results in R.
Usage
data("Mize19AH")
Format
A data frame with 4307 observations on the following 29 variables.
AIDa numeric vector
racea numeric vector
agea numeric vector
educa numeric vector
degreea numeric vector
collegea numeric vector
healtha numeric vector
rolea numeric vector
workrolea numeric vector
parrolea numeric vector
incomea numeric vector
wagesa numeric vector
logwagesa numeric vector
depBa numeric vector
alcBa numeric vector
womana numeric vector
edyrsa numeric vector
whiteBa numeric vector
X_est_prnoa numeric vector
X_est_prpara numeric vector
X_est_alcedmoda numeric vector
X_est_alcmoda numeric vector
race2a numeric vector
race3a numeric vector
race4a numeric vector
ed1a numeric vector
ed2a numeric vector
ed3a numeric vector
ed4a numeric vector
Source
Mize, Trenton D. 2019. "Best Practices for Estimating, Interpreting, and Presenting Nonlinear Interaction Effects" Sociological Science 6: 81-117.
Examples
data(Mize19AH)
head(Mize19AH)
General Social Survey Data analzed in Mize (2019)
Description
Mize (2019) illustrates how to establish nonlinear moderation in the context of regression models. He illustrates using General Social Survey (GSS) data and provides Stata code to replicate the results. Catregs functions can replicate these results in R.
Usage
data("Mize19GSS")
Format
A data frame with 19337 observations on the following 42 variables.
nosameBa numeric vector
sameokBa numeric vector
polviewsa character vector
agea numeric vector
age10a numeric vector
yeara numeric vector
ida numeric vector
degreea numeric vector
racea numeric vector
partyida character vector
natspaca character vector
natenvira character vector
natheala character vector
natcitya character vector
natcrimea character vector
natdruga character vector
nateduca character vector
natracea character vector
natarmsa character vector
nataida character vector
natfarea character vector
healtha character vector
helpnotBa character vector
conserva character vector
polviews3a character vector
employeda numeric vector
malea numeric vector
womana numeric vector
whitea numeric vector
collegea numeric vector
marrieda numeric vector
parenta character vector
edyrsa numeric vector
incomea numeric vector
hrsworka character vector
parttimea character vector
wagesa numeric vector
conviewSSa numeric vector
year2a numeric vector
yearcata numeric vector
year1976a numeric vector
year1976.2a numeric vector
Source
Mize, Trenton D. 2019. "Best Practices for Estimating, Interpreting, and Presenting Nonlinear Interaction Effects" Sociological Science 6: 81-117.
Examples
data(Mize19GSS)
head(Mize19GSS)
Compares two marginal effects (MEMs or AMEs). Estimate of uncertainty is from a simulated draw from a normal distribution.
Description
Given two marginal effects (AMEs or MEMs), as estimated via the margins package or via first.diff.fitted, this function simulates draws from the distribution of MEs defined by the estimates and their standard error, and computes the overlap in the two distributions. The p-value refers to proportion of times the two draws overlapped.
Usage
compare.margins(margins,margins.ses,seed=1234,rounded=3,nsim=10000)
Arguments
margins |
The two marginal effects that you want to compare. |
margins.ses |
The standard errors for the marginal effects you want to compare. |
seed |
Random number seed so that results are reproducible. |
rounded |
The number of decimal places to round the output. The default is 3. |
nsim |
The number of simulated AMEs to draw from each distribution. The default is 10,000. |
Value
differnce |
The observed difference in the two AMEs. |
p.value |
The p-value associated with the difference. This is the proportion of the simulated sample when the MEs overlapped. |
Author(s)
David Melamed
Examples
data("essUK")
m1 <- glm(safe ~ religious + minority*female + age,data=essUK,family="binomial")
des<-margins.des(m1,expand.grid(minority=c(0,1),female=c(0,1)))
des
ma1<-suppressWarnings(as.data.frame(marginaleffects::avg_slopes(m1,
variables="female",newdata=data.frame(minority=0,religious=3.6024,age=53.146))))
ma2<-as.data.frame(marginaleffects::avg_slopes(m1,variables="female",
newdata=data.frame(minority=1,religious=3.6024,age=53.146)))
cames <- rbind(ma2,ma1)
compare.margins(margins=cames$estimate,margins.ses=cames$std.error)
Fits four different count models and compares them.
Description
Given a Poisson model object, count.fit fits Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models to the data. It reports results of Vuong tests between the zero-inflated and non-zer-inflated models, summarizes the information criteria of the four models, summarizes the model output of the four models, creates a ggplot object of coefficient plots for each model, and creates a ggplot object of model residuals.
Usage
count.fit(m1,y.range,rounded=3,use.color="yes")
Arguments
m1 |
A Poisson regression model, as estimated via the glm function. |
y.range |
The observed response range for the count outcome. For example, if the observed range is 0 to 18, this would be 0:18 |
rounded |
The number of decimal places to round the output. The default is 3. |
use.color |
Whether to use color in the ggplot objects. Default is "yes" |
Value
ic |
A data.frame summarizing the information criteria for the four models. Bayesian and Akaike's informaiton criteria are included. |
models |
A summary of the model estimates, including coefficients and standard errors. |
pic |
A ggplot object illustrating model residuals for each type of model. |
models.pic |
A ggplot object of coefficient plots from each type of model. |
Author(s)
David Melamed
Examples
data("LF06art")
p1 <- glm(art ~ fem + mar + kid5 + phd + ment , family = "poisson", data = LF06art)
table(LF06art$art)
fit<-count.fit(p1,0:19)
names(fit)
Computes diagnostics for generalized linear models.
Description
Given a glm object, diagn returns case-level diagnostics. For logistic, probit, Poisson, and negative binomial models, it returns Pearson residuals, standardized Pearson residuals, the diagonal of the hat matrix, delta-beta (Cook's D), and deviance residuals. For zero-inflated and hurdle models, it returns the Pearson residual and the observation number.
Usage
diagn(model)
Arguments
model |
A model object. The model should be regression model for limited dependent variables, such as a logistic regression. |
Value
out |
The output is a dataframe of diagnostic statistics. For logit, probit, Poisson, and negative binomial models, the output includes the Pearson residual (pearsonres), the diagonal of the Hat matrix (h), the standardized Pearson residual (stdpres), the delta-beta statistic (deltabeta), the observation number (obs), and the deviance residual (devres). For zero-inflated and hurdle models, the output includes the Pearson residual (pearsonres), and the observation number (obs). |
Author(s)
David Melamed
Examples
data("Mize19AH")
m1 <- glm(alcB ~woman*parrole + age + race2 + race3 +
race4 + income + ed1 + ed2 + ed3 + ed4,
family="binomial",data=Mize19AH)
head(diagn(m1))
A subset of data from the European Social Survey
Description
These are data from the European Social Survey used to illustrate mixed effects, multilevel, or hierarchical regression models.
Usage
data("ess")
Format
A data frame with 49519 observations on the following 37 variables.
countrya character vector
can.trust.peoplea numeric vector
people.try.faira numeric vector
say.in.govta factor with levels
Not at allVery littleSomeA lotA great dealtrust.legal.sysa numeric vector
trust.policea numeric vector
votea factor with levels
YesNoNot eligible to voteconservativea numeric vector
life.satisfactiona numeric vector
immigration.good.economya numeric vector
happya numeric vector
important.matters.peoplea character vector
walk.alone.darka factor with levels
Very unsafeUnsafeSafeVery safereligiousa numeric vector
ethnic.minoritya character vector
num.childrena numeric vector
ideal.age.parenta numeric vector
household.sizea numeric vector
gendera character vector
agea numeric vector
maritala character vector
educationa numeric vector
employmenta character vector
income.decilea numeric vector
father.educationa character vector
father.education.numa numeric vector
everyone.job.wanteda numeric vector
income.fairnessa factor with levels
Low, extremely unfairLow, very unfairLow, somewhat unfairLow, slightly unfairFairHigh, slightly unfairHigh, somewhat unfairHigh, very unfairHigh, extremely unfairunder.over.paida factor with levels
UnderpaidRight amountOverpaidincome.fairness.numa numeric vector
wealth.diff.faira factor with levels
Small, extremely unfairSmall, very unfairSmall, somewhat unfairSmall, slightly unfairFairLarge, slightly unfairLarge, somewhat unfairLarge, very unfairLarge, extremely unfairwealth.differencesa factor with levels
Too littleJust rightToo muchgdpa numeric vector
urban.populationa numeric vector
unemploymenta numeric vector
alcolhol.consumptiona numeric vector
suicide.ratea numeric vector
Source
European Social Survey.
Examples
data(ess)
head(ess)
A subset of data from the European Social Survey
Description
These are data from respondents in the United Kingdom from the European Social Survey. They are used to illustrate regression models for limited dependent variables.
Usage
data("essUK")
Format
A data frame with 2204 observations on the following 45 variables.
countrya character vector
can.trust.peoplea numeric vector
people.try.faira numeric vector
say.in.govta factor with levels
Not at allVery littleSomeA lotA great dealtrust.legal.sysa numeric vector
trust.policea numeric vector
votea factor with levels
YesNoNot eligible to voteconservativea numeric vector
life.satisfactiona numeric vector
immigration.good.economya numeric vector
happya numeric vector
important.matters.peoplea character vector
walk.alone.darka factor with levels
Very unsafeUnsafeSafeVery safereligiousa numeric vector
ethnic.minoritya character vector
num.childrena numeric vector
ideal.age.parenta numeric vector
household.sizea numeric vector
gendera character vector
agea numeric vector
maritala character vector
educationa numeric vector
employmenta character vector
income.decilea numeric vector
father.educationa character vector
father.education.numa numeric vector
everyone.job.wanteda numeric vector
income.fairnessa factor with levels
Low, extremely unfairLow, very unfairLow, somewhat unfairLow, slightly unfairFairHigh, slightly unfairHigh, somewhat unfairHigh, very unfairHigh, extremely unfairunder.over.paida factor with levels
UnderpaidRight amountOverpaidincome.fairness.numa numeric vector
wealth.diff.faira factor with levels
Small, extremely unfairSmall, very unfairSmall, somewhat unfairSmall, slightly unfairFairLarge, slightly unfairLarge, somewhat unfairLarge, very unfairLarge, extremely unfairwealth.differencesa factor with levels
Too littleJust rightToo muchgdpa numeric vector
urban.populationa numeric vector
unemploymenta numeric vector
alcolhol.consumptiona numeric vector
suicide.ratea numeric vector
safea numeric vector
minoritya numeric vector
femalea numeric vector
divorceda numeric vector
marrieda numeric vector
widowa numeric vector
highinca numeric vector
age2a numeric vector
Source
European Social Survey.
Examples
data(essUK)
head(essUK)
Computes the first difference in fitted values, or a series of first differences. Inference in supported via the delta method or bootstrapping.
Description
first.diff.fitted computes first differences between fitted values from a regression model.
Supported models include OLS regression via lm, logistic regression via glm, Poisson regression via glm, negative binomial regression via MASS:glm.nb, ordinal logistic regression via MASS::polr, partial proportional odds models via vgam::vglm, multinomial logistic regression via nnet::multinom, zero-inflated Poisson or negative binomial regression via pscl::zeroinfl, hurdle Poisson or negative binomial regression via pscl::hurdle, mixed effects logistic regression via lme4/lmerTest::glmer, mixed effects Poisson regression via lme4/lmerTest::glmer, mixed effects negative binomial regression via lme4/lmerTest::glmer.nb, and mixed effects ordinal logistic regression via ordinal::clmm.
Usage
first.diff.fitted(mod,design.matrix,compare,alpha=.05,rounded=3,
bootstrap="no",num.sample=1000,prop.sample=.9,data,seed=1234,cum.probs="no")
Arguments
mod |
A model object. The model should be regression model for limited dependent variables, such as a logistic regression. |
design.matrix |
Design matrix of values for the independent variables in the regression model. |
compare |
Pairs of rows in the design matrix to use for computing the fitted values. The first difference between the fitted values is then computed. For example, compare=c(4,2) means to compute the difference in the fitted values between predictions for row 4 of the design matrix and row 2 of the design matrix. If more than two rows are provided, the function uses them two at a time and computes multiple first differences. |
alpha |
The alpha value for confidence intervals. Default is .05. |
rounded |
The number of decimal places to round the output. The default is 3. |
bootstrap |
By default, inference is based on the Delta Method, as implemented in the marginaleffects package. Alternatively, inference can be based upon a bootstrapped sampling distirbution. To do so, change this to "yes." Note that bootstrapping is only supported for one first difference at a time. |
num.sample |
num.sample is the number samples drawn to compute the sampling distibution when using bootstrapping. Default is 1,000 |
prop.sample |
prop.sample is the proportion of the original sample (with replacement) to include in the sampling distibution samples when using bootstrapping. Default is .9 |
data |
For nonparametric inference, provide the data used in the original model statement. |
seed |
For models using bootstrapped inference. The seed ensures reproducible results across runs. Default is 1234, but may be changed. |
cum.probs |
For ordinal logistic regression models, including mixed effects models, do you want the first differences to be based on probabilities of the response categories or cumulative probabilities of the response categories. The default is cum.probs=="no" corresponding to non-cumulative probabilities. Change cum.probs to "yes" for cumulative probabilities. |
Value
out |
If using parametric inference (delta method): output is a dataframe including the first fitted value ("fitted1"), the second fitted value ("fitted2"), the difference in fitted values ("first.diff"), the standard error ("std.error"), the lower limit ("ll"), and upper limit ("ul") of the confidence interval. Of course, ll and ul are based on the alpha level. If using nonparametric inference (bootstrapping): output is a list of objects. obs.diff is the observed difference in the response or fitted values. boot.dist is the sorted bootstrapped distribution of differences in the samples. mean.boot.dist is the average of the differences in the responses or fitted values. sd.boot.dist is the standard deviation of the sampling distribution. ci.95 is the Lower and Upper limits of the confidence interval; despite it's name, the confidence interval is based upon the alpha level. model.class is just the class of the model that was used to generate the fitted values. |
Author(s)
David Melamed
Examples
data("Mize19AH")
m1 <- glm(alcB ~woman*parrole + age + race2 +
race3 + race4 + income + ed1 + ed2 + ed3 +
ed4,family="binomial",data=Mize19AH)
des2<-margins.des(m1,expand.grid(woman=c(0,1),parrole=c(0,1)))
des2
first.diff.fitted(m1,des2,compare=c(4,2))
# Pr(Drink | Mothers) - Pr(Drink | Childless Women)
first.diff.fitted(m1,des2,compare=c(3,1))
# Pr(Drink | Fathers) - Pr(Drink | Childless Men)
Data from the 2016 General Social Survey.
Description
Limited date from the 2016 General Social Survey on respondent and paternal class and occupational classifications.
Usage
data("gss2016")
Format
A data frame with 12498 observations on the following 13 variables.
pclassa factor with levels
Unskilled ManualSkilled ManualSelf-EmployedNon-Manual/ServiceProfessional, LowerProfessional, Highersclassa factor with levels
Unskilled ManualSkilled ManualSelf-EmployedNon-Manual/ServiceProfessional, LowerProfessional, Highereduca numeric vector
racea character vector
ida numeric vector
occ2a character vector
occa numeric vector
unskmanuala numeric vector
skmanuala numeric vector
selfempa numeric vector
servicea numeric vector
proflowa numeric vector
profhigha numeric vector
Source
The General Social Survey.
Examples
data(gss2016)
head(gss2016)
Transform glm and mixed model objects into model summaries that include coefficients, standard errors, exponentiated coefficients, confidence intervals and percent change.
Description
Given a glm model object or a mixed model model object, the function computes and returns: coefficients, standard errors, z-scores, confidence intervals, p-values, exponentiated coefficients, confidence intervals for exponentiated coefficients, and percent change.
Supported models include logistic regression via the glm function, ordinal regression via mass::polr, multinomial regression via nnet:multinom, Poisson regression via the glm function, negative binomial regression via mass::glm.nb, mixed effects models for continuous outcomes with serial correlation via nlme::lme, mixed effects logistic and poisson regression via lme4::glmer, mixed effects negative binomial regression via lme4::glmer.nb, and mixed effects ordinal regression via ordinal::clmm.
Usage
list.coef(model,rounded=3,alpha=.05)
Arguments
model |
A model object. The model should be regression model for limited dependent variables, such as a logistic regression, or a mixed model from nlme or lme4/lmerTest. |
rounded |
The number of decimal places to round the output. The default is 3. |
alpha |
The alpha value for confidence intervals. Default is .05. |
Value
b |
The estimated model coefficients from the model object. |
S.E. |
The estimated model standard errors from the model object. |
Z |
The z-statistic corresponding to the coefficient. |
LL CI |
Given the coefficient, standard error and alpha value (default=.05), the lower limit of the confidence interval around the coefficient is reported. |
UL CI |
Given the coefficient, standard error and alpha value (default=.05), the upper limit of the confidence interval around the coefficient is reported. |
p-val |
The p-value associated with the z-statistic. |
exp(b) |
The exponentiated model coefficients. That is, odds ratios in the case of a logistic regression, or incidence rate ratios in the case of a count model. |
LL CI for exp(b) |
Given the exponentiated coefficient, standard error and alpha value (default=.05), the lower limit of the confidence interval around the exponentiated coefficient is reported. |
UL CI for exp(b) |
Given the exponentiated coefficient, standard error and alpha value (default=.05), the upper limit of the confidence interval around the exponentiated coefficient is reported. |
Percent |
The coefficients in terms of percent change. That is, 100*(exp(coef(model))-1) |
Author(s)
David Melamed
Examples
data("Mize19AH")
m1 <- glm(alcB ~woman*parrole + age + race2 +
race3 + race4 + income + ed1 + ed2 + ed3 +
ed4,family="binomial",data=Mize19AH)
list.coef(m1,rounded=4)
list.coef(m1,rounded=4,alpha=.01)
Replication data for Logan's (1983) application of conditional logistic regression to mobility processes.
Description
Replication data for Logan's (1983) application of conditional logistic regression to mobility processes.
Usage
data("logan")
Format
A data frame with 4190 observations on the following 11 variables.
occupationrespondent occupation with levels
farmoperativescraftsmensalesprofessionalfoccpaternal occupation, i.e., father's occupation with levels
farmoperativescraftsmensalesprofessionaleducationa numeric vector
racea factor with levels
non-blackblackida numeric vector
tocca factor with levels
farmoperativescraftsmensalesprofessionalcasea numeric vector
craftsmena numeric vector
farma numeric vector
operativesa numeric vector
professionala numeric vector
References
Logan, John A. 1983. “A Multivariate Model for Mobility Tables.” American Journal of Sociology 89(2):324–349.
Examples
data(logan)
head(logan)
Add model predictions, standard errors and confidence intervals to a design matrix for a model object.
Description
Given a model object and a design matrix, this creates a data.frame of the design matrix, with model predictions, standard errors and lower/upper limits of confidence intervals around the predictions. This is a wrap around function for calls to emmeans; it adjusts the emmeans equation to return fitted values on the response scale.
Supported models include OLS regression via lm, logistic regression via glm, Poisson regression via glm, negative binomial regression via MASS:glm.nb, ordinal logistic regression via MASS::polr, multinomial logistic regression via nnet::multinom, zero-inflated Poisson or negative binomial regression via pscl::zeroinfl, hurdle Poisson or negative binomial regression via pscl::hurdle, linear mixed effects models with or without serial correlation via nlme::lme, linear mixed effects models via lme4/lmerTest::lmer, mixed effects logistic regression via lme4/lmerTest::glmer, mixed effects Poisson regression via lme4/lmerTest::glmer, mixed effects negative binomial regression via lme4/lmerTest::glmer.nb, and mixed effects ordinal logistic regression via ordinal::clmm.
For mixed effects ordinal logistic regression models, as estimated via the ordinal package, the outcome variable in the regression model (i.e., the clmm function) needs to be named "dv."
Given one of these model objects and an appropiate design matrix, the function detects the model response type and generates fitted values on the response scale. For example, a logistic regression model returns predicted probabilities, and a Poisson model returns the fitted counts. In addition to the fitted values, the function returns the delta method standard error for the fitted value and a confidence interval. The confidence interval is 95 percent by default, but that may be changed by the user.
Usage
margins.dat(mod,des,alpha=.05,rounded=3,cumulate="no",
pscl.data=data,num.sample=1000,prop.sample=.9,seed=1234)
Arguments
mod |
A regression model object. |
des |
Design matrix of values for the independent variables in the regression model. |
alpha |
The alpha value for confidence intervals. Default is .05. |
rounded |
The number of decimal places to round the output. The default is 3. |
cumulate |
Whether the fitted values should reflect cumulative probabilities. Default is "no." Intended for predicted probabilities drawn from ordinal logistic regression models (polr) or mixed effects ordinal logistic regression (clmm). |
pscl.data |
If generating predicted counts from Zero-Inflated models (either Poisson or negative binomial), you need to include the data that was specified in the model statement, i.e., the data in the "mod" object. |
num.sample |
num.sample is the number samples drawn to compute the sampling distibution. |
prop.sample |
prop.sample is the proportion of the original sample to include in the sampling distibution samples. Default is .9 |
seed |
For models using bootstrapped inference. The seed ensures reproducible results across runs. Default is 1234, but may be changed. |
Value
marginsdat |
Returns a data.frame containing the design matrix and additional columns for the fitted value on the response scale, the delta method standard error (except partial proportional odds models, which are bootstrapped), and the lower/upper limits on confidence intervals around the fitted value. |
Author(s)
David Melamed
Examples
data("Mize19AH")
m1 <- glm(alcB ~woman*parrole + age +
race2 + race3 + race4 + income + ed1 +
ed2 + ed3 + ed4,family="binomial",data=Mize19AH)
des2<-margins.des(m1,expand.grid(woman=c(0,1),parrole=c(0,1)))
margins.dat(m1,des2,rounded=5)
des1 <- margins.des(m1,expand.grid(parrole=1,woman=1))
margins.dat(m1,des1,rounded=5)
des3 <- margins.des(m1,expand.grid(age=seq(20,75,5),parrole=c(0,1)))
a<- margins.dat(m1,des3,rounded=5)
a # Then plot a using ggplot
Computes predicted probabilities for conditional and rank-order/exploded logistic regression models. Inference is based upon simulation techniques (requires the MASS package). Alternatively, bootstrapping is an option for conditional logistic regression models.
Description
Given a model object and a design matrix, this creates a data.frame of the design matrix, with predicted probabilities for each response category. Inferential information about the predict probabilities is supported with simulation. Bootstrapping may be added as an option for conditional logistic regression models.
Usage
margins.dat.clogit(mod,design.matrix,run.boot="no",num.sample=1000,
prop.sample=.9,alpha=.05,seed=1234,rounded=3)
Arguments
mod |
A conditional logistic regression model as estimated in the Epi package or an Exploded logistic regression model as estimated in the mlogit package. |
design.matrix |
Design matrix of values for the independent variables in the regression model. Unlike the design matrices in the margins.des function, the design matrix for a conditional logistic regression entails multiple rows, corresponding to the number of response options. |
run.boot |
Whether to compute confidence intervals around the predicted probabilities using bootstrapping. Defaul is "no." |
num.sample |
num.sample is the number samples drawn to compute the sampling distibution. |
prop.sample |
prop.sample is the proportion of the original sample to include in the sampling distibution samples. Default is .9 |
alpha |
The alpha value for confidence intervals. Default is .05. |
seed |
Sets a seed so that random results are reprodicible. |
rounded |
How many decimal places to show in the output. |
Value
des |
Returns a data.frame containing the design matrix and additional columns for the predicted probabilities. |
boot.dist |
The full bootstrapped distribution for the probabilities. |
Author(s)
David Melamed
Examples
data("LF06travel")
m1 <- Epi::clogistic(choice ~ train + bus +
time + invc, strata=id, data=LF06travel)
design <- data.frame(train=c(0,0,1),bus=c(0,1,0),time=200,invc=20)
design
margins.dat.clogit(m1,design)
Creates a design matrix of idealized data for illustrating model predictions.
Description
Create a data frame of idealized data for making model predictions/predicted margins that will be used with margins.dat for generating/plotting model predictions. Given a model object (generalized linear model or generalized linear mixed model), a grid of independent variable values, and a list of any variables (factor variables in particular) to exclude from the design matrix, the function returns the design matrix as a data.frame object. All covariates are set to their means in the data used to estimate the model object. If there are factors in the model, they need to be excluded using the "excl" option.
Supported models include OLS via the lm function, logistic and Poisson regression via the glm function, negative binomial regression via MASS::glm.nb, ordinal logistic regression via MASS::polr, multinomial logistic regression via nnet::multinom, partial proportional odds models via vgam::vglm, linear mixed effects models with or without serial correlation specified via nlme::lme, mixed effects logistic regression via lme4::glmer, mixed effects Poisson regression via lme4::glmer, mixed effects negative binomial regression via lme4::glmer.nb, and mixed effects ordinal logistic regression via ordinal::clmm. Zero-inflated and hurdle models are also supported via pscl::zeroinfl and pscl::hurdle, respectively.
For multinomial regression model, as estimated via the nnet package, you need to provide the data used in the nnet function that defined the model. For partial proportional odds models, as estimated via the vgam package, you need to specify an ordinal model via MASS::polr and provide that model to the margins.des function (the data for the model are not part of a vgam object.) For mixed effects regression models, as estimated by lme4/lmerTest, you need to provide the data used in the glmer or lmer function that defined the model. For mixed effects ordinal logistic regression models, as estimated via the ordinal package, the outcome variable in the regression model (i.e., the clmm function) needs to be named "dv."
Usage
margins.des(mod,ivs,excl="nonE",data)
Arguments
mod |
A model object. The model should be regression model for limited dependent variables, such as a logistic regression. Specifically, supported models include lm, glm, polr, multinom, vgam, zeroinf (pscl), amd hurdle (pscl). vgam is supported for partial proportional odds models, not models for count outcomes. zerotrunc is only supported with bootstrapped inference, and may take a while. |
ivs |
This should be an 'expand.grid' statement including all desired variables and their corresponding levels in the design matrix. |
excl |
If you want to exclude covariates from the design matrix, you can list them here. This is designed to exclude factor variables from the design matrix, as they do not have means, but may be useful in other specialized cases. Default is "nonE," corresponding to excluding none of the variables. |
data |
If the model is a multinomial model, you also need to provide the data. This is because nnet objects do not include the relevant information for computing the means of covariates. |
Value
design |
Returns a data.frame containing the design matrix for model predictions. |
Author(s)
David Melamed
Examples
data("Mize19AH")
m1 <- glm(alcB ~woman*parrole + age + race2 +
race3 + race4 + income + ed1 + ed2 + ed3 +
ed4,family="binomial",data=Mize19AH)
des1 <- margins.des(m1,expand.grid(parrole=1,woman=1))
des1
des2<-margins.des(m1,expand.grid(woman=c(0,1),parrole=c(0,1)))
des2
des3 <- margins.des(m1,expand.grid(age=seq(20,75,5),parrole=c(0,1)))
des3
Aggregate Standard Errors using Rubin's Rule.
Description
The function takes a vector of standard error estimates and it pools them using Rubin's rule.
Usage
rubins.rule(std.errors)
Arguments
std.errors |
A vector of standard errors to be aggregated using Rubin's rule. |
Value
r.r.std.error |
The aggregated standard error. |
Author(s)
David Melamed
References
Rubin, Donald B. 2004. Multiple Imputation for Nonresponse in Surveys. Vol. 81. John Wiley & Sons.
Computes the second difference in fitted values. Inference in supported via the delta method or bootstrapping.
Description
second.diff.fitted computes the second differences between fitted values, that is, the difference between two first differences, from a regression model.
Supported models include OLS regression via lm, logistic regression via glm, Poisson regression via glm, negative binomial regression via MASS:glm.nb, ordinal logistic regression via MASS::polr, partial proportional odds models via vgam::vglm, multinomial logistic regression via nnet::multinom, zero-inflated Poisson or negative binomial regression via pscl::zeroinfl, hurdle Poisson or negative binomial regression via pscl::hurdle, mixed effects logistic regression via lme4/lmerTest::glmer, mixed effects Poisson regression via lme4/lmerTest::glmer, mixed effects negative binomial regression via lme4/lmerTest::glmer.nb, and mixed effects ordinal logistic regression via ordinal::clmm.
Usage
second.diff.fitted(mod,design.matrix,compare,alpha=.05,rounded=3,
bootstrap="no",num.sample=1000,prop.sample=.9,
data,seed=1234,cum.probs="no")
Arguments
mod |
A model object. The model should be regression model for limited dependent variables, such as a logistic regression. |
design.matrix |
Design matrix of values for the independent variables in the regression model. |
compare |
A set of four rows in the design matrix to use for computing the fitted values that are used in the calculation of second differences. For example, compare(a,b,c,d) results in computing the fitted values for rows a, b, c, and d of the design matrix, respectively, and then computing the following second difference: (a - b) - (c - d). Only four rows may be compared at a time. |
alpha |
The alpha value for confidence intervals. Default is .05. |
rounded |
The number of decimal places to round the output. The default is 3. |
bootstrap |
By default, inference is based on the Delta Method, as implemented in the marginaleffects package. Alternatively, inference can be based upon a bootstrapped sampling distirbution. To do so, change this to "yes" |
num.sample |
num.sample is the number samples drawn to compute the sampling distibution when using bootstrapping. Default is 1,000 |
prop.sample |
prop.sample is the proportion of the original sample to include in the sampling distibution samples when using bootstrapping. Default is .9 |
data |
For nonparametric inference, provide the data used in the original model statement. |
seed |
For models using bootstrapped inference. The seed ensures reproducible results across runs. Default is 1234, but may be changed. |
cum.probs |
For ordinal logistic regression models, including mixed effects models, do you want the first differences to be based on probabilities of the response categories or cumulative probabilities of the response categories. The default is cum.probs=="no" corresponding to non-cumulative probabilities. Change cum.probs to "yes" for cumulative probabilities. |
Value
out |
If using parametric inference (delta method): output is a dataframe including the second difference in fitted values ("est"), the standard error ("std.error"), the lower limit ("ll"), and upper limit ("ul") of the confidence interval. Of course, ll and ul are based on the alpha level. If using nonparametric inference (bootstrapping): output is a list of objects. obs.diff is the observed second difference in the response or fitted values. boot.dist is the sorted bootstrapped distribution of second differences in the samples. mean.boot.dist is the average of the second differences in the responses or fitted values. sd.boot.dist is the standard deviation of the sampling distribution. ci.95 is the Lower and Upper limits of the confidence interval; despite it's name, the confidence interval is based upon the alpha level. model.class is just the class of the model that was used to generate the fitted values. |
Author(s)
David Melamed
Examples
data("Mize19AH")
m1 <- glm(alcB ~woman*parrole + age + race2 + race3 +
race4 + income + ed1 + ed2 + ed3 +
ed4,family="binomial",data=Mize19AH)
des2<-margins.des(m1,expand.grid(woman=c(0,1),parrole=c(0,1)))
des2
second.diff.fitted(m1,des2,compare=c(4,2,3,1),rounded=5)
# [Pr(Drink | Mothers) - Pr(Drink | Childless Women)] -
# [Pr(Drink | Fathers) - Pr(Drink | Childless Men)]
# Note that this is reported as the "Second Difference" in
# Table 3 of Mize (2019: 104, "Best Practices for Estimating,
# Interpreting, and Presenting Nonlinear Interaction Effect.
# Sociological Science. 6(4): 81-117.")
Data to illustrate mixed effects regression models with serial correlation.
Description
Replication data illustrating serial correlation specifications to adjust for correlated residuals.
Usage
data("wagepan")
Format
A data frame with 4360 observations on the following 51 variables.
nra numeric vector
yeara numeric vector
agrica numeric vector
blacka numeric vector
busa numeric vector
construca numeric vector
enta numeric vector
expera numeric vector
fina numeric vector
hispa numeric vector
poorhltha numeric vector
hoursa numeric vector
manufa numeric vector
marrieda numeric vector
mina numeric vector
nrthcena numeric vector
nrtheasta numeric vector
occ1a numeric vector
occ2a numeric vector
occ3a numeric vector
occ4a numeric vector
occ5a numeric vector
occ6a numeric vector
occ7a numeric vector
occ8a numeric vector
occ9a numeric vector
pera numeric vector
proa numeric vector
puba numeric vector
rura numeric vector
southa numeric vector
educa numeric vector
traa numeric vector
trada numeric vector
uniona numeric vector
lwagea numeric vector
d81a numeric vector
d82a numeric vector
d83a numeric vector
d84a numeric vector
d85a numeric vector
d86a numeric vector
d87a numeric vector
expersqa numeric vector
ra numeric vector
numa numeric vector
numbera numeric vector
mn_lwagea numeric vector
yearta numeric vector
yeart2a numeric vector
yeart3a numeric vector
Examples
data(wagepan)
head(wagepan)