The purpose of this tutorial is to introduce some techniques to generate postestimation quantities of interest after estimating multilevel models with lmer()
and glmer()
.
Please note that most of the plots (and models!) in this lab can and should be improved for publication-ready quality, but you will get to see some quick ways to present results from multilevel models in a somewhat more intuitive manner than a regression table.
While linear multilevel models are straightforward to interpret, you may wish to show some additional information about the model beyond regression coefficients. Here, I briefly show how to plot varying intercepts and varying slopes.
The (linear) models are reproduced from Lecture 7; please refer to that lecture for details on the data and models.
<- rio::import("mlbook2_r.dat", format = "tsv") dat
library("lme4")
<- lmer(langPOST ~ IQ_verb + (1 | schoolnr), data = dat)
m_vi summary(m_vi)
## Linear mixed model fit by REML ['lmerMod']
## Formula: langPOST ~ IQ_verb + (1 | schoolnr)
## Data: dat
##
## REML criterion at convergence: 24917.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.1952 -0.6378 0.0659 0.7098 3.2132
##
## Random effects:
## Groups Name Variance Std.Dev.
## schoolnr (Intercept) 9.909 3.148
## Residual 40.479 6.362
## Number of obs: 3758, groups: schoolnr, 211
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 41.05442 0.24402 168.24
## IQ_verb 2.50722 0.05439 46.09
##
## Correlation of Fixed Effects:
## (Intr)
## IQ_verb 0.003
Using the lattice::dotplot()
function produces a dot-and-whisker plot of the varying intercepts (which we extract from the model object using the ranef
function):
library("lattice")
dotplot(ranef(m_vi, condVar = TRUE))
## $schoolnr
We can also plot the varying intercepts and varying slopes if both are part of the model:
<- lmer(langPOST ~ IQ_verb + sch_iqv + (1 + IQ_verb | schoolnr),
m_vi_vs data = dat)
summary(m_vi_vs)
## Linear mixed model fit by REML ['lmerMod']
## Formula: langPOST ~ IQ_verb + sch_iqv + (1 + IQ_verb | schoolnr)
## Data: dat
##
## REML criterion at convergence: 24870.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2604 -0.6337 0.0676 0.7035 2.7622
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schoolnr (Intercept) 8.9787 2.9964
## IQ_verb 0.1995 0.4466 -0.63
## Residual 39.6858 6.2997
## Number of obs: 3758, groups: schoolnr, 211
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 41.1275 0.2347 175.248
## IQ_verb 2.4802 0.0645 38.453
## sch_iqv 1.0305 0.2633 3.913
##
## Correlation of Fixed Effects:
## (Intr) IQ_vrb
## IQ_verb -0.279
## sch_iqv -0.002 -0.187
## optimizer (nloptwrap) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.00224192 (tol = 0.002, component 1)
dotplot(ranef(m_vi_vs, condVar = TRUE))
## $schoolnr
You will note that the range of the varying slopes is much smaller than that of the intercepts. So it may be preferable to plot them alone. Here, it may make sense to extract the random effects and switch to ggplot2 for plotting. The code below should work for any ranef
object regardless of the names of the variables, as long as you provide the model name to ranef()
:
<- as.data.frame(ranef(m_vi_vs, condVar = TRUE))
m_vi_vs_ranef library("ggplot2")
ggplot(data = m_vi_vs_ranef,
aes(y = grp, x = condval)) +
geom_point() +
facet_wrap(~ term, scales = "free_x") +
geom_errorbarh(aes(xmin = condval - 2*condsd, xmax = condval + 2*condsd),
height = 0) +
scale_y_discrete(breaks = FALSE) +
theme_bw()
To plot just one of the random effects, filter the data frame for the variable for which you wish to plot the random effect - here, “IQ_verb”
<- as.data.frame(ranef(m_vi_vs, condVar = TRUE))
m_vi_vs_ranef suppressPackageStartupMessages(library("tidyverse"))
ggplot(data = filter(m_vi_vs_ranef, term == "IQ_verb"),
aes(y = reorder(grp, condval), x = condval)) +
geom_point() +
geom_errorbarh(aes(xmin = condval - 2*condsd, xmax = condval + 2*condsd),
height = 0) +
scale_y_discrete(breaks = FALSE) +
theme_bw()
Depending on the size of your data, you may want to plot varying slopes by group. To facilitate display, below I plot the lines for each of the 211 schools, but color them based on the decile (10th percentiles) of the average school-level IQ. I achieve this by combining the cut()
function with the quantile()
function. See the help files for each for more detail. In addition, I choose a colorblind-friendly palette using the viridis package.
library("viridis")
ggplot(dat, aes(x = IQ_verb, y = langPOST)) +
geom_jitter(alpha = 0.1) +
geom_line(aes(y = predict(m_vi_vs),
group = schoolnr,
color = cut(sch_iqv, breaks = quantile(sch_iqv, probs = seq(0, 1, by = 0.1)), include.lowest = TRUE)),
size = 0.25, alpha = 0.75) +
scale_color_viridis(discrete = TRUE, option = "plasma") +
labs(color = "Average school-level IQ") +
guides(color = guide_legend(override.aes = list(size = 2))) +
theme_bw()
If you estimated a slope to vary by schools, you can then plot the point estimates and confidence intervals around the slopes along the range of a level-2 covariate of interest, similar to Figure 13.2 (b) in Gelman & Hill.
First, estimate the model:
<- lmer(langPOST ~ IQ_verb + (1 + IQ_verb | schoolnr), data = dat) m_vi_vs_modeled
Then, extract the random effects and filter for the random effects of the slope:
<- as.data.frame(ranef(m_vi_vs_modeled, condVar = TRUE))
m_vi_vs_ranef <- filter(m_vi_vs_ranef, term == "IQ_verb") iq_slopes
Here, I convert the school ID into a numeric variable for merging:
$schoolnr <- as.numeric(as.character(iq_slopes$grp)) iq_slopes
Because the slope for each school is the combination of the “random effect” (in lmer language) and the “fixed effect” (2.519758), I add it to obtain the correct slope for each school:
$condval <- iq_slopes$condval + fixef(m_vi_vs_modeled)[2] iq_slopes
This slope can, by the way, also obtained using coef()
on the model object:
head(iq_slopes$condval)
## [1] 2.512030 3.141586 2.601269 2.369620 3.074282 2.835133
head(coef(m_vi_vs_modeled)$schoolnr$IQ_verb)
## [1] 2.512030 3.141586 2.601269 2.369620 3.074282 2.835133
Before plotting the slopes against school-average IQ, I need to create a dataset containing these school-averages:
<- summarize(group_by(dat, schoolnr),
iq_scores avg_iq = mean(sch_iqv))
Now, I can merge slopes and school-average IQ…
<- left_join(x = iq_slopes, y = iq_scores,
iq_sch_dat by = c("schoolnr"))
head(iq_sch_dat)
## grpvar term grp condval condsd schoolnr avg_iq
## 1 schoolnr IQ_verb 1 2.512030 0.2432017 1 -1.4039
## 2 schoolnr IQ_verb 2 3.141586 0.2856793 2 -2.8668
## 3 schoolnr IQ_verb 3 2.601269 0.3277434 3 -0.3668
## 4 schoolnr IQ_verb 9 2.369620 0.3588022 9 0.9059
## 5 schoolnr IQ_verb 10 3.074282 0.3380855 10 -1.3668
## 6 schoolnr IQ_verb 12 2.835133 0.2657368 12 -2.3374
… and plot my varying slopes:
ggplot(data = iq_sch_dat, aes(x = avg_iq, y = condval)) +
geom_smooth(method = "lm") +
geom_pointrange(aes(ymin = condval - 2*condsd, ymax = condval + 2*condsd), size = 0.25) +
theme_bw() + xlab("School-average IQ") + ylab("Slope for individual-level IQ")
Note that there is more discussion about how to calculate and compare confidence intervals of random effects (varying slopes); we will revisit this issue later during this workshop.
The main difference between generalized linear multilevel models and linear multilevel models is that coefficients from GLMLMs do not immediately translate into linear, continuous increases in \(y\) holding all other predictors constant. You can verify this by considering the equation for the multilevel logit model:
\[ \text{logit}(\text{Pr}(y_i = 1)) = \gamma_{0j[i]} + \beta_0 + \beta_1 x_{1i} + \varepsilon_i \]
Here, \(\beta_i\) is the effect of a one-unit increase in \(x_1\) on the logit transformation of the probability that \(y = 1\).
Per Gelman & Hill (2007: 82),
As a rule of convenience, we can take logistic regression coefficients (other than the constant term) and divide them by 4 to get an upper bound of the predictive difference corresponding to a unit difference in x. This upper bound is a reasonable approximation near the midpoint of the logistic curve, where probabilities are close to 0.5.
But you may also prefer to show other quantities, such as predicted probabilities, to facilitate interpretation of your results. In this lab, I show a couple of options on how to use convenient packages/functions in R to calculate such quantities.
Instead of re-estimating the multilevel logistic regression model from Lecture 9 (which involved quite a lot of observations), I’m using a different example here. I examine why people in 28 countries did or did not vote in the most recent election in their respective country. The data are adapted from the Comparative Study of Electoral Systems (CSES), which provides survey data from 28 countries from the mid-to late 2000s. Here, my outcome of interest is a binary indicator for whether a respondent voted or not. The covariates of interest are:
<- na.omit(rio::import("cses3_reduced.dta"))
d $employed_f <- factor(d$employed)
d$female_f <- factor(d$female)
d<- glmer(voted ~ employed_f + female_f + dissatisdem + age + education + gdpgrowth + (1 | cname),
m data = d,
family = binomial(link = "logit"))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
## - Rescale variables?
To take advantage of the nonlinear relationship estimated by the logit model, you should investigate the predicted probability of a “1” outcome across the range of each predictor. To do this, you can use the effects package. It works in two steps: first, calculate the Effect
of a predictor;1 second, plot
the predictor. The package provides a shorthand to plot all effects. We’ll look at this first. Note: because of the large number of parameters involved and large \(n\), this command takes some time to complete.
library("effects")
<- allEffects(mod = m)
all.effects summary(all.effects)
## model: voted ~ employed_f + female_f + dissatisdem + age + education +
## gdpgrowth
##
## employed_f effect
## employed_f
## 0 1
## 0.9119396 0.9257131
##
## Lower 95 Percent Confidence Limits
## employed_f
## 0 1
## 0.8677855 0.8876417
##
## Upper 95 Percent Confidence Limits
## employed_f
## 0 1
## 0.9423281 0.9515879
##
## female_f effect
## female_f
## 0 1
## 0.9214619 0.9178211
##
## Lower 95 Percent Confidence Limits
## female_f
## 0 1
## 0.8814798 0.8762592
##
## Upper 95 Percent Confidence Limits
## female_f
## 0 1
## 0.9487406 0.9462790
##
## dissatisdem effect
## dissatisdem
## 1 1.8 2.5 3.2 4
## 0.9477657 0.9321085 0.9149448 0.8939356 0.8644505
##
## Lower 95 Percent Confidence Limits
## dissatisdem
## 1 1.8 2.5 3.2 4
## 0.9197445 0.8969628 0.8722538 0.8423713 0.8011158
##
## Upper 95 Percent Confidence Limits
## dissatisdem
## 1 1.8 2.5 3.2 4
## 0.9663611 0.9558561 0.9442804 0.9300329 0.9098850
##
## age effect
## age
## 20 40 60 80 100
## 0.8330249 0.8987881 0.9405004 0.9656784 0.9804237
##
## Lower 95 Percent Confidence Limits
## age
## 20 40 60 80 100
## 0.7595620 0.8493253 0.9092665 0.9466718 0.9690744
##
## Upper 95 Percent Confidence Limits
## age
## 20 40 60 80 100
## 0.8873703 0.9332888 0.9614384 0.9780678 0.9876610
##
## education effect
## education
## 1 3 4 6 8
## 0.8342793 0.8789438 0.8971145 0.9263392 0.9477460
##
## Lower 95 Percent Confidence Limits
## education
## 1 3 4 6 8
## 0.7599596 0.8213435 0.8468829 0.8886288 0.9197422
##
## Upper 95 Percent Confidence Limits
## education
## 1 3 4 6 8
## 0.8889511 0.9197870 0.9321867 0.9519709 0.9663363
##
## gdpgrowth effect
## gdpgrowth
## -6 -3 0.9 5 8
## 0.9401372 0.9323203 0.9207431 0.9066452 0.8949437
##
## Lower 95 Percent Confidence Limits
## gdpgrowth
## -6 -3 0.9 5 8
## 0.8408327 0.8672317 0.8800233 0.8348691 0.7619235
##
## Upper 95 Percent Confidence Limits
## gdpgrowth
## -6 -3 0.9 5 8
## 0.9790307 0.9667241 0.9484522 0.9491242 0.9577618
Summarizing this object returns point estimates and 95% confidence intervals for predicted probabilities of attending religious services across the range of each predictor, while all other predictors are held at typical values. Typical values are, by default of the effects (or sjPlot) package, the mean for all covariates other than the one that is being plotted. This is an important point: these predicted probabilities do not show an “average” probability (or, by extension, an average change/difference in probabilities). Instead, they show predicted probabilities for an “average case.” Such an average case can be atypical (depending on the structure of the data) or, more importantly, it can limit inference to the “special” case that the effects package identifies as typical. You can find an informative discussion in Hanmer & Kalkan (2013).2
Considering the example below, what are the implications of the previous paragraph?
You can plot individual effects as follows, using the Effect()
function (see ?Effect
for more):
<- Effect(focal.predictors = "age", mod = m)
age.eff summary(age.eff)
##
## age effect
## age
## 20 40 60 80 100
## 0.8330249 0.8987881 0.9405004 0.9656784 0.9804237
##
## Lower 95 Percent Confidence Limits
## age
## 20 40 60 80 100
## 0.7595620 0.8493253 0.9092665 0.9466718 0.9690744
##
## Upper 95 Percent Confidence Limits
## age
## 20 40 60 80 100
## 0.8873703 0.9332888 0.9614384 0.9780678 0.9876610
plot(age.eff, type = "response", ylim = c(0, 1),
main = "Effect of age on Pr(Voting)",
xlab = "Age",
ylab = "Pr(Voting)")
The sjPlot package also offers some easy-to-access functions to plot or tabulate probabilities and marginal effects. Take a look at the vignette for some examples. Below are just a few.
library("sjPlot")
Predicted probability:
plot_model(m, type = "pred", terms = "age")
## Data were 'prettified'. Consider using `terms="age [all]"` to get smooth plots.
plot_model(m, type = "pred", terms = "dissatisdem")
Random effects (here, varying intercepts by country):
plot_model(m, type = "re")
## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.3.3
## Current Matrix version is 1.3.4
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
Odds ratios (percent change in the odds of voting…):
plot_model(m, type = "est") + scale_y_continuous()
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
The ggeffects package offers even more flexibility.
The margins package offers a marginal effects function similar to Stata’s margins
:
library("margins")
<- margins(m, data = d)
m_margins m_margins
## Average marginal effects
## dissatisdem age education gdpgrowth employed_f1 female_f1
## -0.03756 0.003107 0.01973 -0.004709 0.02004 -0.005309
As noted above, the approaches used so far calculate \(\text{Pr}(y_{ij} = 1)\) where all other covariates are set to “typical” values (their mean or median). Following Hanmer & Kalkan (2013), we can also predict probabilities for all observed cases and average them. Here is a quick illustration on how to do this. Two caveats:
?predict.merMod
.First, I re-estimate the model above with varying slopes for the dissatisdem
variable, capturing respondents’ dissatisfaction with democracy. This is just for illustration on how to show varying slopes carrying over into predicted probabilities.
<- glmer(voted ~ employed_f + female_f + dissatisdem + age + education + gdpgrowth + (1 + dissatisdem | cname),
m2 data = d,
family = binomial(link = "logit"))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
## - Rescale variables?
Next, I extract the “model frame” or dataset used to estimate the model (\(\mathbf{yX}\)) from the model object:
<- m2@frame mf
Now, I expand the dataset to contain the range of possible values of dissatisdem
for each respondent, in each country:
<- merge(data.frame(dissatisdem2 = 1:4), mf, by = NULL)
mf_sim $dissatisdem <- mf_sim$dissatisdem2
mf_sim<- select(mf_sim, -dissatisdem2)
mf_sim dim(mf_sim)
## [1] 139452 8
The resulting simulated dataset has 34863 \(\times\) 4 rows. Now, I predict \(\text{Pr}(y_{ij} = 1)\) for each observation. As a shortcut, I use the predict()
function, though I could also use the longer \(\text{Pr}(y_{ij} = 1) = \frac{\exp(\mathbf{Xb}}{1 + \exp(\mathbf{Xb}})\) formulation.
<- predict(object = m2,
m_pred_obs newdata = mf_sim,
type = "response")
I then add the newly generated \(\text{Pr}(y_{ij} = 1)\) (which is a probability, ranging from 0 to 1) to the simulated dataframe and summarize the predictions across levels of dissatisdem
for each country:
<- cbind(m_pred_obs, mf_sim)
mf_pred <- summarize(group_by(mf_pred, cname, dissatisdem),
m_pred_obs_sum pred = mean(m_pred_obs))
## `summarise()` has grouped output by 'cname'. You can override using the `.groups` argument.
Lastly, I plot the resulting probabilities:
ggplot(data = m_pred_obs_sum, aes(x = dissatisdem, y = pred)) +
geom_line() +
geom_point() +
facet_wrap(~ cname, ncol = 4) +
theme_bw() +
xlab("Dissatisfaction with democracy") +
ylab("Pr(Voted)")
As noted above, this method doesn’t generate standard errors or confidence intervals around predictions because these are not easily defined. See ?predict.merMod
.
Effect()
and plot_model()
come up with confidence intervals?These packages (sjPlot and ggeffects rely on the effects package) use the delta method to calculate uncertainty (standard errors and confidence intervals) around predicted probabilities. You can find more information on this method in, for instance, Scott Long’s 2005 article “Confidence Intervals for Predicted Outcomes in Regression Models for Categorical Outcomes”. Tl;dr:
Bayesian methods offer a “natural” option to carry estimation uncertainty over into predictions. Stay tuned for Tuesday…!
Note that the capital E
in the name of this function is important.↩︎
Hanmer, Michael J. and Kalkan, Kerem Ozan. 2013. “Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models.” American Journal of Political Science 57 (1): 263–277.↩︎