The purpose of this tutorial is to introduce some techniques to generate postestimation quantities of interest after estimating multilevel models with lmer() and glmer().

Please note that most of the plots (and models!) in this lab can and should be improved for publication-ready quality, but you will get to see some quick ways to present results from multilevel models in a somewhat more intuitive manner than a regression table.

Linear multilevel models

While linear multilevel models are straightforward to interpret, you may wish to show some additional information about the model beyond regression coefficients. Here, I briefly show how to plot varying intercepts and varying slopes.

The (linear) models are reproduced from Lecture 7; please refer to that lecture for details on the data and models.

dat <- rio::import("mlbook2_r.dat", format = "tsv")

Varying intercepts, unmodeled

library("lme4")
m_vi <- lmer(langPOST ~ IQ_verb + (1 | schoolnr), data = dat)
summary(m_vi)
## Linear mixed model fit by REML ['lmerMod']
## Formula: langPOST ~ IQ_verb + (1 | schoolnr)
##    Data: dat
## 
## REML criterion at convergence: 24917.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.1952 -0.6378  0.0659  0.7098  3.2132 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  schoolnr (Intercept)  9.909   3.148   
##  Residual             40.479   6.362   
## Number of obs: 3758, groups:  schoolnr, 211
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 41.05442    0.24402  168.24
## IQ_verb      2.50722    0.05439   46.09
## 
## Correlation of Fixed Effects:
##         (Intr)
## IQ_verb 0.003

Using the lattice::dotplot() function produces a dot-and-whisker plot of the varying intercepts (which we extract from the model object using the ranef function):

library("lattice")
dotplot(ranef(m_vi, condVar = TRUE))
## $schoolnr

Varying intercepts and slopes

We can also plot the varying intercepts and varying slopes if both are part of the model:

m_vi_vs <- lmer(langPOST ~ IQ_verb + sch_iqv + (1 + IQ_verb | schoolnr), 
                data = dat)
summary(m_vi_vs)
## Linear mixed model fit by REML ['lmerMod']
## Formula: langPOST ~ IQ_verb + sch_iqv + (1 + IQ_verb | schoolnr)
##    Data: dat
## 
## REML criterion at convergence: 24870.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2604 -0.6337  0.0676  0.7035  2.7622 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  schoolnr (Intercept)  8.9787  2.9964        
##           IQ_verb      0.1995  0.4466   -0.63
##  Residual             39.6858  6.2997        
## Number of obs: 3758, groups:  schoolnr, 211
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  41.1275     0.2347 175.248
## IQ_verb       2.4802     0.0645  38.453
## sch_iqv       1.0305     0.2633   3.913
## 
## Correlation of Fixed Effects:
##         (Intr) IQ_vrb
## IQ_verb -0.279       
## sch_iqv -0.002 -0.187
## optimizer (nloptwrap) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.00224192 (tol = 0.002, component 1)
dotplot(ranef(m_vi_vs, condVar = TRUE))
## $schoolnr

You will note that the range of the varying slopes is much smaller than that of the intercepts. So it may be preferable to plot them alone. Here, it may make sense to extract the random effects and switch to ggplot2 for plotting. The code below should work for any ranef object regardless of the names of the variables, as long as you provide the model name to ranef():

m_vi_vs_ranef <- as.data.frame(ranef(m_vi_vs, condVar = TRUE))
library("ggplot2")
ggplot(data = m_vi_vs_ranef,
       aes(y = grp, x = condval)) +
        geom_point() + 
        facet_wrap(~ term, scales = "free_x") +
        geom_errorbarh(aes(xmin = condval - 2*condsd, xmax = condval + 2*condsd),
                       height = 0) + 
        scale_y_discrete(breaks = FALSE) + 
        theme_bw()

To plot just one of the random effects, filter the data frame for the variable for which you wish to plot the random effect - here, “IQ_verb”

m_vi_vs_ranef <- as.data.frame(ranef(m_vi_vs, condVar = TRUE))
suppressPackageStartupMessages(library("tidyverse"))
ggplot(data = filter(m_vi_vs_ranef, term == "IQ_verb"),
       aes(y = reorder(grp, condval), x = condval)) +
        geom_point() + 
        geom_errorbarh(aes(xmin = condval - 2*condsd, xmax = condval + 2*condsd),
                       height = 0) + 
        scale_y_discrete(breaks = FALSE) + 
        theme_bw()

Scatterplots with slopes

Depending on the size of your data, you may want to plot varying slopes by group. To facilitate display, below I plot the lines for each of the 211 schools, but color them based on the decile (10th percentiles) of the average school-level IQ. I achieve this by combining the cut() function with the quantile() function. See the help files for each for more detail. In addition, I choose a colorblind-friendly palette using the viridis package.

library("viridis")
ggplot(dat, aes(x = IQ_verb, y = langPOST)) +
  geom_jitter(alpha = 0.1) +
  geom_line(aes(y = predict(m_vi_vs),
                group = schoolnr, 
                color = cut(sch_iqv, breaks = quantile(sch_iqv, probs = seq(0, 1, by = 0.1)), include.lowest = TRUE)), 
            size = 0.25, alpha = 0.75) + 
  scale_color_viridis(discrete = TRUE, option = "plasma") + 
  labs(color = "Average school-level IQ") + 
   guides(color = guide_legend(override.aes = list(size = 2))) + 
  theme_bw()

If you estimated a slope to vary by schools, you can then plot the point estimates and confidence intervals around the slopes along the range of a level-2 covariate of interest, similar to Figure 13.2 (b) in Gelman & Hill.

First, estimate the model:

m_vi_vs_modeled <- lmer(langPOST ~ IQ_verb + (1 + IQ_verb | schoolnr), data = dat)

Then, extract the random effects and filter for the random effects of the slope:

m_vi_vs_ranef <- as.data.frame(ranef(m_vi_vs_modeled, condVar = TRUE))
iq_slopes <- filter(m_vi_vs_ranef, term == "IQ_verb")

Here, I convert the school ID into a numeric variable for merging:

iq_slopes$schoolnr <- as.numeric(as.character(iq_slopes$grp)) 

Because the slope for each school is the combination of the “random effect” (in lmer language) and the “fixed effect” (2.519758), I add it to obtain the correct slope for each school:

iq_slopes$condval <- iq_slopes$condval + fixef(m_vi_vs_modeled)[2]

This slope can, by the way, also obtained using coef() on the model object:

head(iq_slopes$condval)
## [1] 2.512030 3.141586 2.601269 2.369620 3.074282 2.835133
head(coef(m_vi_vs_modeled)$schoolnr$IQ_verb)
## [1] 2.512030 3.141586 2.601269 2.369620 3.074282 2.835133

Before plotting the slopes against school-average IQ, I need to create a dataset containing these school-averages:

iq_scores <- summarize(group_by(dat, schoolnr),
                       avg_iq = mean(sch_iqv))

Now, I can merge slopes and school-average IQ…

iq_sch_dat <- left_join(x = iq_slopes, y = iq_scores,
                       by = c("schoolnr"))
head(iq_sch_dat)
##     grpvar    term grp  condval    condsd schoolnr  avg_iq
## 1 schoolnr IQ_verb   1 2.512030 0.2432017        1 -1.4039
## 2 schoolnr IQ_verb   2 3.141586 0.2856793        2 -2.8668
## 3 schoolnr IQ_verb   3 2.601269 0.3277434        3 -0.3668
## 4 schoolnr IQ_verb   9 2.369620 0.3588022        9  0.9059
## 5 schoolnr IQ_verb  10 3.074282 0.3380855       10 -1.3668
## 6 schoolnr IQ_verb  12 2.835133 0.2657368       12 -2.3374

… and plot my varying slopes:

ggplot(data = iq_sch_dat, aes(x = avg_iq, y = condval)) + 
  geom_smooth(method = "lm") + 
  geom_pointrange(aes(ymin = condval - 2*condsd, ymax = condval + 2*condsd), size = 0.25) + 
  theme_bw() + xlab("School-average IQ") + ylab("Slope for individual-level IQ")

Note that there is more discussion about how to calculate and compare confidence intervals of random effects (varying slopes); we will revisit this issue later during this workshop.

Generalized linear multilevel models

The main difference between generalized linear multilevel models and linear multilevel models is that coefficients from GLMLMs do not immediately translate into linear, continuous increases in \(y\) holding all other predictors constant. You can verify this by considering the equation for the multilevel logit model:

\[ \text{logit}(\text{Pr}(y_i = 1)) = \gamma_{0j[i]} + \beta_0 + \beta_1 x_{1i} + \varepsilon_i \]

Here, \(\beta_i\) is the effect of a one-unit increase in \(x_1\) on the logit transformation of the probability that \(y = 1\).

Per Gelman & Hill (2007: 82),

As a rule of convenience, we can take logistic regression coefficients (other than the constant term) and divide them by 4 to get an upper bound of the predictive difference corresponding to a unit difference in x. This upper bound is a reasonable approximation near the midpoint of the logistic curve, where probabilities are close to 0.5.

But you may also prefer to show other quantities, such as predicted probabilities, to facilitate interpretation of your results. In this lab, I show a couple of options on how to use convenient packages/functions in R to calculate such quantities.

Instead of re-estimating the multilevel logistic regression model from Lecture 9 (which involved quite a lot of observations), I’m using a different example here. I examine why people in 28 countries did or did not vote in the most recent election in their respective country. The data are adapted from the Comparative Study of Electoral Systems (CSES), which provides survey data from 28 countries from the mid-to late 2000s. Here, my outcome of interest is a binary indicator for whether a respondent voted or not. The covariates of interest are:

  • employment status
  • sex
  • dissatisfaction with democracy (an ordinal index)
  • age (measured in years)
  • education
  • and the country-level covariate GDP growth, which measures economic growth.
d <- na.omit(rio::import("cses3_reduced.dta"))
d$employed_f <- factor(d$employed)
d$female_f <- factor(d$female)
m <- glmer(voted ~ employed_f + female_f + dissatisdem + age + education + gdpgrowth + (1 | cname),
           data = d,
           family = binomial(link = "logit"))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?

Predicted probabilities from the effects package

To take advantage of the nonlinear relationship estimated by the logit model, you should investigate the predicted probability of a “1” outcome across the range of each predictor. To do this, you can use the effects package. It works in two steps: first, calculate the Effect of a predictor;1 second, plot the predictor. The package provides a shorthand to plot all effects. We’ll look at this first. Note: because of the large number of parameters involved and large \(n\), this command takes some time to complete.

library("effects")
all.effects <- allEffects(mod = m)
summary(all.effects)
##  model: voted ~ employed_f + female_f + dissatisdem + age + education + 
##     gdpgrowth
## 
##  employed_f effect
## employed_f
##         0         1 
## 0.9119396 0.9257131 
## 
##  Lower 95 Percent Confidence Limits
## employed_f
##         0         1 
## 0.8677855 0.8876417 
## 
##  Upper 95 Percent Confidence Limits
## employed_f
##         0         1 
## 0.9423281 0.9515879 
## 
##  female_f effect
## female_f
##         0         1 
## 0.9214619 0.9178211 
## 
##  Lower 95 Percent Confidence Limits
## female_f
##         0         1 
## 0.8814798 0.8762592 
## 
##  Upper 95 Percent Confidence Limits
## female_f
##         0         1 
## 0.9487406 0.9462790 
## 
##  dissatisdem effect
## dissatisdem
##         1       1.8       2.5       3.2         4 
## 0.9477657 0.9321085 0.9149448 0.8939356 0.8644505 
## 
##  Lower 95 Percent Confidence Limits
## dissatisdem
##         1       1.8       2.5       3.2         4 
## 0.9197445 0.8969628 0.8722538 0.8423713 0.8011158 
## 
##  Upper 95 Percent Confidence Limits
## dissatisdem
##         1       1.8       2.5       3.2         4 
## 0.9663611 0.9558561 0.9442804 0.9300329 0.9098850 
## 
##  age effect
## age
##        20        40        60        80       100 
## 0.8330249 0.8987881 0.9405004 0.9656784 0.9804237 
## 
##  Lower 95 Percent Confidence Limits
## age
##        20        40        60        80       100 
## 0.7595620 0.8493253 0.9092665 0.9466718 0.9690744 
## 
##  Upper 95 Percent Confidence Limits
## age
##        20        40        60        80       100 
## 0.8873703 0.9332888 0.9614384 0.9780678 0.9876610 
## 
##  education effect
## education
##         1         3         4         6         8 
## 0.8342793 0.8789438 0.8971145 0.9263392 0.9477460 
## 
##  Lower 95 Percent Confidence Limits
## education
##         1         3         4         6         8 
## 0.7599596 0.8213435 0.8468829 0.8886288 0.9197422 
## 
##  Upper 95 Percent Confidence Limits
## education
##         1         3         4         6         8 
## 0.8889511 0.9197870 0.9321867 0.9519709 0.9663363 
## 
##  gdpgrowth effect
## gdpgrowth
##        -6        -3       0.9         5         8 
## 0.9401372 0.9323203 0.9207431 0.9066452 0.8949437 
## 
##  Lower 95 Percent Confidence Limits
## gdpgrowth
##        -6        -3       0.9         5         8 
## 0.8408327 0.8672317 0.8800233 0.8348691 0.7619235 
## 
##  Upper 95 Percent Confidence Limits
## gdpgrowth
##        -6        -3       0.9         5         8 
## 0.9790307 0.9667241 0.9484522 0.9491242 0.9577618

Summarizing this object returns point estimates and 95% confidence intervals for predicted probabilities of attending religious services across the range of each predictor, while all other predictors are held at typical values. Typical values are, by default of the effects (or sjPlot) package, the mean for all covariates other than the one that is being plotted. This is an important point: these predicted probabilities do not show an “average” probability (or, by extension, an average change/difference in probabilities). Instead, they show predicted probabilities for an “average case.” Such an average case can be atypical (depending on the structure of the data) or, more importantly, it can limit inference to the “special” case that the effects package identifies as typical. You can find an informative discussion in Hanmer & Kalkan (2013).2

Considering the example below, what are the implications of the previous paragraph?

You can plot individual effects as follows, using the Effect() function (see ?Effect for more):

age.eff <- Effect(focal.predictors = "age", mod = m)
summary(age.eff)
## 
##  age effect
## age
##        20        40        60        80       100 
## 0.8330249 0.8987881 0.9405004 0.9656784 0.9804237 
## 
##  Lower 95 Percent Confidence Limits
## age
##        20        40        60        80       100 
## 0.7595620 0.8493253 0.9092665 0.9466718 0.9690744 
## 
##  Upper 95 Percent Confidence Limits
## age
##        20        40        60        80       100 
## 0.8873703 0.9332888 0.9614384 0.9780678 0.9876610
plot(age.eff, type = "response", ylim = c(0, 1),
     main = "Effect of age on Pr(Voting)",
     xlab = "Age",
     ylab = "Pr(Voting)")

Marginal effects and predicted probabilities via the sjPlot and ggeffects packages

The sjPlot package also offers some easy-to-access functions to plot or tabulate probabilities and marginal effects. Take a look at the vignette for some examples. Below are just a few.

library("sjPlot")

Predicted probability:

plot_model(m, type = "pred", terms = "age")
## Data were 'prettified'. Consider using `terms="age [all]"` to get smooth plots.

plot_model(m, type = "pred", terms = "dissatisdem")

Random effects (here, varying intercepts by country):

plot_model(m, type = "re")
## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.3.3
## Current Matrix version is 1.3.4
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package

Odds ratios (percent change in the odds of voting…):

plot_model(m, type = "est") + scale_y_continuous()
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

The ggeffects package offers even more flexibility.

Marginal effects using the margins package

The margins package offers a marginal effects function similar to Stata’s margins:

library("margins")
m_margins <- margins(m, data = d)
m_margins
## Average marginal effects
##  dissatisdem      age education gdpgrowth employed_f1 female_f1
##     -0.03756 0.003107   0.01973 -0.004709     0.02004 -0.005309

Predicted probabilities using the observed-case approach

As noted above, the approaches used so far calculate \(\text{Pr}(y_{ij} = 1)\) where all other covariates are set to “typical” values (their mean or median). Following Hanmer & Kalkan (2013), we can also predict probabilities for all observed cases and average them. Here is a quick illustration on how to do this. Two caveats:

  • 85% of respondents stated that they voted, so the predicted probabilities will be pretty high across the board, especially in countries with mandatory voting.
  • The method below doesn’t generate standard errors or confidence intervals around predictions because these are not easily defined. See ?predict.merMod.

First, I re-estimate the model above with varying slopes for the dissatisdem variable, capturing respondents’ dissatisfaction with democracy. This is just for illustration on how to show varying slopes carrying over into predicted probabilities.

m2 <- glmer(voted ~ employed_f + female_f + dissatisdem + age + education + gdpgrowth + (1 + dissatisdem | cname),
           data = d,
           family = binomial(link = "logit"))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?

Next, I extract the “model frame” or dataset used to estimate the model (\(\mathbf{yX}\)) from the model object:

mf <- m2@frame

Now, I expand the dataset to contain the range of possible values of dissatisdem for each respondent, in each country:

mf_sim <- merge(data.frame(dissatisdem2 = 1:4), mf, by = NULL)
mf_sim$dissatisdem <- mf_sim$dissatisdem2
mf_sim <- select(mf_sim, -dissatisdem2)
dim(mf_sim)
## [1] 139452      8

The resulting simulated dataset has 34863 \(\times\) 4 rows. Now, I predict \(\text{Pr}(y_{ij} = 1)\) for each observation. As a shortcut, I use the predict() function, though I could also use the longer \(\text{Pr}(y_{ij} = 1) = \frac{\exp(\mathbf{Xb}}{1 + \exp(\mathbf{Xb}})\) formulation.

m_pred_obs <- predict(object = m2,
                      newdata = mf_sim,
                      type = "response")

I then add the newly generated \(\text{Pr}(y_{ij} = 1)\) (which is a probability, ranging from 0 to 1) to the simulated dataframe and summarize the predictions across levels of dissatisdem for each country:

mf_pred <- cbind(m_pred_obs, mf_sim)
m_pred_obs_sum <- summarize(group_by(mf_pred, cname, dissatisdem),
                            pred = mean(m_pred_obs))
## `summarise()` has grouped output by 'cname'. You can override using the `.groups` argument.

Lastly, I plot the resulting probabilities:

ggplot(data = m_pred_obs_sum, aes(x = dissatisdem, y = pred)) + 
  geom_line() + 
  geom_point() +
  facet_wrap(~ cname, ncol = 4) + 
  theme_bw() + 
  xlab("Dissatisfaction with democracy") + 
  ylab("Pr(Voted)")

As noted above, this method doesn’t generate standard errors or confidence intervals around predictions because these are not easily defined. See ?predict.merMod.

How do Effect() and plot_model() come up with confidence intervals?

These packages (sjPlot and ggeffects rely on the effects package) use the delta method to calculate uncertainty (standard errors and confidence intervals) around predicted probabilities. You can find more information on this method in, for instance, Scott Long’s 2005 article “Confidence Intervals for Predicted Outcomes in Regression Models for Categorical Outcomes”. Tl;dr:

  • the delta method is an approximation for uncertainty around predictions and should be treated as such
  • the delta method allows us to generate uncertainty only for predictions based on “fixed effects”, but not “random effects” such as varying slopes.

More postestimation for Bayesian multilevel models

Bayesian methods offer a “natural” option to carry estimation uncertainty over into predictions. Stay tuned for Tuesday…!


  1. Note that the capital E in the name of this function is important.↩︎

  2. Hanmer, Michael J. and Kalkan, Kerem Ozan. 2013. “Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models.” American Journal of Political Science 57 (1): 263–277.↩︎