This tutorial shows you:

how to specify regression models with interaction terms
(briefly) how to interpret interaction terms (refer to AR ch. 7 and Brambor et al. for a more detailed treatment)
how to graph marginal/conditional effects from regression estimates

As always, please note that this tutorial only accompanies the other course materials and that you are expected to have worked through assigned reading before tackling this tutorial.

Note: You must have read Brambor, Clark & Golder (2006) before working on this tutorial.

Interaction terms in regression

Interaction terms enter the regression equation as the product of two constitutive terms, \(x_1\) and \(x_2\). For this product term \(x_1 x_3\), the regression equation adds a separate coefficient \(\beta_3\).

\[ y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \varepsilon \]

Example 1: one binary, one continuous term

In thinking about interaction terms, it helps to first simplify by working through the prediction of the regression equation for different values of two predictors, \(x_1\) and \(x_2\). We can imagine a continuous outcome \(y\), e.g. the income of Hollywood actors, that we predict with two variables. The first, \(x_1\), is a binary variable such as female gender; it takes on values of 0 (males) and 1 (females) only. The second, \(x_2\), is a continuous variable, ranging from \(-5\) to \(+5\), such as a (centered and standardized) measure of age. In this case, I’m using the term “effect” loosely and non-causally. An interaction term expresses the idea that the effect of one variable depends on the value of the other variable. With these variables, this suggests that effect of age on actors’ income is different for male actors than for female actors.

\(\beta_1\) is the effect of \(x_1\) on \(y\) when \(x_2\) is 0:
- \(\hat{y} = \alpha + \beta_1 x_1 + \beta_2 \times 0 + \beta_3 x_1 \times 0\)
- \(\hat{y} = \alpha + \beta_1 x_1 + 0 + 0\)
\(\beta_2\) is the effect of \(x_2\) on \(y\) when \(x_1\) is 0:
- \(\hat{y} = \alpha + \beta_1 \times 0 + \beta_2 x_2 + \beta_3 \times 0 \times x_2\)
- \(\hat{y} = \alpha + \beta_2 x_2 + 0\)
When both \(x_1\) and \(x_2\) are not 0, \(\beta_3\) becomes important, and the effect of \(x_1\) now varies with the value of \(x_2\). We can plug in 1 for \(x_1\) and simplify the equation as follows:
- \(\hat{y} = \alpha + \beta_1 \times 1 + \beta_2 x_2 + \beta_3 \times 1 \times x_2\)
- \(\hat{y} = \alpha + \beta_1 + \beta_2 x_2 + \beta_3 \times x_2\)
- \(\hat{y} = (\alpha + \beta_1) + (\beta_2 + \beta_3) \times x_2\)

With simulated data, this can be illustrated easily. I begin by simulating a dataset with 200 observations, two predictors \(x_1\) (binary: male/female) and \(x_2\) (continuous: standardized and centered age), and create \(y\) (continuous: income) as the linear combination of \(x_1\), \(x_2\), and an interaction term of the two predictors.

set.seed(123)
n.sample <- 200
x1 <- rbinom(n.sample, size = 1, prob = 0.5)
x2 <- runif(n.sample, -5, 5)
a <- 5
b1 <- 3
b2 <- 4
b3 <- -3
e <- rnorm(n.sample, 0, 5)
y <- a + b1 * x1 + b2 * x2 + b3 * x1 * x2 + e
sim.dat <- data.frame(y, x1, x2)

Before advancing, this is what the simulated data look like:

par(mfrow = c(1, 3))
hist(sim.dat$y)
hist(sim.dat$x1)
hist(sim.dat$x2)

par(mfrow = c(1, 1))

For convenience, you could also use the multi.hist() function from the “psych” package. It automatically adds a density curve (dashed line) and a normal density plot (dotted line).

library("psych")
multi.hist(sim.dat, nrow = 1)

Fitting a model using what we know about the data-generating process gives us:

mod.sim <- lm(y ~ x1 * x2, dat = sim.dat)
summary(mod.sim)

## 
## Call:
## lm(formula = y ~ x1 * x2, data = sim.dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1974  -3.4874  -0.0738   3.4837  13.2178 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.7891     0.4924   9.726  < 2e-16 ***
## x1            3.8716     0.7081   5.467 1.38e-07 ***
## x2            3.9554     0.1663  23.790  < 2e-16 ***
## x1:x2        -2.9435     0.2421 -12.160  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.997 on 196 degrees of freedom
## Multiple R-squared:  0.7615, Adjusted R-squared:  0.7579 
## F-statistic: 208.6 on 3 and 196 DF,  p-value: < 2.2e-16

First, I plot the relationship between the continuous predictor \(x_2\) and \(y\) for all observations where \(x_1 = 0\). The regression line is defined by an intercept \(\alpha\) and a slope \(\beta_2\), as we calculated above. I can extract these from the lm object above using the coef() function. I use the index in square brackets to extract the coefficient I need.

coef(mod.sim)

## (Intercept)          x1          x2       x1:x2 
##    4.789084    3.871614    3.955383   -2.943516

coef(mod.sim)[1]

## (Intercept) 
##    4.789084

plot(x = sim.dat[sim.dat$x1 == 0, ]$x2, y = sim.dat[sim.dat$x1 == 0, ]$y, 
     col = rgb(red = 0, green = 0, blue = 1, alpha = 0.25), pch = 19,
     xlab = "x2", ylab = "y")
abline(a = coef(mod.sim)[1], b = coef(mod.sim)[3], col = "blue", pch = 19, lwd = 2)

For all cases where \(x_1 = 0\), what is the relationship between \(x_2\) and \(y\)?

Next, we can add the data points where \(x_1 = 1\), and add the separate regression line for these points:

plot(x = sim.dat[sim.dat$x1 == 0, ]$x2, y = sim.dat[sim.dat$x1 == 0, ]$y, 
     pch = 19, xlab = "x2", ylab = "y", col = rgb(red = 0, green = 0, blue = 1, alpha = 0.25))
abline(a = coef(mod.sim)[1], b = coef(mod.sim)[3], col = "blue", lwd = 2)
points(x = sim.dat[sim.dat$x1 == 1, ]$x2, y = sim.dat[sim.dat$x1 == 1, ]$y, 
       col = rgb(red = 1, green = 0, blue = 0, alpha = 0.25), pch = 19)
abline(a = coef(mod.sim)[1] + coef(mod.sim)[2], b = coef(mod.sim)[3] + coef(mod.sim)[4], 
       col = "red", lwd = 2)

For all cases where \(x_1 = 1\), what is the relationship between \(x_2\) and \(y\)?

From this first exercise, you should take home a few things:

Specifying an interaction

Although the tutorial does not discuss this, in the prevalent case of using null hypothesis significance tests on interaction effects, interaction terms should be derived from theory.
In R, you specify an interaction term by putting an asterisk between the two constitutive terms: x1 * x2

You always need to include the constitutive terms in your model.

See Brambor et al. (2006) for more details. What would it imply if your model only included \(\beta_3 x_1 x_2\)? Omitting a coefficient is equivalent to setting \(\beta_1 = 0\) and \(\beta_2 = 0\). See for yourself:

mod.sim2 <- lm(y ~ x1:x2, data = sim.dat)
summary(mod.sim2)

## 
## Call:
## lm(formula = y ~ x1:x2, data = sim.dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.6719  -4.5727   0.6632   5.8826  24.6120 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.6555     0.7078   9.404  < 2e-16 ***
## x1:x2         0.9588     0.3513   2.729  0.00692 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.995 on 198 degrees of freedom
## Multiple R-squared:  0.03625,    Adjusted R-squared:  0.03138 
## F-statistic: 7.448 on 1 and 198 DF,  p-value: 0.006924

plot(x = sim.dat[sim.dat$x1 == 0, ]$x2, y = sim.dat[sim.dat$x1 == 0, ]$y, 
     pch = 19, col = rgb(red = 0, green = 0, blue = 1, alpha = 0.25),
     xlab = "x2", ylab = "y")
points(x = sim.dat[sim.dat$x1 == 1, ]$x2, y = sim.dat[sim.dat$x1 == 1, ]$y, 
       col = rgb(red = 1, green = 0, blue = 0, alpha = 0.25), pch = 19)
abline(a = coef(mod.sim2)[1], b = coef(mod.sim2)[2], lwd = 2)

Note: R will automatically include the constitutive terms when you use the asterisk as above.

y ~ x1 + x2 + x1 * x2 is equivalent to y ~ x1 * x2

Example 2: Two continuous variables

The first example illustrates the general logic of interaction terms. It carries over directly into the case of an interaction term between two continuous variables. Let’s now simulate another (fake) dataset, with a continuous outcome \(y\) being the income of professional baseball players. We can assume that \(x_1\) is a continuous variable and distributed normally with a mean of 2 and a standard deviation of 5 (e.g., a standardized measure of injury risk of each player). As previously, \(x_2\) is a continuous variable (such as standardized & centered age), ranging from \(-5\) to \(+5\). We might be thinking of a conditional effect, where older athletes make more — but only if they are rated as low injury risk. Older athletes with higher injury risk might be making less money. With this configuration,

\(\beta_1\) is the effect of \(x_1\) on \(y\) when \(x_2\) is 0:
- \(\hat{y} = \alpha + \beta_1 x_1 + \beta_2 \times 0 + \beta_3 x_1 \times 0\)
- \(\hat{y} = \alpha + \beta_1 x_1 + 0 + 0\)
\(\beta_2\) is the effect of \(x_2\) on \(y\) when \(x_1\) is 0:
- \(\hat{y} = \alpha + \beta_1 \times 0 + \beta_2 x_2 + \beta_3 \times 0 \times x_2\)
- \(\hat{y} = \alpha + \beta_2 x_2 + 0\)
When both \(x_1\) and \(x_2\) are not 0, \(\beta_3\) becomes important, and the effect of \(x_1\) now varies with the value of \(x_2\). We can again plug in 1 for \(x_1\) and simplify the equation as follows:
- \(\hat{y} = \alpha + \beta_1 \times 1 + \beta_2 x_2 + \beta_3 \times 1 \times x_2\)
- \(\hat{y} = \alpha + \beta_1 + \beta_2 x_2 + \beta_3 \times x_2\)
- \(\hat{y} = (\alpha + \beta_1) + (\beta_2 + \beta_3) \times x_2\)
But \(x_1\) takes on many other values than just 1. The more general equation for \(y\) is then (cf. Brambor et al. p. 11)
- \(y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2\)
- In this case, we cannot define just “two” equations (for \(x_1 = 1\) and \(x_1 = 0\)). Instead, the effect of \(x_1\) varies with \(x_2\), and vice versa.
- To express this, we use the term “marginal effects” or “conditional effects”: the effect of \(x_1\) conditional on the value of \(x_2\).
- This marginal effect for \(x_1\) is a slope, and this slope is the sum of the coefficient for \(x_1\) and the product of \(\beta_3\) and the value of \(x_2\): \(\beta_1 + \beta_3 \times x_2\).
- Similarly, the marginal effect for \(x_2\) is \(\beta_2 + \beta_3 \times x_1\).

I use the same simulation setup above, but make \(x_1\) a continuous variable, normally distributed with a mean of 2 and a standard deviation of 5.

set.seed(123)
n.sample <- 200
x1 <- rnorm(n.sample, mean = 2, sd = 5)
x2 <- runif(n.sample, -5, 5)
a <- 5
b1 <- 3
b2 <- 4
b3 <- -3
e <- rnorm(n.sample, 0, 20)
y <- a + b1 * x1 + b2 * x2 + b3 * x1 * x2 + e
sim.dat2 <- data.frame(y, x1, x2)

Before advancing, this is what the simulated data look like:

multi.hist(sim.dat2, nrow = 1)

Fitting a model using what we know about the data-generating process gives us:

mod.sim3 <- lm(y ~ x1 * x2, dat = sim.dat2)
summary(mod.sim3)

## 
## Call:
## lm(formula = y ~ x1 * x2, data = sim.dat2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -54.539 -12.129   0.986  12.424  51.132 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.4950     1.5581   4.168 4.60e-05 ***
## x1            2.5923     0.3059   8.475 5.59e-15 ***
## x2            4.1026     0.5325   7.704 6.41e-13 ***
## x1:x2        -3.0383     0.1017 -29.883  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.34 on 196 degrees of freedom
## Multiple R-squared:  0.8349, Adjusted R-squared:  0.8323 
## F-statistic: 330.3 on 3 and 196 DF,  p-value: < 2.2e-16

We can now calculate the effect of \(x_1\) on \(y\) at different levels of \(x_2\), say, across the range of \(x_2\) with increments of 1. In R, I create this sequence using the seq() function.

x2.sim <- seq(from = -5, to = 5, by = 1)
x2.sim

##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

eff.x1 <- coef(mod.sim3)[2] + coef(mod.sim3)[4] * x2.sim

I can now list the effect of \(x_1\) at different levels of \(x_2\):

eff.dat <- data.frame(x2.sim, eff.x1)
eff.dat

##    x2.sim      eff.x1
## 1      -5  17.7837936
## 2      -4  14.7454971
## 3      -3  11.7072007
## 4      -2   8.6689042
## 5      -1   5.6306077
## 6       0   2.5923113
## 7       1  -0.4459852
## 8       2  -3.4842817
## 9       3  -6.5225781
## 10      4  -9.5608746
## 11      5 -12.5991711

The object eff.x1 is now the coefficient of \(x_1\) at the respective levels of \(x_2\). \(x_2\) in this context is also called the “moderator” or “moderating variable” because it moderates the effect of \(x_1\). You can see that \(x_1\) exhibits a positive effect on \(y\) when \(x_2\) is approximately smaller than 1, at which point the effect of \(x_1\) on \(y\) turns negative.

You can now visualize this, just as you did in the binary-continuous interaction above. Because \(x_2\), the moderating variable, is now continuous, there is an (almost) infinite number of separate regression lines for \(x_1\) and \(y\): each individual value of \(x_2\) creates a separate regression line (with a separate slope coefficient). The following scatterplot plots observations against their values on \(x_1\) and \(7\) and colors them based on their value of \(x_2\), as in the previous example.

rbPal <- colorRampPalette(colors = c("red", "blue"))
sim.dat2$x2.col <- rbPal(10)[as.numeric(cut(sim.dat2$x2,breaks = 10))]
plot(x = sim.dat2$x1, y = sim.dat2$y, col = sim.dat2$x2.col,
     pch = 19, xlab = "x1", ylab = "y")

Alternatively, ggplot2 makes it easier to color lines conditionally on the value of another variable:

library("ggplot2")
p <- ggplot(data = sim.dat2, aes(x = x1, y = y)) + geom_point(aes(color = x2)) + 
  scale_colour_continuous(low ="red", high ="blue")
p + theme_bw()

–>

Then, I’m adding 10 regression lines for the 10 values of \(x_2\) that I just calculated. The lines are also colored according to the values of \(x_2\).

eff.dat$x2.col <- rbPal(10)[as.numeric(cut(eff.dat$x2,breaks = 10))]
plot(x = sim.dat2$x1, y = sim.dat2$y, col = sim.dat2$x2.col,
     pch = 19, xlab = "x1", ylab = "y")
apply(eff.dat, 1, function(x) abline(a = coef(mod.sim3)[1], b = x[2], col = x[3]))

## NULL

You can see that at low values of \(x_2\) (red), the relationship between \(x_1\) and \(y\) is positive (upward lines), whereas at higher values of \(x_2\) (blue), the relationship is negative (downward lines).

p + geom_abline(data = eff.dat, aes(intercept = coef(mod.sim3)[1],
                                    slope = eff.x1, color = x2.sim)) + 
  theme_bw()

Why it is important to inspect your data first

As you’ve heard before, and as Hainmueller et al. (2019) imply, you should always inspect your data. This also applies to interactions. Always investigate whether:

your data have observations across the whole range of each of the constitutive terms. It is best to do this by creating simple scatterplots of the form you see in Hainmueller et al.
the raw relationship between one explanatory variable and the outcome variable is linear at different levels of the moderating variable. This is easy for binary moderators. For continuous moderators, Hainmueller et al. suggest splitting the observations in low, medium, and high levels of the continuous moderator.

You should see this as an opportunity to learn something about your data rather than a nuisance.

Here is an example of how I would inspect my data from the first model, which contained an interaction between a dichotomous (x1) and continuous (x2) variable:

par(mfrow = c(1, 2))
plot(x = sim.dat[sim.dat$x1 == 0, ]$x2,
     y = sim.dat[sim.dat$x1 == 0, ]$y,
     main = "x1 = 0", xlab = "x2", ylab = "y")
plot(x = sim.dat[sim.dat$x1 == 1, ]$x2,
     y = sim.dat[sim.dat$x1 == 1, ]$y,
     main = "x1 = 1", xlab = "x2", ylab = "y")

par(mfrow = c(1, 1))

I see no reason to be concerned about the distribution of x2 or the linearity of the relationship between x2 and y, and this makes sense, given how I simulated the data. But see Hainmueller et al. for examples where this does become a problem.

Next, here is an example of how I would inspect my data from the third model above, which contained an interaction between a continuous (x1) and continuous (x2) variable. First, I split the moderator (again, I choose x1) in three groups. For this, I use the cut() function (see the R Cookbook for more info).

sim.dat2$x1_tri <- cut(x = sim.dat2$x1, 
                       breaks = quantile(x = sim.dat2$x1, probs = c(0, 0.33, 0.66, 1)),
                       include.lowest = TRUE)
table(sim.dat2$x1_tri)

## 
## [-9.55,-0.151]  (-0.151,3.66]    (3.66,18.2] 
##             66             66             68

par(mfrow = c(1, 3))
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[1], ]$x2,
     y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[1], ]$y,
     main = paste("x1 = ", levels(sim.dat2$x1_tri)[1], sep = ""),
     xlab = "x2", ylab = "y")
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[2], ]$x2,
     y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[2], ]$y,
     main = paste("x1 = ", levels(sim.dat2$x1_tri)[2], sep = ""),
     xlab = "x2", ylab = "y")
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[3], ]$x2,
     y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[3], ]$y,
     main = paste("x1 = ", levels(sim.dat2$x1_tri)[3], sep = ""),
     xlab = "x2", ylab = "y")

par(mfrow = c(1, 1))

Again, given how I simulated these data, there is no reason to be concerned about the distribution of x2 or the linearity of the relationship between x2 and y.

Obtaining standard errors for marginal effects

As usual, you should evaluate the variance around coefficients. The online appendix to Brambor et al. (2006) gives you the formula for this variance:

For the variance of the marginal/conditional effect of \(x_1\), use this equation:

\[ \text{Var} = \text{Var} (\beta_1) + x_2^2 \times \text{Var} (\beta_3) + 2 \times x_2 \times \text{Cov} (\beta_1, \beta_3) \]

The standard error is then easily calculated as the square root of the variance.

Note that this equation makes use of the covariance of estimates. You learned about variance and covariance earlier, and you can access the variance-covariance matrix with the vcov() function. You can then extract cells of the matrix by using the familiar square brackets, where the first number stands for rows and the second for columns.

vcov(mod.sim3)

##               (Intercept)            x1            x2         x1:x2
## (Intercept)  2.4277430885 -0.1831417946 -0.0009989293 -0.0018730849
## x1          -0.1831417946  0.0935660448 -0.0018849223  0.0007608924
## x2          -0.0009989293 -0.0018849223  0.2835933457 -0.0200800974
## x1:x2       -0.0018730849  0.0007608924 -0.0200800974  0.0103373423

vcov(mod.sim3)[1, 1]

## [1] 2.427743

Now, you can obtain the standard error of the respective coefficients by taking the square root of the variance. In this code, I’m nesting the equation above in the sqrt() function.

eff.dat$se.eff.x1 <- sqrt(vcov(mod.sim3)[2, 2] + 
                            eff.dat$x2.sim^2 * vcov(mod.sim3)[4, 4] + 
                            2 * eff.dat$x2.sim * vcov(mod.sim3)[2, 4])
eff.dat

##    x2.sim      eff.x1  x2.col se.eff.x1
## 1      -5  17.7837936 #FF0000 0.5868481
## 2      -4  14.7454971 #FF0000 0.5028682
## 3      -3  11.7072007 #E2001C 0.4266577
## 4      -2   8.6689042 #C60038 0.3631416
## 5      -1   5.6306077 #AA0055 0.3199713
## 6       0   2.5923113 #8D0071 0.3058857
## 7       1  -0.4459852 #71008D 0.3246924
## 8       2  -3.4842817 #5500AA 0.3714283
## 9       3  -6.5225781 #3800C6 0.4372270
## 10      4  -9.5608746 #1C00E2 0.5148307
## 11      5 -12.5991711 #0000FF 0.5996737

Plotting the marginal/conditional effect

It is usually (always, really) more useful to plot the marginal effect of a variable across the range of the second variable in the interaction, rather than plotting separate regression lines as I did above. I strongly recommend this approach. You already have the tools to do this:

plot(x = eff.dat$x2.sim, y = eff.dat$eff.x1, type = "l", xlab = "x2", 
     ylab = "Conditional effect of x1")
abline(h = 0, col = "grey", lty = "dashed")

The solid line shows the conditional effect of \(x_1\) across the range of \(x_2\). You can add 95% confidence intervals by plotting lines representing the conditional effect \(\pm 1.96 \times \text{SE}\).

plot(x = eff.dat$x2.sim, y = eff.dat$eff.x1, type = "l", xlab = "x2", 
     ylab = "Conditional effect of x1")
abline(h = 0, col = "grey", lty = "dashed")
lines(x = eff.dat$x2.sim, y = eff.dat$eff.x1 + 1.96 * eff.dat$se.eff.x1, lty = "dashed")
lines(x = eff.dat$x2.sim, y = eff.dat$eff.x1 - 1.96 * eff.dat$se.eff.x1, lty = "dashed")
abline(h = 0, lty = 2, col = "grey")

Example 3: “Electoral Institutions, Unemployment and Extreme Right Parties: A Correction.”

The final example uses real-world data on the electoral success of extreme right-wing parties in 16 countries over 102 elections. These data were used in Matt Golder’s BJPS article “Electoral Institutions, Unemployment and Extreme Right Parties: A Correction.” (see Brambor et al. 2006 for the full citation).

ei.dat <- rio::import("http://www.jkarreth.net/files/golder.2003.csv")
summary(ei.dat)

##    country               year          unemp             enp      
##  Length:102         Min.   :1970   Min.   : 0.300   Min.   :1.72  
##  Class :character   1st Qu.:1975   1st Qu.: 2.125   1st Qu.:2.79  
##  Mode  :character   Median :1980   Median : 5.300   Median :3.40  
##                     Mean   :1980   Mean   : 5.811   Mean   :3.59  
##                     3rd Qu.:1985   3rd Qu.: 8.375   3rd Qu.:4.63  
##                     Max.   :1990   Max.   :20.800   Max.   :5.10  
##      lerps            thresh      
##  Min.   :0.0000   Min.   : 0.670  
##  1st Qu.:0.0000   1st Qu.: 2.600  
##  Median :0.4700   Median : 5.000  
##  Mean   :0.9208   Mean   : 8.807  
##  3rd Qu.:1.8284   3rd Qu.: 9.875  
##  Max.   :3.3142   Max.   :35.000

par(mfrow = c(2, 2))
hist(ei.dat$lerps)
hist(ei.dat$thresh)
hist(ei.dat$enp)
hist(ei.dat$unemp)

par(mfrow = c(1, 1))

Variable	Description
`lerps`	Log of extreme right percentage support + 1
`thresh`	Effective threshold of representation and exclusion in the political system
`enp`	Effective number of parties
`unemp`	Unemployment rate
`country`	Country name
`year`	Election year

This study aims to determine the relationship between electoral support for extreme right-wing parties (the outcome variable), the threshold for representation in the political system (the first predictor), and the effective number of parties (the second predictor). The author hypothesized that the effect of representation thresholds might vary depending on the effective number of parties. The unemployment rate at the time of the election is a control variable. Because elections in the same country might be subject to idiosyncratic dynamics, the author includes a dummy variable for each country (cf. our earlier discussions).

After fitting the model, I display the results using the familiar screenreg() function from the “texreg” package. I make use of the omit.coef option to avoid printing of the coefficients for each country dummy variable.

# Model 3
m3 <- lm(lerps ~ thresh + enp + thresh * enp + unemp + factor(country), data = ei.dat)
library("modelsummary")
modelsummary(list(m3), 
             coef_omit = "country")

	(1)
(Intercept)	-0.967
	(1.099)
thresh	0.267
	(0.063)
enp	1.159
	(0.433)
unemp	0.070
	(0.015)
thresh × enp	-0.099
	(0.021)
Num.Obs.	102
R2	0.853
R2 Adj.	0.819
AIC	136.2
BIC	191.3
Log.Lik.	-47.097
F	25.009
RMSE	0.38

Rather than trying to interpret these results from the regression table, I calculate the marginal effect of electoral thresholds across the range of the effective number of parties. This step is exactly analogous to what I did with simulated data above.

thresh_sim <- seq(from = min(ei.dat$thresh), to = max(ei.dat$thresh), length.out = 50)
enp_sim <- seq(from = min(ei.dat$enp), to = max(ei.dat$enp), length.out = 50)
eff_thresh <- coef(m3)[2] + coef(m3)[20] * enp_sim
eff_thresh_se <- sqrt(vcov(m3)[2, 2] + enp_sim^2 * vcov(m3)[20, 20] + 2 * enp_sim * vcov(m3)[2, 20])
plot(x = enp_sim, y = eff_thresh, type = "l", 
     xlab = "Effective number of parties",
     ylab = "Conditional effect of the electoral threshold")
lines(x = enp_sim, y = eff_thresh + 1.96 * eff_thresh_se, lty = "dashed")
lines(x = enp_sim, y = eff_thresh - 1.96 * eff_thresh_se, lty = "dashed")
abline(h = 0, col = "grey", lty = "dashed")

What do you conclude from this figure? What do these results say about the author’s hypothesis?

Canned R functions to plot marginal effects in OLS

Rather than calculating marginal effects by hand, you might want to use R functions to plot marginal effects with less effort.

`plot_slopes()`

The “marginaleffects” package provides the easiest option to do this. You need to first install the package, then load it, and then specify the plot_sloppes() function with at least the following arguments:

# install.packages("marginaleffects")
library(marginaleffects)
plot_slopes(m3, variables = "thresh", condition = "enp") +
    geom_hline(yintercept = 0)

plot_slopes() by default plots the marginal effect of the variable specified under variables. It also allows you to include an indicator of the actual distribution of the variables, via a rug plot. With this added visualization, you can immediately see whether the estimated effect of a variable corresponds to real observations at that value of the moderating variable.

plot_slopes(m3, variables = "thresh", condition = "enp", rug = TRUE) +
    geom_hline(yintercept = 0)

`ggintfun()`

A second function is ggintfun(), which I’ve written. Like the plot_slopes(), ggintfun() uses the “ggplot2” package. You need to first download ggintfun() from my GitHub repository. This requires three extra steps:

Install the “devtools” package (only once - you’ve very likely already done this during this semester).
Install the “ggplot2” package (only once - you’ve very likely already done this during this semester).
Install the “gridExtra” package (only once).
Read the ggintfun() function into R using the devtools::source_url() function and the URL of the raw R code for ggintfun() on my GitHub site (every time you want to use ggintfun()).

devtools::source_url("https://raw.githubusercontent.com/jkarreth/JKmisc/master/ggintfun.R")

The function then produces this plot, which shows the exact same information as your hand-crafted marginal effects plot and the left panel of the DAintfun2()) plot from above.

ggintfun(obj = m3, varnames = c("thresh", "enp"), 
         varlabs = c("Electoral threshold", "Effective number of parties"),
         title = FALSE, rug = TRUE,
         twoways = FALSE)

It can also produce both panels at the same time by setting twoways = TRUE:

ggintfun(obj = m3, varnames = c("thresh", "enp"), 
         varlabs = c("Electoral threshold", "Effective number of parties"),
         title = FALSE, rug = TRUE,
         twoways = TRUE)

Lastly, this function automatically adjusts the plot if a moderator is binary (as in our very first model in this tutorial):

ggintfun(obj = mod.sim, varnames = c("x1", "x2"), 
         varlabs = c("x1 (binary)", "x2 (continuous)"),
         title = FALSE, rug = TRUE,
         twoways = TRUE)

There are also two convenient packages intended to facilitate easy plotting of interaction effects:

interplot, to produce plots similar to my function above, but in a more generalizable form
interflex, implementing the suggestions in Hainmueller et al. (2019)

Printing a plot to a file

Basics

If you want to export a plot so that you can insert it into a manuscript (or presentation slides), you can find instructions in our earlier course materials. The logic is straightforward:

Open a graphics device where you want to plot to be printed.
Create the plot
Close the graphics device.

Example: printing a plot as a PDF file

In more detail:

Open a graphics device where you want to plot to be printed. This command consists of several parts: the format of the plot you want to print, its filename, and its size. For instance, to plot the previous graph of marginal effects to the PDF file twoway_interaction.pdf with a width of 8 inches and a height of 4 inches, specify the graphics device as follows:

pdf("twoway_interaction.pdf", width = 8, height = 4, units = "in")

Next, draw the actual plot. For this example, you can use the ggintfun() function. It draws the plot in one step.

ggintfun(obj = mod.sim, varnames = c("x1", "x2"), 
         varlabs = c("x1 (binary)", "x2 (continuous)"),
         title = FALSE, rug = TRUE,
         twoways = TRUE)

In other examples, you might draw a plot in several steps. For instance, you might plot(x, y), followed by abline(lm(y ~ x)). In this case, draw the complete plot during this step - execute all code that is necessary to complete the plot.

Close the graphics device with the following code (regardless of the file format you chose):

dev.off()

The sequence of these three steps has now produced a PDF file twoway_interaction.pdf in your working directory.

Example: printing a plot as a JPG file

If you wanted to print the plot to a JPG image file instead, use jpeg and specify the resolution of the plot in addition to its size:

jpeg("twoway_interaction.jpg", width = 8, height = 4, units = "px")
ggintfun(obj = mod.sim, varnames = c("x1", "x2"), 
         varlabs = c("x1 (binary)", "x2 (continuous)"),
         title = FALSE, rug = TRUE,
         twoways = TRUE)
dev.off()

Example: printing three figures next to each other

The graphics device allows you to do anything you’ve down for plots in RStudio so far. If you want to place two plots next to each other, you can do so by first setting up the plotting area using the par command:

pdf("descriptive_histograms.pdf", width = 10, height = 3.33)
par(mfrow = c(1, 3))
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[1], ]$x2,
     y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[1], ]$y,
     main = paste("x1 = ", levels(sim.dat2$x1_tri)[1], sep = ""),
     xlab = "x2", ylab = "y")
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[2], ]$x2,
     y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[2], ]$y,
     main = paste("x1 = ", levels(sim.dat2$x1_tri)[2], sep = ""),
     xlab = "x2", ylab = "y")
plot(x = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[3], ]$x2,
     y = sim.dat2[sim.dat2$x1_tri == levels(sim.dat2$x1_tri)[3], ]$y,
     main = paste("x1 = ", levels(sim.dat2$x1_tri)[3], sep = ""),
     xlab = "x2", ylab = "y")
par(mfrow = c(1, 1))
dev.off()

Tutorial 7: Interaction terms

Johannes Karreth

Essex Summer School 2022