This tutorial shows you:

  • how to estimate a regression model with multiple predictors
  • how to present regression results graphically
  • how to calculate standardized regression coefficients

This tutorial only accompanies the other materials from our course. Be sure to consult those other course materials for a full coverage of this topic.

Basic setup

You’ve previously worked with one outcome variable \(y\) and one explanatory variable \(x\). The parameters you recovered via OLS were the intercept, \(\alpha\), and the slope coefficient \(\beta\). For today, you encounter a second explanatory variable \(x_2\), next to \(x_1\).

This also increases the number of slope coefficients to two: \(\beta_1\) is associated with \(x_1\), and \(\beta_2\) is associated with \(x_2\):

\[ \begin{aligned} y_i = \alpha + \beta_1 x_1 + \beta_2 x_2 + \varepsilon \end{aligned} \]

We can simulate the data-generating process for this \(y\) (cf. to lecture for explanation):

set.seed(123)
n.obs <- 50
x1 <- rnorm(n = n.obs, mean = 5, sd = 2)
x2 <- runif(n = n.obs, min = -7, max = 7)
i <- c(1:n.obs)
e <- rnorm(n = n.obs, mean = 0, sd = 2)
a <- 2
b1 <- 0.5
b2 <- -0.75
y <- a + b1 * x1 + b2 * x2 + e
sim.dat <- data.frame(y, x1, x2)

Here, I choose to create a new dataframe, sim.dat, that contains the three variables we’re working with: \(y\), \(x_1\), and \(x_2\).

When estimating a linear model in statistical software, in our case R, the only change you have to make to add the additional explanatory variable(s) to the formula for the model you estimate:

mod <- lm(y ~ x1 + x2, data = sim.dat)
summary(mod)
## 
## Call:
## lm(formula = y ~ x1 + x2, data = sim.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1279 -1.2674 -0.2188  1.0713  4.2181 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.55417    0.71122   3.591 0.000784 ***
## x1           0.39331    0.13211   2.977 0.004589 ** 
## x2          -0.78539    0.05943 -13.215  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.709 on 47 degrees of freedom
## Multiple R-squared:  0.792,  Adjusted R-squared:  0.7832 
## F-statistic: 89.48 on 2 and 47 DF,  p-value: < 2.2e-16

This lm object has some elements that you can use later on. You can view the structure of the model object with the str() function:

str(mod)
## List of 12
##  $ coefficients : Named num [1:3] 2.554 0.393 -0.785
##   ..- attr(*, "names")= chr [1:3] "(Intercept)" "x1" "x2"
##  $ residuals    : Named num [1:50] 1.96 -0.722 -2.135 0.582 -0.279 ...
##   ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:50] -30.961 3.559 -22.578 -0.185 -0.485 ...
##   ..- attr(*, "names")= chr [1:50] "(Intercept)" "x1" "x2" "" ...
##  $ rank         : int 3
##  $ fitted.values: Named num [1:50] 2.98 6.178 5.872 -0.421 4.81 ...
##   ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
##  $ assign       : int [1:3] 0 1 2
##  $ qr           :List of 5
##   ..$ qr   : num [1:50, 1:3] -7.071 0.141 0.141 0.141 0.141 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:50] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:3] "(Intercept)" "x1" "x2"
##   .. ..- attr(*, "assign")= int [1:3] 0 1 2
##   ..$ qraux: num [1:3] 1.14 1.03 1.06
##   ..$ pivot: int [1:3] 1 2 3
##   ..$ tol  : num 1e-07
##   ..$ rank : int 3
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 47
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = y ~ x1 + x2, data = sim.dat)
##  $ terms        :Classes 'terms', 'formula'  language y ~ x1 + x2
##   .. ..- attr(*, "variables")= language list(y, x1, x2)
##   .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:3] "y" "x1" "x2"
##   .. .. .. ..$ : chr [1:2] "x1" "x2"
##   .. ..- attr(*, "term.labels")= chr [1:2] "x1" "x2"
##   .. ..- attr(*, "order")= int [1:2] 1 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(y, x1, x2)
##   .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:3] "y" "x1" "x2"
##  $ model        :'data.frame':   50 obs. of  3 variables:
##   ..$ y : num [1:50] 4.941 5.456 3.737 0.161 4.531 ...
##   ..$ x1: num [1:50] 3.88 4.54 8.12 5.14 5.26 ...
##   ..$ x2: num [1:50] 1.4 -2.34 -0.159 6.363 -0.239 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language y ~ x1 + x2
##   .. .. ..- attr(*, "variables")= language list(y, x1, x2)
##   .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:3] "y" "x1" "x2"
##   .. .. .. .. ..$ : chr [1:2] "x1" "x2"
##   .. .. ..- attr(*, "term.labels")= chr [1:2] "x1" "x2"
##   .. .. ..- attr(*, "order")= int [1:2] 1 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(y, x1, x2)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:3] "y" "x1" "x2"
##  - attr(*, "class")= chr "lm"

We can extract some of these quantities with commands that you’re already familiar with:

coef(mod)
## (Intercept)          x1          x2 
##   2.5541711   0.3933124  -0.7853943
plot(x = fitted.values(mod), 
     y = residuals(mod))