This tutorial shows you:
Note on copying & pasting code from the PDF version of this tutorial: Please note that you may run into trouble if you copy & paste code from the PDF version of this tutorial into your R script. When the PDF is created, some characters (for instance, quotation marks or indentations) are converted into non-text characters that R won’t recognize. To use code from this tutorial, please type it yourself into your R script or you may copy & paste code from the source file for this tutorial which is posted on my website.
Note on R functions discussed in this tutorial: I don’t discuss many functions in detail here and therefore I encourage you to look up the help files for these functions or search the web for them before you use them. This will help you understand the functions better. Each of these functions is well-documented either in its help file (which you can access in R by typing ?ifelse
, for instance) or on the web. The Companion to Applied Regression (see our syllabus) also provides many detailed explanations.
As always, please note that this tutorial only accompanies the other materials for Day 9 and that you are expected to have worked through the reading for that day before tackling this tutorial.
Some of you are working with so-called dummy variables in your replication assignments, so we will briefly explore how these variables are used in multiple regression. Dummy variables are also explained well in chapter 7 of AR (assigned on Day 11), but it doesn’t hurt to explore them earlier. Dummy variables are binary indicators that are set to 1 for all observations matching a particular classification and 0 to all other observations. For instance, a dummy variable in a survey for married respondents will be coded the following way:
\[ \text{married} = \begin{cases} 1, & \text{if respondent is married}\\ 0, & \text{otherwise} \end{cases} \]
We’ll start with an example dataset that I’ve taken from the accompanying materials to Kellstedt and Whitten’s Fundamentals of Political Science Research. This dataset is a modified extract from the 1996 edition of the (American) National Election studies. This dataset has 1714 observations and 8 variables:
Variable | Description |
---|---|
demrep | Party identification (1 = strong Democrat, 7 = strong Republican) |
clinton.therm | Feeling thermometer toward Hillary Clinton |
dem.therm | Feeling thermometer toward the Democrats |
female | Female (1 = yes, 0 = no) |
age | Age in years |
educ | Education (1 = 8 grades or less, 7 = advanced degree) |
income | Income (1 = less than $2999, 24 = $105,000 or more) |
region | Northeast, North Central, South, or West |
nes.dat <- import("https://www.dropbox.com/s/24ktov8o7wcn3l2/nes1996subset.csv?dl=1")
summary(nes.dat)
## demrep clinton.therm dem.therm female
## Min. :1.000 Min. : 0.00 Min. : 0.00 Min. :0.0000
## 1st Qu.:3.000 1st Qu.: 30.00 1st Qu.: 40.00 1st Qu.:0.0000
## Median :4.000 Median : 60.00 Median : 60.00 Median :1.0000
## Mean :4.327 Mean : 52.81 Mean : 58.86 Mean :0.5519
## 3rd Qu.:5.000 3rd Qu.: 70.00 3rd Qu.: 70.00 3rd Qu.:1.0000
## Max. :7.000 Max. :100.00 Max. :100.00 Max. :1.0000
## NA's :385 NA's :29 NA's :27
## age educ income region
## Min. :18.00 Min. :1.000 Min. : 1.00 Length:1714
## 1st Qu.:34.00 1st Qu.:3.000 1st Qu.:11.00 Class :character
## Median :44.00 Median :4.000 Median :16.00 Mode :character
## Mean :47.54 Mean :4.105 Mean :15.03
## 3rd Qu.:61.00 3rd Qu.:6.000 3rd Qu.:20.00
## Max. :93.00 Max. :7.000 Max. :24.00
## NA's :2 NA's :3 NA's :150
Say you are interested in explaining why some respondents exhibit a more positive attitude toward Hillary Clinton than others. You could use bivariate regression to test the (somewhat obvious) argument that Republican respondents might be less likely to approve of Clinton than more Democratic respondents. First, you may want to means-center the party ID variable for ease of interpretation:
nes.dat$demrep.ctr <- nes.dat$demrep - median(nes.dat$demrep, na.rm = TRUE)
m1 <- lm(clinton.therm ~ demrep.ctr, data = nes.dat)
plot(x = jitter(nes.dat$demrep.ctr), y = nes.dat$clinton.therm,
xlab = "Party ID (Democratic -> Republican)",
ylab = "Clinton thermometer")
abline(m1, col = "red")