**This tutorial shows you**:

- how to specify quadratic terms in regression models
- how to explore nonlinear relationships using lo(w)ess smoothers and generalized additive models
- how to use residuals to interpret model quality

As always, please note that this tutorial only accompanies the other course materials and that you are expected to have worked through assigned reading before tackling this tutorial.

So far, we have not encountered serious violations of the assumption of linearity - a linear relationship between predictors and outcome. But this assumption simply means that we impose a linear structure on the relationship between \(x\) and \(y\). Coefficient estimates alone from a regression model will not reveal whether the relationship between \(x\) and \(y\) in your data actually are linear, but a scatterplot will be useful to investigate whether this might be the case.

Theories might often make predictions of the form, “as \(x\) increases, \(y\) first increases, and then drops again”. An example for this is the Kuznets curve in economics, suggesting that as countries developed, income inequality first increased, peaked, and then decreased (summarized, for instance, in Acemoglu and Robinson 2002). This implies a so-called curvilinear relationship between economic development and inequality: both poor and rich countries have low inequality, but middle-income countries should exhibit high levels of inequality.

Take the following example:

```
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -91.650 -5.757 3.239 9.980 28.822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.1480 0.8235 -9.895 < 2e-16 ***
## x 0.8155 0.2559 3.186 0.00155 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.02 on 425 degrees of freedom
## Multiple R-squared: 0.02333, Adjusted R-squared: 0.02103
## F-statistic: 10.15 on 1 and 425 DF, p-value: 0.001547
```

Perhaps you might notice the low \(R^2\) value, but that itself is not indicative of problems. Examining the residual plots, however, reveals that the the model produces residuals that are grouped below 0 at low and high values of \(x\):

`## Loading required package: carData`