Day 1: First steps in R - Exercises

Intro to R (ESS 2023)
Author
Affiliation

Ursinus College

Published

June 27, 2023

Goals

What you should be able to do on your own after this exercise:

  • Create a working directory on your computer
  • Create an R script
  • Load a package
  • Use basic R functions
  • Create figures and save them in your working directory
  • Look for help online

Setup

To begin this exercise, please quit and re-open RStudio on your computer.

Acknowledgement: Some of this exercise contains materials from the book R for Data Science, the core text for this course. For citations of the R packages used here, please refer to citation("packagename")

1. Look for help online

You’re invited to ask questions while you complete this exercise during our meeting, but I also recommend consulting the main resources I recommend during our course. These are:

2. Create a directory for this exercise

Using your computer’s tools or the “Files” Tab on the bottom right in RStudio, create a folder “Exercise 1” within your course folder.

3. Create an R script

Create an empty R script. Save it as “IntroR_Day1_Exercise.R” in your “Exercise 1” working directory. Within the script, type or copy & paste the following code in the first line to set the working directory to the same folder in which the script is located:

Code
setwd(dirname(rstudioapi::getSourceEditorContext()$path))

4. Load a package

To start your script, load the “gapminder” package in R. You may have installed the package during lecture. If not, install the package now.

You also need to use the “tidyverse” package so you can use ggplot2 and other functions.

5. Use basic R functions

  1. The “gapminder” package has just one purpose: it is a dataset that comes with an R package. (We will focus on importing other types of data on Wednesday.) To load the data into the workspace (so that it shows up in the “Environment” tab on the top right), use the data() function like so:

  2. As the first task, please take a look at the first few rows of the gapminder dataset. Hint: use the head() function. What are the variables, and what does one row of the dataset capture?

  3. How many rows and how many columns does the gapminder dataset have? Hint: You can look at the “Environment” tab on the top right or use the dim() function (check ?dim for help on how it works).

  4. Next, you can access variables within the “gapminder” dataset using the $ operator in R, like so: gapminder$country. Please tabulate the continents in the dataset using the table() function.

  5. For the next task, please calculate the mean GDP per capita across all countries and years in the data. Then, compare it to the median GDP per capita across all countries and years in the data.

6. Tell stories with graphs

  1. Do richer countries have higher life expectancy? Create a scatterplot to answer this question.

  2. Can you interpret the scatterplot in your own words?

  3. Taking a step back: what happens if you only run ggplot(data = gapminder)? Can you explain the result?

  4. Are there different patterns in life expectancy across regions? What happens if you create a scatterplot of life expectancy (on the y-axis) across regions (on the x-axis)? How useful is this plot?

  5. Are there different patterns in the relationship of wealth and life expectancy across regions? Create a version of your initial scatterplot that distinguishes data points by continent using different colors.

  6. That plot might look a bit cluttered, so… can you instead create small multiples of the initial scatterplots, one facet for each continent?

  7. Now, let’s try to add time as a dimension. In the small multiples plot you just created, try to color each dot by the year in which the variables were measured.

  8. What story does this graph tell?

  9. Now, let’s add one more dimension. Try to indicate whether countries have larger or smaller populations, using the size of each datapoint. Is this plot informative? What might you do differently?

  10. Let’s go back to the initial scatterplot and add a trend line. What relationship does the line suggest?

  11. Describe one variable in the dataset: how many continents are distributed, and which one provides the most observations?

  12. How do you turn all bars purple? And how would you assign a separate color for each continent to each bar?

7. Save figures in your working directory

Pick one figure that you liked the most, and save it as a .png file to your working directory. Hint: use the ggsave() function.

8. Common error sources

Can you explain what is wrong or inefficient with each of the following lines of code?

Code
ggplot(data = gapminder) 
+ geom_point(mapping = aes(x = gdpPercap, y = lifeExp))
Code
ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp, fill = "blue"))
Code
ggplot(data = gapminder) + 
  geom_bar(mapping = aes(x = continent, color = continent))
Code
ggplot(data = gapminder) + 
  geom_bar(mapping = aes(x = continent, color = "continent"))
Code
ggplot(data = gapminder) + 
  geom_bar(mapping = aes(x = continent), color = "blue")

9. Bonus question

Can you write the R code that created the figure below?

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Preview: further reading on creating high-quality plots

Thomas Leeper wrote a nice guide to printing high-quality figures in R for the Political Methodologist. You can find it here: http://thepoliticalmethodologist.com and in the print version of Volume 21, Issue 1. It explains the steps above in more detail and provides some additional information on how to produce good figures with other software (Excel, Stata) as well.