Code
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
What you should be able to do on your own after this exercise:
To begin this exercise, please quit and re-open RStudio on your computer.
Acknowledgement: Some of this exercise contains materials from the book R for Data Science, the core text for this course. For citations of the R packages used here, please refer to citation("packagename")
You’re invited to ask questions while you complete this exercise during our meeting, but I also recommend consulting the main resources I recommend during our course. These are:
Using your computer’s tools or the “Files” Tab on the bottom right in RStudio, create a folder “Exercise 1” within your course folder.
Create an empty R script. Save it as “IntroR_Day1_Exercise.R” in your “Exercise 1” working directory. Within the script, type or copy & paste the following code in the first line to set the working directory to the same folder in which the script is located:
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
To start your script, load the “gapminder” package in R. You may have installed the package during lecture. If not, install the package now.
You also need to use the “tidyverse” package so you can use ggplot2 and other functions.
The “gapminder” package has just one purpose: it is a dataset that comes with an R package. (We will focus on importing other types of data on Wednesday.) To load the data into the workspace (so that it shows up in the “Environment” tab on the top right), use the data()
function like so:
As the first task, please take a look at the first few rows of the gapminder dataset. Hint: use the head()
function. What are the variables, and what does one row of the dataset capture?
How many rows and how many columns does the gapminder dataset have? Hint: You can look at the “Environment” tab on the top right or use the dim()
function (check ?dim
for help on how it works).
Next, you can access variables within the “gapminder” dataset using the $
operator in R, like so: gapminder$country
. Please tabulate the continent
s in the dataset using the table()
function.
For the next task, please calculate the mean GDP per capita across all countries and years in the data. Then, compare it to the median GDP per capita across all countries and years in the data.
Do richer countries have higher life expectancy? Create a scatterplot to answer this question.
Can you interpret the scatterplot in your own words?
Taking a step back: what happens if you only run ggplot(data = gapminder)
? Can you explain the result?
Are there different patterns in life expectancy across regions? What happens if you create a scatterplot of life expectancy (on the y-axis) across regions (on the x-axis)? How useful is this plot?
Are there different patterns in the relationship of wealth and life expectancy across regions? Create a version of your initial scatterplot that distinguishes data points by continent using different colors.
That plot might look a bit cluttered, so… can you instead create small multiples of the initial scatterplots, one facet for each continent?
Now, let’s try to add time as a dimension. In the small multiples plot you just created, try to color each dot by the year in which the variables were measured.
What story does this graph tell?
Now, let’s add one more dimension. Try to indicate whether countries have larger or smaller populations, using the size of each datapoint. Is this plot informative? What might you do differently?
Let’s go back to the initial scatterplot and add a trend line. What relationship does the line suggest?
Describe one variable in the dataset: how many continents are distributed, and which one provides the most observations?
How do you turn all bars purple? And how would you assign a separate color for each continent to each bar?
Pick one figure that you liked the most, and save it as a .png file to your working directory. Hint: use the ggsave()
function.
Can you explain what is wrong or inefficient with each of the following lines of code?
ggplot(data = gapminder)
+ geom_point(mapping = aes(x = gdpPercap, y = lifeExp))
ggplot(data = gapminder) +
geom_point(mapping = aes(x = gdpPercap, y = lifeExp, fill = "blue"))
ggplot(data = gapminder) +
geom_bar(mapping = aes(x = continent, color = continent))
ggplot(data = gapminder) +
geom_bar(mapping = aes(x = continent, color = "continent"))
ggplot(data = gapminder) +
geom_bar(mapping = aes(x = continent), color = "blue")
Can you write the R code that created the figure below?
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Thomas Leeper wrote a nice guide to printing high-quality figures in R for the Political Methodologist. You can find it in the print version of Volume 21, Issue 1 of The Political Methodologist. It is a bit dated, but explains the foundations behind producing good figures with other software (Excel, Stata) as well.