What you should be able to do on your own after this exercise:
To begin this exercise, please quit and re-open RStudio on your computer.
Acknowledgement: Some of this exercise contains materials
from the book R for Data Science,
the core text for this course. For citations of the R packages used
here, please refer to citation("packagename")
You’re invited to ask questions while you complete this exercise during our meeting, but I also recommend consulting the main resources I recommend during our course. These are:
Using your computer’s tools or the “Files” Tab on the bottom right in RStudio, create a folder “Exercise 2” within your course folder.
Create an empty R script. Save it as “IntroR_Day2_Exercise.R” in your “Exercise 2” working directory. Within the script, type or copy & paste the following code in the first line to set the working directory to the same folder in which the script is located:
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
To start your script, load the following packages: “tidyverse”, “gapminder”, and “nycflights13”.
Create a named object that contains all integers (whole numbers) from
1960 to 2022. How many elements does this object have? Hint: use the
seq()
and length()
functions and the approach
discussed in lecture.
Using the gapminder data, find each of the following. Remember the tricks to print/view all observations!
country-years that had a life expectancy over 80
the country-year with the highest GDP per capita in the dataset
the 10 country-years with the highest GDP per capita in the dataset
the 10 countries with the highest GDP per capita in 2007 in the dataset
the average life expectancy for each continent in 1952 and 2007
Knowing that GDP per capita is a country’s GDP divided by its
population, can you use the two variables pop
and
gdpPercap
to recover countries’ GDP?
Can you create a histogram of this variable for the year 2007?
You just created this new variable, but the histogram suggests
that it might be useful to transform it. Section 5.5.1 in R4DS suggests
the log transformation. Create a new variable, log2Gdp
, and
generate a histogram for this variable in 2007.
You decide that you’d rather use the natural log and delete the
log2Gdp
variable from the data. How do you perform both
operations?
For each country-year, calculate the difference between its life expectancy and the average (mean) life expectancy of that country’s continent in that given year.
Can you create a new object, named gapminder2007
,
that contains only observations from 2007?
Now create a new object, gapminder2007_countries
,
that only contains the countries that are part of
gapminder2007
. What object class is this? And how long is
it?
Can you create this object in a way so that it turns out to be a vector?
At this point, which objects are in your workspace?
What happens if you try to access all gapminder observations from
2002 by typing gapminder2002
into R? Why?
Explore the variation of life expectancy for each continent-year graphically.
This graph is a bit cluttered. Create a new dataset with a new
variable, lifeExpDecade
, that takes the average (mean) of a
country’s life expectancy by decade, and recreate the prior graph using
that variable. Hint: use this trick to generate a decade
variable.
<- gapminder |> mutate(decade = trunc(year / 10, digits = 0) * 10) gapminder
Which continent shows the largest deviations in life expectancy
from the continent mean? Use the variable
diff_lifeExp_contMean
from earlier and a box plot to
explore this question.
Over time, which country shows the largest deviations in life
expectancy from the continent mean? Use the variable
diff_lifeExp_contMean
from earlier and a box plot to
explore this question. Hint: see the end of section 7.5.1 in R4DS for an
example.
Explore the covariation of GDP per capita and life expectancy:
does the relationship between both variables change over time, and does
it differ across continents? This builds on a plot you saw in
yesterday’s exercise. I recommend using the
gapminder_decade
dataset from a few exercises
above.
Pick one figure that you liked the most, and save it to your working
directory. Hint: use the ggsave()
function.