Code
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
What you should be able to do on your own after this exercise:
To begin this exercise, please quit and re-open RStudio on your computer.
Acknowledgement: Some of this exercise contains materials from the book R for Data Science, the core text for this course. For citations of the R packages used here, please refer to citation("packagename")
You’re invited to ask questions while you complete this exercise during our meeting, but I also recommend consulting the main resources I recommend during our course. These are:
Using your computer’s tools or the “Files” Tab on the bottom right in RStudio, create a folder “Exercise 2” within your course folder.
Create an empty R script. Save it as “IntroR_Day2_Exercise.R” in your “Exercise 2” working directory. Within the script, type or copy & paste the following code in the first line to set the working directory to the same folder in which the script is located:
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
To start your script, load the following packages: “tidyverse”, “tidylog”, “gapminder”, and “nycflights13”. If necessary, install the package(s) via the package manager in RStudio.
Create a named object that contains all integers (whole numbers) from 1960 to 2022. How many elements does this object have? Hint: use the seq()
and length()
functions and the approach discussed in lecture.
Using the gapminder data, find each of the following. Remember the tricks to print/view all observations!
country-years that had a life expectancy over 80
the country-year with the highest GDP per capita in the dataset
the 10 country-years with the highest GDP per capita in the dataset
the 10 countries with the highest GDP per capita in 2007 in the dataset
the average life expectancy for each continent in 1952 and 2007
Knowing that GDP per capita is a country’s GDP divided by its population, can you use the two variables pop
and gdpPercap
to “recover” countries’ GDP?
Can you create a histogram of this variable for the year 2007?
You just created this new variable, but the histogram suggests that it might be useful to transform it. Section 5.5.1 in R4DS suggests the log transformation. Create a new variable, log2Gdp
, and generate a histogram for this variable in 2007.
You decide that you’d rather use the natural log and delete the log2Gdp
variable from the data. How do you perform both operations?
For each country-year, calculate the difference between that country’s average (mean) life expectancy in that year and the average (mean) life expectancy of that country’s continent in that given year. What are some of the countries with the biggest deviation in life expectancy from their continent’s typical values?
Can you create a new object, named gapminder2007
, that contains only observations from 2007?
Now create a new object, gapminder2007_countries
, that only contains the countries that are part of gapminder2007
. What object class is this? And how long is it?
Can you create this object in a way so that it turns out to be a vector?
At this point, which objects are in your workspace?
What happens if you try to access all gapminder observations from 2002 by typing gapminder2002
into R? Why?
Explore the variation of life expectancy for each continent-year graphically.
This graph is a bit cluttered. Create a new dataset with a new variable, lifeExpDecade
, that takes the average (mean) of a country’s life expectancy by decade, and recreate the prior graph using that variable. Hint: use this trick to generate a decade
variable.
Which continent shows the largest deviations in life expectancy from the continent mean? Use the variable diff_lifeExp_contMean
from earlier and a box plot to explore this question.
Over time, which country shows the largest deviations in life expectancy from the continent mean? Use the variable diff_lifeExp_contMean
from earlier and a box plot to explore this question. Hint: see the end of section 7.5.1 in R4DS for an example.
Explore the covariation of GDP per capita and life expectancy: does the relationship between both variables change over time, and does it differ across continents? This builds on a plot you saw in yesterday’s exercise. I recommend using the gapminder_decade
dataset from a few exercises above.
Pick one figure that you liked the most, and save it to your working directory. Hint: use the ggsave()
function.