ProblemSet3_sampleanswers

pdf

School

University of Toronto *

*We aren’t endorsed by this school

Course

130

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by LieutenantFlagSquid18

STA130H1S – Fall 2022 Problem Set 3 () and STA130 Professors Instructions Complete the exercises in this .Rmd file and submit your .Rmd and .pdf output through Quercus on Thursday, September 29 by 5:00 p.m. ET. Part 1: More Olympics Data The code below loads the VGAMdata package (so you can access the data sets it contains) and the tidyverse package (so you can use the functions it contains) and glimpses the oly12 data set, which you will use for this question. Do not use the olympics data set from class to answer the prompts in this question . library (tidyverse) ## -- Attaching packages --------------------------------------- tidyverse 1.3.2 -- ## v ggplot2 3.3.6 v purrr 0.3.4 ## v tibble 3.1.8 v dplyr 1.0.10 ## v tidyr 1.2.1 v stringr 1.4.1 ## v readr 2.1.2 v forcats 0.5.2 ## -- Conflicts ------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library (VGAMdata) # install.packages("VGAMdata") ## Loading required package: VGAM ## Loading required package: stats4 ## Loading required package: splines names (oly12) # convenient function to quickly glance at data set column names ## [1] "Name" "Country" "Age" "Height" "Weight" "Sex" "DOB" ## [8] "PlaceOB" "Gold" "Silver" "Bronze" "Total" "Sport" "Event" glimpse (oly12) ## Rows: 10,384 ## Columns: 14 ## $ Name <fct> Lamusi A, A G Kruger, Jamale Aarrass, Abdelhak Aatakni, Maria ~ ## $ Country <fct> "People s Republic of China", "United States of America", "Fra~ ## $ Age <int> 23, 33, 30, 24, 26, 27, 30, 23, 27, 19, 37, 28, 28, 28, 22, 19~ ## $ Height <dbl> 1.70, 1.93, 1.87, NA, 1.78, 1.82, 1.82, 1.87, 1.90, 1.70, NA, ~ ## $ Weight <int> 60, 125, 76, NA, 85, 80, 73, 75, 80, NA, NA, NA, 60, 64, 62, N~ ## $ Sex <fct> M, M, M, M, F, M, F, M, M, M, M, M, F, F, M, F, M, M, M, M, F,~ ## $ DOB <date> 1989-02-06, NA, NA, 1988-09-02, NA, 1984-06-09, NA, 1989-03-0~ 1

## $ PlaceOB <fct> "NEIMONGGOL (CHN)", "Sheldon (USA)", "BEZONS (FRA)", "AIN SEBA~ ## $ Gold <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~ ## $ Silver <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~ ## $ Bronze <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~ ## $ Total <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~ ## $ Sport <fct> "Judo", "Athletics", "Athletics", "Boxing", "Athletics", "Hand~ ## $ Event <fct> "Men s -60kg", "Men s Hammer Throw", "Men s 1500m", "Men s Lig~ Question 1: Practice with filter() (a) In this week’s class, we looked at data for each country which participated in the 2012 Olympics (e.g. size of each country’s Olympic team, number of medals won, etc.), and there was one observation (i.e. one row) for each participating country. What does each row in the oly12 dataset represent? In the oly12 dataset, each row corresponds to one athlete who participated in the 2012 Olympic Games. Hint: Type ?oly12 or help(oly12) in the console (on the bottom left corner) to view the help file for the oly12 dataset in the Help tab (on the bottom right corner) of RStudio); or, just search for “oly12” in the Help tab. (b) Determine the number of athletes who represented Canada ( Canada ) or the United States ( United States of America ) in the 2012 Olympic Games. # Using filter to keep only canadian athletes, # then glimpse to view the number of observations oly12 %>% filter (Country == "Canada" ) %>% glimpse () ## Rows: 274 ## Columns: 14 ## $ Name <fct> Jennifer Abel, Natalie Achonwa, Mohammed Ahmed, Dylan Armstron~ ## $ Country <fct> "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "C~ ## $ Age <int> 20, 19, 21, 31, 28, 24, 20, 28, 23, 22, 21, 56, 29, 24, 23, 25~ ## $ Height <dbl> 1.60, 1.92, 1.90, 1.93, 1.85, 1.83, 1.68, 1.86, 1.86, 1.68, 1.~ ## $ Weight <int> 62, 83, 60, 139, 82, 78, 150, 90, 80, 58, 75, 78, 98, 48, 69, ~ ## $ Sex <fct> F, F, M, M, F, F, M, M, M, F, M, M, M, F, F, F, M, M, F, F, M,~ ## $ DOB <date> NA, NA, 1991-05-01, NA, NA, 1988-06-05, 1992-11-03, NA, NA, 1~ ## $ PlaceOB <fct> "Montreal (CAN)", "", "Mogadishu (SOM)", "Kamloops (CAN)", "",~ ## $ Gold <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~ ## $ Silver <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,~ ## $ Bronze <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,~ ## $ Total <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,~ ## $ Sport <fct> "Diving", "Basketball", "Athletics", "Athletics", "Basketball"~ ## $ Event <fct> "Women s 3m Springboard, Women s Synchronised 3m Springboard",~ oly12 %>% filter (Country == "United States of America" ) %>% glimpse () ## Rows: 518 ## Columns: 14 ## $ Name <fct> A G Kruger, Abdihakem Abdirahman, Amy Acuff, Cammile Adams, Na~ ## $ Country <fct> "United States of America", "United States of America", "Unite~ ## $ Age <int> 33, 35, 37, 20, 23, 24, 27, 23, 21, 20, 25, 28, 29, 38, 28, 30~ ## $ Height <dbl> 1.93, 1.80, 1.88, 1.73, 2.01, 1.91, 1.85, 1.80, 1.73, 1.78, 2.~ 2

## $ Weight <int> 125, 61, 66, 65, 102, 79, 74, 70, 64, 68, 93, 104, 77, 58, 75,~ ## $ Sex <fct> M, M, F, F, M, F, M, F, F, F, M, M, F, F, F, M, F, F, F, M, F,~ ## $ DOB <date> NA, 1977-01-01, NA, 1991-11-09, 1988-07-12, 1987-05-10, NA, N~ ## $ PlaceOB <fct> "Sheldon (USA)", "HARGISA (SOM)", "Port Arthur (USA)", "Housto~ ## $ Gold <int> 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,~ ## $ Silver <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~ ## $ Bronze <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~ ## $ Total <int> 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,~ ## $ Sport <fct> "Athletics", "Athletics", "Athletics", "Swimming", "Swimming",~ ## $ Event <fct> "Men s Hammer Throw", "Men s Marathon", "Women s High Jump", "~ # add the above 2 numbers together # Using filter to keep only canadian or USA athletes, # then count the number of rows in the resulting data frame oly12 %>% filter (Country == "Canada" | Country == "United States of America" ) %>% nrow () ## [1] 792 # Use summarise to calculate the number of athletes for each country, # then filter to keep only the row for Canada oly12 %>% group_by (Country) %>% summarise ( team_size = n ()) %>% filter (Country == "Canada" | Country == "United States of America" ) ## # A tibble: 2 x 2 ## Country team_size ## <fct> <int> ## 1 Canada 274 ## 2 United States of America 518 274 + 518 ## [1] 792 274 athletes represented Canada, and 518 athletes represented USA at the 2012 Olympic Games, thus 792 athletes represented either Canada or the USA at the 2012 Olympic Games. Hint: Apply the filter() function to the Country column of the oly12 dataset (c) Determine the number of female athletes who competed in classical gymnastics ( Gymnastics - Artistic and Gymnastics - Rhythmic ) or classical pool sports ( Diving and Swimming ). oly12_FemaleClassicalGymPool <- oly12 %>% filter (Sex == "F" ) %>% filter (Sport == "Gymnastics - Rhythmic" | Sport == "Gymnastics - Artistic" | Sport == "Diving" | Sport == "Swimming" ) oly12_FemaleClassicalGymPool %>% summarise ( n= n ()) ## n ## 1 685 Hint: You can see all the possible values for the Sport variable with levels(oly12$Sport) , and count the number of possible levels with nlevels(oly12$Sport) . 3

Your preview ends here