ProblemSet3_sampleanswers
pdf
keyboard_arrow_up
School
University of Toronto *
*We aren’t endorsed by this school
Course
130
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
23
Uploaded by LieutenantFlagSquid18
STA130H1S – Fall 2022
Problem Set 3
() and STA130 Professors
Instructions
Complete the exercises in this .Rmd file and submit your .Rmd and .pdf output through
Quercus
on Thursday,
September 29 by 5:00 p.m. ET.
Part 1: More Olympics Data
The code below loads the
VGAMdata
package (so you can access the data sets it contains) and the
tidyverse
package (so you can use the functions it contains) and glimpses the
oly12
data set, which you will use for this
question.
Do not use the
olympics
data set from class to answer the prompts in this question
.
library
(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6
v purrr
0.3.4
## v tibble
3.1.8
v dplyr
1.0.10
## v tidyr
1.2.1
v stringr 1.4.1
## v readr
2.1.2
v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()
masks stats::lag()
library
(VGAMdata)
# install.packages("VGAMdata")
## Loading required package: VGAM
## Loading required package: stats4
## Loading required package: splines
names
(oly12)
# convenient function to quickly glance at data set column names
##
[1] "Name"
"Country" "Age"
"Height"
"Weight"
"Sex"
"DOB"
##
[8] "PlaceOB" "Gold"
"Silver"
"Bronze"
"Total"
"Sport"
"Event"
glimpse
(oly12)
## Rows: 10,384
## Columns: 14
## $ Name
<fct> Lamusi A, A G Kruger, Jamale Aarrass, Abdelhak Aatakni, Maria ~
## $ Country <fct> "People s Republic of China", "United States of America", "Fra~
## $ Age
<int> 23, 33, 30, 24, 26, 27, 30, 23, 27, 19, 37, 28, 28, 28, 22, 19~
## $ Height
<dbl> 1.70, 1.93, 1.87, NA, 1.78, 1.82, 1.82, 1.87, 1.90, 1.70, NA, ~
## $ Weight
<int> 60, 125, 76, NA, 85, 80, 73, 75, 80, NA, NA, NA, 60, 64, 62, N~
## $ Sex
<fct> M, M, M, M, F, M, F, M, M, M, M, M, F, F, M, F, M, M, M, M, F,~
## $ DOB
<date> 1989-02-06, NA, NA, 1988-09-02, NA, 1984-06-09, NA, 1989-03-0~
1
## $ PlaceOB <fct> "NEIMONGGOL (CHN)", "Sheldon (USA)", "BEZONS (FRA)", "AIN SEBA~
## $ Gold
<int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ Silver
<int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ Bronze
<int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ Total
<int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ Sport
<fct> "Judo", "Athletics", "Athletics", "Boxing", "Athletics", "Hand~
## $ Event
<fct> "Men s -60kg", "Men s Hammer Throw", "Men s 1500m", "Men s Lig~
Question 1: Practice with
filter()
(a) In this week’s class, we looked at data for each country which participated in the 2012
Olympics (e.g. size of each country’s Olympic team, number of medals won, etc.), and there
was one observation (i.e. one row) for each participating country. What does each row in the
oly12
dataset represent?
In the
oly12
dataset, each row corresponds to one athlete who participated in the 2012 Olympic Games.
Hint: Type
?oly12
or
help(oly12)
in the console (on the bottom left corner) to view the help
file for the
oly12
dataset in the Help tab (on the bottom right corner) of RStudio); or, just
search for “oly12” in the Help tab.
(b) Determine the number of athletes who represented Canada (
Canada
) or the United States
(
United States of America
) in the 2012 Olympic Games.
# Using filter to keep only canadian athletes,
# then glimpse to view the number of observations
oly12
%>%
filter
(Country
==
"Canada"
)
%>%
glimpse
()
## Rows: 274
## Columns: 14
## $ Name
<fct> Jennifer Abel, Natalie Achonwa, Mohammed Ahmed, Dylan Armstron~
## $ Country <fct> "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "C~
## $ Age
<int> 20, 19, 21, 31, 28, 24, 20, 28, 23, 22, 21, 56, 29, 24, 23, 25~
## $ Height
<dbl> 1.60, 1.92, 1.90, 1.93, 1.85, 1.83, 1.68, 1.86, 1.86, 1.68, 1.~
## $ Weight
<int> 62, 83, 60, 139, 82, 78, 150, 90, 80, 58, 75, 78, 98, 48, 69, ~
## $ Sex
<fct> F, F, M, M, F, F, M, M, M, F, M, M, M, F, F, F, M, M, F, F, M,~
## $ DOB
<date> NA, NA, 1991-05-01, NA, NA, 1988-06-05, 1992-11-03, NA, NA, 1~
## $ PlaceOB <fct> "Montreal (CAN)", "", "Mogadishu (SOM)", "Kamloops (CAN)", "",~
## $ Gold
<int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ Silver
<int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,~
## $ Bronze
<int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,~
## $ Total
<int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,~
## $ Sport
<fct> "Diving", "Basketball", "Athletics", "Athletics", "Basketball"~
## $ Event
<fct> "Women s 3m Springboard, Women s Synchronised 3m Springboard",~
oly12
%>%
filter
(Country
==
"United States of America"
)
%>%
glimpse
()
## Rows: 518
## Columns: 14
## $ Name
<fct> A G Kruger, Abdihakem Abdirahman, Amy Acuff, Cammile Adams, Na~
## $ Country <fct> "United States of America", "United States of America", "Unite~
## $ Age
<int> 33, 35, 37, 20, 23, 24, 27, 23, 21, 20, 25, 28, 29, 38, 28, 30~
## $ Height
<dbl> 1.93, 1.80, 1.88, 1.73, 2.01, 1.91, 1.85, 1.80, 1.73, 1.78, 2.~
2
## $ Weight
<int> 125, 61, 66, 65, 102, 79, 74, 70, 64, 68, 93, 104, 77, 58, 75,~
## $ Sex
<fct> M, M, F, F, M, F, M, F, F, F, M, M, F, F, F, M, F, F, F, M, F,~
## $ DOB
<date> NA, 1977-01-01, NA, 1991-11-09, 1988-07-12, 1987-05-10, NA, N~
## $ PlaceOB <fct> "Sheldon (USA)", "HARGISA (SOM)", "Port Arthur (USA)", "Housto~
## $ Gold
<int> 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,~
## $ Silver
<int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ Bronze
<int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ Total
<int> 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,~
## $ Sport
<fct> "Athletics", "Athletics", "Athletics", "Swimming", "Swimming",~
## $ Event
<fct> "Men s Hammer Throw", "Men s Marathon", "Women s High Jump", "~
# add the above 2 numbers together
# Using filter to keep only canadian or USA athletes,
# then count the number of rows in the resulting data frame
oly12
%>%
filter
(Country
==
"Canada"
|
Country
==
"United States of America"
)
%>%
nrow
()
## [1] 792
# Use summarise to calculate the number of athletes for each country,
# then filter to keep only the row for Canada
oly12
%>%
group_by
(Country)
%>%
summarise
(
team_size =
n
())
%>%
filter
(Country
==
"Canada"
|
Country
==
"United States of America"
)
## # A tibble: 2 x 2
##
Country
team_size
##
<fct>
<int>
## 1 Canada
274
## 2 United States of America
518
274
+
518
## [1] 792
274 athletes represented Canada, and 518 athletes represented USA at the 2012 Olympic Games, thus 792
athletes represented either Canada or the USA at the 2012 Olympic Games.
Hint: Apply the
filter()
function to the
Country
column of the
oly12
dataset
(c) Determine the number of female athletes who competed in classical gymnastics (
Gymnastics
- Artistic
and
Gymnastics - Rhythmic
) or classical pool sports (
Diving
and
Swimming
).
oly12_FemaleClassicalGymPool <- oly12
%>%
filter
(Sex
==
"F"
)
%>%
filter
(Sport
==
"Gymnastics - Rhythmic"
|
Sport
==
"Gymnastics - Artistic"
|
Sport
==
"Diving"
|
Sport
==
"Swimming"
)
oly12_FemaleClassicalGymPool
%>%
summarise
(
n=
n
())
##
n
## 1 685
Hint: You can see all the possible values for the
Sport
variable with
levels(oly12$Sport)
, and
count the number of possible levels with
nlevels(oly12$Sport)
.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(d)
Determine
the
number
of
athletes
who
competed
in
ANY
gymnastic
(
Gymnastics
- Artistic
,
Gymnastics - Rhythmic
,
Trampoline
)
or
ANY
pool
sports
(
Diving
,
Swimming
,
Synchronised Swimming
, and
Water Polo
)
oly12_GymnastsPoolers <- oly12
%>%
filter
(Sport
%in%
c
(
"Gymnastics - Rhythmic"
,
"Gymnastics - Artistic"
,
"Diving"
,
"Trampoline"
,
"Swimming"
,
"Synchronised Swimming"
,
"Water Polo"
))
oly12_GymnastsPoolers
%>%
summarise
(
n=
n
())
##
n
## 1 1695
Hint:
As indicated on
stackoverflow
, the
%in%
comparision operator could be useful here
with
allGymnastics <- c("Gymnastics - Artistic", "Gymnastics - Rhythmic", "Trampoline")
and
allWaterPool <- c("Diving", "Swimming", "Synchronised Swimming", "Water Polo")
and
filter(Sport %in% allGymnastics | Sport %in% allWaterPool)
.
(e) Create the data subset
oly12_FemaleArtisticRhythmicGymnasts
which contains all female
olympic athletes who competed in artistic gymnastics or rhythmic gymnastics.
oly12_FemaleArtisticRhythmicGymnasts <- oly12
%>%
filter
(Sex
==
"F"
)
%>%
filter
(Sport
==
"Gymnastics - Rhythmic"
|
Sport
==
"Gymnastics - Artistic"
)
Hint:
names(oly12)
shows all the column names of the data set.
(f) Use
oly12_FemaleArtisticRhythmicGymnasts
and
ggplot2
to compare the age distribution of
female olympic athletes competing in artistic gymnastics to the age distribution of female
olympic athletes competing in rhythmic gymnastics using both boxplots and histrograms.
oly12_FemaleArtisticRhythmicGymnasts
%>%
ggplot
(
aes
(
x=
Sport,
y=
Age))
+
geom_boxplot
()
4
15
20
25
30
35
Gymnastics - Artistic
Gymnastics - Rhythmic
Sport
Age
oly12_FemaleArtisticRhythmicGymnasts
%>%
filter
(Sport
==
"Gymnastics - Artistic"
)
%>%
ggplot
(
aes
(
x=
Age))
+
geom_histogram
(
bins=
12
,
color=
"black"
,
fill=
"gray"
)
5
0
10
20
30
15
20
25
30
35
Age
count
oly12_FemaleArtisticRhythmicGymnasts
%>%
filter
(Sport
==
"Gymnastics - Rhythmic"
)
%>%
ggplot
(
aes
(
x=
Age))
+
geom_histogram
(
bins=
12
,
color=
"black"
,
fill=
"gray"
)
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
0
5
10
15
20
15.0
17.5
20.0
22.5
25.0
27.5
Age
count
Hint: don’t forget
aes()
and to use
+
rather than
%>%
.
(g) Answer the following questions based on the plots you created in (d).
•
Are the age distributions of female rhythmic gymnasts and female artistic gymnasts symmetrical or
skewed?
From the boxplots, we can see that the age distribution of female artistic gymnasts appears to be symmetric,
with a slight right skew (based on outliers) and the age distribution of female rhythmic gymnasts appears to be
right skewed. This can also be seen in the histograms of the age distributions.
•
How do the medians, 25th percentiles, and 75th percentiles for ages of female rhythmic gymnasts and
female artistic gymnasts compare?
From the boxplots, we can see that the median age of female rhythmic gymnasts and female artistic gymnasts
is similar (~18). From the boxplots, we can see that the 25th percentile age of female rhythmic gymnasts is
slightly higher than the 25th percentile age of female artistic gymnasts is similar (~18 and ~17). Lastly, we
can see that the 75th percentile of ages of female rhythmic gymnasts and female artistic gymnasts is similar
(~21).
•
Based only on the histogram and boxplots, predict whether the standard deviation of the ages is similar
or different. Justify your answer in 1-2 sentences.
I predict that the standard deviation of ages for female rhythmic gymnasts will be slightly smaller than the
sd of ages from female artistic gymnasts the IQR and range are smaller in length for the rhythmic gymnast
group.
7
Question 2: Practice with
summarise()
,
group_by()
, and
mutate()
(a) Create a summary table of
oly12_FemaleArtisticRhythmicGymnasts
reporting the minimum
(
min
), maximum (
min
),
mean
,
median
, and standard deviation (
sd
) of ages for female rhythmic
gymnasts and female artistic gymnasts.
Were you correct in your guess about the standard
deviation in part (g) of the last question?
oly12_FemaleArtisticRhythmicGymnasts
%>%
group_by
(Sport)
%>%
summarise
(
min=
min
(Age),
max=
max
(Age),
mean=
mean
(Age),
median=
median
(Age),
sd=
sd
(Age))
## # A tibble: 2 x 6
##
Sport
min
max
mean median
sd
##
<fct>
<int> <int> <dbl>
<dbl> <dbl>
## 1 Gymnastics - Artistic
15
37
19.7
19
3.66
## 2 Gymnastics - Rhythmic
16
27
19.5
19
2.68
As predicted, the standard deviation of ages is slightly higher for female artistic gymnast athletes than for
female rhythmic gymnast athletes (3.66 vs 2.68), but they are very similar.
(b) Create a new variable called
total_medals
and create a new tibble called
oly12_OneMedalClub
that contains athletes who won exactly one medal at the 2012 olympics.
oly12_OneMedalClub <- oly12
%>%
mutate
(
total_medals=
Gold
+
Silver
+
Bronze)
%>%
filter
(total_medals
==
1
)
(c) Uncomment the code below and run the glimpse of the data created in part (c).
# glimpse(oly12_OneMedalClub)
Question 3: Practice with
select()
,
arrange()
,
desc()
, and
filter()
(b) Find the
Name
and
Age
of the 6 oldest athletes who competed in the 2012 Olympics.
# Type your code here
oly12
%>%
arrange
(
desc
(Age))
%>%
head
()
%>%
select
(Name, Age)
##
Name Age
## 1
Hiroshi Hoketsu
71
## 2 Afanasijs Kuzmins
65
## 3
Ian Millar
65
## 4
Carl Bouckaert
58
## 5
Andrei Kavalenka
57
## 6
Mary Hanna
57
(b) Find the
Name
,
Age
and
Sport
of the 6 youngest female athletes who competed in the 2012
Olympics.
oly12
%>%
filter
(Sex
==
"F"
)
%>%
8
arrange
(Age)
%>%
head
()
%>%
select
(Name, Age, Sport)
##
Name Age
Sport
## 1
Adzo Kpossi
13 Swimming
## 2
Aurelie Fanchette
14 Swimming
## 3
Suji Kim
14
Diving
## 4 Nafissatou Moussa Adamou
14 Swimming
## 5
Lea Melissa Moutoussamy
14
Fencing
## 6
Yuhan Qiu
14 Swimming
(c) Find the
Name
,
Age
,
Sport
, and
Event
for the 6 youngest and 6 oldest competitors who won
gold medals at the 2012 olympics. [This can be run as two pieces of code rather than one piece
of combined code].
oly12
%>%
filter
(Gold
>
0
)
%>%
arrange
(Age)
%>%
head
()
%>%
select
(Name, Age, Sport, Event)
##
Name Age
Sport
## 1
Ruta Meilutyte
15
Swimming
## 2
Kyla Ross
15 Gymnastics - Artistic
## 3 Gabrielle Douglas
16 Gymnastics - Artistic
## 4
Yolane Kukla
16
Swimming
## 5
Mc Kayla Maroney
16 Gymnastics - Artistic
## 6
Shiwen Ye
16
Swimming
##
Event
## 1
Women s 50m Freestyle, Women s 100m Freestyle, Women s 100m Breaststroke
## 2
Women s Team, Women s Qualification
## 3
Women s Individual All-Around, Women s Team, Women s Qualification
## 4
Women s 4x100m Freestyle Relay
## 5
Women s Team, Women s Qualification
## 6 Women s 200m Individual Medley, Women s 400m Individual Medley, Women s 4x200m Freestyle Relay
oly12
%>%
filter
(Gold
>
0
)
%>%
arrange
(
desc
(Age))
%>%
head
()
%>%
select
(Name, Age, Sport, Event)
##
Name Age
Sport
## 1
Peter Thomsen
51
Equestrian
## 2
Ingrid Klimke
44
Equestrian
## 3
Sergei Martynov
44
Shooting
## 4
Kristin Armstrong
38 Cycling - Road
## 5
Valentina Vezzali
38
Fencing
## 6 Alexandr Vinokurov
38 Cycling - Road
##
Event
## 1
Individual Eventing, Team Eventing, BARNY
## 2 Individual Eventing, Team Eventing, BUTTS ABRAXXAS
## 3
Men s 50m Rifle Prone
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
## 4
Women s Individual Time Trial, Women s Road Race
## 5
Women s Individual Foil, Women s Team Foil
## 6
Men s Individual Time Trial, Men s Road Race
# google "tidy get the first and last rows"
# https://stackoverflow.com/questions/31528981/select-first-and-last-row-from-grouped-data
oly12
%>%
filter
(Gold
>
0
)
%>%
arrange
(Age)
%>%
filter
(
row_number
()
<=
6
|
row_number
()
>=
(
n
()
-
6
))
##
Name
Country Age Height Weight Sex
## 1
Ruta Meilutyte
Lithuania
15
1.72
64
F
## 2
Kyla Ross
United States of America
15
1.57
NA
F
## 3
Gabrielle Douglas
United States of America
16
1.50
NA
F
## 4
Yolane Kukla
Australia
16
1.68
61
F
## 5
Mc Kayla Maroney
United States of America
16
1.60
NA
F
## 6
Shiwen Ye People s Republic of China
16
1.72
64
F
## 7
Chris Hoy
Great Britain
36
1.85
93
M
## 8
Kristin Armstrong
United States of America
38
1.73
58
F
## 9
Valentina Vezzali
Italy
38
1.64
53
F
## 10 Alexandr Vinokurov
Kazakhstan
38
1.76
69
M
## 11
Ingrid Klimke
Germany
44
1.72
59
F
## 12
Sergei Martynov
Belarus
44
1.72
70
M
## 13
Peter Thomsen
Germany
51
1.83
73
M
##
DOB
PlaceOB Gold Silver Bronze Total
Sport
## 1
<NA>
Kaunas (LTU)
1
0
0
1
Swimming
## 2
<NA>
Honolulu (USA)
1
0
0
1 Gymnastics - Artistic
## 3
<NA> Newport News (USA)
2
0
0
2 Gymnastics - Artistic
## 4
<NA> AUCHENFLOWER (AUS)
1
0
0
1
Swimming
## 5
1995-09-12
ALISO VIEJO (USA)
1
0
0
1 Gymnastics - Artistic
## 6
1996-01-03
Zhejiang (CHN)
2
0
0
2
Swimming
## 7
<NA>
Edinburgh (GBR)
1
0
0
1
Cycling - Track
## 8
1973-11-08
Memphis (USA)
1
0
0
1
Cycling - Road
## 9
<NA>
1
0
1
2
Fencing
## 10
<NA>
Pavlodar (KAZ)
1
0
0
1
Cycling - Road
## 11 1968-01-04
MUNSTER (GER)
1
0
0
1
Equestrian
## 12
<NA>
VEREIA (RUS)
1
0
0
1
Shooting
## 13 1961-04-04
Flensburg (GER)
1
0
0
1
Equestrian
##
Event
## 1
Women s 50m Freestyle, Women s 100m Freestyle, Women s 100m Breaststroke
## 2
Women s Team, Women s Qualification
## 3
Women s Individual All-Around, Women s Team, Women s Qualification
## 4
Women s 4x100m Freestyle Relay
## 5
Women s Team, Women s Qualification
## 6
Women s 200m Individual Medley, Women s 400m Individual Medley, Women s 4x200m Freestyle Relay
## 7
Men s Keirin, Men s Team Sprint
## 8
Women s Individual Time Trial, Women s Road Race
## 9
Women s Individual Foil, Women s Team Foil
## 10
Men s Individual Time Trial, Men s Road Race
## 11
Individual Eventing, Team Eventing, BUTTS ABRAXXAS
## 12
Men s 50m Rifle Prone
## 13
Individual Eventing, Team Eventing, BARNY
10
Question 4: The Data Consultant
You have just been hired by a consultancy company. Congratulations! They are doing a report on each
Olympics for the past 10 years. Given your recent experience in STA130, you ask to be responsible for the
2012 summary. Write a short report to your boss on information that can be gleaned about the ages of
the athletes across sports. As it turns out, you happen to know that your new boss’ favourite sports are
badminton and weightlifting, so addressing these sports specifically might be an easy way to capture their
attention; but, other features athletes’ ages which can be learned from your plots and tables will of course be
appreciated, too. The more interesting the better!
Question Constraints
This is a quick report for your boss, so use full sentences and communicate in
a clear and professional manner. Grammar isn’t the main focus of the assessment, but don’t use slang or
emojis.
•
Avoid Analysis Paralysis
: this is envisioned as a 30 minute exercise, so you don’t have time to
exhaustively explore every aspect of the data set.
•
Avoid Writer’s Block
: this is envisioned as a 200-400 word exercise, so quickly find something you
can communicate and write about.
(a) Watch this
7-minute video introduction to hedging
.
Hedging is helpful whenever you can’t say something is 100% one way or another, as is often the case.
In statistics, hedging should always be used with respect to the limitations of data and the strength and
generalizability of the conclusions.
(b) Provide a small introduction of one or two sentences to draw your reader in and then
explain what you’ll be discussing. Be definitive about what your data is, and use
hedging
to
caveat the limitations of the data.
(c) Provide one or two clearly titled and labeled figures addressing interesting features of
athletes’ ages.
(d) Provide one or two clearly labeled summary tables addressing interesting features of ath-
letes’ ages.
(e) Watch this
8-minute video introduction to plagiarism
.
You don’t need to cite any outside references for your report to your boss, but you will be referring to your
own created figures and tables. We’ll use this as an excuse to get started early thinking about this important
topic, and also use it as an exercise to start getting into the right referencing habits. It’s easy and natural
and makes your writing better (not mention avoids potential serious academic integrity violations. . . )
(f) Describe the interesting features of athletes’ ages that you’ve found, referencing the figures
and summary tables created in (c) and (d) just above. Use at least two of the vocabulary words
listed below; but, your boss isn’t a statistician, so make sure to clearly define and explain the
vocabulary you use.
(f) Finish with a conclusion to remind your boss of the key take home points from your
summary about the athletes’ ages. Be definitive about what your findings are, but use
hedging
to caveat the limitations of the conclusion more generally.
Vocabulary
•
Cleaning data
•
Tidy data
•
Handling missing values (NAs)
•
Removing a column
•
Extracting a subset of variables
11
•
Filtering a tibble based on a condition (e.g. based on the values in one or more of the variables/columns)
•
Sorting data based on the values of a variable
•
Defining new variables
•
Renaming the variables
•
Producing new data frames
•
Grouping categories
•
Creating summary tables
You may also find these vocabulary words from last week useful with your writing this week
•
location/center (mean, median, mode) and scale/spread (range, IQR, var, sd)
–
note: interpreting center and spread relative to each other can be helpful
•
shape (symmetric, left-skewed, right-skewed, unimodal, bimodal, multimodal, uniform)
•
outliers/extreme values
–
note: this can be related to the tails of a distribution (heavy-tailed, thin-tailed)
•
frequency (most, least, pattern tendencies)
Part 2: OPTIONAL but Recommended
You may complete these questions for practice if you wish.
You are not required to complete these
questions as they ARE NOT included as part of your mark.
Question 5: Amazon Books
The code below reads in data about
books sold on Amazon
.
- Note that the height (
Height
), width (
Width
), and thickness (
Thick
) of books in this data frame are
measured in inches.
library
(tidyverse)
# Load the tidyverse package so it is available to use
books <-
read.csv
(
"amazonbooks.csv"
)
(a) What is the name of the book(s) with the smallest number of pages in this sample of books,
and how many pages does it have?
books
%>%
arrange
(NumPages)
%>%
select
(Title, NumPages)
%>%
head
()
##
Title NumPages
## 1
Big Dog . . . Little Dog
24
## 2
The Berenstain Bears He Bear, She Bear
24
## 3 The Shape of Me and Other Stuff: Dr. Seuss s Surprising Word Book
24
## 4
Cloudy With a Chance of Meatballs
32
## 5
Go the F**k Asleep
32
## 6
Madeline
54
(b) Create a summary table which reports the total number of books written by each author
and the mean and variance of the number of pages per book for each author, for the books
represented in this sample of books.
books
%>%
group_by
(Author)
%>%
summarise
(
n =
n
(),
mean_pages =
sum
(NumPages)
/
n,
var_pages =
var
(NumPages))
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
## # A tibble: 256 x 4
##
Author
n mean_pages var_pages
##
<chr>
<int>
<dbl>
<dbl>
##
1 ""
1
432
NA
##
2 "Abraham Verghese"
1
667
NA
##
3 "Adam Goodheart"
1
460
NA
##
4 "Adam Hochschild"
1
480
NA
##
5 "Adam Mansbach"
1
32
NA
##
6 "Alaa Aswany"
1
255
NA
##
7 "Alice Munro"
2
320
2048
##
8 "Alice Schroeder"
1
832
NA
##
9 "Allen, Toorawa"
1
200
NA
## 10 "Andrea Warren"
1
160
NA
## # ... with 246 more rows
(c) Modify your code from (b) so to create a new summary table which contains only informa-
tion for authors who wrote more than 2 books, and sort them in decreasing order of number
of books written.
books
%>%
group_by
(Author)
%>%
summarise
(
n =
n
(),
mean_pages =
sum
(NumPages)
/
n,
var_pages =
var
(NumPages))
%>%
filter
(n
>
2
)
%>%
arrange
(
desc
(n))
## # A tibble: 16 x 4
##
Author
n mean_pages var_pages
##
<chr>
<int>
<dbl>
<dbl>
##
1 Jodi Picoult
7
414.
1658.
##
2 Vladimir Nabokov
7
316
20528
##
3 Lewis
4
266.
18820.
##
4 Murakami
4
354.
9838.
##
5 Ben Mezrich
3
299
571
##
6 Bruce Ballenger
3
448
9472
##
7 Christensen
3
245.
24917.
##
8 Collins
3
370.
1920.
##
9 Drucker
3
304
11008
## 10 Ha Jin
3
300
5232
## 11 James Patterson
3
438.
1408.
## 12 John Steinbeck
3
392.
63632.
## 13 S.E. Hinton
3
181.
341.
## 14 Seuss
3
56
768
## 15 Shel Silverstein
3
149.
5461.
## 16 William Faulker
3
339
763
Part 3: OPTIONAL for Additional Practice
You may complete these questions for practice if you wish.
You are not required to complete these
questions as they ARE NOT included as part of your mark.
13
Question 6: Titanic Data
At the time it departed from England in April 1912, the RMS Titanic was the largest ship in the world. In
the night of April 14th to April 15th, the Titanic struck an iceberg and sank approximately 600km south of
Newfoundland (a province in eastern Canada). Many people perished in this accident. The code below loads
data about the passengers who were on board the Titanic at the time of the accident.
titanic <-
read_csv
(
"titanic.csv"
)
## Rows: 2208 Columns: 14
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (12): Name, Survived, Boarded, Class, MWC, Adut_or_Chld, Sex, Ticket_No,...
## dbl
(2): Age, Paid
##
## i Use
spec()
to retrieve the full column specification for this data.
## i Specify the column types or set
show_col_types = FALSE
to quiet this message.
glimpse
(titanic)
## Rows: 2,208
## Columns: 14
## $ Name
<chr> "ABBING, Mr Anthony", "ABBOTT, Mr Ernest Owen", "ABBOTT, ~
## $ Survived
<chr> "Dead", "Dead", "Dead", "Dead", "Alive", "Alive", "Alive"~
## $ Boarded
<chr> "Southampton", "Southampton", "Southampton", "Southampton~
## $ Class
<chr> "3", "Crew", "3", "3", "3", "3", "3", "2", "2", "3", "3",~
## $ MWC
<chr> "Man", "Man", "Child", "Man", "Woman", "Woman", "Man", "M~
## $ Age
<dbl> 42.00, 21.00, 14.00, 16.00, 39.00, 16.00, 25.00, 30.00, 2~
## $ Adut_or_Chld <chr> "Adult", "Adult", "Child", "Adult", "Adult", "Adult", "Ad~
## $ Sex
<chr> "Male", "Male", "Male", "Male", "Female", "Female", "Male~
## $ Paid
<dbl> 7.550000, NA, 20.250000, 20.250000, 20.250000, 7.650000, ~
## $ Ticket_No
<chr> "5547", NA, "CA2673", "CA2673", "CA2673", "348125", "3481~
## $ Boat_or_Body <chr> NA, NA, NA, "[190]", "A", "16", "A", NA, "10", "15", "C",~
## $ Job
<chr> "Blacksmith", "Lounge Pantry Steward", "Scholar", "Jewell~
## $ Class_Dept
<chr> "3rd Class Passenger", "Victualling Crew", "3rd Class Pas~
## $ Class_Full
<chr> "3", "V", "3", "3", "3", "3", "3", "2", "2", "3", "3", "E~
(a) Often, before you start working with a dataset you need to clean it.
•
The variable
Adut_or_Chld
indicates which passengers were adults and which were children. Use the
rename()
function to change the name of this variable to
Adult_or_Child
. The variable
MWC
records
whether the passenger was a man, woman or child. Use the
rename()
function to change the name of
this variable to
Man_Woman_or_Child
to make this clear.
titanic <- titanic
%>%
rename
(
Adult_or_Child =
Adut_or_Chld,
Man_Woman_or_Child =
MWC)
Hint: Unless the transformed tibble is saved into a new object or overwrites the original tibble,
like
oly12 <- oly12 %>% rename(Place_of_birth = PlaceOB)
, the changes won’t be permanent.
•
Since many of their values are missing or unclear, modify the
titanic
data frame by removing the
following variables:
Ticket_No
,
Boat_or_Body
,
CLass_Dept
,
Class_Full
.
titanic <- titanic
%>%
select
(Name, Survived, Boarded, Class, Man_Woman_or_Child, Age,
Adult_or_Child, Sex, Paid, Job)
14
(b) Create a summary table reporting the number of passengers on the Titanic (n), the
number of passengers who survied (n_surv), and the proportion of passengers who survived
(prop_surv).
titanic
%>%
summarise
(
n=
n
(),
n_surv=
sum
(Survived
!=
"Dead"
),
prop_surv=
n_surv
/
n)
## # A tibble: 1 x 3
##
n n_surv prop_surv
##
<int>
<int>
<dbl>
## 1
2208
712
0.322
(c) Calculate the proportion of deaths for the following groups of passengers.
•
For men, women, and children:
titanic
%>%
group_by
(Man_Woman_or_Child)
%>%
summarise
(
n=
n
(),
n_died =
sum
(Survived
==
"Dead"
),
proportion=
n_died
/
n)
## # A tibble: 3 x 4
##
Man_Woman_or_Child
n n_died proportion
##
<chr>
<int>
<int>
<dbl>
## 1 Child
124
60
0.484
## 2 Man
1652
1331
0.806
## 3 Woman
432
105
0.243
•
For passengers aged between 25-40 years of age:
titanic
%>%
filter
(Age
>=
25
&
Age
<=
40
)
%>%
summarise
(
n=
n
(),
n_died =
sum
(Survived
==
"Dead"
),
proportion=
n_died
/
n)
## # A tibble: 1 x 3
##
n n_died proportion
##
<int>
<int>
<dbl>
## 1
1067
739
0.693
•
For men, women, and children among the passengers who paid more than 50 British pounds for their
tickets:
titanic
%>%
filter
(Paid
>
50
)
%>%
group_by
(Man_Woman_or_Child)
%>%
summarise
(
n=
n
(),
n_died =
sum
(Survived
==
"Dead"
),
proportion=
n_died
/
n)
## # A tibble: 3 x 4
##
Man_Woman_or_Child
n n_died proportion
##
<chr>
<int>
<int>
<dbl>
## 1 Child
13
7
0.538
## 2 Man
100
70
0.7
15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
## 3 Woman
126
4
0.0317
•
Write several sentences interpreting the summary tables created for the three groups above.
Survival rates on the Titanic were associated with whether the passenger was a man, woman or child and the
cost of their ticket. About 24% of all women passengers on the Titanic died. Unfortunately men and children
passengers had considerably higher death rates (0.81 and 0.48 respectively). Among the passengers who paid
more for their tickets, death rates were lower for the adult passengers since only 3% of these women and 70%
of these men died, but higher for children (58=4% of these children died).
(d) What was the most common job among passengers of the Titanic?
Write 1-2 sentences
explaining your answer.
titanic
%>%
group_by
(Job)
%>%
summarise
(
n=
n
())
%>%
arrange
(
desc
(n))
## # A tibble: 358 x 2
##
Job
n
##
<chr>
<int>
##
1 <NA>
631
##
2 General Labourer
162
##
3 Fireman
161
##
4 Trimmer
73
##
5 Saloon Steward
56
##
6 Farm Labourer
49
##
7 Farmer
48
##
8 Saloon Steward (1st class)
48
##
9 Greaser
33
## 10 Able Seaman
28
## # ... with 348 more rows
631 of the passengers do not have a job listed (NA). The job recorded for the largest number of passengers is
“General Labourer” (162), although there were also 161 firemen.
(e) Plot the age distribution for passengers with the job “General Labourer”, and describe
this distribution in 1-2 sentences.
titanic
%>%
filter
(Job
==
"General Labourer"
)
%>%
ggplot
(
aes
(
x=
""
,
y=
Age))
+
geom_boxplot
()
16
20
30
40
50
x
Age
titanic
%>%
filter
(Job
==
"General Labourer"
)
%>%
ggplot
(
aes
(
x=
Age))
+
geom_histogram
(
bins=
30
,
color=
"black"
,
fill=
"gray"
)
17
0
5
10
15
20
30
40
50
Age
count
General labourers on the Titanic ranged in age from uner 15 to just over 50. The age distribution is slightly
right skewed, with a few outliers in the right tail corresponding to older individuals (over age 43). The median
age of general labourers on the Titanic is close to 25 years, with an interquartile range of approximately 9
years (21 to 30 years).
(f) Were any of the general labourers on the titanic women? If so, how many?
# there are several ways to do this
titanic
%>%
filter
(Job
==
"General Labourer"
)
%>%
ggplot
(
aes
(
x=
Man_Woman_or_Child))
+
geom_bar
()
18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
0
50
100
150
Child
Man
Woman
Man_Woman_or_Child
count
titanic
%>%
filter
(Job
==
"General Labourer"
)
%>%
group_by
(Man_Woman_or_Child)
%>%
summarise
(
n=
n
())
## # A tibble: 3 x 2
##
Man_Woman_or_Child
n
##
<chr>
<int>
## 1 Child
1
## 2 Man
160
## 3 Woman
1
titanic
%>%
filter
(Job
==
"General Labourer"
&
Sex
==
"Female"
)
## # A tibble: 1 x 10
##
Name
Survi~1 Boarded Class Man_W~2
Age Adult~3 Sex
Paid Job
##
<chr>
<chr>
<chr>
<chr> <chr>
<dbl> <chr>
<chr> <dbl> <chr>
## 1 HAAS, Miss Aloi~ Dead
Southa~ 3
Woman
24 Adult
Fema~
8.85 Gene~
## # ... with abbreviated variable names 1: Survived, 2: Man_Woman_or_Child,
## #
3: Adult_or_Child
Of the 162 general labourers on the Titanic, 160 were men, 1 was a child and 1 was a woman.
(g) What are the names of the passengers with the top 4 most expensive tickets? Did these
passengers survive the accident?
titanic
%>%
arrange
(
desc
(Paid))
%>%
select
(Name, Paid, Survived)
19
## # A tibble: 2,208 x 3
##
Name
Paid Survived
##
<chr>
<dbl> <chr>
##
1 CARDEZA, Mr Thomas Drake Martinez
512. Alive
##
2 CARDEZA, Mrs Charlotte Wardle
512. Alive
##
3 LESUEUR, Mr Gustave J.
512. Alive
##
4 WARD, Miss Annie Moore
512. Alive
##
5 FORTUNE, Miss Alice Elizabeth
263
Alive
##
6 FORTUNE, Miss Ethel Flora
263
Alive
##
7 FORTUNE, Miss Mabel Helen
263
Alive
##
8 FORTUNE, Mr Charles Alexander
263
Dead
##
9 FORTUNE, Mr Mark
263
Dead
## 10 FORTUNE, Mrs Mary
263
Alive
## # ... with 2,198 more rows
The most expensive tickets were sold to: - Mr Thomas Drake Martinez CARDEZA - Mrs Charlotte Wardle
CARDEZA - Mr Custave J. LESUEUR - Miss Annie Moore WARD All four of these passengers paid 512.32
British pounds for their tickets and they all survived the accident.
(h) In this question, you will compare the distribution of ticket prices for survivors and non-
survivors of the Titanic using both visualizations and summary tables.
•
Construct two histograms to visualize the distribution of ticket prices for survivors and non-survivors
(i.e. one histogram for survivors and one for non-survivors). Write 2-3 sentences comparing the two
distributions based on these plots.
titanic
%>%
filter
(Survived
==
"Alive"
)
%>%
ggplot
(
aes
(
x=
Paid))
+
geom_histogram
(
color=
"black"
,
fill=
"gray"
,
bins=
30
)
0
50
100
150
0
100
200
300
400
500
Paid
count
titanic
%>%
filter
(Survived
==
"Dead"
)
%>%
ggplot
(
aes
(
x=
Paid))
+
geom_histogram
(
color=
"black"
,
fill=
"gray"
,
bins=
30
)
20
0
100
200
300
400
0
100
200
Paid
count
The distribution of ticket prices is very right-skewed for both the survivors and those who perished; while most
of the tickets cost less than 100 pounds, some of the survivors paid over 500 pounds for their tickets. The
first bar in the histogram (corresponding to the lowest range of fares) is much taller in the distribution of
non-survivors than survivors, so we see that most of the individuals who bought these low-cost tickets did not
survive the accident.
•
Construct a pair of boxplots (in the same figure) to visualize the distribution of ticket prices for survivors
and non-survivors. Write 2-3 sentences comparing the two distributions based on these plots.
titanic
%>%
ggplot
(
aes
(
x=
Survived,
y=
Paid))
+
geom_boxplot
()
21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
0
100
200
300
400
500
Alive
Dead
Survived
Paid
Again, we see that both distributions are highly right skewed. From the boxplots, it is clear that the median
fare paid by surviving passengers was higher than that paid by the non-survivors and the interquartile range
of ticket prices is much wider among survivors than non-survivors (IQR of approximately 50 pounds for
survivors and less than 25 pounds for non-survivors). The distribution of ticket prices is more right-skewed
among non-survivors than among survivors, as the median appears to be very close to the first quartile.
•
Construct a summary table with the minimum, first quartile, median, mean, third quartile, and
maximum ticket price for survivors and non-survivors.
titanic
%>%
group_by
(Survived)
%>%
summarise
(
n=
n
(),
min=
min
(Paid,
na.rm=
TRUE
),
first_quartile=
quantile
(Paid,
0.25
,
na.rm=
TRUE
),
median=
median
(Paid,
na.rm=
TRUE
),
mean=
mean
(Paid,
na.rm=
TRUE
),
third_quartile=
quantile
(Paid,
0.75
,
na.rm=
TRUE
),
max=
max
(Paid,
na.rm=
TRUE
))
## # A tibble: 2 x 8
##
Survived
n
min first_quartile median
mean third_quartile
max
##
<chr>
<int> <dbl>
<dbl>
<dbl> <dbl>
<dbl> <dbl>
## 1 Alive
712
0
11.3
26
49.6
57.9
512.
## 2 Dead
1496
0
7.85
10.5
22.9
26
263
•
Write 2-3 sentences comparing the two distributions based on this summary table.
The minimum ticket price among both survivors and non-survivors is 0, which is strange; more investigation is
required to determine whether this is an error in the data or if some passengers in fact received complimentary
tickets. From the summary table, we see that the median ticket price among survivors was more than twice
as high as the median ticket price among non-survivors. Among survivors, 75% paid less than 58 pounds
22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
for their tickets, while 75% of the non-survivors paid less than 26 pounds. The mean ticket price is also
much higher among survivors, but this is pulled up by the particularly high prices paid by a small number of
passengers.
As a side note, we can take a closer look at passenger with 0-pound tickets
titanic
%>%
filter
(Paid
==
0
)
%>%
group_by
(Class)
%>%
summarise
(
n
())
## # A tibble: 3 x 2
##
Class
n()
##
<chr> <int>
## 1 1
8
## 2 2
14
## 3 3
6
titanic
%>%
filter
(Paid
==
0
)
%>%
group_by
(Boarded)
%>%
summarise
(
n
())
## # A tibble: 4 x 2
##
Boarded
n()
##
<chr>
<int>
## 1 Belfast
9
## 2 Cherbourg
1
## 3 Queenstown
1
## 4 Southampton
17
titanic
%>%
filter
(Paid
==
0
)
%>%
group_by
(Survived)
%>%
summarise
(
n
())
## # A tibble: 2 x 2
##
Survived
n()
##
<chr>
<int>
## 1 Alive
4
## 2 Dead
24
There is no obvious pattern connecting individuals with recorded 0-pound tickets. It is not clear whether this
is an error or not, but since only 28 out of 2208 observations are affected, these are not expected to have a
large impact on the comparison.
•
Comment on the strengths and weaknesses of each of the visualizations and summary table constructed
above.
Histograms: The paired histograms give us a good overall impression of the distribution of ticket prices
among survivors and non-survivors, but it is difficult to extract estimates of the mean, median, and quantiles,
as well as the bounds of each bin.
Boxplots: The boxplots make it easy for us to compare the medians, quartiles, IQR (and outliers) of ticket
prices across the two groups, although we cannot easily extract exact values for these. Also, since boxplots
only display a small number of summary statistics, we lose information about the shape of the distributions.
Summary table: The summary table makes it easy to compare numerical values of key statistics. It is only
from the summary table that we noticed that some passengers were recorded to have paid 0 pounds for their
tickets. However, it is more difficult to get a quick sense of the overall shape of the distributions from these
summary statistics alone, although these could be used to sketch a pair of boxplots.
23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
* Question Completion Status:
QUESTION 1
Arnold purchased a $1,300 set of golf clubs on a nine-month layaway plan and had to pay a monthly payment of $158.89. What is the fee charged for the layaway plan?
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
BIUS
Paragraph
Arial
14px
A
arrow_forward
Pls help
might need to use excel
arrow_forward
Ski resorts are interested in the average age that children take their first ski and snowboard lessons. They need this information to optimally plan their ski classes. Match the vocabulary word with its corresponding example.
The list of the 92 ages that the children from the study took their first lessonThe age that children take their first lessonThe average age that all children take their first lessonAll children who ski and snowboardThe average age that the 92 children took their first lessonThe 92 children who were asked when they took their first lesson
StatisticSamplePopulationDataParameterVariable
arrow_forward
Part 2. Refer to the Excel file Cereal data set to complete the following tasks. All results and explanations need to be reported within this Word document after each question. Make sure to use complete sentences when explaining your results. Your results should be formatted and edited.
Data Set: Cereals
The data set shows the name of different brands of cereals, the manufacturers, the total calories, proteins, sugar, fat, potassium, sodium, location of the shelf in the supermarket, etc. The amount of sugar, protein, etc., is measured in grams (g).
Exercise 1:
A. Construct a frequency distribution and a bar graph for the cereal manufactures (mfr). Include the relative frequencies. Edit and format the graph and include appropriate labels for the horizontal and vertical axes. Describe your findings in the context of the problem (Include which manufacturer produces the most cereals and least number of cereals in the cereal market).
N = Nabisco, K = Kellog’s, Q = Quaker Oats…
arrow_forward
complete screenshot please
arrow_forward
DataValues
9
26
48
60
63
72
80
90
What is the 85thP ?
arrow_forward
SPC Student Login - One SPC
mịc Mylab Meth All Assignments M
O Study Plan Practice-Samantha S x
A mathxl.com/Student/PlayerPractice.aspx?chapterld=48sld%31&objectiveld%=38single=true&closeUrl=https%3A%2F%2Fwww.mathl.com%2FStudent%2FStum
->
E Apps
6 Spiritual Gifts Test Friv
* name generator
A Executive Comman.
* Moonrise, Moonset..
Undergraduate Ap
MAT1033 Intermediate Algebra - (Blended) # 1679 - DeNooyer
X Quiz - XMONDO H.
Homepage
Samantha
3.1 Graphing Linear Equations with Two Unknowns
Objective: Graph linear equations in two variables.
1 of 6 (0 complete) ▼
3.1.9
Graph the equation.
AY
y=2x-5
Use the graphing tool to graph the equation.
Click to
enlarge
graph
Click the graph, choose a tool in the palette and follow the instructions to create your graph.
Clear All
All parts showing
P Type here to search
hp
to
10
f3
12
米
&
8.
esc
#3
4
arrow_forward
Help!
arrow_forward
Help
arrow_forward
Plz help asap 40 need crit value as well
arrow_forward
help please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all working
arrow_forward
Block: _
Date.
Five year-old Susie's parents are concerned that she seems short for her age. Their doctor has the followng rer
of Susie' height:
Age (months): 20
30
45
50
57
60
Height (cm):
80
86
87
90
91
94
(a) Make a scatterplot of these data:
arrow_forward
Need help asap
arrow_forward
View Options
Lecturelntros - Goog X
O Lumen OHM- MAT120.7681FA x
Post Attendee- Zoom
6 Faculty and Staff Portal My Le
Nd3solWKTTX44 hz6umEOSRdg6TCnthhrGhE/edit
Help
Last edit was 9 minutes ago
13.5
E===1ヨ
E E X
Let's put this rules in practice:
a. In a survey of hospital nursing from a Washington state hospital it was found that 92% of the nurses
were female, that 74% had received their nursing degree in Washington state, and 70% were both
female and had received their nursing degree in Washington state. (Smith, 2007). If a nurse is chosen
at random from those sampled, find the probability of the following events.
wy did not receive their nursing degree in Washington state/ /
• Getting a nurse
• Getting a nurse who is a female and received their nursing degree in Washington state.
• Getting a nurse who is a female or received their nursing degree in Washington state.
arrow_forward
Need help with this worksheet
arrow_forward
Continue monitoring the process. A second ten days of data have been collected, see table labeled “2nd 10 Days of Monitoring Reservation Processing Time” in the Data File.
Develop Xbar and R charts for the 2nd 10 days of monitoring. Plot the data for the 2nd 10 days on the Xbar and R charts.
Is the reservation process for the 2nd 10 days of monitoring in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart.
Based on the X-bar and R Charts that you developed for the 2nd 10 days of data, is the process in control?
Group of answer choices
No. The X-bar and R Charts are both out of control.
No. The X-bar Chart is in control, but the R Chart is out of control.
No. The R Chart is in control, but the X-bar Chart is out of control.
Yes. The X-bar and R Charts are both in control.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,
Related Questions
- * Question Completion Status: QUESTION 1 Arnold purchased a $1,300 set of golf clubs on a nine-month layaway plan and had to pay a monthly payment of $158.89. What is the fee charged for the layaway plan? For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac). BIUS Paragraph Arial 14px Aarrow_forwardPls help might need to use excelarrow_forwardSki resorts are interested in the average age that children take their first ski and snowboard lessons. They need this information to optimally plan their ski classes. Match the vocabulary word with its corresponding example. The list of the 92 ages that the children from the study took their first lessonThe age that children take their first lessonThe average age that all children take their first lessonAll children who ski and snowboardThe average age that the 92 children took their first lessonThe 92 children who were asked when they took their first lesson StatisticSamplePopulationDataParameterVariablearrow_forward
- Part 2. Refer to the Excel file Cereal data set to complete the following tasks. All results and explanations need to be reported within this Word document after each question. Make sure to use complete sentences when explaining your results. Your results should be formatted and edited. Data Set: Cereals The data set shows the name of different brands of cereals, the manufacturers, the total calories, proteins, sugar, fat, potassium, sodium, location of the shelf in the supermarket, etc. The amount of sugar, protein, etc., is measured in grams (g). Exercise 1: A. Construct a frequency distribution and a bar graph for the cereal manufactures (mfr). Include the relative frequencies. Edit and format the graph and include appropriate labels for the horizontal and vertical axes. Describe your findings in the context of the problem (Include which manufacturer produces the most cereals and least number of cereals in the cereal market). N = Nabisco, K = Kellog’s, Q = Quaker Oats…arrow_forwardcomplete screenshot pleasearrow_forwardDataValues 9 26 48 60 63 72 80 90 What is the 85thP ?arrow_forward
- SPC Student Login - One SPC mịc Mylab Meth All Assignments M O Study Plan Practice-Samantha S x A mathxl.com/Student/PlayerPractice.aspx?chapterld=48sld%31&objectiveld%=38single=true&closeUrl=https%3A%2F%2Fwww.mathl.com%2FStudent%2FStum -> E Apps 6 Spiritual Gifts Test Friv * name generator A Executive Comman. * Moonrise, Moonset.. Undergraduate Ap MAT1033 Intermediate Algebra - (Blended) # 1679 - DeNooyer X Quiz - XMONDO H. Homepage Samantha 3.1 Graphing Linear Equations with Two Unknowns Objective: Graph linear equations in two variables. 1 of 6 (0 complete) ▼ 3.1.9 Graph the equation. AY y=2x-5 Use the graphing tool to graph the equation. Click to enlarge graph Click the graph, choose a tool in the palette and follow the instructions to create your graph. Clear All All parts showing P Type here to search hp to 10 f3 12 米 & 8. esc #3 4arrow_forwardHelp!arrow_forwardHelparrow_forward
- Plz help asap 40 need crit value as wellarrow_forwardhelp please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all workingarrow_forwardBlock: _ Date. Five year-old Susie's parents are concerned that she seems short for her age. Their doctor has the followng rer of Susie' height: Age (months): 20 30 45 50 57 60 Height (cm): 80 86 87 90 91 94 (a) Make a scatterplot of these data:arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtElementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage Learning
- Mathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,