ProblemSet 5 my answers
pdf
keyboard_arrow_up
School
University of Toronto *
*We aren’t endorsed by this school
Course
130
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
18
Uploaded by LieutenantFlagSquid18
STA130H1S –
Fall 2022 Problem Set 4 Amogh Shashidhar (1008817666) and STA130 Professors Instructions Complete the exercises of Part 2
in this .Rmd file and submit your .Rmd and .pdf output through Quercus on Oct 6 by 5:00 p.m. ET. Complete the exercises of Part 3
in this .Rmd file and submit your .Rmd and .pdf output through Quercus on Oct 13 by 5:00 p.m. ET. library
(tidyverse)
Part 1: OPTIONAL Warm Up if Needed. Complete these guided questions if you need some additional help getting started with hypothesis testing before moving on to Part 2
, or if you want some additional practice with this hypothesis testing. You are not required to complete these questions as they ARE NOT included as part of your mark.
Question 1: Warm Up with Biased Coin Flipping Approximately 23% of the general population use the social media platform Twitter. Suppose that the Department of Statistical Sciences (DoSS) is conducting a study to see if this percentage is the same among their undergraduate students (that is, all students in an undergraduate DoSS statistics program). Suppose 𝑛 = 400
students in statistics programs are randomly selected and asked whether or not they use Twitter. Suppose that 103 of these 400 students respond that they use Twitter. (a) What is the NULL hypotheses 𝐻
0
in terms of 𝑝
? What is 𝐻
1
in terms of 𝐻
0
? In a simple sentence without 𝐻
0
and 𝑝
notation, what is the claim of the NULL hypothesis? REPLACE THIS TEXT WITH YOUR ANSWER
(b) Set set.seed(11)
and use the sample()
function to simulate the number of students who use Twitter in a random sample of 400 DoSS students under the assumption that the prevalence of Twitter usage is the same among DoSS students as it is in the general population. How many Twitter users did you have in your simulated sample of 400 students? set.seed
(
11
) # REQUIRED so the random sample is reproducible!
# Code your answer here
Hints sample
(
c
(
"Head"
,
"Tail"
), size=
10
, replace=
TRUE
)
## [1] "Tail" "Tail" "Tail" "Head" "Tail" "Head" "Head" "Tail" "Tail" "Tail"
# will do the same thing as:
sample
(
c
(
"Head"
,
"Tail"
), size=
10
, prob=
c
(
0.5
, 0.5
), replace=
TRUE
)
## [1] "Tail" "Tail" "Head" "Head" "Head" "Head" "Tail" "Tail" "Tail" "Tail"
# Even though the exact counts of "Head" and "Tail" differ each time you
# run this code, if you simulate enough coin flips (by increasing
# the value of 'size', you'll get approximately the same proportion
# of "Head" and "Tail" outcomes)
# To modify the code to make Tails much more likely than Heads, # we could change the probs:
sample
(
c
(
"Head"
,
"Tail"
), size=
10
, prob=
c
(
0.1
, 0.9
), replace=
TRUE
)
## [1] "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail"
(c) Use geom_bar()
to visualize the number of Twitter users versus non-Twitter users from your simulated sample with a bar plot. How does this simulated proportion compare to the general population rate of 23% and to the 103 of 400 sampled DoSS students? # Code your answer here
REPLACE THIS TEXT WITH YOUR ANSWER
Hints # You can make a vector a column of a `tibble` like this
tibble
(
flips = c
(
"Head"
, "Tail"
, "Tail"
))
## # A tibble: 3 × 1
## flips
## <chr>
## 1 Head ## 2 Tail ## 3 Tail
(d) How is the geom_bar()
function different than the geom_col()
function below? REPLACE THIS TEXT WITH YOUR ANSWER
ggplot
(
data=
NULL
, aes
(
x=
c
(
"Twitterer"
,
"Non Twitterer"
), y=
c
(
103
,
400-103
))) +
geom_col
() +
lims
(
y=
c
(
0
,
400
)) +
labs
(
title=
"Sample of (n=400) DoSS\nDo you Tweet?"
, y=
"count"
, x=
""
)
(e) Simulate the sampling distribution of the test statistic under the assumption that the prevalence of Twitter usage among DoSS students matches that of the general population. Set the seed to the last 2 digits of your student number, use a simulation of size 1000, make a plot of the simulated sampling distribution, and describe the distribution in a few sentences. •
If you don’t set set.seed()
the simulation will be different each time its run. •
Your knit won’t be reproducible and won’t align with your inte
rpretations and conclusions. # Clearly label your figure with `labs(x="A primary title\n and a second line")`
# Code your answer here
(e) What is the definition of a p-value? REPLACE THIS TEXT WITH YOUR ANSWER
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(f) What is the p-value of the hypothesis test based on the sampling distribution above? # Use the `abs()` function to reflect the "as or more extreme" aspect of the p-value
# Code your answer here
(g) At the 𝑎𝑙𝑝ℎ𝑎 = 0.05
significance level, what is your conclusion about this hypothesis test based on the p-value computed above? REPLACE THIS TEXT WITH YOUR ANSWER
(h) Which of the following statements is correct regarding the p-value above? (A)
The probability that the proportion of DoSS students who use Twitter matches the general population. (B)
The probability that the proportion of DoSS students who use Twitter does not match the general population. (C)
The probability of obtaining a number of students who use Twitter in a sample of 400 students at least as extreme as the result in this study. (D)
The probability of obtaining a number of students who use Twitter in a sample of 400 students at least as extreme as the result in this study, if the prevalence of Twitter usage among all DoSS students matches the general population. REPLACE THIS TEXT WITH YOUR ANSWER
(i) What happens to the p-value if you change the seed value in set.seed()
? What happens to the p-value if you change the size of the simulation? REPLACE THIS TEXT WITH YOUR ANSWER
Part 2: One Sample Hypothesis Testing DUE THURSDAY Oct 6 by 5 p.m. ET
Question 2: Scottish Medicine A Scottish woman noticed that her husband’s scent changed. Six years later he was diagnosed with Parkinson’s disease. His wife joined a Parkinson’s charity and noticed that odour from other people. She mentioned this to researchers who decided to test her abilities. They recruited 6 people with Parkinson’s disease and 6 people without the disease. Each of the recruits wore a t-shirt for a day, and the woman was asked to smell the t-shirts (in random order) and determine which shirts were worn by someone with Parkinson’s disease. She was correct for 12 of the 12 t
-shirts! You can read about this here
.
(a) Without conducting a simulation, describe what you would expect the sampling distribution of the proportion of correct guesses about the 12 shirts to look like if someone was just guessing. Without conducting a simulation, i would expect the sample distribution of the proportion of correct guesses be random as it is mentioned in the question, the lady had to smell the shirts randomly in order. if someone was guessing there will likely be a chance that due to luck they might guess a few shirts correct but highly unlikely all 12 will be correct. (b) Carry out a simulation and a hypothesis test of the woman being a lucky guesser as opposed to having some ability to identify Parkinson’s disease by smell given that she correctly classified 12 of 12 t-shirts. Make a conclusion based on your work. set.seed
(
66
) # Set the random seed to the last two digits of your student number
N <- 1000 # Change this to 10000 if a finer simulation resolution is required simulated_stats <- rep
(
NA
, N) n_observations <- 12
test_stat <- 11
/
12
for
(i in
1
:
N)
{
new_sim <- sample
(
c
(
"correct"
, "incorrect"
), size=
n_observations, prob=
c
(
0.5
, 0.5
), replace=
TRUE
)
sim_p <- sum
(new_sim ==
"correct"
) /
n_observations
simulated_stats[i] <- sim_p
}
sim <- data_frame
(
p_correct = simulated_stats)
ggplot
(sim, aes
(p_correct)) +
geom_histogram
(
binwidth=
0.01
) +
geom_vline
(
xintercept=
11
/
12
, color=
"red"
) +
geom_vline
(
xintercept=
0.5
-
(
12
/
12-.5
), color=
"red"
)
from the data above, we can tell that the p_correct is positive at p_count > 0 and as the value increases, p_correct increases and peaks at count = 210, after which it decreases and one point goes beyond 0.99. (c) Actually, initially the woman correctly identified all 6 people who had been diagnosed with Parkinson’s but incorrectly identified one of the others as having Parkinson’s. It was only eight months later that the final individual was diagnosed with the disease. What is the p-value when only 11 of 12 were known to be correct? # Code your answers here
set.seed
(
66
) # Set the random seed to the last two digits of your student number
N <- 1000 # Change this to 10000 if a finer simulation resolution is required simulated_stats <- rep
(
NA
, N) n_observations <- 12
test_stat <- 11
/
12
for
(i in
1
:
N)
{
new_sim <- sample
(
c
(
"correct"
, "incorrect"
), size=
n_observations, prob=
c
(
0.5
, 0.5
), replace=
TRUE
)
sim_p <- sum
(new_sim ==
"correct"
) /
n_observations
simulated_stats[i] <- sim_p
}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
sim <- data_frame
(
p_correct = simulated_stats)
ggplot
(sim, aes
(p_correct)) +
geom_histogram
(
binwidth=
0.01
) +
geom_vline
(
xintercept=
11
/
12
, color=
"purple"
) +
geom_vline
(
xintercept=
0.5
-
(
11
/
12-.5
), color=
"orange"
)
(d) Are you able to get the p-value for the test using the initial data (i.e., 11 correct instead of 12 correct) without running a new simulation? yes, i am able to get the p-value using the initial data which is 11 correct instead of 12 (e) Is the conclusion of the hypthesis test the same for an observed test statistic of 11/12 as for an observed test statistic of 12/12? No, the conclusion from 11/12 and 12/12 are very different but there are a few similarities such that they both stay within the range of 0 - 1. however, the highest points which is 0.5 is the same for both. Question 3: Fisher’s Tea Experiment
There is an interesting account of the British statistician Ronald Fisher at a tea party in the 1920s. One of the other guests was algae scientist Dr. Muriel Bristol, who refused a cup of tea from Fisher because he put milk in first BEFORE pouring the tea. Bristol was convinced she could taste the difference, and much preferred the taste of tea where the milk was
poured in afterwards. Fisher didn’t think that there could be a difference and proposed a hypothesis test to examine the situation. Fisher made 8 cups of tea, 4 with milk in first and 4 with tea in first, and gave them to Dr. Bristol without her seeing how they were made and she would say if she thought the tea or the milk was poured first. As it turned out, Dr. Bristol correctly identified if the tea or milk was poured first for all 8 of the cups. Fisher, being a skeptical statistician wanted to test if this could be happening by chance with Bristol just randomly guessing (or whether it seemed more likely that Bristol was not guessing). Suppose you run an experiment like this with students in STA130. You get a random sample of 80 STA130 students to each taste one British-style cup of tea and tell you whether they think the milk or tea was poured first. 49 students correctly state which was poured first. Go through the steps to test whether students are just guessing or not. (a) What is the NULL hypotheses 𝐻
0
in terms of 𝑝
? What is 𝐻
1
in terms of 𝐻
0
? In a simple sentence without 𝐻
0
and 𝑝
notation, what is the claim of the NULL hypothesis? the null hypothesis is that there is no relationship between your variables of interest or that there is no difference among groups. (b) Conduct a hypothesis test on the basis of the following simulated sampling distribution of the test statistic assuming the NULL hypothesis is true. For simplicity, this distribution shows the results of only 100 simulations, but in practice this likely wouldn’t provide very good p
-value resolution refinement. ## Bin width defaults to 1/30 of the range of the data. Pick better value with `binwidth`.
•
What does each single dot in the plot represent? One simulation is represented by each dot. Since the null hypothesis states that people are equally likely to be correct or incorrect in identifying the type of soda, 80 values are randomly generated for each simulation, with equal odds of being correct or incorrect. The percentage of the 80 values that are “right” in the simulation is represented by the dots. The test statistic is estimated using the nyll hypothesis as a base case. •
Based on this plot, what is your estimate of the p-value? 53/80 equals a test statistic of 0.625. The plot contains no dots with values more or equal to 0.6625 or lower than or equal to 0.3375. (0.3375 is located at the same altitude as 0.6625 above 0.5.) Therefore, the estimated P-value is 0. •
At the 𝑎𝑙𝑝ℎ𝑎 = 0.05
significance level, what is your conclusion about this hypothesis test based on the p-value computed above? at the alpha = 0.05 level, we can tell that the proportion choosing milk first assuming no ability to distinguish is relative higher compared to the other plots.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(c) Suppose the analysis described in (b) is repeated but this time 1000 simulations are used to get a better estimate of the p-value, and the resulting p-
value is 0.04. Do not conduct this simulation. At the 𝑎𝑙𝑝ℎ𝑎 = 0.05
significance level, what is your conclusion about this hypothesis test for a p-value of 0.04? We would come to the conclusion that we have good evidence that those who taste the milk are not simply speculating, and that people have the capacity to distinguish between a tea and a milk that was poured first. Question 4: OPTIONAL primer for potential TUT discussion •
You may complete these questions for practice if you wish. You are not required to complete these questions as they ARE NOT included as part of your mark.
A criminal court considers two opposing claims about a defendant: they are either innocent or guilty. In the Canadian legal system, the role of the prosecutor is to present convincing evidence that the defendant is not innocent. Lawyers for the defendant attempt to argue that the evidence is not convincing enough
to rule out that the defendant could be innocent. If there is not enough evidence to convict the defendant and they are set free, the judge generally does not deliver a verdict of “innocent”, but rather
of “not guilty”.
(a) If we look at the criminal trial example in the hypothesis test framework, which would be the null hypothesis and which the alternative? REPLACE THIS TEXT WITH YOUR ANSWER
(b) In the context of this problem, describe what rejecting the null hypothesis would mean. REPLACE THIS TEXT WITH YOUR ANSWER
(c) In the context of this problem, describe what failing to reject the null hypothesis would mean. REPLACE THIS TEXT WITH YOUR ANSWER
(d) In the context of this problem, describe what a type II error would be. REPLACE THIS TEXT WITH YOUR ANSWER
(e) In the context of this problem, describe what a type I error would be. REPLACE THIS TEXT WITH YOUR ANSWER
Part 3: Two Sample Hypothesis Testing Required Questions DUE THURSDAY Oct 13 by 5 p.m. ET
Question 5: Social Media and Anxiety There have been many questions regarding whether or not usage of social media increases anxiety levels. For example, do TikTok and Facebook posts create an unattainable sense of life success and satisfaction? Does procrastinating by watching YouTube videos or reading Twitter posts contribute unnecessary stress from deadline pressure? A study was conducted to examine the relationship between social media usage and student anxiety. Students were asked to categorize their social media usage as “High” if it exceeded more than 2 hours per day, and then student anxiety levels where scored through as series of questions, with higher scores suggesting higher student anxiety. # `The rep()` function was introduced above, and you can see what it does here
social_media_usage <- c
(
rep
(
"Low"
, 30
), rep
(
"High"
, 16
));
anxiety_score <- c
(
24.64
, 39.29
, 16.32
, 32.83
, 28.02
, 33.31
, 20.60
, 21.13
, 26.69
, 28.90
,
26.43
, 24.23
, 7.10
, 32.86
, 21.06
,
28.89
, 28.71
, 31.73
, 30.02
, 21.96
,
25.49
, 38.81
, 27.85
, 30.29
, 30.72
,
21.43
, 22.24
, 11.12
, 30.86
, 19.92
,
33.57
, 34.09
, 27.63
, 31.26
,
35.91
, 26.68
, 29.49
, 35.32
,
26.24
, 32.34
, 31.34
, 33.53
,
27.62
, 42.91
, 30.20
, 32.54
)
anxiety_data <- tibble
(social_media_usage, anxiety_score)
glimpse
(anxiety_data)
## Rows: 46
## Columns: 2
## $ social_media_usage <chr> "Low", "Low", "Low", "Low", "Low", "Low", "Low", "L…
## $ anxiety_score <dbl> 24.64, 39.29, 16.32, 32.83, 28.02, 33.31, 20.60, 21…
(a) What is the NULL hypotheses 𝐻
0
in terms of 𝑀??𝑖𝑎𝑛
𝐻𝑖𝑔ℎ
and 𝑀??𝑖𝑎𝑛
𝐿?𝑤
? In simple terms, what is the claim of the NULL hypothesis? What is 𝐻
1
in terms of 𝐻
0
? The null hypothesis assumes that any kind of difference between the chosen characteristics that you see in a set of data is due to chance. the NULL hypotheses H0 in terms of median high is 0.75 and for median low is 0.65 . One of these is the claim to be tested and based on the sampling results, from the data, we can tell that the claim is supported. H1 in terms of H0 will be 0.35. Hint A formal NULL hypotheses that the means of two groups are the same would be 𝐻
0
: 𝜇
𝐻𝑖𝑔ℎ
=
𝜇
𝐿?𝑤
.
(b) Revisit your statements regarding the NULL hypotheses above with confounding in mind; namely, since social media usage is a self selecting process, perhaps social media users are already more anxious people on average regardless of their social media usage. If we make a determination about the NULL hypothesis are we actually addressing the question of “whether or not usage of social media increases anxiety levels”? Or are we just using a hypothesis test to examine if there is an observable difference between the two groups (regardless of its causes)? if we make a determination about the NULL hypothesis, we will be addressing the question of whether or not usage of social media increases anxiety levels and this conclusion can be derived by overseeing the data and computing the NULL Hypothesis H0 and H1. (c) Construct boxplots of anxiety_score
for the two levels of social media usage, and write 2-3 sentences describing and comparing the distributions of anxiety scores across the social media usage groups. # Code your answers here
boxplot
(anxiety_score)
in the above boxplot, we have graphed the boxplot for the 2 levels of social media usuage and from the distribution, we can tell that the box plot has equal proportions around the median, we can say distribution is normal. (d) What do these data visually suggest regarding the claim that the median
anxiety level is different for those who use social media in high frequency compared to those who use social media in lower frequency? visually comparing the data, we can tell that the median anxiety level is higher than the social media usage telling us that people who use social media for a longer period of time face greater anxiety level and it is probably for the better to have a restriction on screen time.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(e) Look at the code below and write a few sentences explaining what the code inside the for loop is doing and why. in the code given below, the for loop is firstly created to check from 1 to repetitions which is 1000. the variable ‘simdata’ accesses anxiety_da
ta and mutates it in such a way that ‘social_media_usuage’ is equal to the sample of ‘social_media_outage’ collected using the sample keyword. another variable called ‘sim_value’ slims through ‘simdata’ and grouping ‘social_media_usuage’ which then in turn
summarizes the information to find median. the reason for this for loop is to essentially find the difference in median along the different counts. # Note: including the .groups="drop" option in summarise() will suppress a friendly # warning R prints otherwise "`summarise()` ungrouping output (override with #`.groups` argument)". # Including the .groups="drop" option is optional, but you should include it if you #don't want to see that warning.
test_stat <- anxiety_data %>%
group_by
(social_media_usage) %>%
summarise
(
medians = median
(anxiety_score), .groups=
"drop"
) %>%
summarise
(
value = diff
(medians))
test_stat <- as.numeric
(test_stat)
test_stat
## [1] -4.57
set.seed
(
523
)
repetitions <- 1000
;
simulated_values <- rep
(
NA
, repetitions)
for
(i in
1
:
repetitions){
simdata <- anxiety_data %>%
mutate
(
social_media_usage = sample
(social_media_usage))
sim_value <- simdata %>%
group_by
(social_media_usage) %>%
summarise
(
medians = median
(anxiety_score), .groups=
"drop"
) %>%
summarise
(
value = diff
(medians))
simulated_values[i] <- as.numeric
(sim_value)
}
sim <- tibble
(
median_diff = simulated_values)
sim %>%
ggplot
(
aes
(
x=
median_diff)) +
geom_histogram
(
binwidth=
1
, color=
"black"
, fill=
"gray"
)
num_more_extreme <- sim %>%
filter
(
abs
(median_diff) >=
abs
(test_stat)) %>%
summarise
(
n
())
p_value <- as.numeric
(num_more_extreme /
repetitions)
p_value
## [1] 0.009
(f) Summarize the NULL hypothesis and then, at the 𝑎𝑙𝑝ℎ𝑎 = 0.05
significance level, state your conclusion about the hypothesis test of the NULL hypothesis based on the p-value computed above. The null hypothesis is a typical statistical theory which suggests that no statistical relationship and significance exists in a set of given single observed variable, between two sets of observed data and measured phenomena. at 0.05, if the null hypothesis is true, there is less than a 5% chance that these results (or more extreme outcomes) will be seen. This is shown by a p-value of less than.05. (g) Do these data support the claim that the median
anxiety level is different for those who use social media in high frequency compared to those who use social media in lower frequency? How about the claim that “usage of social media increases anxiety levels”?
this data claims that the median anxiety level is different from those who use social media in high frequency compared to those who use social media in lower frequency and is
clearly indicated in the graph above. the median in difference is between -1 and 0 and cannot be accurately predicated since this is a bar graph rather than box plot. Question 6: Airbags The table below is adapted from “Biostatistics for the Biological and Health Sciences” and presents data from a random sample of passengers sitting in the front seat of cars involved in car crashes. Base
d on this data we’d like to make a determination as to whether or not death rates differ for passengers in cars with airbags and passengers in cars without airbags. Airbag available No airbag available Passenger Fatalities 45 62 Total number of Passengers 10,541 9,867 The code below creates a tidy data frame for this problem using the rep()
function. data <- tibble
(
group = c
(
rep
(
"airbag"
,
10541
), rep
(
"no_airbag"
,
9867
)),
outcome = c
(
rep
(
"dead"
,
45
), rep
(
"alive"
,
10541-45
), rep
(
"dead"
,
62
), rep
(
"alive"
,
9867-62
)))
(a) What is the NULL hypotheses 𝐻
0
in terms of 𝑝
?𝑖𝑟??𝑔
and 𝑝
??−?𝑖𝑟??𝑔
? What is 𝐻
1
in terms of 𝐻
0
? In a simple sentence without 𝐻
0
and 𝑝
?𝑖𝑟??𝑔
and 𝑝
??−?𝑖𝑟??𝑔
notation, what is the claim of the NULL hypothesis? the NULL hypotheses of airbags and no airbags is 0.05. A statistically significant test result (P ≤ 0.05) means that the test hypothesis is false or should be rejected. A P value greater than 0.05 means that no effect was observed. (b) Simulate the the sampling distribution of the test statistic under the assumption that the NULL hypothesis state above is TRUE
. set.seed
(
108
) # Replace the seed with the 1st, 3rd, and 5th digits or your student number.
# Code your answers here
n <- 108
X <- sample
(
150
:
200
, n, replace=
TRUE
)
X
## [1] 191 196 165 188 194 196 192 178 196 194 194 176 188 175 155 188 172 155
## [19] 171 169 163 173 180 200 168 171 170 153 163 184 162 198 200 191 195 196
## [37] 173 185 190 154 200 193 170 152 194 161 174 196 186 161 152 182 175 179
## [55] 190 193 159 181 191 182 154 195 166 194 184 151 193 152 197 177 181 175
## [73] 151 196 186 191 196 191 166 193 178 194 164 161 152 171 162 189 164
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
175
## [91] 167 182 189 156 184 183 153 194 190 151 164 163 169 173 174 152 193 168
# space for scratch work if needed
(c) At the 𝑎𝑙𝑝ℎ𝑎 = 0.10
significance level, what is your conclusion about this hypothesis test based on the p-value computed above? At the significance levels of 0.10, the result of a hypothesis test, as has been seen above, is that the null hypothesis is either rejected or not. Using the concept we learnt in class, a lower p-value is sometimes interpreted as meaning there is a stronger relationship between two variables. therefore, the relationship between airbags and dead indicate that no airbags increase the chances of dead. (d) Based on your conclusion above, what kind of error could you have made? computation error of calculating p-value should be one of the error i would have made along with the NULL hypothesis. (e) Does your conclusion support the claim that “airbags save lives”? Or does it seem reasonable to believe that there could be some sort of confounding (like in Question 5) by which people who choose to drive in cars without airbags are just more likely on average do die if they’re in a car crash irrespective of any safety benefit of airbags? from my conclusion, it is evident that i support the claim that airbags save lives as first the p-value being lower indictaes a positive relationship between airbags and lives so staticstically speaking its proven airbags save lives. however, people still choose to drive in cars without airbags and one of the reasons could be that old vintage cars do not come with airbags and car enthuastic who drive such cars are risking their lives during a accident. secondly, the airbags are engaged once the seat belt is on so what few passengers do is that they do not wear a seat belt therefore risking the lives at the back seats and potentially the driver and rear passengers lives. Question 7: OPTIONAL practice specifying NULL Hypotheses •
You may complete these questions for practice if you wish. You are not required to complete these questions as they ARE NOT included as part of your mark.
We’ve covered two kinds of hypothesis tests.
•
In the first version we hypothesize a proportion in a population (the parameter 𝑝
) and test that value using the proportion observed in the sample (the sample average 𝑥
, also sometimes notated as 𝑝̂
). These ONE sample hypotheses tests
have a NULL hypothesis of the form 𝐻
0
: 𝑝 = 𝑝
0
(and of course the ALTERNATIVE hypothesis then takes the form 𝐻
1
: 𝐻
0
is FALSE
). Since this one-sample framework works for any average (not just proportions), another form of this test that is often encountered is
𝐻
0
: 𝜇 = 𝜇
0
where 𝜇
is the mean of the population (corresponding to the sample average 𝑥
). •
In the second version we hypothesize a relationship between two populations, such as that both populations have the same mean or median (or proportion or standard deviation, etc.). These TWO sample hypotheses tests
have a NULL hypothesis of the form 𝐻
0
: 𝜇
1
= 𝜇
2
or 𝐻
0
: 𝑝
1
= 𝑝
2
(or 𝐻
0
: 𝑀??𝑖𝑎𝑛
1
= 𝑀??𝑖𝑎𝑛
2
or 𝐻
0
: 𝜎
1
= 𝜎
2
, etc.) and of course the ALTERNATIVE hypothesis is still 𝐻
1
: 𝐻
0
is FALSE
. For each of the following scenarios, state appropriate hypotheses 𝐻
0
and 𝐻
1
. For each scenario, also state in simple terms what the claim of the NULL hypothesis is. Be sure to carefully define any parameters you refer to. (a) A health survey asked individuals to report the number of times they exercised each week. Researchers were interested in determining if the proportion of individuals who exercised at least 100 minutes per week differed between people who live in the condos vs people who do not live in condos. REPLACE THIS TEXT WITH YOUR ANSWER
(b) A study was conducted to examine whether a baby is born prematurely/early (i.e., before their due date) to whether or n
ot the baby’s mother smoked while she was pregnant. REPLACE THIS TEXT WITH YOUR ANSWER
(c) Nintendo is interested in whether or not their online advertisements are working. They record whether or not a user had seen an ad on a given day and their amount of spending on Nintendo products in the next 48 hours. They are interested in determining if there is an association between whether or not the user say an ad and their expenditures. REPLACE THIS TEXT WITH YOUR ANSWER
(d) Based on results from a survey of graduates from the University of Toronto, we would like to compare the median salaries of graduates from the statistics and graduates of mathematics programs. REPLACE THIS TEXT WITH YOUR ANSWER
Part 4: OPTIONAL but STRONGLY Recommended for Practice You may complete these questions for practice if you wish. You are not required to complete these questions as they ARE NOT included as part of your mark.
Question 8 Complete this One Sample Hypothesis Testing Practice Quiz using this Rmd file Question 9 Complete this Two Sample Hypothesis Testing Practice Quiz using this Rmd file
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell

Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal LittellIntermediate AlgebraAlgebraISBN:9781285195728Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage Learning
- Mathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,Algebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningElementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell

Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning