ProblemSet 5 my answers

pdf

School

University of Toronto *

*We aren’t endorsed by this school

Course

130

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by LieutenantFlagSquid18

STA130H1S – Fall 2022 Problem Set 4 Amogh Shashidhar (1008817666) and STA130 Professors Instructions Complete the exercises of Part 2 in this .Rmd file and submit your .Rmd and .pdf output through Quercus on Oct 6 by 5:00 p.m. ET. Complete the exercises of Part 3 in this .Rmd file and submit your .Rmd and .pdf output through Quercus on Oct 13 by 5:00 p.m. ET. library (tidyverse) Part 1: OPTIONAL Warm Up if Needed. Complete these guided questions if you need some additional help getting started with hypothesis testing before moving on to Part 2 , or if you want some additional practice with this hypothesis testing. You are not required to complete these questions as they ARE NOT included as part of your mark. Question 1: Warm Up with Biased Coin Flipping Approximately 23% of the general population use the social media platform Twitter. Suppose that the Department of Statistical Sciences (DoSS) is conducting a study to see if this percentage is the same among their undergraduate students (that is, all students in an undergraduate DoSS statistics program). Suppose 𝑛 = 400 students in statistics programs are randomly selected and asked whether or not they use Twitter. Suppose that 103 of these 400 students respond that they use Twitter. (a) What is the NULL hypotheses 𝐻 0 in terms of 𝑝 ? What is 𝐻 1 in terms of 𝐻 0 ? In a simple sentence without 𝐻 0 and 𝑝 notation, what is the claim of the NULL hypothesis? REPLACE THIS TEXT WITH YOUR ANSWER

(b) Set set.seed(11) and use the sample() function to simulate the number of students who use Twitter in a random sample of 400 DoSS students under the assumption that the prevalence of Twitter usage is the same among DoSS students as it is in the general population. How many Twitter users did you have in your simulated sample of 400 students? set.seed ( 11 ) # REQUIRED so the random sample is reproducible! # Code your answer here Hints sample ( c ( "Head" , "Tail" ), size= 10 , replace= TRUE ) ## [1] "Tail" "Tail" "Tail" "Head" "Tail" "Head" "Head" "Tail" "Tail" "Tail" # will do the same thing as: sample ( c ( "Head" , "Tail" ), size= 10 , prob= c ( 0.5 , 0.5 ), replace= TRUE ) ## [1] "Tail" "Tail" "Head" "Head" "Head" "Head" "Tail" "Tail" "Tail" "Tail" # Even though the exact counts of "Head" and "Tail" differ each time you # run this code, if you simulate enough coin flips (by increasing # the value of 'size', you'll get approximately the same proportion # of "Head" and "Tail" outcomes) # To modify the code to make Tails much more likely than Heads, # we could change the probs: sample ( c ( "Head" , "Tail" ), size= 10 , prob= c ( 0.1 , 0.9 ), replace= TRUE ) ## [1] "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail" (c) Use geom_bar() to visualize the number of Twitter users versus non-Twitter users from your simulated sample with a bar plot. How does this simulated proportion compare to the general population rate of 23% and to the 103 of 400 sampled DoSS students? # Code your answer here REPLACE THIS TEXT WITH YOUR ANSWER Hints # You can make a vector a column of a `tibble` like this tibble ( flips = c ( "Head" , "Tail" , "Tail" )) ## # A tibble: 3 × 1 ## flips ## <chr> ## 1 Head ## 2 Tail ## 3 Tail

(d) How is the geom_bar() function different than the geom_col() function below? REPLACE THIS TEXT WITH YOUR ANSWER ggplot ( data= NULL , aes ( x= c ( "Twitterer" , "Non Twitterer" ), y= c ( 103 , 400-103 ))) + geom_col () + lims ( y= c ( 0 , 400 )) + labs ( title= "Sample of (n=400) DoSS\nDo you Tweet?" , y= "count" , x= "" ) (e) Simulate the sampling distribution of the test statistic under the assumption that the prevalence of Twitter usage among DoSS students matches that of the general population. Set the seed to the last 2 digits of your student number, use a simulation of size 1000, make a plot of the simulated sampling distribution, and describe the distribution in a few sentences. • If you don’t set set.seed() the simulation will be different each time its run. • Your knit won’t be reproducible and won’t align with your inte rpretations and conclusions. # Clearly label your figure with `labs(x="A primary title\n and a second line")` # Code your answer here (e) What is the definition of a p-value? REPLACE THIS TEXT WITH YOUR ANSWER

Your preview ends here