Homework 5

pdf

School

University of Washington *

*We aren’t endorsed by this school

Course

EDDD 8

Subject

Statistics

Date

May 31, 2024

Type

pdf

Pages

13

Uploaded by JusticeFlower13326

Report
11/10/22, 1 : 11 PM Homework 5 Page 1 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Homework 5 Tina Song 2022-11-10 Problem 1 Part 1a) ## [1] 0.2659855 About 26.6% of students have midterm scores less than or equal to 70. Part 1b) ## [1] 0.03039636 About 3.04% of students have midterm scores greater than 90. Part 1c) ## [1] 0.2597759 About 26% of students have midterm scores between 80 and 95. Code Hide library(tidyverse) library(openintro) library(infer) Hide pnorm(70, 75, 8) Hide 1 - pnorm(90, 75, 8) Hide pnorm (95, 75, 8) - pnorm(80, 75, 8)
11/10/22, 1 : 11 PM Homework 5 Page 2 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Part 1d) P(X x) = 0.99 ## [1] 93.61078 99% of students have midterm scores below about 94 points. Part 1e) P(X > x)=0.80 ## [1] 68.26703 80% of students have midterm scores above about 68 points. Part 1f) We do not know if the population (the class) is normally distributed. Since n 30, we can assume that x-bar ~ approx Normal( , / sqrt(n)) x- bar ~ approx Normal(75, 8 / sqrt(30)) Part 1g) ## [1] 0.08545176 If the mean of the population is 75, the probability that the mean from a sample size of 30 exceeds 77 is about 8.55%. Problem 2 Hide qnorm(0.99, 75, 8, lower.tail = TRUE) Hide qnorm(0.80, 75, 8, lower.tail = FALSE) Hide 1 - pnorm (77, 75, 8/sqrt(30))
11/10/22, 1 : 11 PM Homework 5 Page 3 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Problem 1 Problem 2 Problem 4 Problem 5 Problem 3 one 30 lb. bag: Xi ~ N( = 30.7, = 1.12) one 10 lb. bag: Yi ~ N( = 10.5, = 0.15) three 10 lb. bag (independent RVs): mean = 10.5 + 10.5 + 10.5 = 31.5 SD^2 = (0.15)^2 + (0.15)^2 + (0.15)^2 = 0.0675 W = Yi + Yi + Yi W ~ N(31.5, sqrt(0.0675)) Di ! erence: K = W - Xi mean = 31.5 - 30.7 = 0.8 SD: sqrt(0.0675 + (1.12)^2) K ~ N(0.8, sqrt(0.0675 + (1.12)^2) P(W - Xi) > 0 P(K) > 0 ## [1] 0.7567261 The probability is 75.67%. Problem 3 Part 3a) X ~ Uniform(10,22) a = 10, b = 22 height: 1 / (b-a) Part 3b) E(X) = (b+a) / 2 Part 3c) P(12 < X < 19) = P(12 X 19) ## [1] 0.5833333 Part 3d) P(X = x) = 0 P(X = 15) = 0 Problem 4 Hide 1 - pnorm(0, 0.8, sqrt(0.0675 + (1.12)^2)) 1/(22 - 10) = 0.0833 (22 + 10)/2 = 16 (19 - 12)/(22 - 10) = 0.5833 Hide punif(19, 10 ,22) - punif(12, 10, 22)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/10/22, 1 : 11 PM Homework 5 Page 4 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Read in Data ## Rows: 12 Columns: 2 ## ── Column specification ────────────────────────────── ────────────────────────── ## Delimiter: "," ## chr (1): Pie ## dbl (1): ZOD ## ## Use `spec()` to retrieve the full column specificati on for this data. ## Specify the column types or set `show_col_types = FA LSE` to quiet this message. Part 4a) Hide ZOD <- read_csv("ZODTwoGroups.csv") Hide ZOD$Pie <- factor(ZOD$Pie) Hide p <- ggplot(ZOD, aes(x=Pie, y=ZOD)) + geom_boxplot() + xlab("Pie") + ylab("ZOD") p
11/10/22, 1 : 11 PM Homework 5 Page 5 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Cherry pie’s ZOD data values have more variability. There appears to be a di ! erence between the ZODs for the two groups. Part 4b) ## `summarise()` has grouped output by 'replicate'. You c an override using the ## `.groups` argument. Hide set.seed(15) PermsOut <- ZOD %>% rep_sample_n(size = nrow(ZOD), reps = 1000, replace = FALSE) %>% mutate(ZOD_perm = sample(ZOD)) %>% group_by(replicate, Pie) %>% summarize(mean_ZOD_perm = mean(ZOD_perm), mean_ZOD = m ean(ZOD)) %>% summarize(diff_mean = diff(mean_ZOD_perm), diff_orig = diff(mean_ZOD)) Hide PermsOut
11/10/22, 1 : 11 PM Homework 5 Page 6 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## # A tibble: 1,000 × 3 ## replicate diff_mean diff_orig ## <int> <dbl> <dbl> ## 1 1 -1 3.67 ## 2 2 1.83 3.67 ## 3 3 0.167 3.67 ## 4 4 2.33 3.67 ## 5 5 0 3.67 ## 6 6 0 3.67 ## 7 7 1.33 3.67 ## 8 8 1.67 3.67 ## 9 9 0.667 3.67 ## 10 10 0.5 3.67 ## # … with 990 more rows The observed sample di ! erence is 3.67 in means for the sample data. Part 4c) Di ! erence between two means, independent samples: x-bar1 - x-bar2 (Statistic) x-bar1: mean ZOD for cherry pie = 4.8333 x-bar2: mean ZOD for apple pie = 1.1667 We observed x-bar1 - x-bar2 = 3.6667 Part 4d) Hide origdiff <- PermsOut$diff_orig[1] p1 <- ggplot(data = PermsOut, aes(x = diff_mean)) + geom_histogram(bins = 13) + xlab("Cherry - Apple") + geom_vline(xintercept = origdiff, col="Red") yheight <- max(table(PermsOut$diff_mean)) p1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/10/22, 1 : 11 PM Homework 5 Page 7 of 13 file:///Users/tinasong/Downloads/Homework5Template.html The distribution is unimodal and approximately symmetric. The red line on the graph (tail) tells us that the observed sample di ! erence is rare. Part 4e) ## # A tibble: 1 × 2 ## count proportion ## <int> <dbl> ## 1 3 0.003 About 0.3% of the null statistics are more extreme than the di ! erence which was observed from the original sample. Part 4f) Since the p-value is small, (p 0.05, 0.1, 0.01), we declare evidence in favor of the alternative hypothesis. The mean ZOD for cherry pie is greater than the mean ZOD for apple pie. (x-bar1 > x-bar2) Problem 5 Hide PermsOut %>% summarize(count = sum(diff_orig <= diff_mean), proportion = mean(diff_orig <= diff_mean)) Hide
11/10/22, 1 : 11 PM Homework 5 Page 8 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## Rows: 93 Columns: 2 ## ── Column specification ────────────────────────────── ────────────────────────── ## Delimiter: "," ## chr (1): Diet ## dbl (1): WtLossKG ## ## Use `spec()` to retrieve the full column specificati on for this data. ## Specify the column types or set `show_col_types = FA LSE` to quiet this message. Part 5a) WL <- read_csv("PopularDietsCombined.csv") Hide WL$Diet <- factor(WL$Diet) Hide p <- ggplot(WL, aes(x=Diet, y=WtLossKG)) + geom_boxplot() + xlab("Diet") + ylab("WTLossKG") p
11/10/22, 1 : 11 PM Homework 5 Page 9 of 13 file:///Users/tinasong/Downloads/Homework5Template.html For Ornish and Zone, the boxplots show apparent outliers. Ornish’s boxplot has the lowest variability. Part 5b) ## # A tibble: 1 × 1 ## mean ## <dbl> ## 1 4.95 The point estimate x-bar for mean weight loss across all diet is 4.945 kilograms. Part 5c) Hide (sum.WtLossKG <- WL %>% summarize(mean = mean(WtLossKG, n a.rm=TRUE))) Hide
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/10/22, 1 : 11 PM Homework 5 Page 10 of 13 file:///Users/tinasong/Downloads/Homework5Template.html The distribution is unimodal and approximately symmetric. Part 5d) # code for bootstrap samples given; need to add code for histogram set.seed(15) # 1000 bootstrap samples so we can display the distributi on BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") B1 <- ggplot(data = BootSamp1000, aes(x = stat)) + geom_histogram(bins = 10, colour = 1, fill = "white") + xlab("xbar") B1 + theme(axis.text.y=element_blank(),axis.ticks.y=eleme nt_blank(),axis.title.y=element_blank(),text=elem ent_text(size=15), axis.text.x = element_text(siz e=15)) Hide set.seed(15) (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95))
11/10/22, 1 : 11 PM Homework 5 Page 11 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.53 6.36 About 95% of our bootstrapped x-bar’s fell between 3.53 and 6.36. We are 95% con " dent that the (true) mean weight loss across all diets is between 3.53 and 6.36. Part 5e) ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.78 6.16 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.11 6.89 90%: separate the middle 90% of the bootstrap distribution from the tails, we are less con " dent (narrower con " dence interval) 99%: separate the middle 99% of the bootstrap distribution from the tails, we are more con " dent (wider con " dence interval) Hide # We rerun the complete code since we are using built-in # code to get the desired confidence levels # We want to you repeat set.seed for both sets of samples set.seed(15) (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.90)) Hide set.seed(15) (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.99))
11/10/22, 1 : 11 PM Homework 5 Page 12 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Part 5f) ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.58 6.41 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.52 6.29 If we draw two new sets of 1000 bootstrap samples, the distribution of 1000 x-bar’s would produce a slightly di ! erent con " dence interval. Part 5g) Hide # no set.seed in this part (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)) Hide (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)) Hide # code provided for n=500; add code for n=100 and n=10 # 500 bootstrap samples set.seed(15) WL %>% specify(response = WtLossKG) %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/10/22, 1 : 11 PM Homework 5 Page 13 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.43 6.35 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.40 6.34 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 4.24 6.19 The distribution of 500 and 100 x-bar’s produce slightly di ! erent con " dence intervals. The distribution of 10 x-bar’s produces a much narrower con " dence interval. Hide set.seed(15) WL %>% specify(response = WtLossKG) %>% generate(reps = 100, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95) Hide set.seed(15) WL %>% specify(response = WtLossKG) %>% generate(reps = 10, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)