Homework 5

pdf

School

University of Washington *

*We aren’t endorsed by this school

Course

EDDD 8

Subject

Statistics

Date

May 31, 2024

Type

pdf

Pages

Uploaded by JusticeFlower13326

11/10/22, 1 : 11 PM Homework 5 Page 1 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Homework 5 Tina Song 2022-11-10 Problem 1 Part 1a) ## [1] 0.2659855 About 26.6% of students have midterm scores less than or equal to 70. Part 1b) ## [1] 0.03039636 About 3.04% of students have midterm scores greater than 90. Part 1c) ## [1] 0.2597759 About 26% of students have midterm scores between 80 and 95. Code Hide library(tidyverse) library(openintro) library(infer) Hide pnorm(70, 75, 8) Hide 1 - pnorm(90, 75, 8) Hide pnorm (95, 75, 8) - pnorm(80, 75, 8)

11/10/22, 1 : 11 PM Homework 5 Page 2 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Part 1d) P(X ≤ x) = 0.99 ## [1] 93.61078 99% of students have midterm scores below about 94 points. Part 1e) P(X > x)=0.80 ## [1] 68.26703 80% of students have midterm scores above about 68 points. Part 1f) We do not know if the population (the class) is normally distributed. Since n ≥ 30, we can assume that x-bar ~ approx Normal( , / sqrt(n)) x- bar ~ approx Normal(75, 8 / sqrt(30)) Part 1g) ## [1] 0.08545176 If the mean of the population is 75, the probability that the mean from a sample size of 30 exceeds 77 is about 8.55%. Problem 2 Hide qnorm(0.99, 75, 8, lower.tail = TRUE) Hide qnorm(0.80, 75, 8, lower.tail = FALSE) Hide 1 - pnorm (77, 75, 8/sqrt(30))

11/10/22, 1 : 11 PM Homework 5 Page 3 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Problem 1 Problem 2 Problem 4 Problem 5 Problem 3 one 30 lb. bag: Xi ~ N( = 30.7, = 1.12) one 10 lb. bag: Yi ~ N( = 10.5, = 0.15) three 10 lb. bag (independent RVs): mean = 10.5 + 10.5 + 10.5 = 31.5 SD^2 = (0.15)^2 + (0.15)^2 + (0.15)^2 = 0.0675 W = Yi + Yi + Yi W ~ N(31.5, sqrt(0.0675)) Di ! erence: K = W - Xi mean = 31.5 - 30.7 = 0.8 SD: sqrt(0.0675 + (1.12)^2) K ~ N(0.8, sqrt(0.0675 + (1.12)^2) P(W - Xi) > 0 P(K) > 0 ## [1] 0.7567261 The probability is 75.67%. Problem 3 Part 3a) X ~ Uniform(10,22) a = 10, b = 22 height: 1 / (b-a) Part 3b) E(X) = (b+a) / 2 Part 3c) P(12 < X < 19) = P(12 ≤ X ≤ 19) ## [1] 0.5833333 Part 3d) P(X = x) = 0 P(X = 15) = 0 Problem 4 Hide 1 - pnorm(0, 0.8, sqrt(0.0675 + (1.12)^2)) 1/(22 - 10) = 0.0833 (22 + 10)/2 = 16 (19 - 12)/(22 - 10) = 0.5833 Hide punif(19, 10 ,22) - punif(12, 10, 22)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

11/10/22, 1 : 11 PM Homework 5 Page 4 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Read in Data ## Rows: 12 Columns: 2 ## ── Column specification ────────────────────────────── ────────────────────────── ## Delimiter: "," ## chr (1): Pie ## dbl (1): ZOD ## ## ℹ Use `spec()` to retrieve the full column specificati on for this data. ## ℹ Specify the column types or set `show_col_types = FA LSE` to quiet this message. Part 4a) Hide ZOD <- read_csv("ZODTwoGroups.csv") Hide ZOD$Pie <- factor(ZOD$Pie) Hide p <- ggplot(ZOD, aes(x=Pie, y=ZOD)) + geom_boxplot() + xlab("Pie") + ylab("ZOD") p

11/10/22, 1 : 11 PM Homework 5 Page 5 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Cherry pie’s ZOD data values have more variability. There appears to be a di ! erence between the ZODs for the two groups. Part 4b) ## `summarise()` has grouped output by 'replicate'. You c an override using the ## `.groups` argument. Hide set.seed(15) PermsOut <- ZOD %>% rep_sample_n(size = nrow(ZOD), reps = 1000, replace = FALSE) %>% mutate(ZOD_perm = sample(ZOD)) %>% group_by(replicate, Pie) %>% summarize(mean_ZOD_perm = mean(ZOD_perm), mean_ZOD = m ean(ZOD)) %>% summarize(diff_mean = diff(mean_ZOD_perm), diff_orig = diff(mean_ZOD)) Hide PermsOut

11/10/22, 1 : 11 PM Homework 5 Page 6 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## # A tibble: 1,000 × 3 ## replicate diff_mean diff_orig ## <int> <dbl> <dbl> ## 1 1 -1 3.67 ## 2 2 1.83 3.67 ## 3 3 0.167 3.67 ## 4 4 2.33 3.67 ## 5 5 0 3.67 ## 6 6 0 3.67 ## 7 7 1.33 3.67 ## 8 8 1.67 3.67 ## 9 9 0.667 3.67 ## 10 10 0.5 3.67 ## # … with 990 more rows The observed sample di ! erence is 3.67 in means for the sample data. Part 4c) Di ! erence between two means, independent samples: x-bar1 - x-bar2 (Statistic) x-bar1: mean ZOD for cherry pie = 4.8333 x-bar2: mean ZOD for apple pie = 1.1667 We observed x-bar1 - x-bar2 = 3.6667 Part 4d) Hide origdiff <- PermsOut$diff_orig[1] p1 <- ggplot(data = PermsOut, aes(x = diff_mean)) + geom_histogram(bins = 13) + xlab("Cherry - Apple") + geom_vline(xintercept = origdiff, col="Red") yheight <- max(table(PermsOut$diff_mean)) p1

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

11/10/22, 1 : 11 PM Homework 5 Page 7 of 13 file:///Users/tinasong/Downloads/Homework5Template.html The distribution is unimodal and approximately symmetric. The red line on the graph (tail) tells us that the observed sample di ! erence is rare. Part 4e) ## # A tibble: 1 × 2 ## count proportion ## <int> <dbl> ## 1 3 0.003 About 0.3% of the null statistics are more extreme than the di ! erence which was observed from the original sample. Part 4f) Since the p-value is small, (p ≤ 0.05, 0.1, 0.01), we declare evidence in favor of the alternative hypothesis. The mean ZOD for cherry pie is greater than the mean ZOD for apple pie. (x-bar1 > x-bar2) Problem 5 Hide PermsOut %>% summarize(count = sum(diff_orig <= diff_mean), proportion = mean(diff_orig <= diff_mean)) Hide

11/10/22, 1 : 11 PM Homework 5 Page 8 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## Rows: 93 Columns: 2 ## ── Column specification ────────────────────────────── ────────────────────────── ## Delimiter: "," ## chr (1): Diet ## dbl (1): WtLossKG ## ## ℹ Use `spec()` to retrieve the full column specificati on for this data. ## ℹ Specify the column types or set `show_col_types = FA LSE` to quiet this message. Part 5a) WL <- read_csv("PopularDietsCombined.csv") Hide WL$Diet <- factor(WL$Diet) Hide p <- ggplot(WL, aes(x=Diet, y=WtLossKG)) + geom_boxplot() + xlab("Diet") + ylab("WTLossKG") p

11/10/22, 1 : 11 PM Homework 5 Page 9 of 13 file:///Users/tinasong/Downloads/Homework5Template.html For Ornish and Zone, the boxplots show apparent outliers. Ornish’s boxplot has the lowest variability. Part 5b) ## # A tibble: 1 × 1 ## mean ## <dbl> ## 1 4.95 The point estimate x-bar for mean weight loss across all diet is 4.945 kilograms. Part 5c) Hide (sum.WtLossKG <- WL %>% summarize(mean = mean(WtLossKG, n a.rm=TRUE))) Hide

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

11/10/22, 1 : 11 PM Homework 5 Page 10 of 13 file:///Users/tinasong/Downloads/Homework5Template.html The distribution is unimodal and approximately symmetric. Part 5d) # code for bootstrap samples given; need to add code for histogram set.seed(15) # 1000 bootstrap samples so we can display the distributi on BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") B1 <- ggplot(data = BootSamp1000, aes(x = stat)) + geom_histogram(bins = 10, colour = 1, fill = "white") + xlab("xbar") B1 + theme(axis.text.y=element_blank(),axis.ticks.y=eleme nt_blank(),axis.title.y=element_blank(),text=elem ent_text(size=15), axis.text.x = element_text(siz e=15)) Hide set.seed(15) (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95))

11/10/22, 1 : 11 PM Homework 5 Page 11 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.53 6.36 About 95% of our bootstrapped x-bar’s fell between 3.53 and 6.36. We are 95% con " dent that the (true) mean weight loss across all diets is between 3.53 and 6.36. Part 5e) ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.78 6.16 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.11 6.89 90%: separate the middle 90% of the bootstrap distribution from the tails, we are less con " dent (narrower con " dence interval) 99%: separate the middle 99% of the bootstrap distribution from the tails, we are more con " dent (wider con " dence interval) Hide # We rerun the complete code since we are using built-in # code to get the desired confidence levels # We want to you repeat set.seed for both sets of samples set.seed(15) (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.90)) Hide set.seed(15) (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.99))

11/10/22, 1 : 11 PM Homework 5 Page 12 of 13 file:///Users/tinasong/Downloads/Homework5Template.html Part 5f) ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.58 6.41 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.52 6.29 If we draw two new sets of 1000 bootstrap samples, the distribution of 1000 x-bar’s would produce a slightly di ! erent con " dence interval. Part 5g) Hide # no set.seed in this part (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)) Hide (BootSamp1000 <- WL %>% specify(response = WtLossKG) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)) Hide # code provided for n=500; add code for n=100 and n=10 # 500 bootstrap samples set.seed(15) WL %>% specify(response = WtLossKG) %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

11/10/22, 1 : 11 PM Homework 5 Page 13 of 13 file:///Users/tinasong/Downloads/Homework5Template.html ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.43 6.35 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 3.40 6.34 ## # A tibble: 1 × 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 4.24 6.19 The distribution of 500 and 100 x-bar’s produce slightly di ! erent con " dence intervals. The distribution of 10 x-bar’s produces a much narrower con " dence interval. … Hide set.seed(15) WL %>% specify(response = WtLossKG) %>% generate(reps = 100, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95) Hide set.seed(15) WL %>% specify(response = WtLossKG) %>% generate(reps = 10, type = "bootstrap") %>% calculate(stat = "mean") %>% get_ci(level = 0.95)

Homework 5

Related Documents