F23 Data Analysis 5

docx

School

Oregon State University, Corvallis *

*We aren’t endorsed by this school

Course

314

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by lucekimb

DA 5 © Kelsi Espinoza Data Analysis 5 Q1. (8 points) The virus COVID-19 is currently a pressing local and global concern, and with the vaccines becoming more widely available (and more expensive). Working at a university that requires proof of COVID-19 vaccination to enroll, an instructor wonders what proportion of OSU students are vaccinated for COVID-19. To examine this, the researcher takes a random sample of 200 OSU students (Spring 2022), and discovers that 179 of those individuals are vaccinated for COVID-19. a. (1 point) What proportion of the OSU students sampled are vaccinated for COVID-19? Sample Proportion( p ^) = 179 200 = 0.895 or 89.5%. b. (2 points) Check the sample size conditions for using z-procedures to construct a confidence interval for p . Explain whether these size conditions are met. The sample size conditions for using z-procedures are met because np =179 and n (1− p )=21, both of which are greater than 10 . c. (2 points) Estimate the proportion of all OSU students have gotten the COVID-19 vaccine. Use a confidence level of 95%. (Calculate the 95% confidence interval) Show setup/work and give answer in interval notation, rounded to four places past the decimal. confidence interval = p̂ ± Z * √[(p̂(1 - p̂) / n)] p ^ is the sample proportion, z ∗ is the z-score that corresponds to the desired confidence level (for 95% confidence, z ∗ is approximately 1.96), n is the sample size. The 95% confidence interval is approximately (0.8525, 0.9375) d. (3 points) Interpret your point/interval estimate for p . That is, provide the two-part conclusion for your confidence interval, including context. Based on the sample, we can say with 95% confidence that the true proportion of all OSU students who are vaccinated for COVID-19 lies between 85.25% and 93.75%. This

DA 5 © Kelsi Espinoza means that if we were to take many samples of 200 OSU students, about 95% of those samples would give us a sample proportion that falls within this interval. Q2. (5 points) In December 2022, Benton County (the county that OSU is in) has reported that roughly 0.801, or 80.1%, of Benton County residents are vaccinated for COVID-19. The instructor wonders if the proportion of OSU students have gotten the COVID-19 vaccine is different than 0.801. To examine this, the researcher takes a random sample of 200 OSU students (Spring 2022), and discovers that 179 of those individuals are vaccinated for COVID-19. Does the proportion of all OSU students that are vaccinated for COVID-19 different than 0.801 (the proportion for Benton County)? a. (2 points) Check the sample size conditions for using z-test for p . Explain whether these size conditions are met. The sample size conditions for using a z-test for p are the same as for the confidence interval, and as stated above, these conditions are met. b. (3 points) Based on your confidence interval in Question 1 part (c), is it reasonable to assume that the actual proportion of OSU students have gotten the COVID-19 vaccine could be equal to 0.801? That is, suppose you wanted to test the below hypotheses at a significance level of 0.05: H 0 : p = 0.801 H A : p≠ 0.801 Do not perform the test . Instead, use the information from your confidence interval from Q1(c): will you reject or fail to reject the null hypothesis at a significance level of 0.05? Explain. Based on the confidence interval from Q1 part (c), which does not include the Benton County vaccination rate of 0.801, I think it is reasonable to assume that the actual proportion of OSU students who have gotten the COVID-19 vaccine could be different than 0.801. If we were to test the null hypothesis H 0: p =0.801 against the alternative hypothesis HA : p =  0.801 at a significance level of 0.05, we would reject the null hypothesis because the confidence interval does not contain 0.801. This shows that the proportion of vaccinated OSU students is likely different from that of the general Benton County population.

DA 5 © Kelsi Espinoza Question 3. (22 points) Based here in Corvallis, 2 Towns Ciderhouse, is a brewery that produces a number of alcoholic apple-based ciders. The cider2towns.csv dataset is a random sample of 66 ciders offered by 2 Towns (both currently and formerly from https://2townsciderhouse.com/ ). The variable abv represents the percent of a lcohol b y v olume (%) for each craft beverage. Before 2018, Oregon law defined such beverages with more than 7% abv to be legally considered a “wine”, rather than a cider (ORS 471.023 and ORS 471.015). This abv cut-off was increased in January 2018. Researchers are interested in whether the average volume of all of 2 Town’s ciders is lower than 7% abv, the former cut-off for calling cider a “wine”? Does the sample of ciders provide evidence that the average alcohol by volume of all 2 Towns’ ciders is lower than 7% abv (which is the former threshold for legally calling a cider a “wine”)? Use this dataset and the R script DA5_t_procedures.R to complete the following: a) (2 points) What is the parameter of interest in this scenario? Provide the symbol notation and describe the parameter in context. The parameter of interest is the population mean alcohol by volume (abv) for all ciders produced by 2 Towns Ciderhouse. Symbolically, this is represented as μ , the true mean abv. b) (2 points) State the null and alternative hypotheses to answer the question of interest.  Null hypothesis (H0): μ=7% — The average abv of all ciders is 7%.  Alternative hypothesis (HA): μ<7% — The average abv of all ciders is lower than 7%. c) (2 points) Make a histogram or boxplot to visualize the variable abv . Is there visual evidence that the average alcohol by volume is lower than 7% abv? In this histogram, we see that the majority of the bars, which represent the frequency of samples, are located to the left of the 7% ABV mark. This suggests that more samples have an ABV less than 7% than those with an ABV of 7% or higher.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

DA 5 © Kelsi Espinoza d) (2 points) Calculate the sample mean and standard deviation using R. State/label the values, including any units. 𝑥 = 6. 765% 𝑠 = 1. 650726% e) (2 points) Check the conditions for inference. State each condition as well as whether each condition is met. f) (2 points) Calculate the test statistic by hand. Show setup/work. g) (2 points) State the p-value. Is it one-sided or two-sided? 0.15 < p-value < 0.20 The p-value will be determined using the R function pt() for the t-distribution. Since we're interested in whether the mean is lower, this will be a one-sided test. h) (2 points) Calculate the 95% Confidence Interval by hand. Show setup/work. i) (1 point) Use the t.test() command in R to verify the results of the t-test. How do your answers compare? P-value is 0.1867, which is what I was, and it says inf, 7.204 for confidence interval, so they’re pretty similar. j) (5 points) From the R output, write a four-part conclusion describing the results. Use α = 0 . 05 . The four-part conclusion should include (combine templates from announcement):  State whether (or not) to reject the null hypothesis, p-value, and significance level.

DA 5 © Kelsi Espinoza  Provide the strength of evidence in terms of the alternative hypothesis, in context.  Give the interval estimate and point estimate in context.  CONTEXT!  Include any other information you might feel to be relevant.  Based on the p-value of approximately 0.57, which is greater than the significance level α =0.05, we do not reject the null hypothesis. There is not enough evidence to conclude that the average abv of all 2 Towns’ ciders is less than 7%.  The strength of the evidence is weak since the p-value is much larger than our significance level, indicating that the sample does not provide strong evidence against the null hypothesis.  The interval estimate, our 95% confidence interval, suggests that the true mean abv of all ciders is likely between 6.62% and 7.45%, which includes the 7% threshold. The point estimate, our sample mean, is approximately 7.03%.  In context, this analysis suggests that while the sample mean abv is close to the old legal threshold for defining a wine versus a cider, we do not have sufficient evidence to say definitively that the company's ciders, on average, fall below that threshold.

F23 Data Analysis 5

Related Documents