DA 6 - Hypothesis Testing

pdf

School

Oregon State University, Corvallis *

*We aren’t endorsed by this school

Course

314

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

9

Uploaded by ChiefStraw9411

Report
Data Analysis 6: Hypothesis Testing and Comparing Multiple Means In this data analysis, you will assess the validity of claims using real data and the hypothesis testing procedures we've discussed in class. I f you haven’t already done so, work through the tutorial provided on the Data Analysis 6 Canvas page. Once you’ve worked through the tutorial, write up your responses to the questions listed throughout the tutorial. The same questions are included below to help you format your submissions. Submit a PDF copy of your responses to Gradescope by the deadline stated on Canvas. Part 1: Caffeine Question 1 (1 point): What is the parameter of interest in this scenario? Provide context. Women who are 18 years of age or older and caffeine intake above 200 mg are the parameters. Question 2 (2 points): State the null and alternative hypothesis to answer the question of interest. Clearly define any notation you use. The null hypothesis states that 200 mg of caffeine is consumed on average. An alternative theory is that people consume more than 200 milligrams of caffeine on average each day. Question 3 (1 point): Make either a histogram or boxplot to visualize the variable consumption. Refer back to Data Analysis #1 for guidance on how to use the ggplot() functions to create a histogram or boxplot.
A. (0.5 points) Include your histogram or boxplot in your document. Be sure to include clear axes labels and a title. B. (0.5 points) Based on your visualization, is there visual evidence the average daily caffeine consumption for adult women is greater than 200 mg? Enough visual evidence indicates that the average daily intake of caffeine by adult women is higher than. Despite the appearance that there is more to the left. The total would be higher if you added each lady individually to the right of 200, which indicates that they eat more. Question 4 (1 point): Check the conditions required to perform the appropriate hypothesis test for this scenario. State the conditions and whether or not they are met. The sample size is 47, which is larger than the required minimum of 30. If it was less than, our best recommendation would be to examine the above question's histogram and decide based on the distribution. Question 5 (2 points): Use the sample data to calculate the appropriate test statistic. From the sampled data, you'll need the sample mean and sample standard deviation. You can use the mean() and sd() functions in R to find these values. A. (1 point) Calculate the test statistic. Report the value of the test statistic and show your work (i.e. demonstrate how you arrived at that value).
Test Statistic = 1.034083 B. (1 point) Determine the null distribution of the test statistic. State the name of the distribution and include any parameter values needed to define the distribution. The probability distribution (pnorm) of the test statistic in the case that the null hypothesis is correct is known as the null distribution. We need the test statistic which we obtained from the previous question using the parameters given in the code and the lower tail which is the null hypothesis to be true to define the distribution. Null Distribution = 0.8494514 Question 6 (1 point): Calculate and report the p-value. Include any code used to do this calculation. P-Value = 0.1532516 Question 7 (1 point): Use the t.test() function to verify the hypothesis test calculations you performed in questions 5 and 6. Set the significance level to α=0.0 5. Include the t.test() output in your assignment here. T-Test = 1.0341 Question 8 (2 points): From the R output, write a four-part conclusion describing the results. Use a significance level of α=0.0 5. Provide a statement of evidence in terms of the alternative hypothesis. State whether (or not) to reject the null. Give an interpretation of the point and interval estimate. Be sure to include the context of the problem in your conclusion.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Since the study's mean of 228.5745 mg is higher than (>) the 200 mg predicted by the null hypothesis, we will accept the alternative hypothesis. Considering that the p-value (0.1532516) is higher than (>) α, which is 0.05, we are unable to reject the null hypothesis. We found that the mean, or 229.5754 mg, is the point estimate for the parameter of interest. The parameter has a 95% confidence interval of (1- α[1 -0.05]) between 173.521 and 285.630. Part 2: Combined CO 2 Emissions Question 9 (0.5 points): Since we are treating this entire data set as the population, we can calculate the population mean, μ. Using the mean() function and the vector data we're interested in, population$CombCO2, calculate population mean and store it in mu_co2. It's important that you store the mean value so that we can reference it later on in this exercise. Report the calculated population mean. Sample Mean = 402.2222 Question 10 (1 point): Before performing the hypothesis test, can we anticipate the outcome? Will we most likely fail to reject or reject the null? Why? Based on the information we currently know, we can predict the result, which in this instance would prevent us from rejecting based on the one sample we took from the 45. Question 11 (3 points): Using the information from your sample, (the sample mean, 𝑥 , the sample standard deviation, s, and the sample size, n), perform a t-test for the mean.
A. (1 point) Calculate the t test statistic. Since we're assessing the performance of a t-test, you should use your sample standard deviation , s, in the calculation of the test statistic, despite the fact that we have access to the population standard deviation. B. (1 point) Determine the p-value. C. (1 point) Using a significance level of α=0.05, does your p -value lead you to reject or fail to reject the null hypothesis? Does this conclusion align with or contradict what you expected to happen in question 10? We can predict the outcome based on the data we presently have, which in this case would stop us from rejecting based on the one sample we chose from the 45. Question 12 (2.5 points): Construct a histogram of the 10,000 sample means stored in the vector sample_means45$mean. Refer back to Data Analysis #1 for guidance on how to use the ggplot() functions to create a histogram. A. (0.5 points) Include your histogram in your document. Be sure to include clear axes labels and a title.
B. (1 point) Describe the distribution. Include a description of the shape, center, and spread. The distribution is centered, symmetric and has a pretty tight spread. C. (1 point) According to the Central Limit Theorem (CLT), what is the distribution of the sample means? The distribution according to the CLT is a normal distribution. Question 13 (2.5 points): Construct a histogram of the 10,000 t-test statistics stored in the vector sample_means45$t. A. (0.5 points) Include your histogram in your document. Be sure to include clear axes labels and a title. B. (1 point) Describe the distribution. Include a description of the shape, center, and spread. The distribution seems to be centered around 0, slightly left skewed, and has a pretty tight speed. It is also bell shaped. C. (1 point) In our conversations around estimating and testing the population mean when the sample standard deviation is used in the standard calculation, we discussed the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
theoretical distribution of 𝑡 = 𝑥 −μ 0 𝑠/√𝑛 . What is the name of this distribution? What parameter value(s) do we expect it have? The distribution is called a t distribution with the degrees of freedom (44) as the parameter value df=n-1 which in this case n=45. Question 14 (3 points): Consider the distribution of p-values displayed in the histogram you just created. A. (1 point) Describe the distribution. Include a description of the shape, center, and spread. There appears to be a uniform distribution in the p-value histogram. The p-values that are equal to or less than the significance value of 0.05 are shown by the orange bin. P-values that are higher than the significance value are shown by the gray bins. B. (1 point) Remember in this atypical situation, we know that the null hypothesis is true; therefore, we expect that just by chance, we will falsely reject the null hypothesis α×100% of the time. Use the code mean(sample_means45$p_val <= 0.05) * 100 to calculate the percentage of hypothesis tests that rejected the null hypothesis even though the null hypothesis is true. Does the percentage from your simulation align with the percentage we expected?
With a significance level of 0.05 and a confidence level of 95%, the percentage of 5.38 from my simulation matches the predicted percentage of 5%, indicating that the null hypothesis, which would have incorrectly rejected 5% (or about 5%), would not hold true. C. (1 point) The tests that produce a p-value less than the significance level will lead us to falsely reject the null hypothesis. This incorrect conclusion represents one type of error that can occur when performing a hypothesis test. What type of error is this? This is a type 2 error which is when we accept a false null hypothesis. Question 15 (2 points): Write the null and alternative hypotheses for evaluating whether the average number of hours worked varies across the five groups. The null hypothesis states that people with varying levels of educational attainment do not significantly differ in the average number of hours worked each week. The other possibility is that there is a notable distinction. Mu1 = mu2 = mu3 = mu4 = mu5 H0: at least one mu differs from the others Question 16 (1.5 points): Using the information provided, assess whether the following conditions necessary to accurately perform an ANOVA F test are met: A. Are the observations in the study independent? The observations are independent because individuals from one group are independent from another group. B. Are the sample sizes sufficiently large? (Hint: the n row of the table above provides the sample sizes of each group.) All of the sample sizes are above 30 so they are sufficiently large. C. Is the variation in the groups about equal from one group to the next? (Hint: use the spread of the boxplots and standard deviation values from the table to assess this condition.) All of the standard deviations are very close to each other so the variance is about equal.
Question 17 (1 point): To assess whether there is a significant difference in the average number of hours worked between one or more of the groups, we need to determine the mean squares between groups (MSG) and the mean squares within groups (MSE). Each of these values has an associated degrees of freedom. A. Determine the degrees of freedom associated with the MSG. (k-1)=4 B. Determine the degrees of freedom associated with the MSE. n-k=(121 + 546 + 97 + 253 + 155 5) =1167 Question 18 (1 point): An ANOVA was performed in R. The estimate for the mean squares between groups is MSG = 501.54 and the resulting F statistic is equal to 2.189. Determine the average variation within each group. That is, calculate the MSE. MSG/MSE=2.189 2.189*MSE=MSG MSE = 501.54/2.189 = 229.12 Question 19 (1 point): Using the F statistic from question 18 and the two values for the degrees of freedom in question 17, calculate the p-value for this test. P-Value = 0.06819242 Question 20 (2 points): Using the p-value calculated in question 19, write a conclusion for this ANOVA F test using a significance level of α=0.05. (Hint: your conclusion should include a statement of evidence in favor of the alternative and a statement as to whether the null hypothesis is rejected or not.) We are unable to reject the null hypothesis as 0.6819242 > 0.05. While we don't have enough data to completely reject our null hypothesis, this does not imply that it is correct either. There is a little amount of data to imply that there is at least one genuine mean of educational attainment in the group that differs in terms of weekly hours worked. Gradescope Page Matching (2 points) When you upload your PDF file to Gradescope, you will need to match each question on this assignment to the correct pages. Video instructions for doing this are available in the Start Here module on Canvas on the page “Submitting Assignments in Gradescope”. Failure to follow these instructions will result in a 2-point deduction on your assignment grade. Match this page to outline item “Gradescope Page Matching”.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help