Stats 250 Practice Exam 2 W23 SOLUTIONS (1)

pdf

School

University of Michigan *

*We aren’t endorsed by this school

Course

480

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by hijim04

Question 3: Numeric Input This exam contains multiple questions whose answers are labeled as: Numeric input only . For these questions, • Express your answer in the following format: #.#### • Please do not include units (or additional text of any kind) • When appropriate, please round your final answer to 4 decimal places For example, if you had a calculation of 375/1450, your answer should be inputted as 0.2586 ( ) I confirm that I have read and understand the instructions for numeric input answers. Question 4: Better Coffee Some Stats 250 students were having a discussion about whether Starbucks or RoosRoast has better coffee. After conducting a large study, they concluded that a majority of all UM student’s prefer RoosRoast coffee over Starbucks. a. Provide the appropriate statistical notation for the parameter that would be found in the null and alternative hypotheses. The parameter of interest is p which is defined as the population proportion of all UM Students who prefer RoosRoast coffee over Starbucks. b. The resulting test statistic is 2.3 and the corresponding p-value is 0.0107. Provide an interpretation of the p-value in context. Assuming the population proportion of all UM Students who prefer RoosRoast coffee over Starbucks is equal to 0.5, the probability of obtaining a z-test statistic of 2.3 or larger is 0.0107. OR Assuming the population proportion of all UM Students who prefer RoosRoast coffee over Starbucks is equal to 0.5, and if we were to repeat the sampling process many times, we would expect to obtain a z-test statistic of 2.3 or larger in about 1.07% of the repetitions.

Question 5: Window Color Study A concern in allowing the tinting of car windows is that tasks performed through these windows often require the rapid detection of low-contrast, unilluminatated targets. If window tinting interferes with the detection of targets, then road safety may be compromised. A random sample of 130 U.S. drivers was selected to participate in a study. Participants were randomly assigned to drive a car with a predetermined degree of tinting on the windows (tint: "no window tinting" or "some window tinting"). For each participant, the inspection time (in milliseconds) was recorded. Here, inspection time is defined as the time required to perform a simple discrimination task through the car windows. The researcher would like to use these results to investigate the effect of window tinting on inspection time, on average, for all U.S. drivers. a. There were two variables in this experiment: TINT and INSPECTION TIME . For each of the two variables, provide the variable role and the variable type for the given study. TINT is the explanatory variable and it's type is categorical . INSPECTION TIME is the response variable and it's type is quantitative (continuous) . b. The researcher uses R and provides the following R output: Based on the output provided, provide an appropriate conclusion in context. Since the p-value is between 1% and 5%, we have strong evidence to suggest that the population mean inspection time for all cars with some tint is different than the population mean inspection time for all cars without any tint. c. The researcher used the study results to construct a 95% confidence interval to estimate the difference in population means inspection times for all cars with some window tint versus without any window tint. Based on the output provided, what can you say about the resulting 95% confidence interval? ____ the 95% confidence interval for estimating μ 1 -μ 2 would contain the value of 0. _ X _ the 95% confidence interval for estimating μ 1 -μ 2 would not contain the value of 0. ____ the t* multiplier used to construct the 95% confidence interval for estimating μ 1 -μ 2 is t = 2.5003. ____ the t* multiplier used to construct the 95% confidence interval for estimating μ 1 -μ 2 is t = 2.3849. Because the p-value is between 1% and 5%, we have strong evidence to support Ha, which suggest that μ 1 -μ 2 ≠0, hence the 95% confidence interval constructed using this same data, would not contain the value 0.

Question 6: Tasty Cheese Sarah, a Stats 250 student, is interested in seeing what factors make cheese taste better. After gathering data for a random sample of 30 cheeses, Sarah decides to create a linear regression model that uses lactic acid concentration (%) to predict taste score (a subjective score measured in points). While her results are statistically significant, she remembers she must check her assumptions! Sarah gets carried away and makes too many graphs. a. Identify the three graphs from these six graphs (labelled A to F) that will best help Sarah verify the linear regression assumptions for her analysis. No partial credit will be given; all three graphs have to be identified correctly in order to receive credit. [ X ] Graph A [ ] Graph B [ ] Graph C [ X ] Graph D [ X ] Graph E [ ] Graph F Select one of the three graphs you identified above and state the assumption(s) the graph helps verify. Graph A: Helps verify that the population relationship (between lactic acid consumption and taste score) is in fact linear. Graph D: Helps verify that the true errors are normally distributed. Graph E: Helps verify that the true errors have constant variance and that the population relationship (between lactic acid consumption and taste score) is in fact linear.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Problem 7: Pokémon Go players in Ann Arbor A researcher plans to survey a random sample of Pokémon Go players in Ann Arbor to estimate the population proportion who are young. a. The researcher takes a large random sample of Ann Arbor Pokémon Go players and provides a 98% confidence interval that ranges between 60% and 68%. i. Provide an interpretation of the 98% confidence interval in context. We estimate with 98% confidence that the population proportion of all Ann Arbor Pokemon Go players who are young is between 60% and 68%. ii. Which of the following intervals must be the corresponding 90% confidence interval based on the same survey results? The 90% interval will be narrower (so not option c) and still centered at 0.64 = sample proportion (so not option a) ( ) (62%, 67%) ( X) (61.17%, 66.83%) ( ) (59.81%, 68.19%)

Question 8: Name That Scenario: Family Game Night One important aspect in statistics is to understand which statistical methods or procedures are appropriate to use to address the research problem or question of interest. A family - Rick, Gail, and daughters Emily, Sarah, and Mary Anne - has played board games regularly for multiple years. Because they are good statisticians, they record many variables about their games for future analysis. Assume that we have an appropriate random sample for each scenario. For the following scenarios, indicate which type of hypothesis test should be performed. a. Daughter Emily wants to evaluate her overall performance on competitive family games. She believes that she wins more than her fair share of 1/5 of competitive family games. She selects a random sample of 60 competitive family games, finding that she won 14 of them ( X) One-sample z-test for one population proportion ( ) Two-sample z-test for the difference in two population proportions ( ) One-sample t-test for one population mean ( ) One-sample t-test for the population mean of differences (paired data) ( ) Two-sample t-test for the difference in two population means ( ) One-sample t-test for the population slope of a regression line ( ) Chi-squared test of goodness of fit ( ) Chi-squared test of independence b. The family has a system to decide who selects the game to play among the three daughters. Parents Rick and Gail suspect that there is a relationship between the types of games (competitive board games, cooperative board games, competitive card games, cooperative card games, or other) and the daughter who selected the game. They take a random sample of game selections and for each selected record the type of game and the daughter who selected the game. The data will be used to determine if there is a relationship between the type of games and the daughter who selected the game. ( ) One-sample z-test for one population proportion ( ) Two-sample z-test for the difference in two population proportions ( ) One-sample t-test for one population mean ( ) One-sample t-test for the population mean of differences (paired data) ( ) Two-sample t-test for the difference in two population means ( ) One-sample t-test for the population slope of a regression line ( ) Chi-squared test of goodness of fit ( X ) Chi-squared test of independence

c. The family knows that a game of Monopoly is a time commitment. They believe that the longer a game takes, the more money is taken out of the bank and owned by a player. They wish to assess if there is a relationship between the length of a Monopoly game (in minutes) and the total amount of money possessed by all players (the sum of the money in $). ( ) One-sample z-test for one population proportion ( ) Two-sample z-test for the difference in two population proportions ( ) One-sample t-test for one population mean ( ) One-sample t-test for the population mean of differences (paired data) ( ) Two-sample t-test for the difference in two population means ( X ) One-sample t-test for the population slope of a regression line ( ) Chi-squared test of goodness of fit ( ) Chi-squared test of independence d. Rick recently saw an article that reported the average cost of a game is $30. He wishes to assess if his family has spent more per game than the article reported, on average. The family has a large collection of games, so Rick will select a random sample of 40 games to complete this hypothesis test. ( ) One-sample z-test for one population proportion ( ) Two-sample z-test for the difference in two population proportions ( X ) One-sample t-test for one population mean ( ) One-sample t-test for the population mean of differences (paired data) ( ) Two-sample t-test for the difference in two population means ( ) One-sample t-test for the population slope of a regression line ( ) Chi-squared test of goodness of fit ( ) Chi-squared test of independence e. The family enjoys playing the card game Solitaire, a single-player game. They wish to assess if the average number of moves to win a game is different for the adults and the children. ( ) Small sample binomial test ( ) One-sample z-test for one population proportion ( ) Two-sample z-test for the difference in two population proportions ( ) One-sample t-test for one population mean ( ) One-sample t-test for the population mean of differences (paired data) (X) Two-sample t-test for the difference in two population means ( ) One-sample t-test for the population slope of a regression line ( ) Chi-squared test of goodness of fit ( ) Chi-squared test of independence

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Question 9: Position and Blood Pressure Does the position of a patient affect the results of a blood pressure measurement? An experiment was designed to study whether measuring systolic blood pressure in a standing position results in lower readings, on average, than measuring systolic blood pressure in a supine position (or lying face up). A random sample of 40 adults was selected and for each, their blood pressure (in mmHg) was measured in both positions. a. Which type of experimental design was used for this blood pressure study? ( ) Completely Randomized Design ( ) Randomized Block Design ( X) Matched Pairs Design b. Define the parameter ࠵? d in context. ࠵? d is the population mean of differences in systolic blood pressure when comparing results across two positions (standing – supine) for all adults represented by the sample. c. The researcher, Gloria, remembers that she needs to generate a QQ plot to check a normality assumption before conducting the appropriate statistical analysis. State the normality assumption. The sample of differences in systolic blood pressure was sampled from a population of differences that are Normally distributed. d. After examining the data, Gloria concludes that the normality assumption is not reasonably met. Explain why the researcher would be able to continue with the statistical analysis even though the normality assumption is not met. Even though the normality assumption is not reasonably met, Gloria can continue with the statistical analysis because the sample size n = 40 is large enough (it is larger than 25), hence, by CLT, the sample mean values of differences will have an approximately normal distribution.

e. Gloria checks all assumptions and these are found to be reasonably met. She performs the appropriate t-test using R. These are her results: data: blood_pressure t = -1.7037, df = 39, p-value = 0.0482 Provide an interpretation of the test statistic value of -1.7037 The sample mean of differences in systolic blood pressure is 1.7037 standard errors below the hypothesized population mean of differences of 0. Question 10: Living Arrangements A survey was conducted on a random sample of 316 current U.S. college students. Two of the survey questions were: 1) What is your current living arrangement? 2) What is your current geographical location? The results of these two categorical questions are provided in the table below. a. Which chi-square test would be appropriate for assessing if there is a significant relationship between Living Arrangement and Geographical Location for the population of all U.S. college students? ( ) Chi-square Test of Goodness of Fit ( X ) Chi-square Test of Independence b. The researcher is interested in using these results to compute a few probabilities; i. The first one is to find the probability that a randomly selected college student will live by themselves (solo) and reside in Europe. Out of the 316 students in the study, there were 12 who reported they live by themselves (solo) and are located in Europe. The probability of both of these events happening would be 12/316 = 0.038 ii. The second one is to find the probability that, among all college students who reside in Africa, what is the probability that they live with extended family? Out of the 60 college students who reside in Africa in this study, there were 26 who reported they live with extended family à 26/60= 0.4333

c. Assuming that there is not a significant relationship between Living Arrangement and Geographical Location for the population of all U.S. college students, what is the expected number of college students who would report they live in the Americas and reside in their Parents household? Provide brief numerical justification for your answer. The expected count is found by (110)(138)/316 = 48.038 U.S. college students, we do not round the expected counts. Question 11: Tree Growth Trees grow faster due to global warming. Does rapid growth mean that trees will not survive as long? To help address this question with data, the tree rings that are formed as a tree grows were examined for 46 randomly selected dead trees of a particular species of pines. Tree ring width in the first 25 years tells us how fast the tree grows in its younger years. The survival age of trees observed in the data ranged from 47 to 190 years. The variables of interest are y= TRW = Tree ring width in the first 25 years measured in millimeters (mm) x= AGE = Survival age of trees in years a. The output below shows the results for fitting a linear regression model to the data. Provide the estimated regression line that would be used to predict the TRW from the survival age of a tree. The resulting estimated regression line would be: y-hat = 4.84207 - 0.002632(x) OR predicted TRW = 4.84207 - 0.002632(age)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

b. Four scatterplots are shown below (A, B, C, D). Which scatterplot is consistent with the estimated regression model? ( ) Plot A ( ) Plot B ( X ) Plot C ( ) Plot D c. One of the important measures of how well a regression model fits the data is the quantity r 2 . This value was computed as r 2 = 0.2581. Provide a one-sentence interpretation of as r 2 that includes the value and context Based on this analysis, 25.81% of the variability in tree ring width in the first 25 years (mm) is accounted for by its linear relationship with survival age of tree (years) d. Based on the data, among all trees represented by the sample, does there appear to be a significant negative linear relationship between tree ring width in the first 25 years and survival age of tree? ( X ) Yes ( ) No Explain why you chose this, including appropriate numerical support for your answer. Since the p -value = 0.00031/2 = 0.000125 is much less 1%, there is very strong support for a significant negative linear relationship between TRW and Age, among the population of trees represented by the sample. The plot shows a negative linear relationship between TRW and Age. This is consistent with the negative slope observed for the estimated regression model. And only Plot C has values of TRW that are reasonable (based on intercept and slope)

e. The four 95% interval estimates shown below were computed to summarize these regression results. These are the confidence and prediction intervals at the mean age in the data, and the confidence and prediction intervals at the maximum age in the data. However, the intervals have lost their labels that explain at what Age values they were computed. The predicted value for TRW at the sample mean of Age is estimated as 4.5445 mm. Which of these four intervals must be the confidence interval for average TRW for all trees whose survival age is at the sample mean age. ( X ) Interval A. (4.493, 4.596) ( ) Interval B. (4.225, 4.458) ( ) Interval C. (3.972, 4.711) ( ) Interval D. (4.190, 4.899) Question 12: The Chi-Square Distribution Our most recent set of chi-square tests allowed us to do more inference for categorical responses. We learned about a new distribution, called the chi-square distribution. Provide a description of the chi-square distribution with 6 degrees of freedom. In your answer, be sure to include the shape, the mean, the median, and the standard deviation. The chi-square distribution with 6 degrees of freedom is skewed to the right, with a mean of 6, a median of 5.348 (which makes sense since the distribution is skewed right), and a standard deviation of sqrt(12) or 3.4641. The midpoints for A and D are at the correct value of 4.5445mm. Then Interval A has the narrower interval. So this must be the confidence interval for average TRW at the sample mean of Age.

Question 13: Covid Vaccines When vaccines were first made available, a number of Michiganders were getting their Covid vaccines in Ohio. Let us investigate if the vaccine doses in four Midwest states (Michigan and three bordering states) were distributed according to population size. The table below shows the location of vaccines administered from a random sample of vaccine doses and the population distribution for these four Midwest states. Michigan Ohio Illinois Indiana Total Administered vaccines 136 166 139 84 525 Population distribution for the four Midwest states 24% 29% 31% 16% 100% Using the information in the table, we wish to assess whether, for four Midwest states, the distribution of administered vaccine doses matches the distribution of the population 1. State the appropriate null hypothesis in context. 2. You may assume you have two independent random samples of results. Check the remaining necessary assumption(s) (show all work). 3. Perform the appropriate test by providing the test statistic value, the corresponding distribution, and the corresponding p -value. 4. Evaluate the p -value and provide a conclusion in context. Please provide a well-organized answer that clearly uses labels for Steps 1, 2, 3, and 4. 1) H 0 : p Michigan = 0.24, p Ohio = 0.29, p Illinois = 0.31, p Indiana = 0.16 . 2) State the assumptions: -At least 80% of the expected counts are greater than 5. -None of the expected counts are less than 1. Check: We must compute the expected counts using the total sample size multiplied by the probability of each outcome Michigan Ohio Illinois Indiana Total Expected doses 525*0.24 = 126 525*0.29=152.25 525*0.31=162.75 525*0.16=84 525 The conditions are met, proceed with the test.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

3) Test statistic value: Distribution to find the corresponding p -value: χ 2 (3) → df=k-1=4-1= 3 p -value: Using Shiny App we find the p-value is 0.1386 4) Evaluation of the p-value: since the p-value is larger than 10%, we do not have enough evidence against H0 and in support of Ha. . Conclusion: Based on the data, we do not have enough evidence to say that, for the four Midwestern states, the distribution of administered vaccine doses does not match the distribution of the population from the four Midwestern states.

Question 14: Blood Battle: Bleed Maize and Blue to Beat OSU A group of curious statistics students surveyed a random sample of 200 U of M students (group 1) and a random sample of 150 OSU students (group 2) and recorded whether each participant planned to donate blood. The 90% confidence interval for the difference between the population proportion of all U of M students who are planning to donate blood and the population proportion of all OSU students who are planning to donate blood is given by (-0.175, -0.005). a. Provide an estimate of the difference in the two population proportions. The estimate of the parameter is the statistic= midpoint of the CI = !".$%&’(!".""&) * = −0.09 b. Provide the 90% margin of error: Margin of error is ½ width of the interval = 0.085 c. Determine if the following statement is appropriate or not based on the 90% confidence interval. Based on the 90% confidence interval, we would suggest that there is no difference between the proportion of all U of M students who are planning on donating and the proportion of all OSU students who are planning on donating blood. ( ) Appropriate (X ) Not Appropriate 0 is not a reasonable value for p 1 -p 2 based on the 90% confidence interval, therefore we cannot say that p 1 equals p 2

Stats 250 Practice Exam 2 W23 SOLUTIONS (1)

Related Documents