ST 314D Data Analysis 6

docx

School

Oregon State University, Corvallis *

*We aren’t endorsed by this school

Course

314

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

7

Uploaded by lucekimb

Report
© Intellectual Property of Kelsi Espinoza Q1. (23 points) With marijuana becoming legalized throughout parts of the US, an increasingly popular topic is decriminalization of marijuana (now that having up to some specified amount is considered legal). Curious as to what their students believe, Kelsi took a random sample of 219 students and asked them whether they had ever consumed marijuana and whether they oppose or support decriminalization of marijuana (the order of the questions were randomized). From these students, Kelsi randomly sampled 78 students that reported never consuming marijuana, 46 of which support decriminalization. Then Kelsi sampled 141 students that have consumed marijuana and found that 121 of them support decriminalization. a. (2 points) Looking at the side-by-side barplot provided, compare the distribution of support/opposition of decriminalization within each group of students (those that have and have not consumed marijuana), and well as the distribution of support/opposition of decriminalization between each group of students (those that have and have not consumed marijuana). (For instance, describe relative heights of each group, relationships of bars to one another). Looking at the side-by-side barplot, within each group of students, the majority support decriminalization. Among those who have never consumed marijuana, a smaller proportion supports decriminalization compared to those who have consumed marijuana.
© Intellectual Property of Kelsi Espinoza The relative heights of the bars indicate that the group that has consumed marijuana has a much higher support rate for decriminalization compared to those who have never consumed it. b. (1 point) From the side-by-side barplot, do you feel that the proportion of students that support decriminalization differs between statistics students that have and have not consumed marijuana? Explain. Based on the barplot, it does seem that the proportion of students that support decriminalization is higher among those who have consumed marijuana compared to those who have not. This is indicated by the visibly taller bar representing support in the group of students that have consumed marijuana. c. (2 points) Check the sample size conditions for performing a z-test to test the difference in the two proportions. Explain whether these size conditions are met. The sample size conditions for a z-test require that the sample sizes be large enough for the sampling distribution to be approximately normal. This is generally considered true if np ≥ 10 and n(1−p) ≥ 10 for both populations. With 78 students who never consumed and 46 of them supporting decriminalization, and 141 students who have consumed with 121 supporting, both conditions seem to be met. d. (2 points) State the null and alternative hypothesis. Be sure to label your subscripts (if you use “1” and “2” to describe the populations, make sure to tell us which population goes with “1” and which population goes with “2”). Null Hypothesis (H0): ´ p 1 ´ p 2 =0 (There is no difference in the proportion of support for decriminalization between students who have and have not consumed marijuana.) Alternative Hypothesis (HA): 1−2≠0 (There is a difference in the proportion of support for decriminalization between students who have and have not consumed marijuana.) We are labelling ´ p 2 as the proportion of support from students who have never consumed and ´ p 2 as the proportion of support from students who have consumed. e. (2 points) Calculate the test statistic. Show setup/work and round answer to two places past the decimal. z = ( ´ p 1 −´ p 2 ) ´ p ( 1 −´ p ) ( 1 n 1 + 1 n 2 )
© Intellectual Property of Kelsi Espinoza ´ p 1 and ´ p 2 are the sample proportions of the first and second group respectively. n 1 and n 2 are the sample sizes of the first and second group respectively. ´ p = x 1 + x 2 n 1 + n 2 , where x 1 and x 2 are the number of successes in the first and second group respectively. n 1 = 78 and x 1 = 46 . n 2 = 141 and x 2 = 121 . ´ p 1 = 0.590 For students who have consumed marijuana: ´ p 2 = 0.858 z =− 4.47 ¿ 4.47 f. (1 point) What is the p-value? Giv[e answer to four places past the decimal. p =0.0000167490 g. (1 point) Is the p-value for this test one-sided or two-sided? Since the p-value (0.0000167490) is less than the significance level α =0.05, we reject the null hypothesis. This means there is sufficient evidence to suggest that there is an association between marijuana consumption and support for decriminalization among students. Therefore, p-value is two-sided. h. (3 points) Write your two-part conclusion for the hypothesis test using a significance level of α = 0.05. Recall that this should include: whether you reject of fail to reject the null hypothesis your p-value and significance level Strength of evidence for the alternative in context of the problem There is no association between marijuana consumption and support for decriminalization among students. There is an association between marijuana consumption and support for decriminalization among students.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
© Intellectual Property of Kelsi Espinoza Conclusion : Since the p-value (0.0000167490) is less than the significance level α = 0.05, we reject the null hypothesis. There is strong evidence at the 5% significance level to conclude that there is an association between marijuana consumption and support for decriminalization among students. i. (2 points) Calculate the 95% for p NoConsume Support p Consume Support , the proportion of all non-consumer statistics students that support decriminalization subtracting the proportion of all statistics students that have consumed marijuana that support decriminalization (notice that we are specifying direction of subtraction here: NoConsume Support Consume Support ). Show work/setup and give answer in interval notation, rounded to four places past the decimal. The 95% confidence interval for the difference in proportions p NoConsume Support p Consume Support is approximately (-0.3918, -0.1450) j. (3 points) Write the two-part conclusion for the 95% confidence interval for p NoConsume Support p Consume Support in context of the problem. This should include: The confidence level The confidence interval with any relevant units Indication of whether each value suggests proportion is higher/lower for which population The point estimate from the samples Context We are 95% confident that the true difference in the proportion of support for decriminalization between students who have never consumed marijuana and those who have lies between -0.3918 and -0.1450. k. (2 points) The 95% confidence interval has all negative numbers (hint). For the same data, do we know (without calculating the interval) whether the 90% confidence interval also contains only negative values? Explain. The confidence interval suggests that the proportion of support for decriminalization is higher among students who have never consumed marijuana compared to those who have consumed it. The point estimate for the sample difference is approximately -0.2684, indicating that this is the average difference in support for decriminalization between the two groups of students in our sample. The context of this finding implies that experiences with marijuana may influence students' views on its decriminalization, with non-consumers more likely to support decriminalization than consumers. l. (2 points) The 95% confidence interval has all negative numbers. For the same data, do we know (without calculating the interval) whether the 99% confidence interval also contains only negative values? Explain.
© Intellectual Property of Kelsi Espinoza Without calculating it, we cannot be as certain of this compared to the 90% confidence interval because the 99% interval will be wider, potentially crossing zero if the 95% interval bounds are close to zero. Use the following to answer Question 2: Kelsi thought it would be interesting to explore the amount of time spent playing videogames (weekly), as well as whether games were played mainly on computer/PC or on a gaming console for their students. To investigate this, Kelsi takes a random sample of 45 students that play videogames primarily on the computer and 28 students that primarily play videogames on a console and recorded the number of hours spent gaming in the past week, available in the games.csv dataset. The following software output is an analysis of these data: Welch Two Sample t-test t = 2.0977, df = 65.921, p-value = 0.03977 alternative hypothesis: true difference in means is not equal to 0 Mean Std. Dev. n console 7.61 10.66 28 computer 16.13 23.68 45
© Intellectual Property of Kelsi Espinoza 95 percent confidence interval: 0.4108423 16.6415387 mean in group computer mean in group gaming console 16.133333 7.607143 Question 2. (12 points) Do these data provide evidence of a difference in average weekly time spent gaming for students that play on computer, versus on a console? Use a significance level of 0.05 and answer the following questions using the software output . a) (2 points) Describe the side-by-side boxplot. Is there visual evidence that the weekly hours spent gaming of students differs by gaming preference (console vs computer)? Explain. The side-by-side boxplot shows that the median weekly hours spent gaming on a computer is higher than the median on a console, indicated by the central line of the box. Additionally, the range and interquartile range for computer gaming appear larger, with outliers indicating that some students spend a significantly higher amount of time gaming on computers compared to consoles. This visual evidence suggests that there may be a difference in the weekly hours spent gaming between students with different gaming preferences. b) (2 points) State the null and alternative hypotheses to answer the question of interest. Null hypothesis (H0): There is no difference in the average weekly time spent gaming between students who play on a computer and those who play on a console. μ computer = μ console . Alternative hypothesis (H1): There is a difference in the average weekly time spent gaming between students who play on a computer and those who play on a console. μ computer = μ console c) (1 point) Is the alternative hypothesis one-sided or two-sided? The alternative hypothesis is two-sided because it states that the means are not equal, which includes the possibility that the mean time spent gaming on a computer could be either higher or lower than on a console. d) (2 points) Check the conditions for inference. State each condition as well as whether each condition is met. Independence: The sampled students are assumed to be independent of each other if they were randomly sampled and represent less than 10% of the population of all such gamers.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
© Intellectual Property of Kelsi Espinoza Sample size/Skew : The sample should be large enough for the Central Limit Theorem to apply, which generally requires each group to have at least 30 observations. Here, the computer group meets this condition, but the console group, with 28 students, is slightly below. However, since it's close to 30, the condition is met. Equal variances: The Welch Two Sample t-test does not assume equal variances, so this condition does not need to be met. e) (5 points) From the R output, write a four-part conclusion describing the results (see Week 6 discussion board for write-up templates). The four-part conclusion should include: State whether (or not) to reject the null hypothesis, p-value, and significance level. Provide the strength of evidence in terms of the alternative hypothesis, in context. Give the interval estimate and point estimate in context. CONTEXT! Include any other information you might feel to be relevant. Based on the R output, the null hypothesis should not be rejected at the 0.05 significance level, since the p-value (0.03977) is less than 0.05. The strength of the evidence suggests that there is a statistically significant difference in the average weekly time spent gaming between the two groups. The data indicates that students who game on a computer spend more time gaming than those on a console. The 95% confidence interval for the difference in means is given by the output but not visible in the text provided. Contextually, the results suggest that preferences for gaming devices might influence gaming behavior, potentially due to the nature of games typically played on computers versus consoles or other factors such as accessibility and type of games preferred.