Final Exam Extra Practice - KEY

pdf

School

Indiana University, Bloomington *

*We aren’t endorsed by this school

Course

300

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

5

Uploaded by DeanRock11567

Report
1) My friend Bethany has a husband, Barry, who is a crazy-smart engineer. In his spare time, he built a time machine. When Bethany found out about this, she was very excited as she has always suspected that the people of today like Guinness more than people did in the past. She hopped into the machine, Set the dial for 1910, and went back to visit The William S. Gossett to buy him a beer. While in the past, she went to a pub and collected data from eight individuals about how much they enjoyed the Guinness they were drinking (She used a scale that went from 1 to 100). Then she came back to the present and collected data from eight individuals here about how much they enjoyed the Guinness they were drinking. Based on this experimental design, what type of analysis should Bethany perform and why? This would be a between-participants t test because: 1) there are only two levels of the independent variable (1910 and today) and 2) There are separate/independent/random/different people in each of the two conditions 2) An instructor asks you to prepare a brief report on the yearly rainfall in the Amazon Rainforest over the past 150 years. After digging, you find a resource that lists the total rainfall (in inches per year) for the Amazon each year. After you’ve typed each bit of data into a nice big table, you show it to the instructor who says, “I don’t have time for that, just summarize the data for me!” What would you do? What terms, statistics, and/or graphs would you use to accurately summarize the rainfall data for this professor? I’d put the data into a histogram and I’d expect that the data would likely create a unimodal and symmetric distribution. Because the distribution is unimodal and symmetric, the mean value would be the most representative measure of central tendency. I’d also likely report the standard deviation of distribution. 3) Three thieves who met in an online chat room are discussing the best way to make a living as a thief. One believes that breaking into houses when people are on vacation is the most profitable approach. The other believes that breaking into businesses late at night, when they are closed, is superior, the third believes that pickpocketing is the way to get rich quick. Since they cannot come to an agreement, they devise an experiment: The first thief, who lives in LA, will perform 20 home break-ins and calculate the average profit from each heist. The second thief, who lives in East Milford, Kentucky, will perform 20 business break-ins and record profits as well, the third thief, who lives in London, will pick 20 pockets on the subway. When the study was complete, the data showed that the average profit from the home break ins was $2,314 while the average profit from the business break-ins was $1,215, and the average pickpocket yielded $1,538. What type of statistical analysis (matching, coin-flipping, within, etc.) should the thieves perform to analyze their data? Why? This should be analyzed as a between-participants ANOVA. In this experiment we have more than two conditions, so we have to use an ANOVA, and each level of the IV is a separate/independent group. As thieves are not known for their statistical acumen, it’s possible that this experimental design is slightly flawed. Are there any confounds that could undermine their analysis?
Yes, presumably there is a pretty big difference in the houses, costs of goods, and socioeconomic status in LA, Milford, and London. So even if one style of crime did show a statistically significant difference in profit, the result could be due to the different locations. 4) A researcher suggests that brain activation will go up when viewing images of beautiful people. She selects participants and uses an fMRI imaging study to measure each person’s brain activation both before and while viewing images of good-looking celebrities. What type of statistical analysis should the thieves perform to analyze their data? Why? This should be analyzed as a within-participants t test. In this experiment we have two conditions, and each participant in the study was measured in each level of the condition (before viewing and while viewing). 5) Imagine that you conducted a between-participants experiment where you collected data from 20 participants in each condition. Suppose that the test statistic, the difference between the two group means, was 15, and that the standard deviation for the data you collected was 25. Now imagine that you conducted a second between-participants experiment where you again collected data from 20 participants in each condition. Suppose that, just like in the first experiment, the difference between the two groups’ means was 15, but in this case the standard deviation for the data you collected was 10. In which of the two experiments would you be more likely to reject the null hypothesis? Or would you expect similar outcomes for both experiments? Why? I would expect a smaller (more convincing) p value in the experiment where the standard deviation was lower. Even though the difference between the two groups’ means (the variability due to the manipulation of the experimental variable) is the same , the higher SD in the first experiment means that the random noise and variability in the population was larger in that experiment. In order to have a significant result, we want the difference between groups (variability due to the manipulation of the IV) to be large compared to the everyday noise/variability in the population (as measured by the SD, in this case). 6) At the university level, the stereotype of the “dumb jock” might be strong and ever present; however, a fair amount of research shows that athletes maintain decent grades and competitive graduation rates when compared to non-athletes. Suppose that we collected data on the average number of minutes that NCAA basketball players played in games and on those players’ GPAs and found a significant moderate positive correlation between playing time and GPA. Describe the three possible causal relationships that could explain a positive correlation between GPA and minutes played per game and comment on the likelihood of each relationship. GPA causes minutes played (if the coach values academics, maybe, but this seems unlikely) Minutes played causes GPA (highly unlikely) A student with higher motivation levels in general, or higher levels of discipline could cause them to earn better grades and it could also cause them to practice harder and become better players, thus resulting in more playing time.
7) Suppose you are evaluating the following competing hypotheses about what it is a random cardboard box and you initially believe all these options be equally likely: Hypothesis 1: a box contains two turkeys and one calculator Hypothesis 2: a box contains one turkey, one potato, and one calculator Hypothesis 3: a box contains a goat, a calculator, and a dry-erase marker Hypothesis 4: a box contains a calculator, a coffee cup, and a block of cheese What is P(Hypothesis 2 | calculator)? 25% What is P(block of cheese | Hypothesis 1)? 0% What is P(Hypothesis 3 | a coffee cup)? 0% 8) If I told you that I had collected data from 200 people, and the data had a mean value of 36.1, a median value of 34.5, a mode of 32, and a standard deviation of 7.38, but I didn’t show you the actual data, could you classify the distribution as symmetric, positively skewed or negatively skewed? If so, how? Could you tell me if the data was unimodal, bimodal or uniform? If so, how? Yes. The distribution must be positively skewed because the mean and median have both been pulled out in the positive direction. That is, both values are higher than the mode (the standard deviation doesn’t give much information here). The distribution is unimodal because there is only one mode listed. 9) Suppose that a dietician collected data to explore the idea that eating organic vegetables decreases the risk of heart disease. After the data collection and analysis, the dietician finds a moderate negative correlation between eating organic vegetables and heart disease. List the three possible causal relationships that explain this data and comment on the likelihood of each one. When talking about a third factor, be sure to come up with a specific factor that might be involved and explain how the linkages would work. It may be that eating more organic veggies reduces heart disease. This seems reasonable. It may be that having less heart disease causes you to eat more organic veggies. This is highly unlikely. It may be that a third factor, such as high socioeconomic status allows you to afford things like organic veggies and also lets you afford better medical care, or have time to exercise, or any of a number of other things that may reduce heart disease. There are a number of other possible third factors that could work here and any reasonable 3 rd factor, as long as it makes the linkages clear, is fine.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10) If possible, estimate the Pearson correlation coefficient for each of the following scatter plots. If it is not possible to estimate the correlation coefficient, state why. r = –.60 to –0.90 r = 0.10 to 0.30 r = ~0 (accept –0.1 to 0.1) r = –.35 to –.65 11) Researchers studying the relationship between banana consumption and chimp happiness published the following results: A significant positive correlation between bananas eaten and overall happiness, r (110) = .25, p = .02, 95% CI = [0.17, 0.33], r 2 = 0.063 A significant linear regression for predicting happiness based on banana consumption with a regression constant of 2.50 and a regression coefficient of 0.40. Given that data, what chimp happiness level would you predict for a chip that consumed 10 bananas? How confident are you in the accuracy of that prediction? Explain your answer. Given the regression constant and coefficient above: Chimp happiness = 2.50 + 0.40*bananas consumed = 2.50 + 0.40 * 10 = 2.50 + 4.0 = 6.50 How confident? Not very. The correlation between happiness and banana consumption is fairly weak ( r = .25) and/or the r 2 value is small, so while the correlation is significant, it is still noisy
data, and thus the actual happiness of any given chimp may be well above or below the prediction line.