ISCAM-Chapter3 AC

pdf

School

Rhodes College *

*We aren’t endorsed by this school

Course

211

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

29

Uploaded by DeanStarling489

Report
Chance/Rossman, 2018 ISCAM III Investigation 3.7 209 Investigation 3.7: Is Yawning Contagious? The folks at MythBusters , a popular television program on the Discovery Channel, investigated whether yawning is contagious by recruiting fifty subjects at a local flea market and asking them to sit in one of three small rooms for a short period of time. For some of the subjects, the attendee yawned while leading them to the room (planting a yawn ³seed´), whereas for other subjects the attendee did not yawn. As time passed, the researchers watched (via a hidden camera) to see which subjects yawned. (a) Identify the explanatory variable (EV) and the response variable (RV) in this study. EV: RV: (b) Define the relevant parameter of interest, and state the null and alternative hypotheses for this study. Be sure to clearly define any symbols that you use. Parameter: H 0 : H a : In the study they found that 10 of 34 subjects who had been given a yawn seed actually yawned themselves, compared with 4 of 16 subjects who had not been given a yawn seed. (c) Create a two-way table summarizing the results, using the explanatory variable as the column variable . Totals Totals (d) Explain how you would carry out a simulation analysis to approximate a p-value for this study. [ Hint : How many cards? How many of each type? How many would you deal out? What would you record? How would you find the p-value?] weather them n' Seed or net Yawn or not I The doffenme in prohahyuf yawing tickets - Bros ) two Gwp been a G - oaf nuked Teed > Tmsn From > u seed nosed yawn 10 4 14 no yawn 24 12 36 34 16 50 d im - Em - 4g = 0.044 = .
* Differ . µ pnohahrj or = Diff .nu in ppnluhipnprhin II. a - a. " a thinner - nie i Ik n . . . . . " wth µ = So p - values Pl low mm ) M= 14 pl X 7101 n - - 34
Chance/Rossman, 2018 ISCAM III Investigation 3.7 210 (e) Open the Analyzing Two-way Tables applet. x Paste in the raw data and press Use Data or enter the titles and counts of a two-way table and press Use Table . (Or check the 2 × 2 box and enter the cell values.) x Check the Show Shuffle Options box. x Set Number of Shuffles to 1000 . x Press Shuffle . Briefly describe this randomization (null) distribution: What is its shape? What is the mean? What is the standard deviation? (f) Specify the observed value for the difference in the conditional proportions in the Count Samples box. Then indicate whether the research conjecture expected a larger or smaller proportion of successes in Group A by choosing Greater Than or Less Than from the pull-down menu. Then press the Count button. Exact p-value The simulations you have conducted in Investigations 3.6 (Dolphin Therapy) and above approximated the p-value for two-way tables arising from random assignment by assuming the row and column totals are fixed. In this case, the probability of obtaining a specific number of successes in one group can be calculated exactly using the hypergeometric probability distribution. (We used the independent binomial distributions with the teen hearing loss study, where we wanted to sample separately from two populations and the overall number of successes was not fixed in advance.) Keep in mind, that under the null hypothesis, we are assuming the group assignments made no difference and that there would be 14 successes (³yawners´) and 36 failures (³non - yawners´) between the two groups regardless. Because the random assignment makes every configuration of the subjects between the two groups equally likely, we determine the probability of any particular outcome for the number of yawners and non-yawners by first counting the total number of ways to assign 34 of the subjects to the yawn-seed group (and 16 to the no-yawn-seed group) in the denominator. The numerator is then the number of ways to get a particular set of configurations for that group, such as those consisting of 10 yawners and 24 non-yawners. fco.tt#Ifxaotp.-an .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.7 211 (g) How many ways altogether are there to randomly assign these 50 subjects into one group of 34 (yawn-seed group) and the remaining group of 16 (no-yawn-seed group)? [ Hint : Recall what you saw earlier with the binomial distribution and counting the number of ways to obtain S successes and F failures in n trials. See the Technical Details in Investigation 1.1.] (h) Now consider the 14 successes and the 36 failures. How many ways are there to randomly select 10 of the successes? How many ways are there to randomly assign 24 of the failures to be in the yawn seed group? How should you combine these two numbers to calculate the total number of ways to obtain 10 successes and 24 failures in the yawn-seed group, the configuration that we observed in the study? Successes Failures Total: (i) To determine the exact probability that random assignment would produce exactly 10 successes and 24 failures into the group of 34 subjects, divide your calculation in (h) by your calculation in (g). (j) Explain why your answer to (i) is not yet the p-value for this study. Result: The probability of obtaining k successes in Group A, with n observations, when sampled from a two-way table with N observations, consisting of M successes and N ± M failures is: P(X = k ) = C( M , k ) × C( N ± M , n ± k ) / C( N , n ) where C( N , n ) = N !/[ n !( N ± n !)] is the number of ways to choose n items from a group of N items. X represents the number of successes randomly selected for group A. X is a hypergeometric random variable. Also note E(X) = n ( M / N ) and SD(X) = )] 1 ( /[ ) )( ( 2 ± ± ± N N n N M N nM . In this study, we had N = 50 subjects and we defined yawning to be success so M = 14. We also arbitrarily chose to focus on the yawn-seed group, so n = 34. This calculation works out the same if you had defined ³not yawning´ to be a success and/or if you had focused on the 16 people in the no - yawn-seed group. You just need to make sure you count consistently. O CoE O ( 58k go.nu . - 492*10 " x lool = ( Yf ) * 13241=125162700 pl # ok fzg/ I IM PIX 7107 Iplx - lulxplx -111 + plash ) -11714=13 ) tpl # 14 ) - -
Chance/Rossman, 2018 ISCAM III Investigation 3.7 212 We will continue to define the p-value to be the probability of obtaining results at least as extreme as those observed in the actual study. Because we expected more yawners in the yawn-seed group, the p- value is the probability of randomly assigning at least 10 of the yawners in the yawn-seed group. So far you have found P(X = 10) = C(14, 10) × C(36, 24) / C(50, 34) = 0.2545. (k) Calculate P(X = 11), P(X = 12), P(X = 13), and P(X = 14) using the hypergeometric probability formula. P(X = 11) P(X = 12) P(X = 13) P(X = 14) Why do we stop at 14? (l) Sum all five probabilities together (including P(X = 10)) to determine the exact p-value for the yawning study. How does this p-value compare to the empirical p-value from the applet simulation? Write a one or two sentence interpretation of this p-value. Exact p-value: Comparison: Interpretation: Definition: Using the hypergeometric probabilities to determine a p-value in this fashion for a two- way table is called Fisher¶s Exact Test , named after R. A. Fisher. (m) Calculate this hypergeometric probability using technology (see Technology Detour on next page). (n) Set up and carry out the calculation to determine the exact p-value where you define the success to be ³not yawning´ and the group of interest to be the yawn seed group. (o) Set up and carry out the calculation to determine the exact p-value, where you focus on the number that did not yawn in the no-yawn-seed group. Show that you obtain the same exact p-value as before. 0.1165 = ipb " " - b. 0702 0.0198 = = = 0.0015 0.5128 V. sina.MN T - * R - Cohen - Iscamhypwprubfk > 24 , f- 50,1=36 ,n=34 17124 Leg ) hairtail . - TRUE ) p( nor More ) , f- SO , a- 16 , f- 36 Iscamhyperptrobl Koh ,N=5UM=3f,a= 16 lower tail - - FALSE I
faut p - true . ¥rP Hall ' a pH ship ix. is l t pl X = 141 + Iii ! xi I sit Is : ) + to 1941
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
p - van can the written as to " ) 1311 = 0.5/28
Chance/Rossman, 2018 ISCAM III Investigation 3.7 213 Technology Detour ʹ Calculating Hypergeometric Probabilities ;Fisher’s Exact TestͿ In R , the iscamhyperprob function takes the following inputs: x k , the observed value of interest (or the difference in conditional proportions, assumed if value is less than one, including negative) x total , the total number of observations in the two-way table x succ , the overall number of successes in the table x n , the number of observations in ³group A´ x lower.tail , a Boolean which is TRUE or FALSE For example: iscamhyperprob(k=10, total=50, succ=14, n=34, lower.tail=FALSE) Analyzing Two-way Tables applet x Check the box for ShRZ FLVheU¶V E[acW TeVW in the lower left corner. x A check box will appear for determining the two-sided p-value Discussion : You should see that there are several equivalent ways to set up the probability calculation. Make sure it is clear how you define success/failure and which group you are considering ³group A.´ This will help you determine the numerical values for N , M , and n in the calculation. Below is a graph of the Hypergeometric distribution with N = 50, M = 14, and n = 34. Using probability rules, you can show that the expected value of this distribution is n N M u ) / ( = (14/50) × 34 = 9.52 yawners in yawn seed group and the standard deviation of the probability distribution is the square root of n × ( M/N ) × ( N ± n )/ N × ( N ± M )/( N ± 1) = 1.496 yawners. (p) Compare this graph and the mean and standard deviation values to your simulation results.
Chance/Rossman, 2018 ISCAM III Investigation 3.7 214 (q) What conclusions will you draw from the p-value for this study? (r) On the Mythbusters program, the hosts concluded that, based on the observed difference in conditional proportions and the large sample size, there is ³ little doubt, yawning seems to be contagious.´ Do you agree? Study Conclusions With a large p-value of 0.513 (Fisher¶s Exact Test), we do not have any evidence that the difference between the two groups (with and without yawn seed) was not created by chance alone from the random assignment process. If there was nothing to the theory that yawning is contagious, by ³luck of the draw ´ alone, we would expect 10 or more of the yawners to end up in the yawn seed group in more than 50% of random assignments. Although the study results were in the conjectured direction, the difference between the yawning proportions was not large enough to convince us that the probability of yawning is truly larger when a yawn seed is planted. The researchers could try the study again with a larger sample size to increase the power of their test. The researchers also may want to be cautious in generalizing these results beyond the population of volunteers at a local flea market. It¶s also not clear how naturalistic the setting of leading individuals to a small room to wait is.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.7 215 Practice Problem 3.7A (a) For the Mythbusters¶ study (p-value > 0.5), is it reasonable to conclude from this study that we have strong evidence that yawning is not contagious? Explain. (b) Explain, in this context, what is meant in the Study Conclusions box by ³the researchers could try the study again with a larger sample size to increase the power of their test´ and why that is a reasonable recommendation here. (c) To calculate the p-value here, why are we using the hypergeometric distribution instead of the binomial distribution? Practice Problem 3.7B Reconsider the Dolphin Therapy study (Investigation 3.6). Dolphin Therapy Control Group Total Showed substantial improvement 10 3 13 Did not show substantial improvement 5 12 17 Total 15 15 30 Continue to focus on the number of improvers randomly assigned to the dolphin group, and represent this value by X. (a) When the null hypothesis is true, the random variable X has a hypergeometric distribution. Specify the values of N , M , and n . (b) Calculating the exact p-value involves finding P(X __________). [ Hint : Fill in the blank with an inequality symbol and a number.] (c) Calculate this exact p-value, either by hand or with technology. Comment on whether this p-value is similar to the approximate one from your simulation results. (Be sure it¶s clear how you calculated this value.) (d) Suppose that the dolphin study had involved twice as many subjects, again with half randomly assigned to each group, and with the same proportion of improvers in each group. Determine the exact p-value in this case, and comment on whether/how it changes from the p-value with the real data. Explain why this makes sense.
Chance/Rossman, 2018 ISCAM III Investigation 3.8 216 Investigation 3.8: CPR vs. Chest Compressions For many years, if a person experienced a heart attack and a bystander called 911, the dispatcher instructed the bystander in how to administer chest compression plus mouth-to-mouth ventilation (a combination known as CPR) until the emergency team arrived. Some researchers believe that giving instruction in chest compression alone (CC) would be a more effective approach. In the 1990s, a randomized comparative experiment was conducted in Seattle involving 518 cases (Hallstrom, Cobb, Johnson, & Copass, New England Journal of Medicine , 2000): In 278 cases, the dispatcher gave instructions in standard CPR to the bystander, and in the remaining 240 cases the dispatcher gave instructions in CC alone. A total of 64 patients survived to discharge from the hospital: 29 in the CPR group and 35 in the CC group. (a) Identify the observational units, explanatory variable, and response variable. Is this an observational study or an experiment? Observational units: Explanatory: Response: Type of study: Observational Experimental (b) Construct a two-way table to summarize the results of this study. Remember to put the explanatory variable in the columns. (c) Calculate the difference in the conditional proportions who survived (CC ± CPR). Does this seem to be a noteworthy difference to you? (d) Use technology to carry out Fisher¶s Exact Test (by calculating the corresponding hypergeometric probability) to assess the strength of evidence that the probability of survival is higher with CC alone as compared to standard CPR. Write out how to calculate this probability, report the p-value, and interpret what it is the probability of. p-value = P(X ) = where X follows a hypergeometric distribution with N = , M = , and n = Interpretation:
Chance/Rossman, 2018 ISCAM III Investigation 3.8 217 Because the sample sizes are large in this study, you should not be surprised that the probability distribution in (d) is approximately normal. The large sample sizes allow us to approximate the hypergeometric distribution with a normal distribution. Thus, with large samples sizes (e.g., at least 5 successes and at least 5 failures in each group), an alternative to Fisher¶s Exact Test is the two -sample z - test that you studied in Section 3.1. (e) Use technology to obtain the two-sample z -test statistic and p-value for this study. Compare this p- value to the one from Fisher¶s Exact Test; are they similar? (f) Suggest a way of improving the approximation of the p-value. (g) (Optional): Compare the normal approximation with a continuity correction to the hypergeometric calculation. [ Hints: In R, use iscamhypernorm(29, 518, 64, 278, TRUE) or use the Analyzing Two-way Tables applet to compare the normal approximation to Fisher¶s Exact Test. ] (h) Do the data from this study provide convincing evidence that CC alone is better than standard CPR at the 10% significance level? Explain. How about the 5% level of significance? (i) An advantage to using the z -procedures is being able to easily produce a confidence interval for the parameter. Use technology to determine a 90 % confidence interval for the parameter of interest, and then interpret this interval. [ Hint : Think carefully about what the relevant parameter is in this study.]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.8 218 (j) Suppose you had defined the parameter by subtracting in the other direction (e.g., CPR ± CC instead of CC ± CPR). How would that change: (i) the observed statistic? (ii) the test statistic? (iii) the alternative hypothesis? (iv) the p-value? (v) confidence interval? Practice Problem 3.8 (a) Researchers in the CPR study also examined other response variables. For example, the 911 dispatcher¶s instructions were completely delivered in 62% of episodes assigned to chest compression plus mouth-to-mouth compared to 81% of the episodes assigned to chest compression alone. (i) Calculate the difference in conditional proportions and compare it to the original study. (ii) Without calculating, do you suspect the p-value for comparing this new response variable between the two groups will be larger or smaller or about the same as the p-value you determined above? Explain your reasoning. (b) The above study was operationally identical to that of another study and the results of the two studies were combined. Of the 399 combined patients randomly assigned to standard CPR, 44 survived to discharge from the hospital. Of the 351 combined patients randomly assigned to chest compression alone, 47 survived to discharge. (i) Calculate the difference in conditional proportions and compare it to the original study. (ii) Without calculating, do you suspect the p-value for this comparison will be larger, smaller, or the same as the p-value you determined? Explain your reasoning.
Chance/Rossman, 2018 ISCAM III Investigation 3.9 219 SECTION 4: OTHER STATISTICS Investigation 3.9: Peanut Allergies Peanut allergies have increased in prevalence in the last decade, but can they be prevented? Even among infants with a high risk of allergy? Is it better to avoid the problematic food or to encourage early introduction? Du Toit et al. ( New England Journal of Medicine , Feb. 2015) randomly assigned U.K. infants (4-11 months old) with pre-existing sensitivity to peanut extract to either consume 6 g of peanut protein per week or to avoid peanuts until 60 months of age. The table below shows the results for infants who were not initially sensitized to peanuts and whether or not the child had developed a peanut allergy at 60 months. Peanut avoidance Peanut consumption Total Peanut allergy 11 2 13 No allergy 172 193 365 Total 183 195 378 (a) Calculate the proportion of children developing a peanut allergy in each group. Does this appear to be a large difference to you? (b) Use Fisher¶s Exact Test to investigate whether these data provide convincing evidence that the probability of developing a peanut allergy is larger among children who avoid peanuts for the first 60 months. [ Hint : State the hypotheses in symbols and in words. Define the random variable and outcomes of interest in computing your p-value.] Do you consider this strong evidence that the peanut consumption effectively deters development of a peanut allergy in this population? (c) Would you feel any differently about the magnitude of the difference in proportions if the conditional proportions developing a peanut allergy had been 0.500 and 0.55? Explain. Discussion : When the baseline rate (probability) of success is small, an alternative statistic to consider rather than the difference in the conditional proportions (which will also have to be small by the nature of the data) is the ratio of the conditional proportions. First used with medical studies where ³success´ is often defined to be an unpleasant event (e.g., death), this ratio was termed the relative risk . - -0 - - T I - - Rauf ' 4183=01101 Fan = 4450£03 xn hypergeometric III } ! I 9%4=724 - - -
Chance/Rossman, 2018 ISCAM III Investigation 3.9 220 Definition: The relative risk is the ratio of the conditional proportions, often intentionally set up so that the value is larger than one: Relative risk = Proportion of successes in group 1 (the larger proportion) Proportion of successes in group 2 (the smaller proportion) The relative risk tells us how many times higher the ³risk´ or ³likelihood´ of ³success´ is in group 1 compared to group 2. (d) Determine and interpret the ratio of the conditional proportions who developed peanut allergy between the peanut avoiders and the peanut consumers in this study. (e) Because we are now working with a ratio, we can also interpret this statistic in terms of percentage change . Subtract one from the relative risk value and multiply by 100% to determine what percentage higher the proportion who developed a peanut allergy is in the avoidance group compared to the consumption group. Of course, now we would also like a confidence interval for the corresponding parameter, the ratio of the underlying probabilities of allergy between these two treatments. When we produced confidence intervals for other parameters, we examined the sampling distribution of the corresponding statistic to see how values of that statistic varied under repeated random sampling. So now let¶s examine the behavior of the relative risk of conditional proportions using the Analyzing Two-Way Tables applet to simulate the random assignment process (as opposed to simulating the random sampling from a binomial process) under the (null) assumption that there¶s no difference between the two treatments. [See the Technology Detour below for software instructions.] (f) Generate a null distribution for Relative Risks: x Check the 2 × 2 box x Enter the two-way table into the applet and press Use Table . x Generate 1000 random shuffles. x Use the Statistic pull-down menu to select Relative Risk . Describe the behavior of the null distribution of relative risk values. (g) Where does the observed value of the relative risk from the actual study fall in the null distribution of the relative risks? What proportion of the simulated relative risks are at least this extreme? (h) What percentage of the simulated relative risks are larger than 2.1 (just so you have a non-zero value to compare to later)? - - - statistic Farol pious = 0%6%3=5.861 - - 4.861*100 % 486 . I % higher D- Volker 0.009 10.8%
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.9 221 But can we apply a mathematical model to this distribution? (i) Why does it make sense that the mean of the simulated relative risks is close to the value 1? [ Hint : Remember the assumption behind your simulation analysis.] (j) You should notice skewness in the distribution of relative risk values. Explain why it is not surprising for the distribution of this statistic to be skewed to the right (especially with smaller sample sizes). Note: If the number of successes equals zero, the applet adds 0.5 to each cell of the table before calculating the relative risk. (k) In fact, this distribution is usually well modeled by a log normal distribution. To verify this, check the ln relative risk box (in the lower left corner) to take the natural log of each relative risk value and display a new histogram of these transformed values. Describe the shape of this distribution. Is the distribution of the lnrelrisk well modeled by a normal distribution? (l) What is the mean of the simulated lnrelrisk values? Why does this value make sense? (m) What is the standard deviation of the lnrelrisk values? (n) Calculate the observed value of ln( p Ö 1 / p Ö 2 ) for this study (but don¶t round up). Where does this value fall (near in the middle or in the tail) of this simulated distribution of lnrelrisk values? Has the empirical p-value changed? (o) If you found the empirical p-value using ln( p Ö 1 / p Ö 2 ), it would be identical to the empirical p-value found in (h). Why? What did change about the distribution? [ Hint : What percentage of the simulated ln rel risk values are more extreme than ln(2.1), how does this compare to (h)?] Hae . Real Risk > I H-H-ocitrue.ee I Hoo . Taro - - Icons Real Risk - - foin Al Ha : Taro > Theon . In C O - 0601/0.0103 ) = In ( 5.8611 = 1.77 In ( 2nd ) = 0.741937
Chance/Rossman, 2018 ISCAM III Investigation 3.9 222 Theoretical Result: It can be shown that the standard error of the ln relative risk is approximated by D B B C A A p p SE ² ± ² ² ± ¸ ¸ ¹ · ¨ ¨ © § 1 1 1 1 Ö Ö ln 2 1 where A , B , C , and D are the observed counts in the 2 × 2 table of data, with A and B representing the number of ³successes´ in the two groups. Having this formula allows us to determine the variability from sample to sample without conducting the simulation first. (p) Calculate the value of this standard error of the ln(rel risk) for this study. Interpret this value and compare it to the standard deviation from your simulated lnrelrisk values. (q) You may find this approximation is in the ballpark but not all that close. What assumption is made by the simulation that is not made by this formula? What if you made the same assumption in this formula? [ Hint : Think pooled p Ö .] (r) Now that you have a statistic (ln rel risk) that has a sampling distribution that is approximately normal, what general formula can we use to determine a confidence interval for the parameter? (s) Calculate the midpoint, 95% margin-of-error, and 95% confidence interval endpoints using the observed value of ln(rel risk) as the statistic and using the standard error calculated in (p). (t) What parameter does the confidence interval in (s) estimate? (u) Exponentiate the endpoints of this interval to obtain a confidence interval for the ratio of the probabilities of developing a peanut allergy between these two treatments. Interpret this interval. (v) Is zero in this interval? Do we care? What value is of interest instead? A B C D A+C B+D - SE = 1k¥ I 0.762 Skip O Lyn - statistic I z* SE / In 15.864 1.768 I 1.96 ( 0.762 ) s ( o . 025 , 3.291 In 1 Real Risk ) 95 X CI for Real Risk 60025 ! ' fo . ogseseg , = ( l - 32 , 26 . I )
Chance/Rossman, 2018 ISCAM III Investigation 3.9 223 (w) Is the midpoint of this confidence interval for the population relative risk equal to the observed value of the sample relative risk? Explain why this makes sense. (x) Compare the confidence interval you just calculated to the one given by the applet if you now check the 95% CI for relative risk box. (y) Suppose you used this method to construct a confidence interval for each of the 1,000 simulated random samples that you generated in (f). Because our simulation assumes the null hypothesis to be true, do you expect the value 1 to be in these intervals? All of them? Most of them? What percentage of them? Explain. Study Conclusions This study provided strong evidence that children with pre-existing sensitivity to peanut extract are more likely to develop a peanut allergy by 5 years of age if they avoid consuming peanuts (exact one- sided p-value = 0.0074, z -score = 2.66). An approximate 95% confidence interval for the difference in the probabilities indicates that the probability of develop a peanut allergy is 0.013 (1.3 percentage points) to .087 (8.7 percentage points) higher for those avoiding peanuts. However, focusing on the difference in ³success´ probabilities has some limitations. In particular, if the probabilities are small it may be difficult for us to interpret the magnitude of the difference between the values. Also, we have to be very careful with our language, focusing on the difference in the allergy probabilities and not the percentage change. An alternative to examining a confidence interval for the difference in the conditional probabilities is to construct a confidence interval for the relative risk (ratio of conditional probabilities). A large sample approximation exists for a z -interval for the ln(relative risk) which can then be back-transformed to an interval for long-run relative risk. Many practitioners prefer focusing on this ratio parameter rather than the difference. From this study, we are 95% confident that ratio of the peanut allergy is between 1.32 and 26.08. This means that avoiding peanuts rather than some consumption raises the probability of developing a peanut allergy by between 32% and 250%. Note: It can be risky to interpret the relative risk in isolation without considering the absolute risks (conditional proportions) as well. For example, doubling a very small probability may not be noteworthy, depending on the context. You should also note that the percentage change calculation and interpretation depends on which group (e.g., treatment or control) is used as the reference group. - Right stewed data r - - (a9thal
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.9 224 Practice Problem 3.9A (a) For the peanut allergy study, find a 95% confidence interval for the probability of not developing a peanut allergy comparing the consumption treatment to the avoidance treatment. (b) Provide a one-sentence interpretation of the interval in (a). (c) Is the interval the same or closely related to the one you found in Investigation 3.9? Does one interval provided strong evidence of a treatment effect? Which interval would you report to new parents? (d) The article reports that ³the power to detect a difference in risk of 30 percentage points was 80.0%.´ Explain what this means in your own words. Practice Problem 3.9B A multicenter, randomized, double-blind trial involved patients aged 36-65 years who had knee injuries consistent with a degenerative medial meniscus tear (Shivonen et al., New England Journal of Medicine , 2013). Patients received either the most common orthopedic procedure (arthroscopic partial meniscectomy, n 1 = 70) or sham surgery that simulated the sounds, sensations, and timing of the real surgery ( n 2 = 76). After 12 months, 54 of those in the treatment group, reported satisfaction, compared to 53 in the sham surgery. (a) Calculate and interpret a confidence interval for the ratio of the probabilities (relative risk) of satisfaction for these two procedures. (b) What does your interval in (a) indicate about whether those receiving the orthopedic surgery are significantly more likely that those receiving a sham surgery to report satisfaction after 12 months? Explain your reasoning. Summary of Inference for ͞Relative Risk͟ Statistic: ratio of conditional proportions (typically set up to be larger than one) = 1 Ö p / 2 Ö p Hypotheses : H 0 : S 1 / S 2 = 1; H a : S 1 / S 2 <, >, or 1 p-value : Fisher¶s Exact Test or normal approximation on ln( 1 Ö p / 2 Ö p ) Confidence interval for S 1 / S 2 : exponentiate endpoints of » ¼ º « ¬ ª ² ± ² ² ± r D B B C A A z p p 1 1 1 1 * ) Ö / Ö ln( 2 1 Note: The confidence interval for the relative risk will not necessarily be symmetric around the statistic.
Chance/Rossman, 2018 ISCAM III Investigation 3.9 225 Technology Detour ʹ Simulating Random Assignment (two-way tables) We can select observations from a hypergeometric distribution for the cell 1 counts and then compute the cell 2 counts and the number of failures based on the fixed row and column totals. With this information you can compute the difference in conditional proportions, relative risk, etc. We show how to calculate p Ö unvac below, the rest is up to you. Also keep in mind you can use ³log´ to calculate the natural logs of values. Also recall how you created a Boolean expression in Investigation 3.1 to find the p-value from the simulated results. In R > VacInfCount=rhyper(10000, 210, 4985, 2584) x 210 is the number of successes ( M ) x 4985 is the number of failures ( N ± M) x 2584 is the sample size ( n ) > UnvacInfCount = 210-VacInfCount > Unvacphat = UnvacInfCount/2584
Chance/Rossman, 2018 ISCAM III Investigation 3.10 226 Investigation 3.10: Smoking and Lung Cancer After World War II, evidence began mounting that there was a link between cigarette smoking and pulmonary carcinoma (lung cancer). In the 1950s, three now classic articles were published on the topic. One of these studies was conducted in the United States by Wynder and Graham (³Tobacco Smoking as a Possible Etiologic Factor in Bronchiogenic Cancer,´ 1950, Journal of the American Medical Association ). They found records from a large number of patients with a specific type of lung cancer in hospitals in California, Colorado, Missouri, New Jersey, New York, Ohio, Pennsylvania, and Utah. Of those in the study, the researchers focused on 605 male patients with this form of lung cancer. Another 780 male hospital patients with similar age and economic distributions without this type of lung cancer were interviewed in St. Louis, Boston, Cleveland, and Hines, IL. Subjects (or family members) were interviewed to assess their smoking habits, occupation, education, etc. The table below classifies them as non-smoker or light smoker, or at least a moderate smoker. Wynder and Graham None or Light smoker (0-9 per day) Moderate to Heavy smoker (10-35+ per day) Total Lung cancer patients 22 583 605 Controls 204 576 780 Total 226 1159 1385 (a) Calculate and interpret the relative risk of being a lung cancer patient for the moderate to heavy (³regular´) smokers compared to the None or Light ³non - smokers.´ (b) Does this feel like an impressive statistic to you? Do you think it will be statistically significant? (c) What is the estimate of the baseline rate of lung cancer from this table? Does that seem to be a reasonable estimate to you? How is this related to the design of the study? (d) Calculate and interpret the relative risk of being a control patient for the non-smokers compared to the regular smokers. How does this compare to (a) and (b)? RR = Plea , IR . , µ = 58%9/22/226=517 - Yes e - II = , s - - O ' 437 berylline Skip
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.10 227 Definition: There are three main types of observational studies. x Cross-classification study . The researchers categorize subjects according to both the explanatory and the response variable simultaneously. For example, they could take a sample of adult males and simultaneously record both their smoking status and whether they have lung cancer. A common design is cross-sectional , where all observations are taken at a fixed point in time. x Cohort study. The researchers identify individuals according to the explanatory variable and then observe the outcomes of the response variable. These are usually prospective designs and may even follow the subjects (the cohort ) for several years. x Case-control study . The researchers identify observational units in each response variable category (the ³cases´ and the ³controls´) and then determine the explanatory variable outcome for each observational unit. How the controls are selected is very important in determining the comparability of the groups. These are often retrospective designs in that the researchers may need to ³look back´ at historical data on the observational units. (e) Would you classify the Wynder & Graham study as cross-classified, cohort, or case-control? Explain. (f) Explain why using the relative risk (or even the difference in proportions) as the statistic can be problematic with case-control studies. An advantage of case-control studies is when you are s tudying a ³rare event,´ you can ensure a large enough number of ³successes´ and fairly balanced group sizes. However, a disadvantage is that it does not make sense to calculate ³risk´ or likelihood of success from a case -control study, because the distribution of the response variable has been manipulated/determined by the researcher. Switching the roles of the explanatory and response often gives very different results for relative risk (changing our measure of the strength of the relationship) and often really isn¶t the comparison of interest stated by the research question. Consequently , conditional proportions of success and relative risk are not appropriate statistics to use with case-control studies . Instead, we will consider another way to compare the uncertainty of an outcome between two groups. Definition: The odds of success are defined as the ratio of the proportion of ³successes´ to the proportion of ³failures , ´ which simplifies to the ratio of the number of successes to failures. group the in failures of number group the in successes of number group the in failures of proportion group the in successes of proportion odds For example, if the odds are 2-to-1 in favor of an outcome, we expect a success twice as often as a failure in the long run, so this corresponds to a probability of 2/3 of the outcome occurring. Similarly, if the probability of success is 1/10, then the odds equals (1/10)/(9/10) = 1/9, and failures is 9 times more likely than success. It¶s important to note how the ³outcome´ is defined. For example, in horse racing, odds are typically presented in terms of ³losing the race,´ so if a horse is given 2 -to-1 odds against winning a race, we expect the horse to lose two-thirds of the races in the long run. = - ¥a¥=
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.10 228 Definition: The odds ratio is another way to compare conditional proportions in a 2 × 2 table. Odds ratio = (number of successes in group 1/number of failures in group 1) (number of successes in group 2/number of failures in group 2) Like relative risk, if the odds ratio is 3, this is interpreted as ³the odds of success in the µtop¶ group are 3 times (or 200%) higher than the odds of success in the µbottom¶ group.´ However, the relative risk and the odds ratio are not always similar in value. (g) Calculate and interpret the odds ratio comparing the odds of lung cancer for the smokers to the odds of lung cancer for the control group. Does this match (a)? (h) Calculate and interpret the odds ratio for being in the control group for the non-smokers compared to the smokers? Does this match (d) or (g)? Key Results: A major disadvantage to relative risk is that your (descriptive) measure of the strength of evidence that one group is ³better´ depends on which outcome you define a success as well as which variable you treat as the explanatory and which as the response. But a big advantage to odds ratio is that it is invariant to these definitions (If your odds are 10 times higher to die from lung cancer if you are a smoker, then your odds of being a smoker are 10 times higher if you died from lung cancer). The only real disadvantage is that the odds ratio is trickier to interpret (³higher odds´ vs. the more natural ³more likely´). Thus, for case -control studies in particular, the odds ratio is the preferred statistic. However, when the success proportions are both small, the odds ratio can be used to approximate the relative risk. (i) Let W (³tau´) represent the population odds ratio of having lung cancer for those who are regular smokers compared to those who are not regular smokers, so W = S 1 /(1 ± S 1 )/( S 2 /(1 ± S 2 )). State the null and alternative hypotheses in terms of this parameter. (j) Use Fisher¶s Exact Test to calculate the p -value. (Note: We get the same p-value no matter which statistic we use, why is that?) odds for Smoker light odds ratio = 53¥ - - t.oizfoddzI-o.com/.osyo.,qq- 9.385 arm - * = * .Q -0 - - Ho : Tamia Anon - smoker , Ho : 76=1 , Hai 7 > I # p - vote =p ( X s 22 ) , X n hyper G M - - 605 , n = 226 V. V. Small N - - 1385
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.10 229 But we still need a confidence interval for this new parameter as well. Theoretical Result: The sampling distribution of the sample odds ratio also follow a log-normal distribution like the relative risk (for any study design). Thus, we can construct a confidence interval for the population/treatment log-odds ratio using the normal distribution. The standard error of the sample log-odds ratio (using the natural log) is given by the expression: SE ( ln odds ratio ) = D C B A 1 1 1 1 ² ² ² where A , B , C , and D are the four table counts. (k) Calculate this standard error and then use it to find an approximate 95% confidence interval for the log odds ratio. (l) Back-transform the end-points of the interval and (k) and interpret your results. (m) Does your interval contain the value one? Discuss the implications of whether or not the interval contains the value one. (n) Compare your results to the following JMP output: mm Sf ( In odd ratio o ) = # E ' - ÷ tzottszto-io.com q s Y . CI fon kn odd ratio ( 1.7 84 , 2 . 694 ) In 19.385 ) * on . . ie . . . e. a ; . . . . . - #x-- Ng me mm mm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.10 230 (o) Summarize (with justification) the conclusions you would draw from this study (using both the p- value and the confidence interval, and addressing both the population you are willing to generalize to and whether or not you are drawing a cause-and-effect conclusion). Study Conclusions Because the baseline incidence of lung cancer in the population is so small, the researchers conducted a case-control study to ensure they would have both patients with and without lung cancer in their study (matched by age and economic status). In a case-control study, the odds ratio is a more meaningful statistic to compare the incidence of lung cancer between the two groups. We find that the sample odds of lung cancer are almost ten times larger for the regular smokers compared to the non- regulars in this study. By the invariance of the odds ratio, this also tells us that the odds of being a regular smoker (rather than not) are almost 10 times higher for those with lung cancer. We are 95% confident that in the larger populations represented by these samples, the odds of lung cancer are 5.92 to 15.52 times larger for the regular smokers (F isher¶s Exact Test p-value << 0.001). If both success proportions had been small, we could say this is approximately equal to the relative risk and use the words ³10 times higher´ or ³10 times more likely.´ The full data set (which broke down the second category further) also shows that the odds of having lung cancer increase with the amount of smoking (light smokers have 2 times the odds, heavy smokers have 11 times the odds, and chain smokers have 29 times the odds!) ± this is called a ³dose - response.´ We see a strong relationship between the size of the ³dose´ of smoking and occurrence of lung cancer for these patients. However, this stu dy was criticized for ³retrospective bias´ in asking subjects to accurately remember, and be willing to tell, details of their lifestyles. This can also be complicated by asking these questions of patients who know they have been diagnosed with lung cancer, as their recall may be affected by this knowledge. We also have to worry whether hospitalized males are representative of the male population. Other studies around the same time (e.g., Hammond and Horn, Wynder and Cornfield) found similar increases in ³risk´ with smoking. However, these were all observational studies so critics reasonably argued that other variables such as lifestyle, diet, exercise, and genetics could be responsible for both the smoking habits and the development of lung cancer. Although there was still much (on-going) research to be done, and these studies did not claim to prove that cigarette smoking causes lung cancer, these landmark studies set the stage. They also led to many efforts in improving study design and in developing statistical tools (such as relative risk and odds ratios) to analyze the results. * *
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.10 231 Practice Problem 3.10A A researcher searched court records to find 908 individuals who had been victims of abuse as children (11 years or younger). She then found 667 individuals, with similar demographic characteristics, who had not been abused as children. Based on a search through subsequent years of court records, she determined how many in each of these groups became involved in violent crimes (Widom, 1989). The results are shown below: Abuse victim Control Involved in violent crime 102 53 Not involved in violent crime 806 614 (a) Is this an observational study or an experiment? If observational, which type? (b) Calculate and interpret the odds ratio of being involved in a violent crime between these two groups. (c) The one-sided p- value for this result (using Fisher¶s Exact Test) is 0.018 (confirm). Is it reasonable to conclude that being a victim of abuse as a child causes individuals to be more likely to be violent toward others afterwards? Explain. (d) Calculate and interpret a 95% confidence interval for the population odds ratio. (e) Is it reasonable to generalize these results to all abuse and non-abuse victims? Explain. Practice Problem 3.10B (a) Suppose that individuals in Group 1 have a 2/3 probability of success, and those in Group 2 have a 1/2 probability of success. Calculate and interpret the relative risk of success, comparing Group 1 to Group 2. (b) Calculate and interpret the odds of success for Group 1. (c) Calculate and interpret the odds ratio of success, comparing Group 1 to Group 2. (d) Suppose Group 3 has a 0.1 probability of success, and Group 4 has a 0.05 probability of success. Repeat questions (a) and (c). (e) In which case (Groups 1 and 2, or Groups 3 and 4) are the relative risk and odds ratio more similar? Why? Summary of Inference for Odds Ratio Statistic: W Ö = [ 1 Ö p /(1 ± 1 Ö p )]/[ 2 Ö p /(1 ± 2 Ö p )] = ( A × D ) / ( B × C ) (typically set up to be larger than one) Hypotheses : H 0 : W = 1; H a : W <, >, or 1 p-value : Fisher¶s Exact Test or normal approximation on ln( W Ö ) confidence interval for W : exponential of » ¼ º « ¬ ª ² ² ² r D C B A z 1 1 1 1 * ) Ö ln( W In R: > fisher.test(matrix(c(a, c, b, d), nrow=2), alt = ) A B C D 7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.11 232 Investigation 3.11: Sleepy Drivers Connor et al. ( British Medical Journal , May 2002) reported on a study that investigated whether sleeplessness is related to car crashes. The researchers identified all drivers or passengers of eligible light vehicles who were admitted to a hospital or died as a result of a car crash on public roads in the Auckland, New Zealand region between April 1998 and July 1999. Though cluster sampling, they identified a sample of 571 drivers who had been involved in a crash resulting in injury and a sample of 588 drivers who had not been involved in such a crash as representative of people driving on the region¶s roads during the study period. The researchers asked the individuals (or proxy interviewees) whether they had a full night¶s sleep (at least seven hours mostly between 11pm and 7am) any night during the previous week. The researchers found that 61 of the 535 crash drivers who responded and 44 of the 588 ³no crash´ drivers had not gotten at least one full night¶s sleep in the previous week. (a) Identify the observational units and variables in this study. Which variable would you consider the explanatory variable and which the response variable? Was this an observational study or an experiment? If observational, would it be considered a case-control, cohort, or cross-classified design? Observational units: Explanatory variable: Response variable: Type of study: (b) Organize these sample data into a 2 × 2 table: NR fXOO QLghW¶V VOeeS in past week (³VOeeS deSULYed´) AW OeaVW RQe fXOO QLghW¶V VOeeS in past week (³QRW VOeeS deSULYed´) Sample sizes Crash 535 No crash 588 Total 1123 (c) Which statistic (odds ratio or relative risk) is most appropriate to calculate from this table, considering how the data were collected? Calculate and interpret this statistic. Does the value of this statistic support the researchers¶ conjecture? Explain. I 61 535-61--474 44 588-442544 105 1018 ataksrdsio = I 44 - = 4591 424 544
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.11 233 Statistical Inference (d) Outline the steps of a simulation that models the randomness in this study and helps you assess how unusual the statistic is that you calculated in (c) when the null hypothesis is true. Include a statement of the null and alternative hypotheses for your choice of parameter. (e) Use technology to carry out your simulation and draw your conclusions. [ Hint : Be careful of rounding issues in finding your p-value, make sure you are including observations as extreme as the observed in your count.] (f) Calculate and interpret a 95% confidence interval for your choice of parameter. (g) Summarize (with justification) the conclusions you would draw from this study (using both the p- value and the confidence interval, and addressing both the population you are willing to generalize to and whether or not you are drawing a cause-and-effect conclusion). Ho : 2=1 Ha : 74 SEC In 2) = I 69+474+444+4474=0-02075 954 . CI for he In 11.59 ) I 1. qf ( o - 020751=10 ' 0570.871 95 't CI ofz } ( f ' 057 , eoff , = ( 1.06 , 2.39 )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Chance/Rossman, 2018 ISCAM III Investigation 3.11 234 Study Conclusions The proportions of drivers who had not gotten a full night¶s sleep in the previous week were 0.107 for the case group of drivers who had been involved in a crash, compared to 0.075 for the control group who had not. Because these proportions are small, and because of the awkward roles of the explanatory and response variables in this study (we would much rather make a statement about the proportion of sleepless drivers who are involved in crashes), the odds ratio is a more meaningful statistic to calculate. The sample odds of having missed out o n a full night¶s sleep were 1.59 times higher for the case group than for the control group. By the invariance of the odds ratio, we can also state that the sample odds of having an accident are 1.59 times (almost 60%) higher for those who do not get a full night sleep than those who do. The empirical p-value (less than 5%) provides moderately strong evidence that such an extreme value for the sample odds ratio is unlikely to have arisen by chance alone if the proportion of drivers with sleepless nights was 0.09 for both the population of ³cases´ and the population of ³controls.´ (Using a one- sided Fisher¶s Exact Test, we get p-value = 0.016.) A 95% confidence interval for the population odds ratio extends from 1.06 to 2.39 (1.04 to 2.45 with R). This interval provides statistically significant evidence that the population odds ratio exceeds one and that, with 95% confidence, the odds of having an accident are about 1 to 2.5 times higher for the sleepy drivers than for well rested drivers. We cannot attribute this association to a cause-and-effect relationship because this was an observational (case-control) study. We might also want to restrict our conclusions to New Zealand drivers. Practice Problem 3.11 Another landmark study on smoking began in 1952 (Hammond and Horn, 1958, ³ Smoking and death rates ² Report on forty-four months of follow-up of 187, 783 men: II. Death rates by cause,´ JAMA ). They used 22,000 American Cancer Society volunteers as interviewers. Each interviewer was to ask 10 healthy white men between the ages of 50 and 69 to complete a questionnaire on smoking habits. Each year during the 44-month follow-up, the interviewer reported whether or not the man had died, and if so, how. They ended up tracking 187,783 men in nine states (CA, IL, IA, MI, MN, NJ, NY, PA, WI). Almost 188,000 were followed up by the volunteers through October 1955, during which time about 11,870 of the men had died, 488 from lung cancer. The following table classifies the men as having a history of regular cigarette smoking or not and whether or not they died from lung cancer . In this study, nonsmokers are grouped with occasional smokers, including pipe- and cigar-only smokers. Hammond and Horn Not regular smoker Regular smoker Total Lung cancer death 51 397 448 Alive or other cause of death 108,778 78,557 187,335 Total 108,829 78,954 187,783 (a) Is this a case-control, cohort, or cross-classified study? (b) Calculate and interpret an odds ratio from the two-way table. (c) Produce and interpret a 95% confidence interval for the population odds ratio. (d) Are these results consistent with the Wynder and Graham study? Explain.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help