ISCAM-Chapter3 AC
pdf
keyboard_arrow_up
School
Rhodes College *
*We aren’t endorsed by this school
Course
211
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
29
Uploaded by DeanStarling489
Chance/Rossman, 2018
ISCAM III
Investigation 3.7
209
Investigation 3.7: Is Yawning Contagious?
The folks at
MythBusters
, a popular television program on the Discovery Channel, investigated whether
yawning is contagious by recruiting fifty subjects at a local flea market and asking them to sit in one of
three small rooms for a short period of time.
For some of the subjects, the attendee yawned while
leading them to the room (planting a yawn ³seed´),
whereas for other subjects the attendee did not
yawn. As time passed, the researchers watched (via a hidden camera) to see which subjects yawned.
(a) Identify the explanatory variable (EV) and the response variable (RV) in this study.
EV:
RV:
(b) Define the relevant parameter of interest, and state the null and alternative hypotheses for this study.
Be sure to clearly define any symbols that you use.
Parameter:
H
0
:
H
a
:
In the study they found that 10 of 34 subjects who had been given a yawn seed actually yawned
themselves, compared with 4 of 16 subjects who had not been given a yawn seed.
(c) Create a two-way table summarizing the results,
using the explanatory variable as the column
variable
.
Totals
Totals
(d) Explain how you would carry out a simulation analysis to approximate a p-value for this study.
[
Hint
: How many cards?
How many of each type?
How many would you deal out? What would you
record? How would you find the p-value?]
weather
them
n'
Seed
or
net
Yawn
or
not
㱺
I
The
doffenme
in
prohahyuf
yawing
tickets
-
Bros
)
two
Gwp
been
a
G
-
oaf
nuked
Teed
>
Tmsn
㱺
From
>
u
seed
nosed
yawn
10
4
14
no
yawn
24
12
36
34
16
50
←
d
im
-
Em
-
4g
=
0.044
=
.
*
Differ
.
µ
pnohahrj
or
=
Diff
.nu
in
ppnluhipnprhin
II.
a
-
a.
"
a
thinner
-
nie
i
Ik
n
.
.
.
.
.
"
wth
µ
=
So
p
-
values
Pl
low
mm
)
M=
14
pl
X
7101
n
-
-
34
Chance/Rossman, 2018
ISCAM III
Investigation 3.7
210
(e) Open the
Analyzing Two-way Tables
applet.
x
Paste in the raw data and press
Use Data
or enter the titles and
counts of a two-way table and
press
Use Table
. (Or check the
2
×
2 box and enter the cell
values.)
x
Check the
Show Shuffle
Options
box.
x
Set
Number of Shuffles
to
1000
.
x
Press
Shuffle
.
Briefly describe this randomization (null) distribution:
What is its shape? What is the mean? What is the
standard deviation?
(f) Specify the observed value for the difference in the conditional proportions in the
Count Samples
box.
Then indicate whether the research conjecture expected a larger or smaller proportion of successes
in Group A by choosing
Greater Than
or
Less Than
from the pull-down menu. Then press the
Count
button.
Exact p-value
The simulations you have conducted in Investigations 3.6 (Dolphin Therapy) and above approximated
the p-value for two-way tables arising from random assignment by assuming the row and column totals
are fixed. In this case, the probability of obtaining a specific number of successes in one group can be
calculated exactly using the
hypergeometric
probability distribution. (We used the independent
binomial
distributions with the teen hearing loss study, where we wanted to sample separately from two
populations and the overall number of successes was not fixed in advance.)
Keep in mind, that under the null hypothesis, we are assuming the group assignments made no
difference and
that there would be 14 successes (³yawners´) and 36 failures (³non
-
yawners´) between
the two groups regardless.
Because the random assignment makes every configuration of the subjects between the two groups
equally likely, we determine the probability of any particular outcome for the number of yawners and
non-yawners by first counting the total number of ways to assign 34 of the subjects to the yawn-seed
group (and 16 to the no-yawn-seed group) in the denominator. The numerator is then the number of
ways to get a particular set of configurations for that group, such as those consisting of 10 yawners and
24 non-yawners.
fco.tt#Ifxaotp.-an
.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.7
211
(g) How many ways altogether are there to randomly assign these 50 subjects into one group of 34
(yawn-seed group) and the remaining group of 16 (no-yawn-seed group)? [
Hint
: Recall what you saw
earlier with the binomial distribution and counting the number of ways to obtain S successes and F
failures in
n
trials. See the Technical Details in Investigation 1.1.]
(h) Now consider the 14 successes and the 36 failures.
How many ways are there to randomly select 10
of the successes? How many ways are there to randomly assign 24 of the failures to be in the yawn seed
group?
How should you combine these two numbers to calculate the total number of ways to obtain 10
successes and 24 failures in the yawn-seed group, the configuration that we observed in the study?
Successes
Failures
Total:
(i) To determine the exact probability that random assignment would produce exactly 10 successes and
24 failures into the group of 34 subjects, divide your calculation in (h) by your calculation in (g).
(j) Explain why your answer to (i) is
not
yet the p-value for this study.
Result:
The probability of obtaining
k
successes in Group A, with
n
observations, when sampled from
a two-way table with
N
observations, consisting of
M
successes and
N
±
M
failures is:
P(X =
k
) = C(
M
,
k
)
×
C(
N
±
M
,
n
±
k
) / C(
N
,
n
)
where C(
N
,
n
) =
N
!/[
n
!(
N
±
n
!)] is the number of ways to choose
n
items from a group of
N
items.
X represents the number of successes randomly selected for group A.
X is a
hypergeometric
random
variable.
Also note E(X) =
n
(
M
/
N
) and SD(X) =
)]
1
(
/[
)
)(
(
2
±
±
±
N
N
n
N
M
N
nM
.
In this study, we had
N
= 50 subjects and we defined yawning to be success so
M
= 14. We also
arbitrarily chose to focus on the yawn-seed group, so
n
= 34.
This calculation works out the same if
you had defined ³not yawning´ to be a success and/or if you had focused on the 16 people in the no
-
yawn-seed group. You just need to make sure you count consistently.
O
CoE
O
(
58k
go.nu
.
-
492*10
"
x
lool
=
(
Yf
)
*
13241=125162700
pl
#
ok
fzg/
I
IM
PIX
7107
Iplx
-
lulxplx
-111
+
plash
)
-11714=13
)
tpl
#
14
)
-
-
Chance/Rossman, 2018
ISCAM III
Investigation 3.7
212
We will continue to define the p-value to be the probability of obtaining results
at least as extreme
as
those observed in the actual study.
Because we expected more yawners in the yawn-seed group, the p-
value is the probability of randomly assigning
at least
10 of the yawners in the yawn-seed group.
So far you have found P(X = 10) = C(14, 10)
×
C(36, 24) / C(50, 34) = 0.2545.
(k) Calculate P(X = 11), P(X = 12), P(X = 13), and P(X = 14) using the hypergeometric probability
formula.
P(X = 11)
P(X = 12)
P(X = 13)
P(X = 14)
Why do we stop at 14?
(l) Sum all five probabilities together (including P(X = 10)) to determine the exact p-value for the
yawning study.
How does this p-value compare to the empirical p-value from the applet simulation?
Write a one or two sentence interpretation of this p-value.
Exact p-value:
Comparison:
Interpretation:
Definition:
Using the hypergeometric probabilities to determine a p-value in this fashion for a two-
way table is called
Fisher¶s Exact Test
, named after R. A. Fisher.
(m) Calculate this hypergeometric probability using technology (see Technology Detour on next page).
(n) Set up and carry out the calculation to determine the exact p-value where you define the success to
be ³not yawning´
and the group of interest to be the yawn seed group.
(o) Set up and carry out the calculation to determine the exact p-value, where you focus on the number
that did not yawn in the no-yawn-seed group. Show that you obtain the same exact p-value as before.
0.1165
=
ipb
"
"
-
b.
0702
0.0198
=
=
=
0.0015
0.5128
V.
sina.MN
T
-
*
R
-
Cohen
-
Iscamhypwprubfk
>
24
,
f-
50,1=36
,n=34
17124
Leg
)
hairtail
.
-
TRUE
)
p(
nor
More
)
,
f-
SO
,
a-
16
,
f-
36
Iscamhyperptrobl
Koh
,N=5UM=3f,a=
16
lower
tail
-
-
FALSE
I
faut
p
-
true
.
¥rP
Hall
'
a
pH
ship
ix.
is
l
t
pl
X
=
141
→
→
+
Iii
!
xi
→
I
sit
→
Is
:
)
+
to
1941
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
p
-
van
can
the
written
as
to
"
)
1311
=
0.5/28
Chance/Rossman, 2018
ISCAM III
Investigation 3.7
213
Technology Detour
ʹ
Calculating Hypergeometric Probabilities ;Fisher’s Exact TestͿ
In R
,
the
iscamhyperprob
function takes the following inputs:
x
k
, the observed value of interest (or the difference in conditional proportions, assumed if value
is less than one, including negative)
x
total
, the total number of observations in the two-way table
x
succ
, the overall number of successes in the table
x
n
, the number of observations in ³group A´
x
lower.tail
, a Boolean which is TRUE or FALSE
For example:
iscamhyperprob(k=10, total=50, succ=14, n=34,
lower.tail=FALSE)
Analyzing Two-way Tables
applet
x
Check the box for
ShRZ FLVheU¶V E[acW TeVW
in the lower left corner.
x
A check box will appear for determining the two-sided p-value
Discussion
:
You should see that there are several equivalent ways to set up the probability calculation.
Make sure it is clear how you define success/failure and which group you are considering ³group A.´
This will help you determine the numerical values for
N
,
M
, and
n
in the calculation.
Below is a graph of the Hypergeometric distribution with
N
= 50,
M
= 14, and
n
= 34.
Using probability rules, you can show that the expected value of this distribution is
n
N
M
u
)
/
(
=
(14/50)
×
34 = 9.52 yawners in yawn seed group and the standard deviation of the probability
distribution is the square root of
n
×
(
M/N
)
×
(
N
±
n
)/
N
×
(
N
±
M
)/(
N
±
1) = 1.496 yawners.
(p) Compare this graph and the mean and standard deviation values to your simulation results.
Chance/Rossman, 2018
ISCAM III
Investigation 3.7
214
(q) What conclusions will you draw from the p-value for this study?
(r) On the
Mythbusters
program, the hosts concluded that, based on the observed difference in
conditional proportions and the large sample size,
there is ³
little doubt, yawning seems to be
contagious.´ Do you agree?
Study Conclusions
With a large p-value of 0.513
(Fisher¶s Exact Test), we do not have any evidence that the difference
between the two groups (with and without yawn seed) was not created by chance alone from the
random assignment process.
If there was nothing to the theory that yawning is contagious,
by ³luck
of the draw
´ alone, we would expect 10 or more of the yawners to end up in the yawn seed group in
more than 50% of random assignments.
Although the study results were in the conjectured direction,
the difference between the yawning proportions was not large enough to convince us that the
probability of yawning is truly larger when a yawn seed is planted.
The researchers could try the
study again with a larger sample size to increase the power of their test.
The researchers also may
want to be cautious in generalizing these results beyond the population of volunteers at a local flea
market.
It¶s also not clear how naturalistic the setting of leading individuals to a small room to wait
is.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.7
215
Practice Problem 3.7A
(a) For the Mythbusters¶ study
(p-value > 0.5), is it reasonable to conclude from this study that we have
strong evidence that yawning is
not
contagious?
Explain.
(b) Explain, in this context, what is meant in the Study Conclusions box by ³the researchers could try the
study again with
a larger sample size to increase the power of their test´ and why that is a reasonable
recommendation here.
(c) To calculate the p-value here, why are we using the hypergeometric distribution instead of the
binomial distribution?
Practice Problem 3.7B
Reconsider the Dolphin Therapy study (Investigation 3.6).
Dolphin Therapy
Control Group
Total
Showed substantial improvement
10
3
13
Did not show substantial improvement
5
12
17
Total
15
15
30
Continue to focus on the number of improvers randomly assigned to the dolphin group, and represent
this value by X.
(a) When the null hypothesis is true, the random variable X has a hypergeometric distribution.
Specify
the values of
N
,
M
, and
n
.
(b) Calculating the exact p-value involves finding P(X __________).
[
Hint
: Fill in the blank with an
inequality symbol and a number.]
(c) Calculate this exact p-value, either by hand or with technology.
Comment on whether this p-value is
similar to the
approximate one from your simulation results. (Be sure it¶s clear how you calculated this
value.)
(d) Suppose that the dolphin study had involved twice as many subjects, again with half randomly
assigned to each group, and with the same proportion of improvers in each group.
Determine the exact
p-value in this case, and comment on whether/how it changes from the p-value with the real data.
Explain why this makes sense.
Chance/Rossman, 2018
ISCAM III
Investigation 3.8
216
Investigation 3.8: CPR vs. Chest Compressions
For many years, if a person experienced a heart attack and a bystander called 911, the dispatcher
instructed the bystander in how to administer chest compression plus mouth-to-mouth ventilation (a
combination known as CPR) until the emergency team arrived. Some researchers believe that giving
instruction in chest compression alone (CC) would be a more effective approach. In the 1990s, a
randomized comparative experiment was conducted in Seattle involving 518 cases (Hallstrom, Cobb,
Johnson, & Copass,
New England Journal of Medicine
, 2000): In 278 cases, the dispatcher gave
instructions in standard CPR to the bystander, and in the remaining 240 cases the dispatcher gave
instructions in CC alone.
A total of 64 patients survived to discharge from the hospital: 29 in the CPR
group and 35 in the CC group.
(a) Identify the observational units, explanatory variable, and response variable. Is this an observational
study or an experiment?
Observational units:
Explanatory:
Response:
Type of study:
Observational
Experimental
(b) Construct a two-way table to summarize the results of this study.
Remember to put the explanatory
variable in the columns.
(c) Calculate the difference in the conditional proportions who survived (CC
±
CPR).
Does this seem to
be a noteworthy difference to you?
(d) Use technology to carry out Fisher¶s Exact Test (by calculating the corresponding hypergeometric
probability) to assess the strength of evidence that the probability of survival is higher with CC alone as
compared to standard CPR.
Write out how to calculate this probability, report the p-value, and interpret
what it is the probability of.
p-value = P(X
) =
where
X
follows a hypergeometric distribution with
N
=
,
M
=
, and
n
=
Interpretation:
Chance/Rossman, 2018
ISCAM III
Investigation 3.8
217
Because the sample sizes are large in this study, you should not be surprised that the probability
distribution in (d) is approximately normal.
The large sample sizes allow us to approximate the
hypergeometric distribution with a normal distribution.
Thus, with large samples sizes (e.g., at least 5
successes and at least 5 failures in each group), an alternative to Fisher¶s Exact Test is the two
-sample
z
-
test that you studied in Section 3.1.
(e) Use technology to obtain the two-sample
z
-test statistic and p-value for this study.
Compare this p-
value to the one from Fisher¶s Exact Test; are they similar?
(f) Suggest a way of improving the approximation of the p-value.
(g) (Optional): Compare the normal approximation with a continuity correction to the hypergeometric
calculation. [
Hints:
In R,
use
iscamhypernorm(29, 518, 64, 278, TRUE) or use the Analyzing Two-way
Tables applet
to compare the normal approximation to
Fisher¶s Exact Test.
]
(h) Do the data from this study provide convincing evidence that CC alone is better than standard CPR
at the 10% significance level? Explain. How about the 5% level of significance?
(i) An advantage to using the
z
-procedures is being able to easily produce a confidence interval for the
parameter.
Use technology to determine a
90
% confidence interval for the parameter of interest, and
then interpret this interval. [
Hint
: Think carefully about what the relevant parameter is in this study.]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.8
218
(j) Suppose you had defined the parameter by subtracting in the other direction (e.g., CPR
±
CC instead
of CC
±
CPR).
How would that change:
(i) the observed statistic?
(ii) the test statistic?
(iii) the alternative hypothesis?
(iv) the p-value?
(v) confidence interval?
Practice Problem 3.8
(a) Researchers in the CPR study also examined other response variables. For example, the 911
dispatcher¶s instructions
were completely delivered in 62% of episodes assigned to chest compression
plus mouth-to-mouth compared to 81% of the episodes assigned to chest compression alone.
(i) Calculate the difference in conditional proportions and compare it to the original study.
(ii) Without calculating, do you suspect the p-value for comparing this new response variable
between the two groups will be larger or smaller or about the same as the p-value you
determined above? Explain your reasoning.
(b) The above study was operationally identical to that of another study and the results of the two studies
were combined. Of the 399 combined patients randomly assigned to standard CPR, 44 survived to
discharge from the hospital. Of the 351 combined patients randomly assigned to chest compression
alone, 47 survived to discharge.
(i) Calculate the difference in conditional proportions and compare it to the original study.
(ii) Without calculating, do you suspect the p-value for this comparison will be larger, smaller, or
the same as the p-value you determined? Explain your reasoning.
Chance/Rossman, 2018
ISCAM III
Investigation 3.9
219
SECTION 4: OTHER STATISTICS
Investigation 3.9: Peanut Allergies
Peanut allergies have increased in prevalence in the last decade, but can they be prevented?
Even
among infants with a high risk of allergy?
Is it better to avoid the problematic food or to encourage
early introduction? Du Toit et al. (
New England Journal of Medicine
, Feb. 2015) randomly assigned
U.K. infants (4-11 months old) with pre-existing sensitivity to peanut extract to either consume 6 g of
peanut protein per week or to avoid peanuts until 60 months of age. The table below shows the results
for infants who were not initially sensitized to peanuts and whether or not the child had developed a
peanut allergy at 60 months.
Peanut avoidance
Peanut consumption
Total
Peanut allergy
11
2
13
No allergy
172
193
365
Total
183
195
378
(a) Calculate the proportion of children developing a peanut allergy in each group.
Does this appear to
be a large difference to you?
(b) Use Fisher¶s Exact Test to investigate whether these data provide convincing evidence that the
probability of developing a peanut allergy is larger among children who avoid peanuts for the first 60
months. [
Hint
: State the hypotheses in symbols and in words.
Define the random variable and outcomes
of interest in computing your p-value.]
Do you consider this strong evidence that the peanut
consumption effectively deters development of a peanut allergy in this population?
(c) Would you feel any differently about the magnitude of the difference in proportions if the conditional
proportions developing a peanut allergy had been 0.500 and 0.55?
Explain.
Discussion
:
When the
baseline rate
(probability) of success is small, an alternative statistic to consider
rather than the
difference
in the conditional proportions (which will also have to be small by the nature
of the data) is the
ratio
of the conditional proportions. First used with medical studies where ³success´ is
often defined to be an unpleasant event (e.g., death), this ratio was termed the
relative risk
.
-
-0
-
-
T
I
-
-
Rauf
'
4183=01101
Fan
=
4450£03
xn
hypergeometric
III
}
!
I
9%4=724
-
-
-
Chance/Rossman, 2018
ISCAM III
Investigation 3.9
220
Definition:
The
relative risk
is the ratio of the conditional proportions, often intentionally set up so
that the value is larger than one:
Relative risk
= Proportion of successes in group 1 (the larger proportion)
Proportion of successes in group 2 (the smaller proportion)
The relative risk tells us how many times higher the ³risk´ or ³likelihood´ of ³success´ is in group 1
compared to group 2.
(d) Determine and interpret the ratio of the conditional proportions who developed peanut allergy
between the peanut avoiders and the peanut consumers in this study.
(e) Because we are now working with a ratio, we can also interpret this statistic in terms of
percentage
change
.
Subtract one from the relative risk value and multiply by 100% to determine what percentage
higher the proportion who developed a peanut allergy is in the avoidance group compared to the
consumption group.
Of course, now we would also like a confidence interval for the corresponding parameter, the ratio of
the underlying probabilities of allergy between these two treatments.
When we produced confidence
intervals for other parameters, we examined the sampling distribution of the corresponding statistic to
see how values
of that statistic varied under repeated random sampling.
So now let¶s examine the
behavior of the
relative risk
of conditional proportions using the
Analyzing Two-Way Tables
applet to
simulate the
random assignment
process (as opposed to simulating the random sampling from a
binomial process) under the (null) assumption that there¶s no difference between the two treatments.
[See the Technology Detour below for software instructions.]
(f) Generate a null distribution for Relative Risks:
x
Check the
2
×
2
box
x
Enter the two-way table into the applet and press
Use Table
.
x
Generate 1000 random shuffles.
x
Use the Statistic pull-down menu to select
Relative Risk
.
Describe the behavior of the null distribution of relative risk values.
(g) Where does the observed value of the relative risk from the actual study fall in the null distribution
of the relative risks? What proportion of the simulated relative risks are at least this extreme?
(h) What percentage of the simulated relative risks are larger than 2.1 (just so you have a non-zero value
to compare to later)?
-
-
←
←
-
statistic
Farol
pious
=
0%6%3=5.861
⾨
-
-
4.861*100
%
486
.
I
%
higher
D-
Volker
0.009
10.8%
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.9
221
But can we apply a mathematical model to this distribution?
(i) Why does it make sense that the mean of the simulated relative risks is close to the value 1? [
Hint
:
Remember the assumption behind your simulation analysis.]
(j) You should notice skewness in the distribution of relative risk values.
Explain why it is not
surprising for the distribution of this statistic to be skewed to the right (especially with smaller sample
sizes). Note: If the number of successes equals zero, the applet adds 0.5 to each cell of the table before
calculating the relative risk.
(k) In fact, this distribution is usually well modeled by a log normal distribution. To verify this, check
the
ln relative risk
box (in the lower left corner) to take the natural log of each relative risk value and
display a new histogram of these transformed values.
Describe the shape of this distribution. Is the
distribution of the
lnrelrisk
well modeled by a normal distribution?
(l) What is the mean of the simulated
lnrelrisk
values? Why does this value make sense?
(m) What is the standard deviation of the
lnrelrisk
values?
(n) Calculate the observed value of ln(
p
Ö
1
/
p
Ö
2
) for this study (but don¶t round up).
Where does this
value fall (near in the middle or in the tail) of this simulated distribution of
lnrelrisk
values? Has the
empirical p-value changed?
(o) If you found the empirical p-value using ln(
p
Ö
1
/
p
Ö
2
), it would be identical to the empirical p-value
found in (h). Why? What did change about the distribution? [
Hint
: What percentage of the simulated
ln
rel risk
values are more extreme than ln(2.1), how does this compare to (h)?]
Hae
.
Real
Risk
>
I
H-H-ocitrue.ee
I
Hoo
.
Taro
-
-
Icons
㱺
Real
Risk
-
-
foin
Al
•
Ha
:
Taro
>
Theon
.
In
C
O
-
0601/0.0103
)
=
In
(
5.8611
=
1.77
In
(
2nd
)
=
0.741937
Chance/Rossman, 2018
ISCAM III
Investigation 3.9
222
Theoretical Result:
It can be shown that the standard error of the
ln relative
risk
is approximated by
D
B
B
C
A
A
p
p
SE
²
±
²
²
±
¸
¸
¹
·
¨
¨
©
§
1
1
1
1
Ö
Ö
ln
2
1
where
A
,
B
,
C
, and
D
are the observed counts in the 2
×
2 table of data, with
A
and
B
representing the number of ³successes´ in the two groups. Having
this formula allows us to determine the variability from sample to sample
without conducting the simulation first.
(p) Calculate the value of this standard error of the ln(rel risk) for this study. Interpret this value and
compare it to the standard deviation from your simulated
lnrelrisk
values.
(q) You may find this approximation is in the ballpark but not all that close.
What assumption is made
by the simulation that is not made by this formula?
What if you made the same assumption in this
formula? [
Hint
: Think pooled
p
Ö
.]
(r) Now that you have a statistic (ln rel risk) that has a sampling distribution that is approximately
normal, what general formula can we use to determine a confidence interval for the parameter?
(s) Calculate the midpoint, 95% margin-of-error, and 95% confidence interval endpoints using the
observed value of ln(rel risk) as the statistic and using the standard error calculated in (p).
(t) What parameter does the confidence interval in (s) estimate?
(u) Exponentiate the endpoints of this interval to obtain a confidence interval for the ratio of the
probabilities of developing a peanut allergy between these two treatments. Interpret this interval.
(v) Is zero in this interval? Do we care? What value is of interest instead?
A
B
C
D
A+C
B+D
-
SE
=
1k¥
I
0.762
Skip
O
Lyn
-
statistic
I
z*
SE
/
In
15.864
1.768
I
1.96
(
0.762
)
s
(
o
.
025
,
3.291
In
1
Real
Risk
)
95
X
CI
for
Real
Risk
60025
!
'
fo
.
ogseseg
,
=
(
l
-
32
,
26
.
I
)
Chance/Rossman, 2018
ISCAM III
Investigation 3.9
223
(w) Is the midpoint of this confidence interval for the population relative risk equal to the observed
value of the sample relative risk? Explain why this makes sense.
(x) Compare the confidence interval you just calculated to the one given by the applet if you now check
the
95% CI for relative risk
box.
(y) Suppose you used this method to construct a confidence interval for each of the 1,000 simulated
random samples that you generated in (f).
Because our simulation assumes the null hypothesis to be
true, do you expect the value 1 to be in these intervals? All of them?
Most of them?
What percentage of
them? Explain.
Study Conclusions
This study provided strong evidence that children with pre-existing sensitivity to peanut extract are
more likely to develop a peanut allergy by 5 years of age if they avoid consuming peanuts (exact one-
sided p-value = 0.0074,
z
-score = 2.66).
An approximate 95% confidence interval for the difference
in the probabilities indicates that the probability of develop a peanut allergy is 0.013 (1.3 percentage
points) to .087 (8.7 percentage points) higher for those avoiding peanuts.
However, focusing on the difference in
³success´
probabilities has some limitations. In particular,
if the probabilities are small it may be difficult for us to interpret the magnitude of the difference
between the values.
Also, we have to be very careful with our language, focusing on the difference in
the allergy probabilities and not the percentage change.
An alternative to examining a confidence
interval for the difference in the conditional probabilities is to construct a confidence interval for the
relative risk (ratio of conditional probabilities). A large sample approximation exists for a
z
-interval
for the ln(relative risk) which can then be back-transformed to an interval for long-run relative risk.
Many practitioners prefer focusing on this ratio parameter rather than the difference.
From this study,
we are 95% confident that ratio of the peanut allergy is between 1.32 and 26.08. This means that
avoiding peanuts rather than some consumption raises the probability of developing a peanut allergy
by between 32% and 250%.
Note:
It can be risky to interpret the relative risk in isolation without considering the absolute risks
(conditional proportions) as well. For example, doubling a very small probability may not be
noteworthy, depending on the context. You should also note that the percentage change calculation and
interpretation depends on which group (e.g., treatment or control) is used as the reference group.
-
Right
stewed
data
r
-
•
-
(a9thal
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.9
224
Practice Problem 3.9A
(a) For the peanut allergy study, find a 95% confidence interval for the probability of not developing a
peanut allergy comparing the consumption treatment to the avoidance treatment.
(b) Provide a one-sentence interpretation of the interval in (a).
(c) Is the interval the same or closely related to the one you found in Investigation 3.9?
Does one
interval provided strong evidence of a treatment effect?
Which interval would you report to new
parents?
(d)
The article reports that ³the power to detect a difference in risk of 30 percentage points was 80.0%.´
Explain what this means in your own words.
Practice Problem 3.9B
A multicenter, randomized, double-blind trial involved patients aged 36-65 years who had knee injuries
consistent with a degenerative medial meniscus tear (Shivonen et al.,
New England Journal of Medicine
,
2013).
Patients received either the most common orthopedic procedure (arthroscopic partial
meniscectomy,
n
1
= 70)
or sham surgery that simulated the sounds, sensations, and timing of the real
surgery (
n
2
= 76). After 12 months, 54 of those in the treatment group, reported satisfaction, compared
to 53 in the sham surgery.
(a) Calculate and interpret a confidence interval for the ratio of the probabilities (relative risk) of
satisfaction for these two procedures.
(b) What does your interval in (a) indicate about whether those receiving the orthopedic surgery are
significantly more likely that those receiving a sham surgery to report satisfaction after 12 months?
Explain your reasoning.
Summary of Inference for ͞Relative Risk͟
Statistic:
ratio of conditional proportions (typically set up to be larger than one) =
1
Ö
p
/
2
Ö
p
Hypotheses
: H
0
:
S
1
/
S
2
= 1; H
a
:
S
1
/
S
2
<, >, or
1
p-value
: Fisher¶s Exact Test
or normal approximation on ln(
1
Ö
p
/
2
Ö
p
)
Confidence interval for
S
1
/
S
2
: exponentiate endpoints of
»
¼
º
«
¬
ª
²
±
²
²
±
r
D
B
B
C
A
A
z
p
p
1
1
1
1
*
)
Ö
/
Ö
ln(
2
1
Note:
The confidence interval for the relative risk will not necessarily be symmetric around the statistic.
Chance/Rossman, 2018
ISCAM III
Investigation 3.9
225
Technology Detour
ʹ
Simulating Random Assignment (two-way tables)
We can select observations from a hypergeometric distribution for the cell 1 counts and then compute
the cell 2 counts and the number of failures based on the fixed row and column totals.
With this
information you can compute the difference in conditional proportions, relative risk, etc. We show
how to calculate
p
Ö
unvac
below, the rest is up to you. Also keep in mind you can use ³log´ to calculate
the natural logs of values. Also recall how you created a Boolean expression in Investigation 3.1 to
find the p-value from the simulated results.
In R
>
VacInfCount=rhyper(10000, 210, 4985, 2584)
x
210 is the number of successes (
M
)
x
4985 is the number of failures (
N
±
M)
x
2584 is the sample size (
n
)
>
UnvacInfCount = 210-VacInfCount
>
Unvacphat = UnvacInfCount/2584
Chance/Rossman, 2018
ISCAM III
Investigation 3.10
226
Investigation 3.10: Smoking and Lung Cancer
After World War II, evidence began mounting that there was a link between cigarette smoking and
pulmonary carcinoma (lung cancer). In the 1950s, three now classic articles were published on the topic.
One of these studies was conducted in the United States
by Wynder and Graham (³Tobacco Smoking as
a Possible Etiologic Factor in Bronchiogenic Cancer,´ 1950,
Journal of the American Medical
Association
). They found records from a large number of patients with a specific type of lung cancer in
hospitals in California, Colorado, Missouri, New Jersey, New York, Ohio, Pennsylvania, and Utah. Of
those in the study, the researchers focused on 605 male patients with this form of lung cancer. Another
780 male hospital patients with similar age and economic distributions without this type of lung cancer
were interviewed in St. Louis, Boston, Cleveland, and Hines, IL. Subjects (or family members) were
interviewed to assess their smoking habits, occupation, education, etc. The table below classifies them as
non-smoker or light smoker, or at least a moderate smoker.
Wynder and Graham
None or Light smoker
(0-9 per day)
Moderate to Heavy smoker
(10-35+ per day)
Total
Lung cancer patients
22
583
605
Controls
204
576
780
Total
226
1159
1385
(a) Calculate and interpret the relative risk of being a lung cancer patient for the moderate to heavy
(³regular´) smokers compared to the None or Light ³non
-
smokers.´
(b) Does this feel like an impressive statistic to you?
Do you think it will be statistically significant?
(c) What is the estimate of the baseline rate of lung cancer from this table? Does that seem to be a
reasonable estimate to you? How is this related to the design of the study?
(d) Calculate and interpret the relative risk of being a control patient for the non-smokers compared to
the regular smokers.
How does this compare to (a) and (b)?
←
✓
RR
=
Plea
,
IR
.
,
µ
=
58%9/22/226=517
-
Yes
e
-
①
II
=
6¥
,
s
-
-
O
'
437
berylline
Skip
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.10
227
Definition:
There are three main types of observational studies.
x
Cross-classification study
.
The researchers categorize subjects according to both the explanatory
and the response variable simultaneously.
For example, they could take a sample of adult males
and simultaneously record both their smoking status and whether they have lung cancer.
A
common design is
cross-sectional
, where all observations are taken at a fixed point in time.
x
Cohort study.
The researchers identify individuals according to the explanatory variable and then
observe the outcomes of the response variable.
These are usually
prospective designs
and may
even follow the subjects (the
cohort
)
for several years.
x
Case-control study
.
The researchers identify observational units in each response variable
category (the ³cases´ and the ³controls´) and then determine the explanatory variable outcome
for each observational unit.
How the controls are selected is very important in determining the
comparability of the groups.
These are often
retrospective designs
in that the researchers may
need to ³look back´ at historical data on the observational units.
(e) Would you classify the Wynder & Graham study as cross-classified, cohort, or case-control?
Explain.
(f) Explain why using the relative risk (or even the difference in proportions) as the statistic can be
problematic with case-control studies.
An advantage of case-control studies is when you are s
tudying a ³rare event,´ you can ensure a large
enough number of ³successes´ and fairly balanced group sizes.
However, a disadvantage is that it does
not make sense to calculate ³risk´ or likelihood of success from a case
-control study, because the
distribution of the response variable has been manipulated/determined by the researcher. Switching the
roles of the explanatory and response often gives very different results for relative risk (changing our
measure of the strength of the relationship) and often
really isn¶t the comparison of interest stated by the
research question. Consequently
, conditional proportions of success and relative risk are not
appropriate statistics to use with case-control studies
.
Instead, we will consider another way to
compare the uncertainty of an outcome between two groups.
Definition:
The
odds of success
are
defined as the ratio of the proportion of ³successes´ to the
proportion of ³failures
,
´ which simplifies to the ratio of the number of successes to failures.
group
the
in
failures
of
number
group
the
in
successes
of
number
group
the
in
failures
of
proportion
group
the
in
successes
of
proportion
odds
For example, if the odds are 2-to-1 in favor of an outcome, we expect a success twice as often as a
failure in the long run, so this corresponds to a probability of 2/3 of the outcome occurring.
Similarly,
if the probability of success is 1/10, then the odds equals (1/10)/(9/10) = 1/9, and failures is 9 times
more likely than success.
It¶s important to note how the ³outcome´ is defined.
For example, in horse
racing, odds are typically presented in terms of ³losing the race,´ so if a horse is given 2
-to-1 odds
against winning a race, we expect the horse to lose two-thirds of the races in the long run.
㱺
=
-
¥a¥=
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.10
228
Definition:
The
odds ratio
is another way to compare conditional proportions in a 2
×
2 table.
Odds ratio
= (number of successes in group 1/number of failures in group 1)
(number of successes in group 2/number of failures in group 2)
Like relative risk, if the odds ratio is 3, this is interpreted as ³the odds of success in the µtop¶ group are
3 times (or 200%) higher than the odds of success in the µbottom¶ group.´
However, the relative risk
and the odds ratio are not always similar in value.
(g) Calculate and interpret the odds ratio comparing the odds of lung cancer for the smokers to the odds
of lung cancer for the control group.
Does this match (a)?
(h) Calculate and interpret the odds ratio for being in the control group for the non-smokers compared to
the smokers? Does this match (d) or (g)?
Key Results:
A major disadvantage to relative risk is that your (descriptive) measure of the strength
of evidence that one group is ³better´ depends on which outcome you define a success as well as
which variable you treat as the explanatory and which as the response. But a big advantage to odds
ratio is that it is
invariant
to these definitions (If your odds are 10 times higher to die from lung cancer
if you are a smoker, then your odds of being a smoker are 10 times higher if you died from lung
cancer). The only real
disadvantage is that the odds ratio is trickier to interpret (³higher odds´ vs. the
more natural ³more likely´). Thus, for case
-control studies in particular, the odds ratio is the preferred
statistic.
However, when the success proportions are both small, the odds ratio can be used to
approximate the relative risk.
(i) Let
W
(³tau´)
represent the population odds ratio of having lung cancer for those who are regular
smokers compared to those who are not regular smokers, so
W
=
S
1
/(1
±
S
1
)/(
S
2
/(1
±
S
2
)).
State the
null and alternative hypotheses in terms of this parameter.
(j)
Use Fisher¶s Exact Test to calculate the p
-value.
(Note: We get the same p-value no matter which
statistic we use, why is that?)
odds
for
Smoker
light
odds
ratio
=
53¥
-
-
t.oizfoddzI-o.com/.osyo.,qq-
9.385
arm
-
*
=
*
⑤
.Q
-0
-
-
Ho
:
Tamia
Anon
-
smoker
,
Ho
:
76=1
,
Hai
7
>
I
←
#
p
-
vote
=p
(
X
s
22
)
,
X
n
hyper
G
M
-
-
605
,
n
=
226
V.
V.
Small
N
-
-
1385
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.10
229
But we still need a confidence interval for this new parameter as well.
Theoretical Result:
The sampling distribution of the sample odds ratio also follow a log-normal
distribution like the relative risk (for any study design).
Thus, we can construct a confidence interval for
the population/treatment log-odds ratio using the normal distribution.
The standard error of the sample
log-odds ratio (using the natural log) is given by the expression:
SE
(
ln odds ratio
) =
D
C
B
A
1
1
1
1
²
²
²
where
A
,
B
,
C
, and
D
are the four table counts.
(k) Calculate this standard error and then use it to find an approximate 95% confidence interval for the
log odds ratio.
(l) Back-transform the end-points of the interval and (k) and interpret your results.
(m) Does your interval contain the value one?
Discuss the implications of whether or not the interval
contains the value one.
(n) Compare your results to the following JMP output:
mm
Sf
(
In
odd
ratio
o
)
=
#
E
'
-
÷
tzottszto-io.com
q
s
Y
.
CI
fon
kn
odd
ratio
㱺
(
1.7
84
,
2
.
694
)
㱺
In
19.385
)
*
on
.
.
ie
.
.
.
e.
a
;
.
.
.
.
.
-
#x--
Ng
me
mm
mm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.10
230
(o) Summarize (with justification) the conclusions you would draw from this study (using both the p-
value and the confidence interval, and addressing both the population you are willing to generalize to
and whether or not you are drawing a cause-and-effect conclusion).
Study Conclusions
Because the baseline incidence of lung cancer in the population is so small, the researchers
conducted a
case-control
study to ensure they would have both patients with and without lung cancer
in their study (matched by age and economic status).
In a case-control study, the
odds ratio
is a more
meaningful statistic to compare the incidence of lung cancer between the two groups. We find that the
sample odds of lung cancer are almost ten times larger for the regular smokers compared to the non-
regulars in this study. By the
invariance
of the odds ratio, this also tells us that the odds of being a
regular smoker (rather than not) are almost 10 times higher for those with lung cancer. We are 95%
confident that in the larger populations represented by these samples, the odds of lung cancer are 5.92
to 15.52 times larger for the regular smokers (F
isher¶s
Exact Test p-value << 0.001). If both success
proportions had been small, we could say this is approximately equal to the relative risk and use the
words ³10 times higher´ or ³10 times more likely.´
The full data set (which broke down the second
category further) also shows that the odds of having lung cancer increase with the amount of smoking
(light smokers have 2 times the odds, heavy smokers have 11 times the odds, and chain smokers have
29 times the odds!)
±
this is called a ³dose
-
response.´
We see a strong relationship between the size
of the ³dose´ of smoking and occurrence of lung cancer for these patients.
However, this stu
dy was criticized for ³retrospective bias´ in asking subjects to accurately
remember, and be willing to tell, details of their lifestyles.
This can also be complicated by asking
these questions of patients who know they have been diagnosed with lung cancer, as their recall may
be affected by this knowledge. We also have to worry whether hospitalized males are representative
of the male population.
Other studies around the same time (e.g., Hammond and Horn, Wynder and Cornfield) found
similar increases in ³risk´ with smoking.
However, these were all observational studies so critics
reasonably argued that other variables such as lifestyle, diet, exercise, and genetics could be
responsible for both the smoking habits and the development of lung cancer.
Although there was still
much (on-going) research to be done, and these studies did not claim to
prove
that cigarette smoking
causes lung cancer, these landmark studies set the stage.
They also led to many efforts in improving
study design and in developing statistical tools (such as relative risk and odds ratios) to analyze the
results.
*
*
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.10
231
Practice Problem 3.10A
A researcher searched court records to find 908 individuals who had been victims of abuse as children
(11 years or younger).
She then found 667 individuals, with similar demographic characteristics, who
had not been abused as children.
Based on a search through subsequent years of court records, she
determined how many in each of these groups became involved in violent crimes (Widom, 1989).
The
results are shown below:
Abuse victim
Control
Involved in violent crime
102
53
Not involved in violent crime
806
614
(a) Is this an observational study or an experiment? If observational, which type?
(b) Calculate and interpret the odds ratio of being involved in a violent crime between these two groups.
(c) The one-sided p-
value for this result (using Fisher¶s
Exact Test) is 0.018 (confirm). Is it reasonable
to conclude that being a victim of abuse as a child causes individuals to be more likely to be violent
toward others afterwards?
Explain.
(d) Calculate and interpret a 95% confidence interval for the population odds ratio.
(e) Is it reasonable to generalize these results to all abuse and non-abuse victims? Explain.
Practice Problem 3.10B
(a) Suppose that individuals in Group 1 have a 2/3 probability of success, and those in Group 2 have a
1/2 probability of success. Calculate and interpret the relative risk of success, comparing Group 1 to
Group 2.
(b) Calculate and interpret the odds of success for Group 1.
(c) Calculate and interpret the odds ratio of success, comparing Group 1 to Group 2.
(d)
Suppose Group 3 has a 0.1 probability of success, and Group 4 has a 0.05 probability of success.
Repeat questions (a) and (c).
(e) In which case (Groups 1 and 2, or Groups 3 and 4) are the relative risk and odds ratio more similar?
Why?
Summary of Inference for Odds Ratio
Statistic:
W
Ö
= [
1
Ö
p
/(1
±
1
Ö
p
)]/[
2
Ö
p
/(1
±
2
Ö
p
)] = (
A
×
D
)
/
(
B
×
C
)
(typically set up to be larger than one)
Hypotheses
: H
0
:
W
= 1; H
a
:
W
<, >, or
1
p-value
: Fisher¶s Exact Test or normal approximation on ln(
W
Ö
)
confidence interval for
W
:
exponential of
»
¼
º
«
¬
ª
²
²
²
r
D
C
B
A
z
1
1
1
1
*
)
Ö
ln(
W
In R:
> fisher.test(matrix(c(a, c, b, d), nrow=2), alt = )
A
B
C
D
7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.11
232
Investigation 3.11: Sleepy Drivers
Connor et al. (
British Medical Journal
, May 2002) reported on a study that investigated whether
sleeplessness is related to car crashes. The researchers identified all drivers or passengers of eligible
light vehicles who were admitted to a hospital or died as a result of a car crash on public roads in the
Auckland, New Zealand region between April 1998 and July 1999. Though cluster sampling, they
identified a sample of 571 drivers who had been involved in a crash resulting in injury and a sample of
588 drivers who had not been involved in such a crash as representative of people driving on the
region¶s roads during the study period. The researchers asked the individuals
(or proxy interviewees)
whether they had a full night¶s sleep (at least seven hours mostly between 11pm and 7am) any night
during the previous week. The researchers found that 61 of the 535 crash drivers who responded and 44
of the 588
³no crash´
drivers had not gotten at least one
full night¶s sleep in the previous week.
(a) Identify the observational units and variables in this study.
Which variable would you consider the
explanatory variable and which the response variable?
Was this an observational study or an
experiment? If observational, would it be considered a case-control, cohort, or cross-classified design?
Observational units:
Explanatory variable:
Response variable:
Type of study:
(b) Organize these sample data into a 2
×
2 table:
NR fXOO QLghW¶V VOeeS
in past week
(³VOeeS deSULYed´)
AW OeaVW RQe fXOO QLghW¶V VOeeS
in past week
(³QRW VOeeS deSULYed´)
Sample sizes
Crash
535
No crash
588
Total
1123
(c) Which statistic (odds ratio or relative risk) is most appropriate to calculate from this table,
considering how the data were collected?
Calculate and interpret this statistic.
Does the value of this
statistic support the researchers¶ conjecture? Explain.
I
61
535-61--474
44
588-442544
105
1018
ataksrdsio
=
I
44
-
=
4591
424
544
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.11
233
Statistical Inference
(d) Outline the steps of a simulation that models the randomness in this study and helps you assess how
unusual the statistic is that you calculated in (c) when the null hypothesis is true. Include a statement of
the null and alternative hypotheses for your choice of parameter.
(e) Use technology to carry out your simulation and draw your conclusions. [
Hint
: Be careful of
rounding issues in finding your p-value, make sure you are including observations as extreme as the
observed in your count.]
(f) Calculate and interpret a 95% confidence interval for your choice of parameter.
(g) Summarize (with justification) the conclusions you would draw from this study (using both the p-
value and the confidence interval, and addressing both the population you are willing to generalize to
and whether or not you are drawing a cause-and-effect conclusion).
Ho
:
2=1
Ha
:
74
SEC
In
2)
=
I
69+474+444+4474=0-02075
954
.
CI
for
he
㱺
In
11.59
)
I
1.
qf
(
o
-
020751=10
'
0570.871
95
't
CI
ofz
}
(
f
'
057
,
eoff
,
=
(
1.06
,
2.39
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chance/Rossman, 2018
ISCAM III
Investigation 3.11
234
Study Conclusions
The proportions of
drivers who had not gotten a full night¶s sleep in the previous week were
0.107 for
the case group of drivers who had been involved in a crash, compared to 0.075 for the control group
who had not.
Because these proportions are small, and because of the awkward roles of the
explanatory and response variables in this study (we would much rather make a statement about the
proportion of sleepless drivers who are involved in crashes), the odds ratio is a more meaningful
statistic to calculate.
The sample odds of having missed out o
n a full night¶s sleep were 1.59
times
higher for the case group than for the control group. By the invariance of the odds ratio, we can also
state that the sample odds of having an accident are 1.59 times (almost 60%) higher for those who do
not get a full night sleep than those who do.
The empirical p-value (less than 5%) provides
moderately strong evidence that such an extreme value for the sample odds ratio is unlikely to have
arisen by chance alone if the proportion of drivers with sleepless nights was 0.09 for both the
population of ³cases´ and the population of ³controls.´
(Using a one-
sided Fisher¶s Exact Test, we get
p-value = 0.016.) A 95% confidence interval for the population odds ratio extends from 1.06 to 2.39
(1.04 to 2.45 with R).
This interval provides statistically significant evidence that the population odds
ratio exceeds one and that, with 95% confidence, the odds of having an accident are about 1 to 2.5
times higher for the sleepy drivers than for well rested drivers.
We cannot attribute this association to
a cause-and-effect relationship because this was an observational (case-control) study.
We might also
want to restrict our conclusions to New Zealand drivers.
Practice Problem 3.11
Another landmark study
on smoking began in 1952 (Hammond and Horn, 1958, ³
Smoking and death
rates
²
Report on forty-four months of follow-up of 187,
783 men: II. Death rates by cause,´
JAMA
).
They used 22,000 American Cancer Society volunteers as interviewers. Each interviewer was to ask 10
healthy white men between the ages of 50 and 69 to complete a questionnaire on smoking habits. Each
year during the 44-month follow-up, the interviewer reported whether or not the man had died, and if so,
how. They ended up tracking 187,783 men in nine states (CA, IL, IA, MI, MN, NJ, NY, PA, WI).
Almost 188,000 were followed up by the volunteers through October 1955, during which time about
11,870 of the men had died, 488 from lung cancer.
The following table classifies the men as
having a
history of regular cigarette smoking or not
and
whether or not they died from lung cancer
. In this study,
nonsmokers are grouped with occasional smokers, including pipe- and cigar-only smokers.
Hammond and Horn
Not regular smoker
Regular smoker
Total
Lung cancer death
51
397
448
Alive or other cause of death
108,778
78,557
187,335
Total
108,829
78,954
187,783
(a) Is this a case-control, cohort, or cross-classified study?
(b) Calculate and interpret an odds ratio from the two-way table.
(c) Produce and interpret a 95% confidence interval for the population odds ratio.
(d) Are these results consistent with the Wynder and Graham study? Explain.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help