Disease status and number of cigarettes smoked per day for.2225 males. For a data set with five rows and two columns, the p-value is <0.05 if X² is greater than 9.488. (Data from RR Sokal and FJ Rohlf's Biometry, 3rd ed.) Never smoked 1-10 11-20 21-40 >41 Lung cancer 15 36 Disease status 133 226 127 Healthy 822 136 328 311 91
Based on the given information, we can assume that the data set is a 4x2 contingency table. To test the hypothesis that smoking is associated with lung cancer, we can perform a chi-square test of independence.
First, we need to calculate the expected frequencies for each cell of the table under the null hypothesis of no association between smoking and disease status. We can do this by multiplying the row and column totals for each cell and dividing by the total sample size. For example, the expected frequency for the cell corresponding to "Never smoked" and "Lung cancer" is:
Expected frequency = (sum of "Never smoked" row) x (sum of "Lung cancer" column) / (total sample size) = (1+10+36+133+127) x (15+36+133+127) / 2225 = 201.13
We can repeat this calculation for all the cells to get the expected frequencies:
Disease status
Never smoked Lung cancer Healthy Total
1-10 205.43 15.57 602.00 823
11-20 73.55 5.59 21.86 101
21-40 98.84 7.52 29.27 136
>41 91.18 6.92 26.90 125
Total 469.00 35.60 680.03 2225
Step by step
Solved in 2 steps