The paper “Good for Women, Good for Men, Bad for People: Simpson’s Paradox and the Importance of Sex-Specific Analysis in Observational Studies” (Journal of Women’s Health and Gender-Based Medicine [2001]: 867-872) described the results of a medical study in which one treatment was shown to be better for men and better for women than a competing treatment. However, if the data for men and women are combined, it appears as though the competing treatment is better.
To see how this can happen, consider the accompanying data tables constructed from information in the paper. Subjects in the study were given either Treatment A or Treatment B, and survival was noted. Let S be the
- a. The following table summarizes data for men and women combined:
- i. Find P(S).
- ii. Find P(S|A).
- iii. Find P(S|B).
- iv. Which treatment appears to be better?
- b. Now consider the summary data for the men who participated in the study:
- v. Find P(S).
- vi. Find P(S|A).
- vii. Find P(S|B).
- viii. Which treatment appears to be better?
- c. Now consider the summary data for the women who participated in the study:
- ix. Find P(S). looks like Treatment B is better. This is an
- x. Find P(S|A).
- xi. Find P(S|B).
- xii. Which treatment appears to be better?
- d. You should have noticed from Parts (b) and (c) that for both men and women, Treatment A appears to be better. But in Part (a), when the data for men and women are combined, it looks like Treatment B is better. This is an example of what is called Simpson’s paradox. Write a brief explanation of why this apparent inconsistency occurs for this data set. (Hint: Do men and women respond similarly to the two treatments?)
a.

i. Compute
ii. Obtain
iii. Calculate
iv. Find the better treatment.
Answer to Problem 52E
i. The value of
ii. The value of
iii. The value of
iv. Treatment B is better than Treatment A.
Explanation of Solution
Calculation:
The given information is the summary table of the survey. Event S denotes the event that a patient selected at random and survives, event A denotes that a patient selected at random received Treatment A, and B denotes the event that a patient selected at random and received Treatment B.
i.
The probability of any event A is given below:
The total number of randomly selected patient is 600.
The total number of patient selected at random survives is 456.
The probability of a randomly selected patients and who survive is calculated as follows:
Thus, the probability of a randomly selected patients who survive is 0.76.
ii.
Conditional rule:
The formula for probability of E given F is,
The total number of patient selected at random and received Treatment A is 300.
The number of patient selected at random and received Treatment A and survive is 215.
The probability that the selected patients at random received Treatment A, given that the patient selected at random survives. It is calculated as follows:
Thus, the value of
iii.
The total number of patient selected at random and received Treatment B is 300.
The number of patient selected at random that received Treatment B and survive is 241.
The probability that the selected patient at random received Treatment B, given that the patient selected at random survives. It is calculated as follows:
Thus, the value of
iv.
The probability of patient who received Treatment B survived more than that of Treatment A.
Thus, Treatment B is better than Treatment A.
b.

i. Compute
ii. Obtain
iii. Calculate
iv. Find the better treatment.
Answer to Problem 52E
i. The value of
ii. The value of
iii. The value of
iv. Treatment A is better than Treatment B.
Explanation of Solution
Calculation:
The given information is the summary table of the survey.
i.
The total number of randomly selected patient is 240.
The total number of patient selected at random and survives is 140.
The probability of a randomly selected patients who survive is calculated as follows:
Thus, the probability of a randomly selected patients who survive is 0.583.
ii.
Conditional rule:
The formula for probability of E given F is,
The total number of patients selected at random that received Treatment A is 200.
The number of patient selected at random that received Treatment A and survives is 120.
The probability that the selected patient at random received Treatment A, given that the patient selected at random survives. It is calculated as follows:
Thus, the value of
iii.
The total number of patients selected at random that received Treatment B is 40.
The number of patient selected at random that received Treatment B and survive is 20.
The probability that the selected patient at random received Treatment B, given that the patient selected at random survives. It is calculated as follows:
Thus, the value of
iv.
The probability of patient who received Treatment A survived more than that of Treatment B.
Thus, Treatment A is better than Treatment B.
c.

i. Compute
ii. Obtain
iii. Calculate
iv. Find the better treatment.
Answer to Problem 52E
i. The value of
ii. The value of
iii. The value of
iv. Treatment A is better than Treatment B.
Explanation of Solution
Calculation:
The given information is the summary table of the survey.
i.
The total number of randomly selected patient is 360.
The total number of patient selected at random that survive is 316.
The probability of a randomly selected patients who survive is calculated as follows:
Thus, the probability of a randomly selected patients who survive is 0.878.
ii.
Conditional rule:
The formula for probability of E given F is,
The total number of patient selected at random that received Treatment A is 100.
The number of patient selected at random that received Treatment A and survive is 95.
The probability that the selected patients at random received Treatment A, given that the patient selected at random survives. It is calculated as follows:
Thus, the value of
iii.
The total number of patient selected at random that received Treatment B is 260.
The number of patient selected at random that received Treatment B and survive is 221.
The probability that the selected patients at random received Treatment B, given that the patient selected at random survives. It is calculated as follows:
Thus, the value of
iv.
The probability of patients who received Treatment A survived more than that of Treatment B.
Thus, Treatment A is better than Treatment B.
d.

Explain the reason for the existence of apparent inconsistency in the data.
Explanation of Solution
From part (a), (b) and (c), it can be observed that Treatment A performs better than that of Treatment B, except part (a). In part (a), the data for men and women are combined. Thus, Treatment B performs better than that of Treatment A.
Want to see more full solutions like this?
Chapter 6 Solutions
INTRODUCTION TO STATISTICS & DATA ANALYS
- II Consider the following data matrix X: X1 X2 0.5 0.4 0.2 0.5 0.5 0.5 10.3 10 10.1 10.4 10.1 10.5 What will the resulting clusters be when using the k-Means method with k = 2. In your own words, explain why this result is indeed expected, i.e. why this clustering minimises the ESS map.arrow_forwardwhy the answer is 3 and 10?arrow_forwardPS 9 Two films are shown on screen A and screen B at a cinema each evening. The numbers of people viewing the films on 12 consecutive evenings are shown in the back-to-back stem-and-leaf diagram. Screen A (12) Screen B (12) 8 037 34 7 6 4 0 534 74 1645678 92 71689 Key: 116|4 represents 61 viewers for A and 64 viewers for B A second stem-and-leaf diagram (with rows of the same width as the previous diagram) is drawn showing the total number of people viewing films at the cinema on each of these 12 evenings. Find the least and greatest possible number of rows that this second diagram could have. TIP On the evening when 30 people viewed films on screen A, there could have been as few as 37 or as many as 79 people viewing films on screen B.arrow_forward
- Q.2.4 There are twelve (12) teams participating in a pub quiz. What is the probability of correctly predicting the top three teams at the end of the competition, in the correct order? Give your final answer as a fraction in its simplest form.arrow_forwardThe table below indicates the number of years of experience of a sample of employees who work on a particular production line and the corresponding number of units of a good that each employee produced last month. Years of Experience (x) Number of Goods (y) 11 63 5 57 1 48 4 54 5 45 3 51 Q.1.1 By completing the table below and then applying the relevant formulae, determine the line of best fit for this bivariate data set. Do NOT change the units for the variables. X y X2 xy Ex= Ey= EX2 EXY= Q.1.2 Estimate the number of units of the good that would have been produced last month by an employee with 8 years of experience. Q.1.3 Using your calculator, determine the coefficient of correlation for the data set. Interpret your answer. Q.1.4 Compute the coefficient of determination for the data set. Interpret your answer.arrow_forwardCan you answer this question for mearrow_forward
- Techniques QUAT6221 2025 PT B... TM Tabudi Maphoru Activities Assessments Class Progress lIE Library • Help v The table below shows the prices (R) and quantities (kg) of rice, meat and potatoes items bought during 2013 and 2014: 2013 2014 P1Qo PoQo Q1Po P1Q1 Price Ро Quantity Qo Price P1 Quantity Q1 Rice 7 80 6 70 480 560 490 420 Meat 30 50 35 60 1 750 1 500 1 800 2 100 Potatoes 3 100 3 100 300 300 300 300 TOTAL 40 230 44 230 2 530 2 360 2 590 2 820 Instructions: 1 Corall dawn to tha bottom of thir ceraan urina se se tha haca nariad in archerca antarand cubmit Q Search ENG US 口X 2025/05arrow_forwardThe table below indicates the number of years of experience of a sample of employees who work on a particular production line and the corresponding number of units of a good that each employee produced last month. Years of Experience (x) Number of Goods (y) 11 63 5 57 1 48 4 54 45 3 51 Q.1.1 By completing the table below and then applying the relevant formulae, determine the line of best fit for this bivariate data set. Do NOT change the units for the variables. X y X2 xy Ex= Ey= EX2 EXY= Q.1.2 Estimate the number of units of the good that would have been produced last month by an employee with 8 years of experience. Q.1.3 Using your calculator, determine the coefficient of correlation for the data set. Interpret your answer. Q.1.4 Compute the coefficient of determination for the data set. Interpret your answer.arrow_forwardQ.3.2 A sample of consumers was asked to name their favourite fruit. The results regarding the popularity of the different fruits are given in the following table. Type of Fruit Number of Consumers Banana 25 Apple 20 Orange 5 TOTAL 50 Draw a bar chart to graphically illustrate the results given in the table.arrow_forward
- Q.2.3 The probability that a randomly selected employee of Company Z is female is 0.75. The probability that an employee of the same company works in the Production department, given that the employee is female, is 0.25. What is the probability that a randomly selected employee of the company will be female and will work in the Production department? Q.2.4 There are twelve (12) teams participating in a pub quiz. What is the probability of correctly predicting the top three teams at the end of the competition, in the correct order? Give your final answer as a fraction in its simplest form.arrow_forwardQ.2.1 A bag contains 13 red and 9 green marbles. You are asked to select two (2) marbles from the bag. The first marble selected will not be placed back into the bag. Q.2.1.1 Construct a probability tree to indicate the various possible outcomes and their probabilities (as fractions). Q.2.1.2 What is the probability that the two selected marbles will be the same colour? Q.2.2 The following contingency table gives the results of a sample survey of South African male and female respondents with regard to their preferred brand of sports watch: PREFERRED BRAND OF SPORTS WATCH Samsung Apple Garmin TOTAL No. of Females 30 100 40 170 No. of Males 75 125 80 280 TOTAL 105 225 120 450 Q.2.2.1 What is the probability of randomly selecting a respondent from the sample who prefers Garmin? Q.2.2.2 What is the probability of randomly selecting a respondent from the sample who is not female? Q.2.2.3 What is the probability of randomly…arrow_forwardTest the claim that a student's pulse rate is different when taking a quiz than attending a regular class. The mean pulse rate difference is 2.7 with 10 students. Use a significance level of 0.005. Pulse rate difference(Quiz - Lecture) 2 -1 5 -8 1 20 15 -4 9 -12arrow_forward
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL


