Practice Problems Module 13 (Solutions)

pdf

School

Columbia University *

*We aren’t endorsed by this school

Course

V3020

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by SuperBatPerson921

P6104 Practice Problems (Solutions) Module 13: Effect Modification in Multiple Linear Regression and Introduction to Logistic Regression 1. An ecological study examined the effect of water fluoridation on tooth decay in 5 year old children using data collected at the level of the electoral ward (in Canada). The electoral wards included were in three areas where the water supply was either unfluoridated, artificially fluoridated, or naturally fluoridated. A multiple linear regression model was fitted with mean tooth decay in the ward as the outcome and with predictors Jarman underprivileged area score for each ward and fluoridation status (unfluoridated, artificially fluoridated or naturally fluoridated). A high Jarman score indicates an area with high deprivation. The authors reported that there was a significant interaction between the effects of Jarman score and water fluoridation on tooth decay. A graph similar to this was given (Jones et al. , 1997). (a) What is meant by interaction? In this example, interaction means that the mean change in tooth decay score for a one unit increase in Ward Jarman Score is different depending on what type of fluoridation status is present. (b) How would you interpret a statistically significant interaction here? If the interaction were statistically significant in this example, we would note that the mean tooth decay score increases much more quickly as Ward Jarman Scores increase in the areas with no fluoridation whereas the mean tooth decay score increases somewhat more steadily as Ward Jarman Scores increase in areas with Artificial or Natural fluoridation.

2. The data set “lowbwt”, contains information for a sample of 100 low birth weight infants born in two teaching hospitals in Boston, Massachusetts. Systolic blood pressure measurements are saved under the variable name sbp , gestational ages under gestage , the five-minute apgar score under apgar5 , and the gender of each infant under sex (0 = female; 1 = male). sbpdat <- read.table(“lowbwt.txt”, header = TRUE) sbp <- sbpdat$sbp gestage <- sbpdat$ gestage sex <- sbpdat$sex (a) Fit a multiple linear regression model with sbp as the response and gestational age, sex and the product of gestational age and sex as predictors. Write the estimated least squares regression line. linreg <- lm(sbp ~ gestage + sex + gestage*sex) summary(linreg) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 14.9805 15.2419 0.983 0.3282 gestage 1.0903 0.5254 2.075 0.0406 * sex -15.1570 27.7433 -0.546 0.5861 gestage:sex 0.5714 0.9569 0.597 0.5518 --- s ˆ bp = 14.9805 + 1.0903* gestage − 15.1570* sex + 0.5714* gestage * sex (b) What is the predicted sbp if the gestational age is 29 weeks and the sex is male? s ˆ bp = 14.9805 + 1.0903*29 − 15.1570*1 + 0.5714*29*1 = 48.0128 (c) One male baby had a gestational age of 29 weeks and an sbp of 43. What is this baby’s residual? Residual = Observed – Expected = 43 – 48.0128 = -5.0128 (d) On the same set of axes, sketch regression lines for the sbps of male and female babies. If sex = 1 (male) s ˆ bp = 14.9805 + 1.0903* gestage − 15.1570*1 + 0.5714* gestage *1 = − 0.1765 + 1.6617* gestage If sex = 0 (female) s ˆ bp = 14.9805 + 1.0903* gestage − 15.1570*0 + 0.5714* gestage *0 = 14.9805 + 1.0903* gestage

20 25 30 35 40 35 40 45 50 55 60 65 g.age sbp.m (e) Is sex an effect modifier for gestational age when considering sbp? No, it does not appear that sex is an effect modifier for gestational age since the interaction term has a p-value of 0.5515, which is large (greater than 0.05). Note that even though the plot above might suggest that there is an interaction, the data do not suggest that this a statistically significant interaction. -0.1765 + 1.6617 *gestage 14.9805 + 1.0903*gestage

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

3. The data set “apache.txt” contains information on 30 day mortality in a sample of septic patients as a function of their baseline APACHE II Score (an integer score from 0 to 71, higher scores correspond to more severe disease). Patients are coded as 1 or 0 depending on whether they are dead or alive in 30 days, respectively. APACHE II score can be thought of as a continuous measure, but since it consists of integer values we can construct a reasonably informative table to better understand the data. Additionally, a plot and two fitted models are provided. death APACHEII 0 1 0 1 0 2 1 0 3 3 1 4 11 0 5 6 3 6 11 3 7 8 4 8 17 5 9 30 3 10 15 5 11 26 5 12 12 5 13 19 13 14 18 7 15 11 7 16 16 8 17 19 8 18 6 13 19 8 7 20 7 6 21 8 9 22 2 12 23 6 7 24 3 8 25 4 7 26 4 2 27 2 5 28 2 1 29 3 4 30 1 4 31 0 3 32 0 3 33 0 1 34 0 1 35 0 1 36 0 1 37 0 1 41 1 0 Model 1: lm(formula = death ~ APACHEII) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.002446 0.050708 -0.048 0.962 APACHEII 0.025070 0.003009 8.331 9.66e-16 *** Model 2: glm(formula = death ~ APACHEII, family = "binomial") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.32521 0.27790 -8.367 < 2e-16 *** APACHEII 0.11673 0.01608 7.260 3.87e-13 ***

(a) The outcome of interest is the event of death. Suppose that we want to use a regression model to analyze the relationship between death and APACHE II score. Which of the two models above should we choose to analyze? Justify your response. Write down the population and fitted models you choose. Since the outcome of interest is binary, we should use the logistic regression model, Model 2. Let Y = 1 if died within 30 days, 0 otherwise and let p = P ( Y = 1 | APACHEII) . The population model is: log p 1 − p " # $ % & ' = β 0 + β 1 ⋅ APACHEII The fitted model is: log ˆ p 1 − ˆ p " # $ % & ' = ˆ β 0 + ˆ β 1 ⋅ APACHEII = − 2.32521 + 0.11673 ⋅ APACHEII (b) Interpret the intercept in Model 2. Additionally, provide an interpretation that does not involve logarithms. ˆ β 0 = − 2.32521 is the log odds of death for those with an APACHE II score of 0. The odds of death for those with APACHE II score of 0 is e ˆ β 0 = e − 2.32521 = 0.098 . (c) Use part (b) to find the estimated probability of death for a subject with an APACHE II score of 0. We know that odds = p 1 − p and so we rearrange the formula to obtain p = odds 1 + odds . From part (b) we have ˆ p = o ˆ dds 1 + o ˆ dds = 0.098 1 + 0.098 = 0.0893 . (d) Based on Model 1, what is the estimated probability of death within 30 days for a subject with an APACHE II score of 0? The fitted model is: ˆ p = ˆ β 0 + ˆ β 1 ⋅ APACHEII = − 0.002446 + 0.02507 ⋅ APACHEII The predicted probability of death for a subject with an APACHE II score of 0 is: ˆ p = − 0.002446 + 0.02507 ⋅ 0 = − 0.002446 which does not make sense.

(e) Interpret the coefficient for APACHEII in Model 2. Additionally, provide an interpretation that does not involve logarithms. ˆ β 1 = 0.11673 is the log odds ratio of death for a one unit increase in APACHEII score. The odds ratio of death for a one unit increase in APACHEII score is e ˆ β 1 = e 0.11673 = 1.12 . In other words, the odds of death for a subject with APACHEII score ( x + 1) are 12% greater than the odds of death for a subject with APACHEII score x . (f) Construct a 95% confidence interval for the population OR of death for a one year increase in APACHE II score. The confidence interval for the log OR is given by: ˆ β 1 ± 1.96 ⋅ ˆ se ( ˆ β 1 ) ⇒ 0.11673 ± 1.96 ⋅ 0.01608 ⇒ (0.0852, 0.1482) The confidence interval for the log OR is given by: ( e 0.0852 , e 0.1482 ) ⇒ (1.09,1.16) (g) Based on Model 2, what is the estimated probability of death within 30 days for a subject with an APACHE II score of 20? ˆ P ( Y = 1 | APACHEII = 20) = e − 2.32521 + 0.11673 ⋅ 20 1 + e − 2.32521 + 0.11673 ⋅ 20 = e 0.00939 1 + e 0.00939 = 1.01 2.01 = 0.502

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4. In the early 1990s studies were conducted to determine the effects of AZT in slowing the development of AIDS symptoms. In one study, 338 veterans whose immune systems were beginning to falter after infection with the AIDS virus were randomly assigned to receive AZT immediately or to wait until their T cells showed severe immune weakness. The data can be found in the file “AZT”. Below is a 2x2x2 table that shows cross classification of veteran’s race, whether AZT was administered immediately, and whether AIDS symptoms developed during the three year study period. The three variables are Race (1 for whites; 0 for blacks), AZT (1 if taken immediately; 0 if waited until low T cell count), and AIDS (1 if symptoms developed; 0 if not). AIDS Symptoms Race AZT Yes (1) No (0) White (1) Yes (1) 14 93 No (0) 32 81 Black (0) Yes (1) 11 52 No (0) 12 43 (a) Run the logistic regression in R for these data (AIDS is the response variable and Race and AZT are the explanatory variables. The response variable is AIDS, whether the veteran developed AIDS symptoms or not. The explanatory variables are RACE and AZT. The response is a dichotomous/binary outcome. Therefore, fitting a logistic regression model is appropriate. aztdata <- read.table(“AZT.txt”, header = TRUE) azt <- aztdata$azt aids <- aztdata$aids race <- aztdata$race logitreg <- glm(aids ~ azt + race, family = binomial) (b) Write the fitted model. Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.07357 0.26294 -4.083 4.45e-05 *** azt -0.71946 0.27898 -2.579 0.00991 ** race 0.05548 0.28861 0.192 0.84755 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 The fitted model is: log ˆ p 1 − ˆ p " # $ % & ' = ˆ β 0 + ˆ β 1 ⋅ azt + ˆ β 2 ⋅ race = − 1.07357 − 0.71946 ⋅ azt + 0.05548 ⋅ race

(c) Interpret the intercept of -1.0736. Do you think that the intercept is interpretable in this study? -1.07357 is the log odds of developing AIDS symptoms for veterans who received AZT when their T cell counts were low and who were black. So the odds are e − 1.07357 = 0.342 . Since this is a prospective study, the intercept is interpretable. (d) What is the log odds of having AIDS symptoms for veterans who were given AZT immediately and who are black? log ˆ p 1 − ˆ p " # $ % & ' = − 1.07357 − 0.71946 1 ( ) + 0.05548 0 ( ) = − 1.79303 (e) Interpret the estimated coefficient for AZT, -0.7195. -0.7195 is the log odds ratio of developing AIDS symptoms for veterans who received AZT immediately compared to those who did not, adjusting (controlling) for race. (f) What is the adjusted odds ratio of having AIDS symptoms for patients that received AZT immediately vs. those who received it after their T cell count was lowered? Interpret. e − 0.7195 = 0.49 is the adjusted odds ratio. This means that, adjusting for race, veterans who received AZT immediately have a 51% lower odds of developing AIDS symptoms compared to veterans who received AZT when their T cell count was low. OR Adjusting for race, the odds of developing AIDS symptoms in veterans who received AZT immediately is 0.49 times that of veterans who received AZT when their T cell count was low. (g) Adjusted for race, is AZT a significant predictor for developing AIDS symptoms? Yes, after adjustment for race, AZT administration is a significant predictor of developing AIDS symptoms at the 5% significance level since the corresponding p-value for the AZT coefficient is 0.00991 < 0.05. (h) Interpret the coefficient for race, 0.055. 0.055 is the log odds ratio of developing AIDS symptoms for white veterans compared to black veterans, adjusting for AZT administration. (i) What is the adjusted odds ratio of having AIDS symptoms for patients who are white vs. those who are black? Interpret. e 0.055 = 1.06 is the adjusted odds ratio.

This means that, adjusting for AZT administration, veterans who were white had a 6% greater odds of developing AIDS symptoms compared to veterans who were black. OR Adjusting for AZT administration, the odds of developing AIDS symptoms in veterans who were white is 1.06 times that of veterans who were black. (j) In the model, is RACE a significant predictor for developing AIDS symptoms? No, after adjusting for AZT administration, race is not a significant predictor for the development of AIDS symptoms at the 0.05 significance level since the corresponding p- value for the race coefficient is 0.84755 > 0.05.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Practice Problems Module 13 (Solutions)

Related Documents