Final 2022 Solutions

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

9

Uploaded by MegaRain11856

Report
Name: unique name: BIOSTAT 651 An Introduction to Generalized Linear Models Winter 2022 Mid Term Exam Key Wednesday, March 23, 2022 If you have any questions about a problem, please ask the instructors. Show all your work for partial points unless indicated otherwise.
Question Points Possible Points Received 1 25 2 35 3 40 Total 100 2
Table of 95th percentiles of various χ 2 df distributions : df χ 2 df, 0 . 95 1 3.84 2 5.99 3 7.81 4 9.49 5 11.07 6 12.59 7 14.07 8 15.51 9 16.92 10 18.31 Note: P ( χ 2 df χ 2 df,x ) = x 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 (25 points, total). Consider the problem you solved in Homework #3 in which you analyzed a retrospective study that was carried out by the University of Adelaide on a random sample of graduate students. Each student was followed for 50 years after graduation and classified as dead or alive. The following model is of interest: log π i 1 π i = β 0 + β 1 ( Y EAR i 1900) + β 2 ART i + β 3 MED i + β 4 ENG i , where Y EAR i is the year of graduation; ART i = 1 if the student graduates from the Arts Department and 0 otherwise; MED i and ENG i are defined analogously. Note that π i = π ( x i ) = P ( Y i = 1 | x i ) with Y i = 1 if the graduate survives (0 if not). A summary of the R-output is provided below (a) (5 points) Based on the R output provide an interpretation for exp( ˆ β 2 ). exp( ˆ β 2 ) = exp( 0 . 8605) = 0 . 4230. Thus, the odds ratio for students graduate from the Art Department versus students who graduate from other department is 0.43 if all other covariates are held constant. (b) (5 points) Derive a 95% confidence interval for exp( β 2 ). The 95% confidence interval for β 2 is given by ˆ β 2 ± 1 . 96 b se( ˆ β 2 ) = 0 . 8605 ± 1 . 96 0 . 2336 = ( 1 . 3184 , 0 . 4026) . The 95% confidence interval for exp( β 2 ) is given by (exp( 1 . 3184) , exp( 0 . 4026)) = (0 . 2676 , 0 . 6686) . Alternatively, delta method can also be used here. (c) (5 points) For students who graduated in 1900 from the Arts department, compute the Risk Ratio (RR) estimate for graduate survives for students who graduated from the Arts Department versus students who graduated from the Science department (i.e. ART=0, MED=0 and ENG=0). Interpret your results. P ( Y i = 1 | Y EAR i = 1900 , ART i = 1 , MED i = 0 , ENG i = 0) = exp( ˆ β 0 + ˆ β 2 ) 1 + exp( ˆ β 0 + ˆ β 2 ) = exp( 0 . 66 0 . 86) 1 + exp( 0 . 66 0 . 86) = 0 . 1795 . 4
P ( Y i = 1 | Y EAR i = 1900 , ART i = 0 , MED i = 0 , ENG i = 0) = exp( ˆ β 0 ) 1 + exp( ˆ β 0 ) = exp( 0 . 66) 1 + exp( 0 . 66) = 0 . 3407 RR = 0 . 1795 / 0 . 3407 = 0 . 5269 . (d) (5 points) Construct a contrast matrix for the hypothesis test H 0 : β 2 = β 4 = 0. Derive a formula for the corresponding Wald test. What is the asymptotic distribution of this test statistics under the null hypothesis? Note: Explain how you would conduct this test; you do not have sufficient information to perform this test. The contrast matrix is given by C = 0 , 0 , 1 , 0 , 0 0 , 0 , 0 , 0 , 1 Under H 0 : C β = 0 , the test statistic is given by X 2 W = ( C b β ) T n b V ( C b β ) o 1 ( C b β ) = ( C b β ) T n C b V ( b β ) C T o 1 ( C b β ) χ 2 2 (e) (5 points) Suppose Student “X” graduated in 1925 and the graduation year for Student “Y” is unknown; they both belong to the same department. The odds of Student X surviving is 53.75% greater than that of Student Y. Determine the graduation year of Student Y. If exp(0 . 043 z ) = 1 . 5375, we have z = 10. Thus, Student 2 graduated in the year: 1925-10=1915. 5
2. (35 points, total) A retrospective cohort study was carried out to study factors affecting chronic obstructive pulmonary disease (COPD) risk. The observed data consist of ( Y i ; S i , P i ); i = 1 , . . . , n , where Y i = 1 for subjects with COPD (0 other- wise); S i = 1 for smokers (0 otherwise) and P i = 1 is an indicator for residence in a zip code consisted to be highly polluted. The total sample size was n = 200, with the observed data summarized by the following tables: For non-smokers ( S i = 0) Y i =0 Y i =1 total P i =0 35 15 50 P i =1 30 20 50 total 65 35 100 For smokers ( S i = 1) Y i =0 Y i =1 total P i =0 20 30 50 P i =1 10 40 50 total 30 70 100 (a) (20 points) Using the information above estimate the following model: log π i 1 π i = β 0 + β 1 S i + β 2 P i + β 3 S i P i , where π i = P ( Y i = 1 | S i , P i ). ˆ P ( Y i = 1 | S i = 0 , P i = 0) = 0 . 3 ˆ β 0 = logit( ˆ P ( Y i = 1 | S i = 0 , P i = 0)) = 0 . 8473 ˆ P ( Y i = 1 | S i = 0 , P i = 1) = 0 . 4 ˆ β 0 + ˆ β 2 = logit( ˆ P ( Y i = 1 | S i = 0 , P i = 1)) = 0 . 4055 ˆ β 2 = 0 . 4418 ˆ P ( Y i = 1 | S i = 1 , P i = 0) = 0 . 6 ˆ β 0 + ˆ β 1 = logit( ˆ P ( Y i = 1 | S i = 1 , P i = 0)) = 0 . 4055 ˆ β 1 = 1 . 2528 ˆ P ( Y i = 1 | S i = 1 , P i = 1) = 0 . 8 ˆ β 0 + ˆ β 1 + ˆ β 2 + ˆ β 3 = logit( ˆ P ( Y i = 1 | S i = 1 , P i = 1)) = 1 . 3863 ˆ β 3 = 0 . 5390 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(b) (5 points) The investigator has access to an richer data set, containing S i (average number of cigarettes smoked per day) and P i (average pollution level to which the subject was exposed), with both S i and P i coded as continuous covariates. The new data file consists of 200 records (one per subject), ( Y i , S i , P i ), and the following model is considered: log π i 1 π i = β 0 + β 1 S i + β 2 P i Write out the likelihood function, L ( β ), where β = ( β 0 , β 1 , β 2 ) T . L i ( β ) = π Y i i (1 π i ) 1 Y i , where π i = exp( β 0 + β 1 S i + β 2 P i ) 1 + exp( β 0 + β 1 S i + β 2 P i ) . L ( β ) = n Y i =1 L i ( β ) . (c) (5 points) The variance of b β in (b) can be estimated through the matrix of the form d Var( b β ) = ( ABC ) 1 . Give expressions for each of the components: A , B and C . I ( b β ) = X T V X , where V = diag ( v π 1 ) , ..., v π n )) and v π i ) = ˆ π i (1 ˆ π i ). Since d Var( b β ) = I ( b β ) 1 , we have A = X T , B = V and C = X . (d) (5 points) Describe an Iteratively Re-weighted Least Squares (IRWLS) algorithm for computing b β . Initial estimate: b β j = 0 ˆ η i,j = x T i c β j , where x i = (1 , S i , P i ) T ˆ π i,j = exp(ˆ η i,j ) 1+exp(ˆ η i,j ) v π i,j ) = ˆ π i,j (1 ˆ π i,j ) ˆ V j = diag ( v π 1 ,j ) , ..., v π n,j )) ˆ Z j = ˆ η j + ˆ V 1 j ( Y ˆ π j ) b β j +1 = ( X T ˆ V j X ) 1 X T ˆ V j ˆ Z j Iterate until convergence, i.e. || ˆ β j +1 ˆ β j || < ϵ ( e.g. 10 5 ). 7
3. (40 points, total) Children’s retinoblastoma is a rare disease. In a case-control study of children’s retinoblastoma for effect of risk factors of gene (A) and maternal folate intake (X) during pregnancy, the children with the disease (cases) and normal children (control group) were assessed at a similar age. The data collected include the children’s genotype A (factor variable with 3 groups: 0, 1, or 2 specific alleles) and maternal folate intake during pregnancy with quantities estimated from the food questionnaires administered to the children’s mothers. Logistic regression models fitted to the data had predictors X as continuous variable and dummy variables for the genotype groups. Below is a summary of the deviances and degrees of freedom (DF) obtained from four models: M0: Intercept only, D 0 = 294 . 65 and DF = 214; M1: Genotype groups, D 1 = 285 . 25 and DF = 212; M2: Intercept and log( X ), D 2 = 290 . 72 and DF = 213; M3: X and genotype groups, D 3 = 281 . 71 and DF = 211. (a) (5 points) What is the sample size of this study? N = df + p = 215 (b) (5 points) Write out the models and show the nesting structure (i.e. hierarchy) of the models: M0, M1, M2 and M3. M0 is nested within M1 within M2 within M3 and so on (c) (10 points) From the information provided by the deviances of M0, M1, M2, M3, is there association between child’s retinoblastoma and maternal folate intake during pregnancy? (Justify your answer) Deviance analysis: Likelihood ratio test (LRT) for effect of maternal folate intake during pregnancy: Ignoring genotype, TS = D 0 D 2 = 3 . 93, df=1 and it is greater than 3.84 (critical value), hence reject null hypothesis of no effect of folate. Adjusting for genotype TS = D 1 D 3 = 3 . 54, df=1 2 . 71 < 3 . 54 < 3 . 84, 0 . 05 < p < 0 . 10. The effect is suggestive. 8
(d) (20 points) Assume we focus on M3: log Odds of retinoblastoma = β 0 + β 1 X + β 2 A 1 + β 3 A 2 , where A 1 is the indicator of a child having one allele (i.e., if the child had one allele, A 1 = 1; otherwise, A 1 = 0) and A 2 is the indicator of a child having two alleles. Furthermore, after fitting the model, we have Covariance Matrix Variable Parameter Estimate(95% CI) β 0 β 1 β 2 β 3 Intercept β 0 -2.855 (-5.054, -0.655) 1.2594 -0.3414 -0.0923 -0.0820 X β 1 -0.578 (-1.194, 0.038) -0.3414 0.0988 0.0036 0.0006 A 1 β 2 0.687 (0.010, 1.364) -0.0923 0.0036 0.1194 0.0798 A 2 β 3 1.174 (0.382, 1.967) -0.0820 0.0006 0.0798 0.1635 Using the above information for M3, what is the odds-ratio of retinoblastoma in children with two alleles vs those with one allele when they had X = 1 . 8? Also, provide an interpretation. Also, calculate the the 95% CI for the odds ratio given the information above. Is the OR an appropriate estimate of risk ratio for this study? OR = exp( β 3 β 2 ) = exp(1 . 174 0 . 687) = exp(0 . 487) = 1 . 627 b se ( β 3 β 2 ) = p (0 . 1194 + 0 . 1635 2 × 0 . 0798) = 0 . 351 95% CI =(exp( . 487 1 . 96 × . 351) , exp( . 487 + 1 . 96 × . 351)) =(exp( 0 . 201) , exp(1 . 175)) = (0 . 818 , 3 . 238) For rare disease, risk ratio odds ratio (OR) – so yes it is reasonable. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help