Homework3_ungrouped

doc

School

University of Florida *

*We aren’t endorsed by this school

Course

4504

Subject

Statistics

Date

Feb 20, 2024

Type

doc

Pages

6

Uploaded by gogoyes1234

Report
1 . Σ Please turn in this assignment by Tuesday, March 23rd at 11:59pm. 1. Below, we have data to examine the effect of age and smoking status on breathing test results among industrial workers in Houston, TX. The data can be found on the website in the file titled Problem1.csv. The age variable is set to age=0 if age is less than 40 and age=1 if age is between 40 and 59. The smoking variable takes values 0,1,2 representing never smoked, former smoker, and current smoker, respectively. Lastly the outcome is coded such that normal breathing is breathing=1, borderline is breathing=2, and abnormal is breathing=3. Breathing Test Results Former Smoker 145 15 7 Current Smoker 245 47 27 Fit a baseline category logit model to this data. Using the third group (abnormal) as the baseline, fit the following model log π j = α π 3 + β j 1 I 1 + β j 2 I 2 + β j 3 age for j = 1 , 2 where I 1 is an indicator that smoking=1 and I 2 is an indicator that smoking=2 (a) Interpret your estimate of β 12 from the model. For current smokers, the estimated odds of normal breathing test vs. abnormal breathing test are exp(-1.34637)=0.2602 times those for never smoker, adjusting for age. The normal breathing test results are less likely than abnormal results for current smokers(smoking=2) compared to those who never smoked. (b) Perform a likelihood ratio test of whether smoking status is associated with breathing test results. Write down the null and alternative hypothesis. Give the test statistic and what distribution it has under the null. Make a conclusion regarding the hypothesis test. j Age Smoking Status Normal Borderline Abnormal < 40 Never Smoked 577 27 7 Former Smoker 192 20 3 Current Smoker 682 46 11 40 59 Never Smoked 164 4 0
2 The null hypothesis: βj1=βj2=0, j=1,2 The alternative hypothesis: Either βj1≠0 or βj2 ≠0. The likelihood-ratio test statistic compares the two model by difference in deviances. The difference of deviances (test statistic) of 30.547 has df=4. The p-value is <0.0001. There is enough evidence to reject the null hypothesis at the significance level of 0.5. Therefore, smoking status is associated with breathing test results. (c) What is the estimated probability of having a normal breathing test result for a person who is 35 years old and never smoked? Explain how you got this probability. (d) According to your model, what is the expected number of subjects who would have a normal breathing test result among those who are under 40 and are never smokers? The predictive probability of having a normal breathing test result for
3 those who are under 40 and are never smokers is 0.9594662. The expected number is 0.9594662*611=586 (e) If we wanted to run a goodness of fit test on this model, how many degrees of freedom would the test statistic have? Why? The test statistic has df=6. I used grouped data. Because the saturated model has 12 parameters. The null model has 2 parameters, so the null model has df= 10. This model has 8 parameters, so this model has df=4(=12-8). the test statistic has df=6 (=10-4). However, in this contingency table, some cell counts are less than 5. Even one cell count is zero. So we cannot run a goodness of fit test. (f) If you wanted to fit the saturated model to this data set, what terms would you add to the model above? Do you think we have sufficient data to fit the saturated model in this case? I would add the interaction terms between smoking status and age for the saturated model. I think we do not have sufficient data to fit the saturated model in this case because some cell counts are less than 5 in the grouped data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
. Σ The following questions will be based on the following cumulative logit model fit to the same data: log P ( Y j ) = α P ( Y > j ) j + β 1 I 1 + β 2 I 2 + β 3 agefor j = 1 , 2 (g) Is there any way to compare whether this model fits the data better than the previous baseline-category logit model? If so, do so. If not, explain why not. Yes, we can compare them using ROC. ROC for the baseline- category logit model fit is 0.6162 and ROC for the cumulative logit model is 0.6207. It seems the cumulative logit model fits better. (h) Test whether the proportional odds assumption holds. Write down the null and alter- native hypothesis. Give the test statistic and what distribution it has under the null. Make a conclusion regarding the hypothesis test. We can perform an LRT to compare two models. The null hypothesis: β 11 = β 21, β 12 = β 22, β 13 = β 23 The alternative hypothesis: Either β 11 ≠ β 21 or β 12 ≠ β 22 or β 13 ≠ β 23 At α=0.5, we fail to reject the null hypothesis because the test statistic is 5.5269 with df=3, corresponding p-value=0.137. It follows chi-squared distribution. Therefore, the ordinal model fits the data better. The proportional odds assumption is reasonable.
(i) What is the estimated probability of a borderline breathing result for a current smoker between the ages of 40 and 59? (j) Construct a point estimate and 95% confidence interval for the following quantity: P ( Y j | Smoking = 2, Age=1) / (1 P ( Y j | Smoking = 2, Age=1)) P ( Y j | Smoking = 0, Age=0) / (1 P ( Y j | Smoking = 0,
Age=0)) Explain how you got both your estimate and confidence interval.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help