Hw7_Sol

pdf

School

Gwinnett Technical College *

*We aren’t endorsed by this school

Course

4115

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

12

Uploaded by BarristerStrawElk2529

Report
ISyE 4031 Homework 7 Solution 5.1 a. input = read.table ( "5-6.txt" , header= TRUE ) attach (input) library (car) ## Loading required package: carData cor (input[, - 6 ]) ## Xray BedDays Length Load Pop ## Xray 1.0000000 0.9071493 0.4466496 0.9073795 0.9104688 ## BedDays 0.9071493 1.0000000 0.6711095 0.9999040 0.9331680 ## Length 0.4466496 0.6711095 1.0000000 0.6711974 0.4628609 ## Load 0.9073795 0.9999040 0.6711974 1.0000000 0.9356913 ## Pop 0.9104688 0.9331680 0.4628609 0.9356913 1.0000000 model = lm (Hours ~ ., data= input) vif (model) ## Xray BedDays Length Load Pop ## 7.940593 8933.086501 4.279835 9597.570761 23.293856 The 3 largest simple correlation coefficients between predictors are: r BedDays,Load = 0 . 9999040 , r Load,P op = 0 . 9356913 , and r BedDays,P op = 0 . 9331680 . The 3 largest VIFs are: V IF Load = 9597 . 570761 , V IF BedDays = 8933 . 086501 , and V IF P op = 23 . 293856 . b. Load, BedDays, and Pop. c. model = lm (Hours ~ ., data= input) summary (model) ## ## Call: ## lm(formula = Hours ~ ., data = input) ## ## Residuals: ## Min 1Q Median 3Q Max ## -611.93 -431.41 -70.77 332.60 1576.06 ## ## Coefficients: 1
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1962.94816 1071.36170 1.832 0.0941 . ## Xray 0.05593 0.02126 2.631 0.0234 * ## BedDays 1.58962 3.09208 0.514 0.6174 ## Length -394.31412 209.63954 -1.881 0.0867 . ## Load -15.85167 97.65299 -0.162 0.8740 ## Pop -4.21867 7.17656 -0.588 0.5685 ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 642.1 on 11 degrees of freedom ## Multiple R-squared: 0.9908, Adjusted R-squared: 0.9867 ## F-statistic: 237.8 on 5 and 11 DF, p-value: 8.068e-11 Yes, b 4 = b Load = - 15 . 85167 . This says that Labor Hours goes down as Patient Load increases which is different from our intuition. d. Yes, from the R summary output in part c., we notice all the p-values related to t-tests are greater than 0.02 while the p-value associated with F(model) is only 8 . 068 × 10 - 11 . e. I. library (leaps) all_model<- regsubsets (Hours ~ ., data= input, nbest= 2 , method = "exhaustive" ) all_sum = summary (all_model) Rsq = round (all_sum $ rsq * 100 , digit= 3 ) adj_Rsq = round (all_sum $ adjr2 * 100 , digit= 3 ) Cp = round (all_sum $ cp, digit= 3 ) SSE = all_sum $ rss # Need to do further calculations to find S for each model. k = as.numeric ( rownames (all_sum $ which)) n = all_model $ nn S = round ( sqrt (all_sum $ rss / (n - (k + 1 ))), digit= 3 ) # Combine the measures with models cbind (Rsq, adj_Rsq, Cp, S, all_sum $ outmat) ## Rsq adj_Rsq Cp S Xray BedDays Length Load Pop ## 1 ( 1 ) "97.218" "97.033" "20.381" "957.856" " " "*" " " " " " " ## 1 ( 2 ) "97.15" "96.96" "21.2" "969.531" " " " " " " "*" " " ## 2 ( 1 ) "98.671" "98.482" "4.942" "685.169" "*" "*" " " " " " " ## 2 ( 2 ) "98.612" "98.413" "5.659" "700.418" "*" " " " " "*" " " ## 3 ( 1 ) "99.007" "98.778" "2.918" "614.779" "*" "*" "*" " " " " ## 3 ( 2 ) "98.94" "98.696" "3.714" "634.992" "*" " " "*" "*" " " ## 4 ( 1 ) "99.081" "98.775" "4.026" "615.489" "*" "*" "*" " " "*" ## 4 ( 2 ) "99.061" "98.748" "4.264" "622.094" "*" " " "*" "*" "*" ## 5 ( 1 ) "99.083" "98.667" "6" "642.088" "*" "*" "*" "*" "*" 2
II. Model 1 : Hours = β 0 + β 1 · Xray + β 2 · BedDays + β 3 · Length + ε Model 2 : Hours = β 0 + β 1 · Xray + β 2 · BedDays + β 3 · Pop + β 4 · Length + ε Model 1 has the smallest s = 614 . 779 and the largest value of ¯ R 2 = 98 . 778% . Model 1 has the smallest C p = 2 . 918 which is also less than k + 1 = 3 + 1 = 4 . III. df_hospital = data.frame ( Xray= 56194 , BedDays= 14077.88 , Pop= 329.7 , Length= 6.89 ) model1 = lm (Hours ~ Xray + BedDays + Length) model2 = lm (Hours ~ Xray + BedDays + Pop + Length) predict (model1, newdata= df_hospital, interval= "prediction" , level = 0.99 ) ## fit lwr upr ## 1 16064.55 13898.32 18230.79 predict (model2, newdata= df_hospital, interval= "prediction" , level = 0.99 ) ## fit lwr upr ## 1 16030.13 13828.37 18231.88 99% PI for Model 1 is [13898.32, 18230.79] 99% PI for Model 2 is [13828.37, 18231.88] Model 1 has the shortest 99% PI. IV. #Stepwise full.lm <- lm (Hours ~ . , data= input) min.lm <- lm (Hours ~ 1 , data= input) # step() uses AIC to select/drop a variable step_backward = step (full.lm, direction= backward ) ## Start: AIC=224.4 ## Hours ~ Xray + BedDays + Length + Load + Pop ## ## Df Sum of Sq RSS AIC ## - Load 1 10863 4545916 222.44 ## - BedDays 1 108962 4644015 222.80 ## - Pop 1 142465 4677517 222.93 ## <none> 4535052 224.40 ## - Length 1 1458572 5993624 227.14 ## - Xray 1 2853834 7388887 230.70 ## ## Step: AIC=222.44 ## Hours ~ Xray + BedDays + Length + Pop ## ## Df Sum of Sq RSS AIC ## - Pop 1 367483 4913399 221.76 ## <none> 4545916 222.44 ## - Length 1 2008920 6554835 226.66 ## - Xray 1 2874412 7420328 228.77 ## - BedDays 1 19070222 23616137 248.45 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## ## Step: AIC=221.76 ## Hours ~ Xray + BedDays + Length ## ## Df Sum of Sq RSS AIC ## <none> 4913399 221.76 ## - Length 1 1658984 6572383 224.71 ## - Xray 1 2628688 7542086 227.05 ## - BedDays 1 32726195 37639593 254.38 step_forward = step (min.lm, list ( upper= full.lm), direction= forward ) ## Start: AIC=294.17 ## Hours ~ 1 ## ## Df Sum of Sq RSS AIC ## + BedDays 1 480950232 13762309 235.27 ## + Load 1 480612694 14099847 235.68 ## + Xray 1 441952483 52760057 258.12 ## + Pop 1 437459361 57253180 259.51 ## + Length 1 165607159 329105382 289.24 ## <none> 494712540 294.17 ## ## Step: AIC=235.27 ## Hours ~ BedDays ## ## Df Sum of Sq RSS AIC ## + Xray 1 7189926 6572383 224.71 ## + Length 1 6220223 7542086 227.05 ## + Pop 1 1571656 12190652 235.21 ## <none> 13762309 235.27 ## + Load 1 162727 13599582 237.07 ## ## Step: AIC=224.71 ## Hours ~ BedDays + Xray ## ## Df Sum of Sq RSS AIC ## + Length 1 1658984 4913399 221.76 ## <none> 6572383 224.71 ## + Load 1 302660 6269722 225.91 ## + Pop 1 17547 6554835 226.66 ## ## Step: AIC=221.76 ## Hours ~ BedDays + Xray + Length ## ## Df Sum of Sq RSS AIC ## <none> 4913399 221.76 ## + Pop 1 367483 4545916 222.44 ## + Load 1 235881 4677517 222.93 From the above outputs, both Backward and Forward selection yeild “Hours ~ BedDays + Xray + Length” as the selected model. Therefore, Model 1 is chosen by both stepwise regression procedures. 4
V. Model 1 is the best model. It has the smallest s and C p , largest ¯ R 2 . and is chosen by both Backward and Forward stepwise regression. summary (model1) ## ## Call: ## lm(formula = Hours ~ Xray + BedDays + Length) ## ## Residuals: ## Min 1Q Median 3Q Max ## -687.40 -380.60 -25.03 281.91 1630.50 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1523.38924 786.89772 1.936 0.0749 . ## Xray 0.05299 0.02009 2.637 0.0205 * ## BedDays 0.97848 0.10515 9.305 4.12e-07 *** ## Length -320.95083 153.19222 -2.095 0.0563 . ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 614.8 on 13 degrees of freedom ## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9878 ## F-statistic: 432 on 3 and 13 DF, p-value: 2.894e-13 6.3 input = read.table ( "6-8.txt" , header= TRUE ) attach (input) plot.ts (Bikes, type= o , pch= 20 ) abline ( mean (Bikes), 0 , col= "red" ) 5
Time Bikes 5 10 15 10 20 30 40 50 a. The trend appears to be slightly linear. b. The seasonal variation appears to be constant. No transformation is required. c. n = length (Bikes); print (n) ## [1] 16 t = 1 : n; # You can create your own Dummy variables Q2 = rep ( c ( 0 , 1 , 0 , 0 ), 4 ) Q3 = rep ( c ( 0 , 0 , 1 , 0 ), 4 ) Q4 = rep ( c ( 0 , 0 , 0 , 1 ), 4 ) model_bike = lm (Bikes ~ t + factor (Q2) + factor (Q3) + factor (Q4)) summary (model_bike) ## ## Call: ## lm(formula = Bikes ~ t + factor(Q2) + factor(Q3) + factor(Q4)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.75 -0.25 -0.25 0.25 1.25 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.75000 0.42806 20.441 4.23e-10 *** 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## t 0.50000 0.03769 13.266 4.12e-08 *** ## factor(Q2)1 21.00000 0.47822 43.913 1.04e-13 *** ## factor(Q3)1 33.50000 0.48265 69.408 6.89e-16 *** ## factor(Q4)1 4.50000 0.48996 9.184 1.72e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 0.6742 on 11 degrees of freedom ## Multiple R-squared: 0.9983, Adjusted R-squared: 0.9977 ## F-statistic: 1645 on 4 and 11 DF, p-value: 3.439e-15 # Or you can use the following way. QT = rep ( c ( 1 , 2 , 3 , 4 ), 4 ) model_bike2 = lm (Bikes ~ t + factor (QT)) summary (model_bike2) ## ## Call: ## lm(formula = Bikes ~ t + factor(QT)) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.75 -0.25 -0.25 0.25 1.25 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.75000 0.42806 20.441 4.23e-10 *** ## t 0.50000 0.03769 13.266 4.12e-08 *** ## factor(QT)2 21.00000 0.47822 43.913 1.04e-13 *** ## factor(QT)3 33.50000 0.48265 69.408 6.89e-16 *** ## factor(QT)4 4.50000 0.48996 9.184 1.72e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 0.6742 on 11 degrees of freedom ## Multiple R-squared: 0.9983, Adjusted R-squared: 0.9977 ## F-statistic: 1645 on 4 and 11 DF, p-value: 3.439e-15 Yes, all the variables are significant because theire p-values are extremely small (much less than α = 0 . 01 ). e. df_t = data.frame ( t= 17 : 20 , QT= c ( 1 , 2 , 3 , 4 )); predict (model_bike2, newdata = df_t, interval= "prediction" , level= 0.90 ) ## fit lwr upr ## 1 17.25 15.73652 18.76348 ## 2 38.75 37.23652 40.26348 ## 3 51.75 50.23652 53.26348 ## 4 23.25 21.73652 24.76348 7
From the above R output, we know: ˆ y 17 = 17 . 25 ˆ y 18 = 38 . 75 ˆ y 19 = 51 . 75 ˆ y 20 = 23 . 25 f. ˆ y 17 = 8 . 75 + 0 . 5(17) + 21(0) + 33 . 5(0) + 4 . 5(0) = 17 . 25 ˆ y 18 = 8 . 75 + 0 . 5(18) + 21(1) + 33 . 5(0) + 4 . 5(0) = 38 . 75 g. For the R output in part e, we know: 90% PI for y 17 is [15 . 73652 , 18 . 76348] . We are 90% confident that sales of the TRK-50 Mountain Bike in the first quarter of year 5 will be between 15.73652 and 18.76348. 90% PI for y 18 is [37 . 23652 , 40 . 26348] . We are 90% confident that sales of the TRK-50 Mountain Bike in the second quarter of year 5 will be between 37.23652 and 40.26348. 90% PI for y 19 is [50 . 23652 , 53 . 26348] . We are 90% confident that sales of the TRK-50 Mountain Bike in the third quarter of year 5 will be between 50.23652 and 53.26348. 90% PI for y 20 is [21 . 73652 , 24 . 76348] . We are 90% confident that sales of the TRK-50 Mountain Bike in the fourth quarter of year 5 will be between 21.73652 and 24.76348. h. library (lmtest) # By default, dwtest() tests for positive autocorrelation. dwtest (model_bike) ## ## Durbin-Watson test ## ## data: model_bike ## DW = 2.2, p-value = 0.615 ## alternative hypothesis: true autocorrelation is greater than 0 The p-value of DW test is 0.615 which is much greater than α = 0 . 05 . Therefore, we fail to reject H 0 and conclude there is no positive autocorrelation. EXTRA df_t = data.frame ( t= 17 : 20 , QT= c ( 1 , 2 , 3 , 4 )); yhat = fitted (model_bike2) pred_y = predict (model_bike2, newdata = df_t) plot.ts (Bikes, type= o , pch= 20 , xlim= c ( 0 , 30 ), ylim= c ( 0 , 60 ), col= "red" ) lines (t, yhat, col= "blue" , pch= 20 , type= o ) lines ( 16 : 17 , c (yhat[ 16 ],pred_y[ 1 ]), col= "orange" , lty= 3 , pch= 20 ) lines ( 17 : 20 ,pred_y, col= "orange" , type= o , pch= 20 ) legend ( 21 , 60 , legend= c ( "Original" , "Fitted" , "Prediction" ), col= c ( "red" , "blue" , "orange" ), lty= c ( 1 , 1 , 1 ), lwd= 2 ) 8
Time Bikes 0 5 10 15 20 25 30 0 10 20 30 40 50 60 Original Fitted Prediction 6.6 a. input = read.table ( "6-10.txt" , header= TRUE ) attach (input) par ( mfrow= c ( 2 , 1 ), mai= c (. 8 ,. 8 ,. 2 ,. 8 )) plot.ts (Cases, type= o , pch= 20 ) abline ( mean (Cases), 0 , col= "red" ) plot.ts ( log (Cases), type= o , pch= 20 ) abline ( mean ( log (Cases)), 0 , col= "red" ) 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Time Cases 2 4 6 8 10 0 20 40 Time log(Cases) 2 4 6 8 10 0 1 2 3 The first plot shows an exponential increase. Therefore, the use of a growth curve model seems appropriate since it appears that the data might be described by the model: y t = β 0 ( β t 1 ) ε t b. Yes. the Log(Cases) looks like linear in the second plot of part a. c. y t = β 0 ( β t 1 ) ε t ln( y t ) = ln( β 0 ) + ln( β 1 ) t + ln( ε t ) ln( y t ) = α 0 + α 1 t + u t where α 0 = ln( β 0 ) , α 1 = ln( β 1 ) , u t = ln( ε t ) d. n= length (Cases); print (n) ## [1] 11 t = 1 : n; model_disease = lm ( log (Cases) ~ t) summary (model_disease) ## ## Call: 10
## lm(formula = log(Cases) ~ t) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.236598 -0.040363 -0.004716 0.072360 0.153371 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.54334 0.07344 -7.398 4.11e-05 *** ## t 0.38997 0.01083 36.012 4.86e-11 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 0.1136 on 9 degrees of freedom ## Multiple R-squared: 0.9931, Adjusted R-squared: 0.9923 ## F-statistic: 1297 on 1 and 9 DF, p-value: 4.859e-11 exp ( - 0.54334 ) ## [1] 0.5808051 exp ( 0.38997 ) ## [1] 1.476936 1. ˆ α 0 = - 0 . 54334 ˆ α 1 = 0 . 38997 ln( ˆ y t ) = - 0 . 54334 + 0 . 38997 t ˆ y t = e - 0 . 54334 · e 0 . 38997 t = 0 . 5808051 · e 0 . 38997 t 2. A point estimate of β 1 is ˆ β 1 = e ˆ α 1 = 1 . 476936 . Growth Rate = 100 × ( ˆ β 1 - 1) = 47 . 6936% . That is, we esiamte y t to be 47.6936% greater than y t - 1 . 3. ln_y12 = predict (model_disease, newdata= data.frame ( t= 12 )); print (ln_y12) ## 1 ## 4.13629 y12 = exp (ln_y12); print (y12) ## 1 ## 62.57026 The point prediction of ln( y 12 ) is ln(ˆ y 12 ) = 4 . 13629 . Therefore, the point prediction of y 12 is ˆ y 12 = e 4 . 13629 = 62 . 57026 . 4. ln_PI12 = predict (model_disease, newdata= data.frame ( t= 12 ), interval= "prediction" , level = 0.99 ); print (l 11
## fit lwr upr ## 1 4.13629 3.696742 4.575838 PI12 = exp (ln_PI12); print (PI12) ## fit lwr upr ## 1 62.57026 40.31575 97.10936 The 99% prediction interval for ln is [3.696742, 3.696742]. Thus, a 99% Prediction Interval for is [ e 3 . 696742 , e 3 . 696742 ] = [40 . 31575 , 97 . 10936] . We can be 99% confident that the number of reported cased of the disease in Month 12 will be at least 40.31575 (or roughly 40) cases and will be at most 97.10936 (or roughly 97) cases. 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help