Hw6_Sol

pdf

School

Gwinnett Technical College *

*We aren’t endorsed by this school

Course

4115

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

11

Uploaded by BarristerStrawElk2529

Report
ISyE 4031 Homework 6 Solution Spring 2021 5.9 I. II. III. ## ## Call: ## lm(formula = ServTime ~ Desktops) ## ## Residuals: ## Min 1Q Median 3Q Max ## -83.505 -10.953 -3.453 12.043 56.510 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 14.431 17.631 0.818 0.428 ## Desktops 22.007 3.114 7.068 8.44e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 34.91 on 13 degrees of freedom ## Multiple R-squared: 0.7935, Adjusted R-squared: 0.7776 ## F-statistic: 49.96 on 1 and 13 DF, p-value: 8.439e-06 2 4 6 8 10 50 150 250 a. Scatter Plot Desktops ServTime 2 4 6 8 10 -100 0 50 100 b. Residual Plot Desktops Residuals In the Scatter plot, the points seem to fan out as the number of desktops increases. The service time appears to vary more when more desktops are being serviced. In the Residual plot, the points fan out. The variation of the residuals is greater for greater number of desktops. 1
IV. ## ## Call: ## lm(formula = ServTime ~ Desktops) ## ## Residuals: ## Min 1Q Median 3Q Max ## -83.505 -10.953 -3.453 12.043 56.510 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 14.431 17.631 0.818 0.428 ## Desktops 22.007 3.114 7.068 8.44e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 34.91 on 13 degrees of freedom ## Multiple R-squared: 0.7935, Adjusted R-squared: 0.7776 ## F-statistic: 49.96 on 1 and 13 DF, p-value: 8.439e-06 ## ## Call: ## lm(formula = ST1 ~ Desktops) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.2652 -0.6267 0.1362 0.7809 2.1009 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.8630 0.7169 8.178 1.76e-06 *** ## Desktops 0.9690 0.1266 7.654 3.61e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 1.42 on 13 degrees of freedom ## Multiple R-squared: 0.8184, Adjusted R-squared: 0.8044 ## F-statistic: 58.59 on 1 and 13 DF, p-value: 3.615e-06 ## ## Call: ## lm(formula = ST2 ~ Desktops) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.46930 -0.07589 0.02233 0.11031 0.31801 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.50004 0.11238 22.247 9.87e-12 *** ## Desktops 0.14747 0.01985 7.431 4.97e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## 2
## Residual standard error: 0.2225 on 13 degrees of freedom ## Multiple R-squared: 0.8094, Adjusted R-squared: 0.7948 ## F-statistic: 55.22 on 1 and 13 DF, p-value: 4.967e-06 ## ## Call: ## lm(formula = ST3 ~ Desktops) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.62770 -0.07817 0.03676 0.17841 0.39872 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.74070 0.15145 24.699 2.61e-12 *** ## Desktops 0.18284 0.02675 6.836 1.19e-05 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 0.2999 on 13 degrees of freedom ## Multiple R-squared: 0.7824, Adjusted R-squared: 0.7656 ## F-statistic: 46.73 on 1 and 13 DF, p-value: 1.195e-05 IV.1 Normal Probability Plots -1 0 1 -80 -40 0 40 y Theoretical Quantiles -1 0 1 -3 -1 0 1 2 y Theoretical Quantiles -1 0 1 -0.4 0.0 0.2 y 0.25 -1 0 1 -0.6 -0.2 0.2 ln ( y 29 ## AD Test for Residuals- y as response ## ## Anderson-Darling normality test 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## ## data: resid(model_Service) ## A = 0.45165, p-value = 0.2352 ## AD Test for Residuals- sqrt(y) as response ## ## Anderson-Darling normality test ## ## data: resid(model_Service1) ## A = 0.26974, p-value = 0.6253 ## AD Test for Residuals- y^0.25 as response ## ## Anderson-Darling normality test ## ## data: resid(model_Service2) ## A = 0.42469, p-value = 0.2758 ## AD Test for Residuals- ln(y) as response ## ## Anderson-Darling normality test ## ## data: resid(model_Service3) ## A = 0.54869, p-value = 0.1303 From the above normal probability plots, all of them appear to be normal. However, the plot with the square root transformation seems to handle the deviated points at both ends of the line the best, while the other three plots have a few points deviated from the line significantly. Results of Anderson-Darling Test on Residuals Transformation p-value y 0.2352 y 0 . 5 0.6253 y 0 . 25 0.2758 ln ( y ) 0.1303 All of the p-values are greater than 0.1 which indicates the residuals are normally distributed for all of the 4 models. However, as you may notice, the square root transformation has the largest p-value while the natural log transformation has the smallest p-value which is silghtly greater than 0.1. 4
VI.2 Residual Plots 2 4 6 8 10 -100 0 50 Desktops e, y 2 4 6 8 10 -4 -2 0 2 4 Desktops e, y 2 4 6 8 10 -0.4 0.0 0.4 Desktops e, y 0.25 2 4 6 8 10 -0.5 0.0 0.5 Desktops e, ln ( y 29 From the above residual plots, though there is still violation against the constant variance assumption, transformations on ServTime do help to make the residual plots appear evenly spread while the original residual plot shows strong pattern of fan-out. [Note] R 2 Comparisons We list all the R 2 ’s in the following table. It turns out the model with the square root transformation on ServTime has the highest R 2 . This is also an indication that the square root transformation works better than the original model as well as the other transformation models. Transformation R 2 Max y 0.7935 y 0 . 5 0.8184 * y 0 . 25 0.8094 ln ( y ) 0.7824 5
5.15 a. -2 -1 0 1 2 -600 0 400 Normal Q-Q Plot Theoretical Quantiles Sample Quantiles 0 5000 10000 15000 -600 0 400 Fitted Residuals 0 5000 10000 15000 -600 0 400 BedDays Residuals 4 5 6 7 8 9 10 11 -600 0 400 Length Residuals b. ## Hat Values ## 1 2 3 4 5 6 7 ## 0.12075307 0.23505170 0.12968114 0.15876352 0.08688092 0.11440248 0.08606574 ## 8 9 10 11 12 13 15 ## 0.08353573 0.08762352 0.13502742 0.08333981 0.17799893 0.06633064 0.71443718 ## 16 17 ## 0.78675138 0.93335684 ## 2*(k+1)/n ## [1] 0.5 ## Hat Values > 0.5 ## 15 16 17 ## 0.7144372 0.7867514 0.9333568 2 × ( k +1) n = 8 16 = 0 . 5 Hospitals 15, 16, and 17 are outliers with respect to their x values. c. ## Studentized Deleted Residuals ## 1 2 3 4 5 6 7 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## -0.3329753 0.4035826 0.1607065 1.2335524 0.4249297 -0.7952567 0.6766342 ## 8 9 10 11 12 13 15 ## 1.1170641 -1.0782642 -1.3590574 1.4611929 -2.2241117 -0.6851192 -0.1374642 ## 16 17 ## 1.2537188 0.5966190 ## t_{0.025,16-(3+2)} ## [1] 2.200985 ## |SDR|>2.200985 ## 12 ## -2.224112 Hospital 12 is an outlier with respect to its y value since its Studentized Deleted Residual (-2.224112) is less than - t (11) [0 . 025] = - 2 . 200985 . d. ## Cooks Distance ## 1 2 3 4 5 6 ## 0.004111350 0.013450584 0.001047070 0.068803012 0.004609869 0.021069992 ## 7 8 9 10 11 12 ## 0.011288649 0.027859603 0.027541648 0.067330758 0.044335089 0.201515928 ## 13 15 16 17 ## 0.008722368 0.012871368 1.383805243 1.316994321 ## F0.5 ## [1] 0.8884783 ## F0.8 ## [1] 0.4073066 ## CooksD>F0.5 ## 16 17 ## 1.383805 1.316994 Hospital 16 and 17 are influential since their Cook’s Distances are greater than F (4 , 12) [0 . 5] = 0 . 8884783 . e. ## Cooks Distance of Hospital 16 for model without Hospital 14 ## 16 ## 1.383805 ## Cooks Distance of Hospital 17 for model with Hospital 14 (Full model) ## 17 ## 5.03294 Yes, Cook’s D for hospital 16 when hospital 14 is removed is 1.383805 , which is considerably less than 5.033 for hospital 17 when hospital 14 is included. We basically compare the two most influential points in the two models and find out that after removing hospital 14, the most influential point (Hospital 16) become less inflential than the most inflential point (Hospital 17) in the original model. 7
5.16 a. ## ## Call: ## lm(formula = Hours ~ Xray + BedDays + Length + DL) ## ## Residuals: ## Min 1Q Median 3Q Max ## -485.91 -204.70 68.77 183.74 727.16 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2462.21640 501.98970 4.905 0.000363 *** ## Xray 0.04816 0.01193 4.037 0.001649 ** ## BedDays 0.78432 0.07331 10.699 1.72e-07 *** ## Length -432.40947 93.35426 -4.632 0.000578 *** ## DL 2871.78284 573.06176 5.011 0.000304 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 363.9 on 12 degrees of freedom ## Multiple R-squared: 0.9968, Adjusted R-squared: 0.9957 ## F-statistic: 931.2 on 4 and 12 DF, p-value: 7.656e-15 D L = 1 , Large Hospital 0 , Non-Large Hospital The mean monthly labor hours for a large hospital will exceed those for small (not large) hospital by 2871.7828 hours when values of the other variables remain the same. Since the p-value = 0.0003 < 0.001, we have very strong evidence that value of is statistically different from 0. b. ## SDR for Hospoital 14 ## 14 ## 1.405802 ## t_(0.025,11) ## [1] 2.200985 Because | 1 . 405802 | < 2 . 200985 , we don’t have evidence that Hospital 14 is an outlier with respect to its y value. c. ## Cooks Distance with Dummy variable - DL ## 1 2 3 4 5 6 ## 0.0699061232 0.0034984938 0.0223768125 0.0022035895 0.0009130147 0.0525707595 ## 7 8 9 10 11 12 ## 0.0074281942 0.0184548837 0.0049568588 0.0092883630 0.1152212446 0.0245065598 ## 13 14 15 16 17 8
## 0.0074872707 0.8768968657 1.4216976841 0.8978682961 0.7377364159 ## Cooks Distance of Hospital 17 in the original model ## 17 ## 5.03294 Now Hospital 15 has the largest Cook’s D which is 1.4216976841. Yes, it’s much smaller than 5.03294 which is the Cook’s D of Hospital 17 in the original model. Therefore, Hospital 15 is less influential. In addition, after adding the dummy variable, Hospital 17 is no longer influential (its Cooks’ D is reduced to 0.7377364159 from 5.03294). d. h_df = data.frame ( Xray= 56194 , BedDays= 14077.88 , Length= 6.89 , DL= 1 ) PI1= predict (model_hospital, newdata= h_df, interval= "prediction" ) print (PI1) ## fit lwr upr ## 1 16064.55 14510.96 17618.15 PI1[ 3 ] - PI1[ 2 ] ## [1] 3107.196 PI2= predict (model_hospital_ 14 , newdata= h_df, interval= "prediction" ) print (PI2) ## fit lwr upr ## 1 15896.25 14906.24 16886.26 PI2[ 3 ] - PI2[ 2 ] ## [1] 1980.022 PI3= predict (model_hospital_dummy, newdata= h_df, interval= "prediction" ) print (PI3) ## fit lwr upr ## 1 16102.53 15175.04 17030.01 PI3[ 3 ] - PI3[ 2 ] ## [1] 1854.973 n = 17, No Dummy (original model): 17618.15 – 14510.96 = 3,107.19 n = 16, No Dummy (removde Hospital 14): 16886.26 – 14906.24 = 1980.02 n = 17, Dummy: 17030.01 – 15175.04 = 1854.97 Model with the dummy variable for all 17 hospitals gives the shortest PI. f. summary (model_hospital) ## ## Call: ## lm(formula = Hours ~ Xray + BedDays + Length, data = input) ## ## Residuals: 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## Min 1Q Median 3Q Max ## -687.40 -380.60 -25.03 281.91 1630.50 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1523.38924 786.89772 1.936 0.0749 . ## Xray 0.05299 0.02009 2.637 0.0205 * ## BedDays 0.97848 0.10515 9.305 4.12e-07 *** ## Length -320.95083 153.19222 -2.095 0.0563 . ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 614.8 on 13 degrees of freedom ## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9878 ## F-statistic: 432 on 3 and 13 DF, p-value: 2.894e-13 summary (model_hospital_ 14 ) ## ## Call: ## lm(formula = Hours ~ Xray + BedDays + Length, data = input_minus_14) ## ## Residuals: ## Min 1Q Median 3Q Max ## -677.23 -270.19 60.93 228.32 517.70 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1946.80204 504.18193 3.861 0.00226 ** ## Xray 0.03858 0.01304 2.958 0.01197 * ## BedDays 1.03939 0.06756 15.386 2.91e-09 *** ## Length -413.75780 98.59828 -4.196 0.00124 ** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 387.2 on 12 degrees of freedom ## Multiple R-squared: 0.9961, Adjusted R-squared: 0.9952 ## F-statistic: 1028 on 3 and 12 DF, p-value: 9.919e-15 summary (model_hospital_dummy) ## ## Call: ## lm(formula = Hours ~ Xray + BedDays + Length + DL) ## ## Residuals: ## Min 1Q Median 3Q Max ## -485.91 -204.70 68.77 183.74 727.16 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2462.21640 501.98970 4.905 0.000363 *** ## Xray 0.04816 0.01193 4.037 0.001649 ** ## BedDays 0.78432 0.07331 10.699 1.72e-07 *** ## Length -432.40947 93.35426 -4.632 0.000578 *** 10
## DL 2871.78284 573.06176 5.011 0.000304 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 363.9 on 12 degrees of freedom ## Multiple R-squared: 0.9968, Adjusted R-squared: 0.9957 ## F-statistic: 931.2 on 4 and 12 DF, p-value: 7.656e-15 The best model for evaluating hospitals appears to be: y = β 0 + β 1 · Xray + β 2 · BedDays + β 3 · Length + β 4 · D L + ε using estimation from all 17 hospitals. It has small p-values (<0.01) for all independent variables. It has the largest adjusted R 2 (0 . 9957) . It has the shortest Prediction Interval and the smallest s . The influence of the individual hospitals on the estimates is relatively low. 11