file_show_2 (1)

docx

School

University of Illinois, Urbana Champaign *

*We aren’t endorsed by this school

Course

100

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by DukeHornetMaster829

HW7 Aamir Hafiz 2022-11-10 fish <- read.table ( "fish.txt" , stringsAsFactors = TRUE ) colnames (fish) <- c ( "age" , "length" ) #head(fish) fish.lm <- lm (length ~ ., data = fish) summary (fish.lm) ## ## Call: ## lm(formula = length ~ ., data = fish) ## ## Residuals: ## Min 1Q Median 3Q Max ## -26.523 -7.586 0.258 10.102 20.414 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 62.649 5.755 10.89 <2e-16 *** ## age 22.312 1.537 14.51 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 12.51 on 76 degrees of freedom ## Multiple R-squared: 0.7349, Adjusted R-squared: 0.7314 ## F-statistic: 210.7 on 1 and 76 DF, p-value: < 2.2e-16 plot (fish $ age,fish $ length) abline (fish.lm)

An R squared value nearing .75 is fairly accurate. The line seems to predict the length fairly well. Part 2 #fit an appropriate polynomial to the data # I(price^2) fish.lm_two <- lm (length ~ . + I (age ^ 2 ), data = fish) summary (fish.lm_two) ## ## Call: ## lm(formula = length ~ . + I(age^2), data = fish) ## ## Residuals: ## Min 1Q Median 3Q Max ## -19.846 -8.321 -1.137 6.698 22.098 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 13.622 11.016 1.237 0.22 ## age 54.049 6.489 8.330 2.81e-12 *** ## I(age^2) -4.719 0.944 -4.999 3.67e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.91 on 75 degrees of freedom ## Multiple R-squared: 0.8011, Adjusted R-squared: 0.7958 ## F-statistic: 151.1 on 2 and 75 DF, p-value: < 2.2e-16

#fit an appropriate polynomial to the data fish.lm_three <- lm (length ~ . + I (age ^ 2 ) + I (age ^ 3 ), data = fish) summary (fish.lm_three) ## ## Call: ## lm(formula = length ~ . + I(age^2) + I(age^3), data = fish) ## ## Residuals: ## Min 1Q Median 3Q Max ## -20.0773 -8.1971 -0.6971 6.7217 22.1300 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.8101 21.7690 0.451 0.65356 ## age 58.1936 21.3868 2.721 0.00811 ** ## I(age^2) -6.0358 6.5417 -0.923 0.35918 ## I(age^3) 0.1279 0.6284 0.203 0.83930 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 10.98 on 74 degrees of freedom ## Multiple R-squared: 0.8012, Adjusted R-squared: 0.7932 ## F-statistic: 99.44 on 3 and 74 DF, p-value: < 2.2e-16 We see that the 3rd order model has a 3rd order term with p-value equal to 0.83930, which is higher than 5%, so we conclude that the optimal order for the polynomial, according to the forward selection method is d=2. So, the fitted model is: y_hat_i = B_0_hat + B_1_hat * x + B_2_hat * x^2 In this example, this comes out to: length_i = 9.8101 + 54.049 * age_i + -4.719 * age_i^2. The new model has an R-squared value of .8, which is better than the original R_squared value of .73. Problem 2 prostate <- read.table ( "prostate.txt" , stringsAsFactors = TRUE ) names (prostate) = c ( "ID" , "lpsa" , "cancervolume" , "weight" , "age" , "hyperplasia" , "svi" , "cp" , "gleason" ) head (prostate) ## ID lpsa cancervolume weight age hyperplasia svi cp gleason ## 1 1 0.651 0.5599 15.959 50 0 0 0 6 ## 2 2 0.852 0.3716 27.660 58 0 0 0 7 ## 3 3 0.852 0.6005 14.732 74 0 0 0 7 ## 4 4 0.852 0.3012 26.576 58 0 0 0 6 ## 5 5 1.448 2.1170 30.877 62 0 0 0 6 ## 6 6 2.160 0.3499 25.280 50 0 0 0 6 Ancova

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

prostate.lm = lm (lpsa ~ cancervolume + svi + cancervolume : svi, prostate) summary (prostate.lm) ## ## Call: ## lm(formula = lpsa ~ cancervolume + svi + cancervolume:svi, data = prostate) ## ## Residuals: ## Min 1Q Median 3Q Max ## -66.314 -6.675 -1.856 5.945 167.930 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.0917 4.7810 1.274 0.2058 ## cancervolume 1.3832 0.7076 1.955 0.0536 . ## svi 4.3081 13.3074 0.324 0.7469 ## cancervolume:svi 2.0700 0.9735 2.126 0.0361 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 30.52 on 93 degrees of freedom ## Multiple R-squared: 0.4574, Adjusted R-squared: 0.4399 ## F-statistic: 26.13 on 3 and 93 DF, p-value: 2.399e-12 Regression line when svi is 0: y_i = 6.09 + 1.38* x_i Regression line when svi is 1: y_i = (6.09+4.31) + (1.38+2.07)* x_i Ho: Model without interaction is adequate Ha: Ho is false anova (prostate.lm) ## Analysis of Variance Table ## ## Response: lpsa ## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 66.7649 1.487e-12 *** ## svi 1 6613 6613 7.0976 0.009099 ** ## cancervolume:svi 1 4212 4212 4.5211 0.036128 * ## Residuals 93 86645 932 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The p value for the interaction term is less that alpha, which is .05. So we reject the null and conclude that the full model is needed (so the interaction term in statistically significant in our model). C) The svi level response varies because when it is 0, the response rate is cancerVolume, but when it is 1 it becomes the sum of the estimated Beta values for cancervolume and the iteration Problem 3

prostate[, 'gleason7' ] = ifelse (prostate[, "gleason" ] == 7 , 1 , 0 ) prostate[, 'gleason8' ] = ifelse (prostate[, "gleason" ] == 8 , 1 , 0 ) #if gleason is 7, make the gleason7 1, if not 0 #same for gleason being 8 prostate.lm_g7_g8 = lm (lpsa ~ cancervolume + gleason7 + gleason8 + cancervolume : gleason7 + cancervolume : gleason8, prostate) summary (prostate.lm_g7_g8) ## ## Call: ## lm(formula = lpsa ~ cancervolume + gleason7 + gleason8 + cancervolume:gleason7 + ## cancervolume:gleason8, data = prostate) ## ## Residuals: ## Min 1Q Median 3Q Max ## -73.556 -6.092 -2.347 3.354 169.779 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.3051 6.7189 0.641 0.523 ## cancervolume 1.4663 0.9338 1.570 0.120 ## gleason7 1.1187 10.4426 0.107 0.915 ## gleason8 6.5061 13.8974 0.468 0.641 ## cancervolume:gleason7 0.4696 1.5775 0.298 0.767 ## cancervolume:gleason8 1.8601 1.1390 1.633 0.106 ## ## Residual standard error: 31.36 on 91 degrees of freedom ## Multiple R-squared: 0.4395, Adjusted R-squared: 0.4087 ## F-statistic: 14.27 on 5 and 91 DF, p-value: 2.656e-10 part b The regression lines are: gleason is 6: y_i = 4.305 + 1466x_i gleason is 7: 4.305+ 1.12 + (1.466 + .47)* x_i gleason is 8: 4.305 + 6.506 + (1.466 + 1.86) * x_i part c Do a backward elimination. We will be taking out the least significant predictor, in order of interaction, then categorical, then continuos. After taking out the least significant predictor, we will test all the remaining predictors and repeat until we have all significant predictors. prostate.lm_elim = prostate.lm_g7_g8 anova (prostate.lm_elim) ## Analysis of Variance Table ## ## Response: lpsa ## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 63.2448 4.799e-12 *** ## gleason7 1 401 401 0.4075 0.52484

## gleason8 1 4631 4631 4.7083 0.03262 * ## cancervolume:gleason7 1 315 315 0.3203 0.57285 ## cancervolume:gleason8 1 2623 2623 2.6668 0.10592 ## Residuals 91 89500 984 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Start by taking the least significant term interaction terms, which is cancervolume:gleason7. We will take it out prostate.lm_elim = lm (lpsa ~ cancervolume + gleason7 + gleason8 + cancervolume : gleason8, prostate) anova (prostate.lm_elim) ## Analysis of Variance Table ## ## Response: lpsa ## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 63.8776 3.736e-12 *** ## gleason7 1 401 401 0.4116 0.52276 ## gleason8 1 4631 4631 4.7554 0.03176 * ## cancervolume:gleason8 1 2851 2851 2.9275 0.09045 . ## Residuals 92 89587 974 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We still have insignificant interaction terms, keep removing. We will remove cancer:8 interaction prostate.lm_elim = lm (lpsa ~ cancervolume + gleason7 + gleason8, data = prostate) anova (prostate.lm_elim) ## Analysis of Variance Table ## ## Response: lpsa ## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 62.5806 5.203e-12 *** ## gleason7 1 401 401 0.4032 0.52698 ## gleason8 1 4631 4631 4.6588 0.03347 * ## Residuals 93 92438 994 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Now move onto categorical variables. Gleason 7 is inadequate because it’s value is .5 > .05 prostate.lm_elim = lm (lpsa ~ cancervolume + gleason8, data = prostate) anova (prostate.lm_elim) ## Analysis of Variance Table ## ## Response: lpsa

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 63.1722 4.102e-12 *** ## gleason8 1 4912 4912 4.9891 0.02788 * ## Residuals 94 92557 985 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We have taken out all the non significant interaction terms, categorical variables,and continuos terms, in that order. This is our final model.

file_show_2 (1)

Related Documents