file_show_2 (1)
docx
keyboard_arrow_up
School
University of Illinois, Urbana Champaign *
*We aren’t endorsed by this school
Course
100
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
7
Uploaded by DukeHornetMaster829
HW7
Aamir Hafiz
2022-11-10
fish <-
read.table
(
"fish.txt"
,
stringsAsFactors =
TRUE
)
colnames
(fish) <-
c
(
"age"
,
"length"
)
#head(fish)
fish.lm <-
lm
(length
~
.,
data =
fish)
summary
(fish.lm)
## ## Call:
## lm(formula = length ~ ., data = fish)
## ## Residuals:
## Min 1Q Median 3Q Max ## -26.523 -7.586 0.258 10.102 20.414 ## ## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 62.649 5.755 10.89 <2e-16 ***
## age 22.312 1.537 14.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## Residual standard error: 12.51 on 76 degrees of freedom
## Multiple R-squared: 0.7349, Adjusted R-squared: 0.7314 ## F-statistic: 210.7 on 1 and 76 DF, p-value: < 2.2e-16
plot
(fish
$
age,fish
$
length)
abline
(fish.lm)
An R squared value nearing .75 is fairly accurate. The line seems to predict the length fairly well.
Part 2
#fit an appropriate polynomial to the data
# I(price^2) fish.lm_two <-
lm
(length
~
. +
I
(age
^
2
),
data =
fish)
summary
(fish.lm_two)
## ## Call:
## lm(formula = length ~ . + I(age^2), data = fish)
## ## Residuals:
## Min 1Q Median 3Q Max ## -19.846 -8.321 -1.137 6.698 22.098 ## ## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 13.622 11.016 1.237 0.22 ## age 54.049 6.489 8.330 2.81e-12 ***
## I(age^2) -4.719 0.944 -4.999 3.67e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## Residual standard error: 10.91 on 75 degrees of freedom
## Multiple R-squared: 0.8011, Adjusted R-squared: 0.7958 ## F-statistic: 151.1 on 2 and 75 DF, p-value: < 2.2e-16
#fit an appropriate polynomial to the data
fish.lm_three <-
lm
(length
~
. +
I
(age
^
2
) +
I
(age
^
3
),
data =
fish)
summary
(fish.lm_three)
## ## Call:
## lm(formula = length ~ . + I(age^2) + I(age^3), data = fish)
## ## Residuals:
## Min 1Q Median 3Q Max ## -20.0773 -8.1971 -0.6971 6.7217 22.1300 ## ## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.8101 21.7690 0.451 0.65356 ## age 58.1936 21.3868 2.721 0.00811 **
## I(age^2) -6.0358 6.5417 -0.923 0.35918 ## I(age^3) 0.1279 0.6284 0.203 0.83930 ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## Residual standard error: 10.98 on 74 degrees of freedom
## Multiple R-squared: 0.8012, Adjusted R-squared: 0.7932 ## F-statistic: 99.44 on 3 and 74 DF, p-value: < 2.2e-16
We see that the 3rd order model has a 3rd order term with
p-value equal to 0.83930, which is higher than 5%, so we conclude that the optimal order for the polynomial, according to the forward selection method is
d=2. So, the fitted model is: y_hat_i = B_0_hat + B_1_hat * x + B_2_hat * x^2 In this example, this comes out to: length_i = 9.8101 + 54.049 * age_i + -4.719 * age_i^2. The new model has an R-squared value of .8, which is better than the original R_squared value of .73.
Problem 2
prostate <-
read.table
(
"prostate.txt"
,
stringsAsFactors =
TRUE
)
names
(prostate) =
c
(
"ID"
, "lpsa"
, "cancervolume"
,
"weight"
, "age"
, "hyperplasia"
,
"svi"
, "cp"
, "gleason"
)
head
(prostate)
## ID lpsa cancervolume weight age hyperplasia svi cp gleason
## 1 1 0.651 0.5599 15.959 50 0 0 0 6
## 2 2 0.852 0.3716 27.660 58 0 0 0 7
## 3 3 0.852 0.6005 14.732 74 0 0 0 7
## 4 4 0.852 0.3012 26.576 58 0 0 0 6
## 5 5 1.448 2.1170 30.877 62 0 0 0 6
## 6 6 2.160 0.3499 25.280 50 0 0 0 6
Ancova
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
prostate.lm =
lm
(lpsa ~
cancervolume +
svi +
cancervolume
:
svi, prostate)
summary
(prostate.lm)
## ## Call:
## lm(formula = lpsa ~ cancervolume + svi + cancervolume:svi, data = prostate)
## ## Residuals:
## Min 1Q Median 3Q Max ## -66.314 -6.675 -1.856 5.945 167.930 ## ## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.0917 4.7810 1.274 0.2058 ## cancervolume 1.3832 0.7076 1.955 0.0536 .
## svi 4.3081 13.3074 0.324 0.7469 ## cancervolume:svi 2.0700 0.9735 2.126 0.0361 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## Residual standard error: 30.52 on 93 degrees of freedom
## Multiple R-squared: 0.4574, Adjusted R-squared: 0.4399 ## F-statistic: 26.13 on 3 and 93 DF, p-value: 2.399e-12
Regression line when svi is 0: y_i = 6.09 + 1.38* x_i Regression line when svi is 1: y_i = (6.09+4.31) + (1.38+2.07)* x_i Ho: Model without interaction is adequate Ha: Ho is false
anova
(prostate.lm)
## Analysis of Variance Table
## ## Response: lpsa
## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 66.7649 1.487e-12 ***
## svi 1 6613 6613 7.0976 0.009099 ** ## cancervolume:svi 1 4212 4212 4.5211 0.036128 * ## Residuals 93 86645 932 ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p value for the interaction term is less that alpha, which is .05. So we reject the null and
conclude that the full model is needed (so the interaction term in statistically significant in our model).
C)
The svi level response varies because when it is 0, the response rate is cancerVolume, but when it is 1 it becomes the sum of the estimated Beta values for cancervolume and the iteration
Problem 3
prostate[,
'gleason7'
] =
ifelse
(prostate[,
"gleason"
] ==
7
, 1
,
0
)
prostate[,
'gleason8'
] =
ifelse
(prostate[,
"gleason"
] ==
8
, 1
,
0
)
#if gleason is 7, make the gleason7 1, if not 0
#same for gleason being 8
prostate.lm_g7_g8 =
lm
(lpsa ~
cancervolume +
gleason7 +
gleason8 + cancervolume
:
gleason7 +
cancervolume
:
gleason8, prostate)
summary
(prostate.lm_g7_g8)
## ## Call:
## lm(formula = lpsa ~ cancervolume + gleason7 + gleason8 + cancervolume:gleason7 + ## cancervolume:gleason8, data = prostate)
## ## Residuals:
## Min 1Q Median 3Q Max ## -73.556 -6.092 -2.347 3.354 169.779 ## ## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.3051 6.7189 0.641 0.523
## cancervolume 1.4663 0.9338 1.570 0.120
## gleason7 1.1187 10.4426 0.107 0.915
## gleason8 6.5061 13.8974 0.468 0.641
## cancervolume:gleason7 0.4696 1.5775 0.298 0.767
## cancervolume:gleason8 1.8601 1.1390 1.633 0.106
## ## Residual standard error: 31.36 on 91 degrees of freedom
## Multiple R-squared: 0.4395, Adjusted R-squared: 0.4087 ## F-statistic: 14.27 on 5 and 91 DF, p-value: 2.656e-10
part b The regression lines are: gleason is 6: y_i = 4.305 + 1466x_i
gleason is 7: 4.305+ 1.12 + (1.466 + .47)* x_i
gleason is 8: 4.305 + 6.506 + (1.466 + 1.86) * x_i
part c Do a backward elimination. We will be taking out the least significant predictor, in order of interaction, then categorical, then continuos. After taking out the least significant predictor, we will test all the remaining predictors and repeat until we have all significant predictors.
prostate.lm_elim =
prostate.lm_g7_g8
anova
(prostate.lm_elim)
## Analysis of Variance Table
## ## Response: lpsa
## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 63.2448 4.799e-12 ***
## gleason7 1 401 401 0.4075 0.52484
## gleason8 1 4631 4631 4.7083 0.03262 * ## cancervolume:gleason7 1 315 315 0.3203 0.57285 ## cancervolume:gleason8 1 2623 2623 2.6668 0.10592 ## Residuals 91 89500 984 ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Start by taking the least significant term interaction terms, which is cancervolume:gleason7. We will take it out
prostate.lm_elim =
lm
(lpsa
~
cancervolume +
gleason7 +
gleason8 +
cancervolume
:
gleason8, prostate)
anova
(prostate.lm_elim)
## Analysis of Variance Table
## ## Response: lpsa
## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 63.8776 3.736e-12 ***
## gleason7 1 401 401 0.4116 0.52276 ## gleason8 1 4631 4631 4.7554 0.03176 * ## cancervolume:gleason8 1 2851 2851 2.9275 0.09045 . ## Residuals 92 89587 974 ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We still have insignificant interaction terms, keep removing. We will remove cancer:8 interaction
prostate.lm_elim =
lm
(lpsa ~
cancervolume +
gleason7 +
gleason8, data =
prostate)
anova
(prostate.lm_elim)
## Analysis of Variance Table
## ## Response: lpsa
## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 62.5806 5.203e-12 ***
## gleason7 1 401 401 0.4032 0.52698 ## gleason8 1 4631 4631 4.6588 0.03347 * ## Residuals 93 92438 994 ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now move onto categorical variables. Gleason 7 is inadequate because it’s value is .5 > .05
prostate.lm_elim =
lm
(lpsa ~
cancervolume +
gleason8, data =
prostate)
anova
(prostate.lm_elim)
## Analysis of Variance Table
## ## Response: lpsa
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
## Df Sum Sq Mean Sq F value Pr(>F) ## cancervolume 1 62202 62202 63.1722 4.102e-12 ***
## gleason8 1 4912 4912 4.9891 0.02788 * ## Residuals 94 92557 985 ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We have taken out all the non significant interaction terms, categorical variables,and continuos terms, in that order. This is our final model.
Related Documents
Recommended textbooks for you
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL