Stats-101A-HW-7
pdf
keyboard_arrow_up
School
University of California, Los Angeles *
*We aren’t endorsed by this school
Course
101A
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
6
Uploaded by ucladsp
Stats 101A HW 7
Ian Zhang UID: 205702810
2023-05-17
Question 1
b
wwh
<-
read.table(
"waistweightheight.txt"
,
header =
TRUE)
m1
<-
lm(Weight ~ Waist + Height,
data =
wwh)
anova(m1)
## Analysis of Variance Table
##
## Response: Weight
##
Df Sum Sq Mean Sq F value
Pr(>F)
## Waist
1 358074
358074 3590.77 < 2.2e-16 ***
## Height
1
29843
29843
299.26 < 2.2e-16 ***
## Residuals 504
50259
100
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
#SYY = SSReg + RSS = 358074 + 29843 + 50259 = 438176
#SSReg = 387917
#RSS = 50259
summary(m1)
##
## Call:
## lm(formula = Weight ~ Waist + Height, data = wwh)
##
## Residuals:
##
Min
1Q
Median
3Q
Max
## -32.760
-6.405
-0.420
5.656
45.474
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept) -165.5332
8.2517
-20.06
<2e-16 ***
## Waist
4.9605
0.1229
40.37
<2e-16 ***
## Height
2.4884
0.1438
17.30
<2e-16 ***
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
##
## Residual standard error: 9.986 on 504 degrees of freedom
## Multiple R-squared:
0.8853, Adjusted R-squared:
0.8848
## F-statistic:
1945 on 2 and 504 DF,
p-value: < 2.2e-16
1
#R-squared = 0.8853
#Adjusted R-squared = 0.8848
#The slope of height means that among people with the same waist size,
#those who are 1 inch taller are an average of 2.4884 pounds heavier
b
set.seed(
23
)
new.df
<-
transform(wwh,
worthless =
rnorm(dim(wwh)[
1
],
0
,
5
))
m2
<-
lm(Weight ~ Waist + Height + worthless,
data =
new.df)
anova(m2)
## Analysis of Variance Table
##
## Response: Weight
##
Df Sum Sq Mean Sq
F value Pr(>F)
## Waist
1 358074
358074 3584.4800 <2e-16 ***
## Height
1
29843
29843
298.7400 <2e-16 ***
## worthless
1
12
12
0.1176 0.7318
## Residuals 503
50247
100
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
#SSReg = 387929
#RSS = 50247
#SYY = 387929 + 50247 = 438176
#The SSReg has increased, but the RSS has decreased
#Overall, SYY has stayed the same.
#SSReg increase means that the model explains more of the total variation.
#RSS decrease means that there is less unexplained variation.
#When SSReg increases and RSS decreases, this means that the model
#has been improved as it explains more variation than before.
summary(m2)
##
## Call:
## lm(formula = Weight ~ Waist + Height + worthless, data = new.df)
##
## Residuals:
##
Min
1Q
Median
3Q
Max
## -32.981
-6.384
-0.350
5.800
45.435
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept) -165.54777
8.25903 -20.044
<2e-16 ***
## Waist
4.95999
0.12300
40.325
<2e-16 ***
## Height
2.48874
0.14397
17.286
<2e-16 ***
## worthless
0.02992
0.08724
0.343
0.732
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
##
2
## Residual standard error: 9.995 on 503 degrees of freedom
## Multiple R-squared:
0.8853, Adjusted R-squared:
0.8846
## F-statistic:
1294 on 3 and 503 DF,
p-value: < 2.2e-16
#R-squared = 0.8852, Adjusted R-squared = 0.8846
#Adjusted R-squared decreased from 0.8848 to 0.8846
set.seed(
23
)
m3
<-
lm(Weight ~ worthless + Waist + Height,
data =
new.df)
anova(m3)
## Analysis of Variance Table
##
## Response: Weight
##
Df Sum Sq Mean Sq
F value Pr(>F)
## worthless
1
58
58
0.5828 0.4456
## Waist
1 358020
358020 3583.9463 <2e-16 ***
## Height
1
29850
29850
298.8086 <2e-16 ***
## Residuals 503
50247
100
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
#SSReg = 387928
#RSS = 50247
#SYY = 387928 + 50247 = 438175
#The SSReg has decreased by 1, RSS stayed the same
#SYY has decreased by 1
#R-squared = 0.8852, Adjusted R-squared = 0.8846
#Adjusted R-squared decreased from 0.8848 to 0.8846
d
Adjusted r-squared is a more reliable guide to determining whether a new variable should be added or not,
because adjusted r-squared only increases if the new variable added improves the model more than would be
expected by chance. This allows you to determine whether a new variable is just noise or actually betters the
model. The adjusted R-squared decreased from 0.8848 to 0.8846. This is a very small difference, but since it
decreased, decreasing adjusted R-squared would indicate that the new worthless variable is not significant.
e
SSReg doesn’t actually tell us whether we should add a new variable. SSReg represents the total variance
and thus, it will always increase when new variables are added regardless of whether those variables actually
add to the model or not.
Partial tests tell us whether the added variables provide more explanatory power which supports the addition
of those variables. Partial tests also take into account the variablilty explained by the variable relative to the
total variability and it also accounts for how many predictors there are in the model. Essentially, it is a test
that allows us to determine whether the new variable added is significant.
Question 2
cars
<-
read.csv(
"cars04.csv"
,
header =
TRUE)
m4
<-
lm(SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders + Horsepower+CityMPG + HighwayMPG +
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
m4$coefficients
## (Intercept)
DealerCost
EngineSize
Cylinders
Horsepower
CityMPG
## 349.9762772
1.0541853 -32.2471983 228.3295245
2.3621206 -16.7423882
##
HighwayMPG
Weight
WheelBase
Length
Width
##
46.7575373
0.6991997
27.0534500
-7.3201885 -84.7084980
Equation:
SuggestedRetailPrice = 349.9762772 + 1.0541853(DealerCost)- 32.2471983(EngineSize) +
228.3295245(Cylinders) + 2.3621206(Horsepower) - 16.7423882(CityMPG) + 46.7575373(HighwayMPG) +
0.6991997(Weight) + 27.0534500(WheelBase) - 7.3201885(Length) - 84.7084980(Width)
#We exclude Vehicle name because the names aren
'
t numerical variables
#and will mess up the final analysis. The names are also not very
#relevant to this regression as they have no impact on the
#suggested retail price of the cars.
summary(m4)
##
## Call:
## lm(formula = SuggestedRetailPrice ~ DealerCost + EngineSize +
##
Cylinders + Horsepower + CityMPG + HighwayMPG + Weight +
##
WheelBase + Length + Width, data = cars)
##
## Residuals:
##
Min
1Q
Median
3Q
Max
## -1403.85
-276.86
-55.03
257.55
2584.11
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept)
349.97628 1461.40052
0.239 0.810953
## DealerCost
1.05418
0.00564 186.923
< 2e-16 ***
## EngineSize
-32.24720
123.05642
-0.262 0.793523
## Cylinders
228.32952
71.99492
3.171 0.001730 **
## Horsepower
2.36212
1.42851
1.654 0.099624 .
## CityMPG
-16.74239
21.46286
-0.780 0.436181
## HighwayMPG
46.75754
24.17910
1.934 0.054403 .
## Weight
0.69920
0.20751
3.370 0.000887 ***
## WheelBase
27.05345
16.36168
1.653 0.099644 .
## Length
-7.32019
7.12296
-1.028 0.305209
## Width
-84.70850
30.21238
-2.804 0.005496 **
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
##
## Residual standard error: 532.3 on 223 degrees of freedom
## Multiple R-squared:
0.9989, Adjusted R-squared:
0.9989
## F-statistic: 2.073e+04 on 10 and 223 DF,
p-value: < 2.2e-16
#Estimated slope = 228.32952
#T-statistic = 3.171
#p-value = 0.001730
#assuming that all necessary model conditions are valid,
#we can conclude that we are able to reject the null hypothesis
#that the slope is 0 as the p-value is 0.00173. This means that
#we can assume that the slope is not 0, which indicates a
4
#linear association. This means that there is enough evidence
#to say that the variable Cylinders contributes significantly
#to the variation in the suggested retail price
am4
<-
anova(m4)
#getting t statistic from anova
sqrt(am4$
`
F value
`
[
3
])
## [1] 3.099739
#t-value from anova is 3.099739
sum4
<-
summary(m4)
#F-statistic from summary = 20730 on 10 and 223 DF
#This is a high number, which essentially means that there
#is strong evidence to support the null hypothesis.
#Essentially this means that we can conclude that at
#least one of the predictor variables has a significant
#impact on the suggested retail price
f
partialm4
<-
lm(SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders + Horsepower + Weight + Wheel
anova(partialm4)
## Analysis of Variance Table
##
## Response: SuggestedRetailPrice
##
Df
Sum Sq
Mean Sq
F value
Pr(>F)
## DealerCost
1 5.8714e+10 5.8714e+10 2.0204e+05 < 2.2e-16 ***
## EngineSize
1 7.7453e+06 7.7453e+06 2.6651e+01 5.353e-07 ***
## Cylinders
1 2.7222e+06 2.7222e+06 9.3670e+00
0.002478 **
## Horsepower
1 7.0394e+05 7.0394e+05 2.4223e+00
0.121028
## Weight
1 5.3446e+05 5.3446e+05 1.8391e+00
0.176418
## WheelBase
1 7.3600e+02 7.3600e+02 2.5000e-03
0.959900
## Length
1 1.4322e+06 1.4322e+06 4.9281e+00
0.027421 *
## Width
1 1.4236e+06 1.4236e+06 4.8985e+00
0.027885 *
## Residuals
225 6.5388e+07 2.9061e+05
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
anova(m4)
## Analysis of Variance Table
##
## Response: SuggestedRetailPrice
##
Df
Sum Sq
Mean Sq
F value
Pr(>F)
## DealerCost
1 5.8714e+10 5.8714e+10 2.0724e+05 < 2.2e-16 ***
## EngineSize
1 7.7453e+06 7.7453e+06 2.7338e+01 3.925e-07 ***
## Cylinders
1 2.7222e+06 2.7222e+06 9.6084e+00
0.002186 **
## Horsepower
1 7.0394e+05 7.0394e+05 2.4847e+00
0.116377
## CityMPG
1 2.1856e+05 2.1856e+05 7.7150e-01
0.380714
## HighwayMPG
1 2.1052e+05 2.1052e+05 7.4310e-01
0.389601
## Weight
1 1.2563e+06 1.2563e+06 4.4344e+00
0.036341 *
5
## WheelBase
1 3.9621e+04 3.9621e+04 1.3990e-01
0.708785
## Length
1 1.6483e+06 1.6483e+06 5.8179e+00
0.016673 *
## Width
1 2.2271e+06 2.2271e+06 7.8611e+00
0.005496 **
## Residuals
223 6.3178e+07 2.8331e+05
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
anova(partialm4, m4)
## Analysis of Variance Table
##
## Model 1: SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders +
##
Horsepower + Weight + WheelBase + Length + Width
## Model 2: SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders +
##
Horsepower + CityMPG + HighwayMPG + Weight + WheelBase +
##
Length + Width
##
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
## 1
225 65387880
## 2
223 63178392
2
2209488 3.8994 0.02165 *
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
The p-value is 0.02165, which is below 0.05, which means that we can reject the null hypothesis. Thus, we
can conclude that the full model is in fact a better fit than the partial model, and that the MPG variables
have a significant impact on the suggested retail price.
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL