Lichen_Jiang_3
docx
keyboard_arrow_up
School
Johns Hopkins University *
*We aren’t endorsed by this school
Course
510.650
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
6
Uploaded by DoctorArt6178
#Q1(a)
set.seed(100)
x = rnorm(100)
x
set.seed(200)
e = rnorm(100)
e
#Q1(b)
y = 3+2*x+x^2+0.5*x^3+e
y
#Q1(c)
x2 = x^2
x3 = x^3
x4 = x^4
x5 = x^5
x6 = x^6
x7 = x^7
x8 = x^8
x9 = x^9
x10 = x^10
data_1= data.frame(x, x2, x3, x4,x5, x6, x7, x8, x9, x10, y)
best_q1 = regsubsets(y ~ ., data = data_1, nvmax = 10)
# Cp
coef(best_q1, which.min(summary(best_q1)$rsq))
According to Cp, I can find that the best one is y =
3.957865+1.088975*x^3
plot(summary(best_q1)$rsq)
# BIC
coef(best_q1, which.min(summary(best_q1)$bic))
According to BIC, I can find that the best one is:
y = 2.9949297+2.2118204*x+1.0370719*x^2+0.4715355*x^3
plot(summary(best_q1)$bic)
# R^2
coef(best_q1, which.max(summary(best_q1)$adjr2))
plot(summary(best_q1)$adjr2)
#Q1(d)forward
best_q1_f = regsubsets(y ~ ., data = data_1, nvmax = 10,method =
"forward")
# Cp
coef(best_q1_f), which.min(summary(best_q1_f)$rsq))
There’s no difference between the original method.
plot(summary(best_q1_f)$rsq)
# BIC
coef(best_q1_f), which.min(summary(best_q1_f)$bic))
There’s no difference between forward method and original method.
plot(summary(best_q1_f)$bic)
# R^2
coef(best_q1_f), which.max(summary(best_q1_f)$adjr2))
There’s some differences between forward method and original method. As
the best method just has 8 parameters.
plot(summary(best_q1_f)$adjr2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
#Q1(d)backward
best_q1_b = regsubsets(y ~ ., data = data_1, nvmax = 10,method =
"backward")
# Cp
coef(best_q1_b, which.min(summary(best_q1_b)$rsq))
plot(summary(best_q1_b)$rsq)
# BIC
coef(best_q1_b, which.min(summary(best_q1_b)$bic))
The parameters have changed. And the number of parameters have
increased.
plot(summary(best_q1_b)$bic)
# R^2
coef(best_q1_b, which.max(summary(best_q1_b)$adjr2))
There’s no difference between back forward method and original method.
plot(summary(best_q1_b)$adjr2)
#Q2(a)
library("MASS")
head(Boston)
reg.boston = regsubsets(medv~., data = Boston,nvmax = 13)
summary(reg.boston)
#Q2(b)
Because of the result of summary(reg.boston), we should choose rm, ptratio
and lstat.
lm.boston = lm(medv~rm+ptratio+lstat,data = Boston)
summary(lm.boston)
The model is significant, about 68% variability can be explained by this linear
model.
#Q2(b)
reg.boston.b = regsubsets(medv~., data = Boston,nvmax = 13,method =
"backward")
reg.boston.f = regsubsets(medv~., data = Boston,nvmax = 13,method =
"forward")
coef(reg.boston.b,7)
coef(reg.boston.f,7)
coef(reg.boston,7)
summary(reg.boston)$rsq[7]
summary(reg.boston.f)$rsq[7]
summary(reg.boston.b)$rsq[7]
The best subset of the three methods are the same. While R
2
of backward are
less than others, meaning that it loses much in terms of proportion of
variability explained than others.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help