Lichen_Jiang_3

docx

School

Johns Hopkins University *

*We aren’t endorsed by this school

Course

510.650

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

6

Uploaded by DoctorArt6178

Report
#Q1(a) set.seed(100) x = rnorm(100) x set.seed(200) e = rnorm(100) e #Q1(b) y = 3+2*x+x^2+0.5*x^3+e y #Q1(c) x2 = x^2 x3 = x^3 x4 = x^4 x5 = x^5 x6 = x^6 x7 = x^7 x8 = x^8 x9 = x^9 x10 = x^10 data_1= data.frame(x, x2, x3, x4,x5, x6, x7, x8, x9, x10, y) best_q1 = regsubsets(y ~ ., data = data_1, nvmax = 10) # Cp coef(best_q1, which.min(summary(best_q1)$rsq)) According to Cp, I can find that the best one is y = 3.957865+1.088975*x^3 plot(summary(best_q1)$rsq) # BIC coef(best_q1, which.min(summary(best_q1)$bic)) According to BIC, I can find that the best one is: y = 2.9949297+2.2118204*x+1.0370719*x^2+0.4715355*x^3
plot(summary(best_q1)$bic) # R^2 coef(best_q1, which.max(summary(best_q1)$adjr2)) plot(summary(best_q1)$adjr2) #Q1(d)forward best_q1_f = regsubsets(y ~ ., data = data_1, nvmax = 10,method = "forward") # Cp coef(best_q1_f), which.min(summary(best_q1_f)$rsq)) There’s no difference between the original method. plot(summary(best_q1_f)$rsq)
# BIC coef(best_q1_f), which.min(summary(best_q1_f)$bic)) There’s no difference between forward method and original method. plot(summary(best_q1_f)$bic) # R^2 coef(best_q1_f), which.max(summary(best_q1_f)$adjr2)) There’s some differences between forward method and original method. As the best method just has 8 parameters. plot(summary(best_q1_f)$adjr2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
#Q1(d)backward best_q1_b = regsubsets(y ~ ., data = data_1, nvmax = 10,method = "backward") # Cp coef(best_q1_b, which.min(summary(best_q1_b)$rsq)) plot(summary(best_q1_b)$rsq) # BIC coef(best_q1_b, which.min(summary(best_q1_b)$bic)) The parameters have changed. And the number of parameters have increased. plot(summary(best_q1_b)$bic) # R^2 coef(best_q1_b, which.max(summary(best_q1_b)$adjr2)) There’s no difference between back forward method and original method. plot(summary(best_q1_b)$adjr2)
#Q2(a) library("MASS") head(Boston) reg.boston = regsubsets(medv~., data = Boston,nvmax = 13) summary(reg.boston) #Q2(b) Because of the result of summary(reg.boston), we should choose rm, ptratio and lstat. lm.boston = lm(medv~rm+ptratio+lstat,data = Boston) summary(lm.boston) The model is significant, about 68% variability can be explained by this linear model. #Q2(b) reg.boston.b = regsubsets(medv~., data = Boston,nvmax = 13,method = "backward") reg.boston.f = regsubsets(medv~., data = Boston,nvmax = 13,method = "forward") coef(reg.boston.b,7)
coef(reg.boston.f,7) coef(reg.boston,7) summary(reg.boston)$rsq[7] summary(reg.boston.f)$rsq[7] summary(reg.boston.b)$rsq[7] The best subset of the three methods are the same. While R 2 of backward are less than others, meaning that it loses much in terms of proportion of variability explained than others.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help