a5-solution

pdf

School

Rumson Fair Haven Reg H *

*We aren’t endorsed by this school

Course

101

Subject

Statistics

Date

Nov 24, 2024

Type

pdf

Pages

Uploaded by CoachRiverTiger30

Assignment 5: Linear Model Selection SDS293 - Machine Learning Due: 24 Oct 2017 by 11:59pm Conceptual Exercises 6.8.1 (p. 259 ISLR) We perform best subset, forward stepwise, and backward stepwise selection on a single data set. For each approach, we obtain p +1 models, containing 0 , 1 , 2 , ..., p predictors. Explain your answers: (a) Which of the three models with k predictors has the smallest training RSS ? Solution: Best subset selection has the smallest training RSS. Both forward and backward selection determine models that depend on which predictors they pick first as they iterate toward the k th model, meaning that a poor choice early on cannot be undone. (b) Which of the three models with k predictors has the smallest test RSS ? Solution: Best subset selection may have the smallest test RSS because it considers more models then the other methods. However, the other models might have better luck picking a model that fits the test data better, as they would be less subject to overfitting. The outcome will depend more heavily on the choice of test set / validation method than on the selection method. (c) True or False: the predictors in Model 1 are a subset of the predictors in Model 2: Model 1 Model 2 T/F i. Forward selection, k variables Forward selection, k + 1 variables True ii. Backward selection, k variables Backward selection, k + 1 variables True iii. Backward selection, k variables Forward selection, k + 1 variables False iv. Forward selection, k variables Backward selection, k + 1 variables False v. Best subset selection, k variables Best subset selection, k + 1 variables False Explain your reasoning. 1

Applied Exercises 6.8.8 parts a-d (p. 262-263 ISLR) In this exercise, we will generate simulated data, and will then use this data to perform best subset selection. (a) Generate a predictor X of length n=100, as well as a noise vector of length n=100. Solution: > set.seed(1) > X=rnorm(100) > eps=rnorm(100) (b) Generate a response vector Y of length n=100 according to the model Y = β 0 + β 1 * X + β 2 * X 2 + β 3 * X 3 + where β 0 , β 1 , β 2 , and β 3 are constants of your choice. Solution: Selecting β 0 = 3 , β 1 = 2 , β 2 = - 3 and β 3 = 0 . 3 : > beta0=3 > beta1=2 > beta2=-3 > beta3=0.3 > Y=beta0 + beta1 * X + beta2 * X ^ 2 + beta3 * X ^ 3 + eps (c) Perform best subset selection in order to choose the best model containing the predictors X, X 2 , ..., X 10 . What is the best model obtained according to Cp, BIC, and adjusted R 2 ? Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained. Solution: > library(leaps) > data.full=data.frame(y=Y, x=X) > mod.full=regsubsets(y ∼ poly(x, 10, raw=T), data=data.full, nvmax=10) > mod.summary=summary(mod.full) # Find the model size for best cp, BIC and adjr2 > min.cp=which.min(mod.summary $ cp) > min.bic=which.min(mod.summary $ bic) > max.adjr2=which.max(mod.summary $ adjr2) # Plot cp, BIC and adjr2 > plot(mod.summary $ cp, xlab="Subset Size", ylab="Cp", pch=20, type="l") > points(min.cp, mod.summary $ cp[min.cp], pch=4, col="red", lwd=7) > plot(mod.summary $ bic, xlab="Subset Size", ylab="BIC", pch=20, type="l") 2

> points(min.bic, mod.summary $ bic[min.bic], pch=4, col="red", lwd=7) > plot(mod.summary $ adjr2, xlab="Subset Size", ylab="adjr2", pch=20, type="l") > points(max.adjr2, mod.summary $ adjr2[max.adjr2], pch=4, col="red", lwd=7) We find that all three criteria (Cp, BIC and Adjusted R2) criteria select 3-variable models. The coefficients of the best 3-variable model are: > coefficients(mod.full, id=3) (Intercept) poly(x, 10, raw=T)1 poly(x, 10, raw=T)2 poly(x, 10, raw=T)7 3.07627412 2.35623596 -3.16514887 0.01046843 (d) Repeat (c), using forward stepwise selection and also using backward stepwise selection. How does your answer compare to the results in (c)? Solution: > mod.fwd=regsubsets(y ∼ poly(x, 10, raw=T), data=data.full, nvmax=10, method="forward") > mod.bwd=regsubsets(y ∼ poly(x, 10, raw=T), data=data.full, nvmax=10, method="backward") > fwd.summary=summary(mod.fwd) > bwd.summary=summary(mod.bwd) # Find best forward-selected model size > min.cp.f=which.min(fwd.summary $ cp) > min.bic.f=which.min(fwd.summary $ bic) > max.adjr2.f=which.max(fwd.summary $ adjr2) # Find best backward-selected model size > min.cp.b=which.min(bwd.summary $ cp) > min.bic.b=which.min(bwd.summary $ bic) > max.adjr2.b=which.max(bwd.summary $ adjr2) # Plot the statistics > par(mfrow=c(3, 2)) # Forward Cp 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

> plot(fwd.summary $ cp, xlab="Subset Size", ylab="Fwd Cp", pch=20, type="l") > points(min.cp.f, fwd.summary $ cp[min.cp.f], pch=4, col="red", lwd=7) # Backward Cp > plot(bwd.summary $ cp, xlab="Subset Size", ylab="Bwd Cp", pch=20, type="l") > points(min.cp.b, bwd.summary $ cp[min.cp.b], pch=4, col="red", lwd=7) # Forward BIC > plot(fwd.summary $ bic, xlab="Subset Size", ylab="Fwd BIC", pch=20, type="l") > points(min.bic.f, fwd.summary $ bic[min.bic.f], pch=4, col="red", lwd=7) # Backward BIC > plot(bwd.summary $ bic, xlab="Subset Size", ylab="Bwd BIC", pch=20, type="l") > points(min.bic.b, bwd.summary $ bic[min.bic.b], pch=4, col="red", lwd=7) # Forward Adj R ^ 2 > plot(fwd.summary $ adjr2, xlab="Subset Size", ylab="Fwd adjr2", pch=20, type="l") > points(max.adjr2.f, fwd.summary $ adjr2[max.adjr2.f], pch=4, col="red", lwd=7) # Backward Adj R ^ 2 > plot(bwd.summary $ adjr2, xlab="Subset Size", ylab="Bwd adjr2", pch=20, type="l") > points(max.adjr2.b, bwd.summary $ adjr2[max.adjr2.b], pch=4, col="red", lwd=7) We see that all statistics pick 3-variable models except backward selection with adjusted R2. Here are the coefficients: > coefficients(mod.fwd, id = 3) 4

(Intercept) poly(x, 10)1 poly(x, 10)2 poly(x, 10)7 3.07627412 2.35623596 -3.16514887 0.01046843 > coefficients(mod.bwd, id = 3) (Intercept) poly(x, 10)1 poly(x, 10)2 poly(x, 10)9 3.078881355 2.419817953 -3.177235617 0.001870457 > coefficients(mod.bwd, id = 4) (Intercept) poly(x, 10)1 poly(x, 10)2 poly(x, 10)4 poly(x, 10)5 3.12902640 2.27105667 -3.32284363 0.04320229 0.05388957 Here forward stepwise picks X7 over X3. Backward stepwise with 3 variables picks X9 while backward stepwise with 4 variables picks X4 and X7. 5