a6-solution

.pdf

School

Rumson Fair Haven Reg H *

*We aren’t endorsed by this school

Course

101

Subject

Statistics

Date

Nov 24, 2024

Type

pdf

Pages

7

Uploaded by CoachRiverTiger30

Assignment 6: Linear Model Selection SDS293 - Machine Learning Due: 1 November 2017 by 11:59pm Conceptual Exercises 6.8.2 (p. 259 ISLR) For parts each of the following, indicate whether each method is more or less flexible than least squares. Describe how each method’s trade-off between bias and variance impacts its prediction accuracy. Justify your answers. (a) The lasso Solution: Puts a budget constraint on least squares. It is therefore less flexible. The lasso will have improved prediction accuracy when its increase in bias is less than its decrease in variance. (b) Ridge regression Solution: For the same reason as above, this method is also less flexible. Ridge regression will have improved prediction accuracy when its increase in bias is less than its decrease in variance. (c) Non-linear methods (PCR and PLS) Solution: Non-linear methods are more flexible and will give improved prediction accuracy when their increase in variance are less than their decrease in bias. 6.8.5 (p. 261) Ridge regression tends to give similar coefficient values to correlated variables, whereas the lasso may give quite different coefficient values to correlated variables. We will now explore this property in a very simple setting. Suppose that n = 2 , p = 2 , x 11 = x 12 , x 21 = x 22 . Furthermore, suppose that y 1 + y 2 = 0 and x 11 + x 21 = 0 and x 12 + x 22 = 0, so that the estimate for the intercept in a least squares, ridge regression, or lasso model is zero: ˆ β 0 = 0. 1
(a) Write out the ridge regression optimization problem in this setting. Solution: In general, Ridge regression optimization looks like: min 1 ...n X i ( y i - ˆ β 0 - 1 ...p X j ˆ β j x j ) 2 + λ 1 ...p X i ˆ β 2 i In this case, ˆ β 0 = 0 and n = p = 2 . So, the optimization simplifies to: min h ( y 1 - ˆ β 1 x 11 - ˆ β 2 x 12 ) 2 + ( y 2 - ˆ β 1 x 21 - ˆ β 2 x 22 ) 2 + λ ( ˆ β 2 1 + ˆ β 2 2 ) i (b) Argue that in this setting, the ridge coefficient estimates satisfy ˆ β 1 = ˆ β 2 . Solution: We know the following: x 11 = x 12 , so we’ll call that x 1 , and x 21 = x 22 , so we’ll call that x 2 . Plugging this into the above, we get: min h ( y 1 - ˆ β 1 x 1 - ˆ β 2 x 1 ) 2 + ( y 2 - ˆ β 1 x 2 - ˆ β 2 x 2 ) 2 + λ ( ˆ β 2 1 + ˆ β 2 2 ) i Taking the partial derivatives of the above with respect to ˆ β 1 and ˆ β 2 and setting them equal to 0 will give us the point at which the function is minimized. Doing this, we find: ˆ β 1 ( x 2 1 + x 2 2 + λ ) + ˆ β 2 ( x 2 1 + x 2 2 ) - y 1 x 1 - y 2 x 2 = 0 and ˆ β 1 ( x 2 1 + x 2 2 ) + ˆ β 2 ( x 2 1 + x 2 2 + λ ) - y 1 x 1 - y 2 x 2 = 0 Since the right-hand side of both equations is identical, we can set the two left-hand sides equal to one another: ˆ β 1 ( x 2 1 + x 2 2 + λ ) + ˆ β 2 ( x 2 1 + x 2 2 ) - y 1 x 1 - y 2 x 2 = ˆ β 1 ( x 2 1 + x 2 2 ) + ˆ β 2 ( x 2 1 + x 2 2 + λ ) - y 1 x 1 - y 2 x 2 and then cancel out common terms: ˆ β 1 ( x 2 1 + x 2 2 + λ ) + ˆ β 2 ( x 2 1 + x 2 2 ) - y 1 x 1 - y 2 x 2 = ˆ β 1 ( x 2 1 + x 2 2 ) + ˆ β 2 ( x 2 1 + x 2 2 + λ ) - y 1 x 1 - y 2 x 2 ˆ β 1 ( x 2 1 + x 2 2 ) + ˆ β 1 λ + ˆ β 2 ( x 2 1 + x 2 2 ) = ˆ β 1 ( x 2 1 + x 2 2 ) + ˆ β 2 ( x 2 1 + x 2 2 ) + ˆ β 2 λ ˆ β 1 λ + ˆ β 2 ( x 2 1 + x 2 2 ) = ˆ β 2 ( x 2 1 + x 2 2 ) + ˆ β 2 λ ˆ β 1 λ = ˆ β 2 λ Thus, ˆ β 1 = ˆ β 2 . (c) Write out the lasso optimization problem in this setting. Solution: min h ( y 1 - ˆ β 1 x 11 - ˆ β 2 x 12 ) 2 + ( y 2 - ˆ β 1 x 21 - ˆ β 2 x 22 ) 2 + λ ( | ˆ β 1 | + | ˆ β 2 | ) i 2
(d) Argue that in this setting, the lasso coefficients ˆ β 1 and ˆ β 2 are not unique – in other words, there are many possible solutions to the optimization problem in (c). Describe these solutions. Solution: One way to demonstrate that these solutions are not unique is to make a geometric argument. To make things easier, we’ll use the alternate form of Lasso constraints that we saw in class, namely: | ˆ β 1 | + | ˆ β 2 | < s . If we were to plot these constraints, they take the familiar shape of a diamond centered at the origin (0 , 0) . Next we’ll consider the squared optimization constraint, namely: ( y 1 - ˆ β 1 x 11 - ˆ β 2 x 12 ) 2 + ( y 2 ˆ β 1 x 21 - ˆ β 2 x 22 ) 2 Using the facts we were given regarding the equivalence of many of the variables, we can simplify down to the following optimization: min h 2( y 1 - ( ˆ β 1 + ˆ β 2 ) x 11 i This optimization problem has a minimum at ˆ β 1 + ˆ β 2 = y 1 x 11 , which defines a line parallel to one edge of the Lasso-diamond ˆ β 1 + ˆ β 2 = s . As ˆ β 1 and ˆ β 2 vary along the line ˆ β 1 + ˆ β 2 = y 1 x 11 , these contours touch the Lasso-diamond edge ˆ β 1 + ˆ β 2 = s at different points. As a result, the entire edge ˆ β 1 + ˆ β 2 = s is a potential solution to the Lasso optimization problem! A similar argument holds for the opposite Lasso-diamond edge, defined by: ˆ β 1 + ˆ β 2 = - s . Thus, the Lasso coefficients are not unique. The general form of solution can be given by two line segments: ˆ β 1 + ˆ β 2 = s ; ˆ β 1 0; ˆ β 2 0 and ˆ β 1 + ˆ β 2 = - s; ˆ β 1 0; ˆ β 2 0 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help