a6-solution
.pdf
keyboard_arrow_up
School
Rumson Fair Haven Reg H *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Nov 24, 2024
Type
Pages
7
Uploaded by CoachRiverTiger30
Assignment 6: Linear Model Selection
SDS293 - Machine Learning
Due: 1 November 2017 by 11:59pm
Conceptual Exercises
6.8.2 (p. 259 ISLR)
For parts each of the following, indicate whether each method is
more or less flexible
than least
squares. Describe how each method’s trade-off between bias and variance impacts its prediction
accuracy. Justify your answers.
(a) The lasso
Solution:
Puts a budget constraint on least squares. It is therefore less flexible. The lasso
will have improved prediction accuracy when its increase in bias is less than its decrease in
variance.
(b) Ridge regression
Solution:
For the same reason as above, this method is also less flexible. Ridge regression
will have improved prediction accuracy when its increase in bias is less than its decrease in
variance.
(c) Non-linear methods (PCR and PLS)
Solution:
Non-linear methods are more flexible and will give improved prediction accuracy
when their increase in variance are less than their decrease in bias.
6.8.5 (p. 261)
Ridge regression tends to give similar coefficient values to correlated variables, whereas the lasso
may give quite different coefficient values to correlated variables. We will now explore this property
in a very simple setting.
Suppose that
n
= 2
, p
= 2
, x
11
=
x
12
, x
21
=
x
22
.
Furthermore, suppose that
y
1
+
y
2
= 0 and
x
11
+
x
21
= 0 and
x
12
+
x
22
= 0, so that the estimate for the intercept in a least squares, ridge
regression, or lasso model is zero:
ˆ
β
0
= 0.
1
(a) Write out the ridge regression optimization problem in this setting.
Solution:
In general, Ridge regression optimization looks like:
min
1
...n
X
i
(
y
i
-
ˆ
β
0
-
1
...p
X
j
ˆ
β
j
x
j
)
2
+
λ
1
...p
X
i
ˆ
β
2
i
In this case,
ˆ
β
0
= 0
and
n
=
p
= 2
. So, the optimization simplifies to:
min
h
(
y
1
-
ˆ
β
1
x
11
-
ˆ
β
2
x
12
)
2
+ (
y
2
-
ˆ
β
1
x
21
-
ˆ
β
2
x
22
)
2
+
λ
(
ˆ
β
2
1
+
ˆ
β
2
2
)
i
(b) Argue that in this setting, the ridge coefficient estimates satisfy
ˆ
β
1
=
ˆ
β
2
.
Solution:
We know the following:
x
11
=
x
12
, so we’ll call that
x
1
, and
x
21
=
x
22
, so we’ll
call that
x
2
. Plugging this into the above, we get:
min
h
(
y
1
-
ˆ
β
1
x
1
-
ˆ
β
2
x
1
)
2
+ (
y
2
-
ˆ
β
1
x
2
-
ˆ
β
2
x
2
)
2
+
λ
(
ˆ
β
2
1
+
ˆ
β
2
2
)
i
Taking the partial derivatives of the above with respect to
ˆ
β
1
and
ˆ
β
2
and setting them equal
to 0 will give us the point at which the function is minimized. Doing this, we find:
ˆ
β
1
(
x
2
1
+
x
2
2
+
λ
) +
ˆ
β
2
(
x
2
1
+
x
2
2
)
-
y
1
x
1
-
y
2
x
2
= 0
and
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
+
λ
)
-
y
1
x
1
-
y
2
x
2
= 0
Since the right-hand side of both equations is identical, we can set the two left-hand sides
equal to one another:
ˆ
β
1
(
x
2
1
+
x
2
2
+
λ
) +
ˆ
β
2
(
x
2
1
+
x
2
2
)
-
y
1
x
1
-
y
2
x
2
=
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
+
λ
)
-
y
1
x
1
-
y
2
x
2
and then cancel out common terms:
ˆ
β
1
(
x
2
1
+
x
2
2
+
λ
) +
ˆ
β
2
(
x
2
1
+
x
2
2
)
-
y
1
x
1
-
y
2
x
2
=
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
+
λ
)
-
y
1
x
1
-
y
2
x
2
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
1
λ
+
ˆ
β
2
(
x
2
1
+
x
2
2
) =
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
) +
ˆ
β
2
λ
ˆ
β
1
λ
+
ˆ
β
2
(
x
2
1
+
x
2
2
) =
ˆ
β
2
(
x
2
1
+
x
2
2
) +
ˆ
β
2
λ
ˆ
β
1
λ
=
ˆ
β
2
λ
Thus,
ˆ
β
1
=
ˆ
β
2
.
(c) Write out the lasso optimization problem in this setting.
Solution:
min
h
(
y
1
-
ˆ
β
1
x
11
-
ˆ
β
2
x
12
)
2
+ (
y
2
-
ˆ
β
1
x
21
-
ˆ
β
2
x
22
)
2
+
λ
(
|
ˆ
β
1
|
+
|
ˆ
β
2
|
)
i
2
(d) Argue that in this setting, the lasso coefficients
ˆ
β
1
and
ˆ
β
2
are not unique – in other words,
there are many possible solutions to the optimization problem in (c). Describe these solutions.
Solution:
One way to demonstrate that these solutions are not unique is to make a geometric
argument. To make things easier, we’ll use the alternate form of Lasso constraints that we
saw in class, namely:
|
ˆ
β
1
|
+
|
ˆ
β
2
|
< s
.
If we were to plot these constraints, they take the
familiar shape of a diamond centered at the origin
(0
,
0)
.
Next we’ll consider the squared optimization constraint, namely:
(
y
1
-
ˆ
β
1
x
11
-
ˆ
β
2
x
12
)
2
+ (
y
2
ˆ
β
1
x
21
-
ˆ
β
2
x
22
)
2
Using the facts we were given regarding the equivalence of many of the variables, we can
simplify down to the following optimization:
min
h
2(
y
1
-
(
ˆ
β
1
+
ˆ
β
2
)
x
11
i
This optimization problem has a minimum at
ˆ
β
1
+
ˆ
β
2
=
y
1
x
11
, which defines a line parallel
to one edge of the Lasso-diamond
ˆ
β
1
+
ˆ
β
2
=
s
.
As
ˆ
β
1
and
ˆ
β
2
vary along the line
ˆ
β
1
+
ˆ
β
2
=
y
1
x
11
, these contours touch the Lasso-diamond
edge
ˆ
β
1
+
ˆ
β
2
=
s
at different points. As a result, the entire edge
ˆ
β
1
+
ˆ
β
2
=
s
is a potential
solution to the Lasso optimization problem!
A similar argument holds for the opposite Lasso-diamond edge, defined by:
ˆ
β
1
+
ˆ
β
2
=
-
s
.
Thus, the Lasso coefficients are not unique.
The general form of solution can be given by
two line segments:
ˆ
β
1
+
ˆ
β
2
=
s
;
ˆ
β
1
≥
0;
ˆ
β
2
≥
0 and
ˆ
β
1
+
ˆ
β
2
=
-
s;
ˆ
β
1
≤
0;
ˆ
β
2
≤
0
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
A hiring company considers hiring your professional services to consult with you about the experiment you are conducting. The company is interested in knowing what effect the temperature has on the resistance of the product (higher resistance is better for the product). The experimental data are:
Find the best-fitting least squares line and parabola. Conduct an analysis to choose which of the two models (linear or quadratic) best predicts the behavior of the experiment. Based on what you found in your analysis, could you suggest an optimal operating temperature to the company? Argue your answers.
arrow_forward
Four points have these coordinates:
1
y
5.
4
Find the least-squares line for the data.
y=5+0.6 x
O ŷ=0.6 - 5.5 x
y=5.5-0.6 x
O ŷ=6-5.5 x
3.
3.
4.
arrow_forward
What is the least-squares line approximation to a set of datapoints? How is the line determined?
arrow_forward
terpreting technology: A biologist computed a least-squares regression line for predicting the brain weight in grams of a bird from its body weight in grams.
e results are presented in the following Excel output.
Coefficients
Intercept
3.74691917
Body Weight 0.08444928
Part: 0 / 2
Part 1 of 2
Write the equation of the least-squares regression line. Use the full accuracy shown in the Excel output (do not round your answers).
Regression line equation: y
X
3
arrow_forward
Explain the Formula for the Two Stage Least Squares estimator?
arrow_forward
"ple: Find the linear least squares approximation to the data:
- 1
0.5
Yi
2
4
arrow_forward
True or false If false, explain briefly. a) We choose the linear model that passes through the most data points on the scatterplot. b) The residuals are the observed y-values minus the y-values predicted by the linear model. c) Least squares means that the square of the largest residual is as small as it could possibly be.
arrow_forward
The table gives the weight r (thousands of pounds) and available heat energy y (million BTU) of
a standard cord of various species of wood typically used for heating. Perform a complete
analysis of the data in analogy with the discussion in this section (that is, make a scatter plot, do
preliminary computations, find the least squares regression line, find SSE, se, and r, and so
on). In the hypothesis test, use as the alternative hypothesis B, > 0, and test at the 5% level of
significance. Use confidence level 95% for the confidence interval for B1. Construct 95%
confidence and predictions intervals at xp = 5 at the end.
x 3.37 3.50 4.29 4.00 4.64
y 23.6 17.5 20.1 21.6 28.1
x 4.99 4.94 5.48 3.26 4.16
y 25.3 27.0 30.7 18.9 20.7
arrow_forward
The least squares regression line given by the formula y=9x+7.5 what would the predictied cost be for a family of 8
arrow_forward
i want Solution Part 1, 2
arrow_forward
Discuss the importance of a model being well documented.
arrow_forward
Consider the one-variable regression model Yi = β0 + β1Xi + ui, and suppose it satisfies the least squares assumptions in Key Concept 4.3. Suppose Yi is measured with error, so the data are ?"i = Yi + wi, where wi is the measurement error, which is i.i.d. and independent of Yi. Consider the population regression ?"i = β0 + β1Xi + vi, where vi is the regression error, using the mismeasured dependent variable ?"i.a) Showthatvi =ui +wi.b) Show that the regression ?"i = β0 + β1Xi + vi satisfies the least squares assumptions in KeyConcept 4.3. (Assume that wi is independent of Yj and Xj for all values of i and j and has afinite fourth moment.)c) Are the OLS estimators consistent?
part b in another question
arrow_forward
What is the least squares prediction equation?
y=enter your response here+enter your response here x1+enter your response here x2
(Type integers or decimals.)
arrow_forward
What level of measurement must the data be in order to perform the least squares regression analysis?
arrow_forward
How many indicator variables would be included in the model in order to prevent the least squares estimation from failing?
arrow_forward
Pls do this fast in 10 min i will give you 3 likes.. try to do this in typed form.
-interpret the variables shown?
-is the entire model significant and why?
-what kind of tests are being used to show the model is significant or not ?
arrow_forward
Interpreting technology: A biologist computed a least-squares regression line for predicting the brain weight in grams of a bird from its body weight in grams.
The results are presented in the following Excel output.
Coefficients
Intercept
3.68952298
Body Weight 0.07800925
Part: 0 / 2
Part 1 of 2
Write the equation of the least-squares regression line. Use the full accuracy shown in the Excel output (do not round your answers).
Regression line equation: y =
X
arrow_forward
TOPIC: Linear Regression
please answer, will upvote your solutions.
arrow_forward
Using technology, what is the slope of the least-squares regression line and what is its interpretation?
The slope is 1.98, which means for each additional inch in height, the child’s weight will increase by 1.98 pounds.
arrow_forward
What is the solution.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning
Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning
Related Questions
- A hiring company considers hiring your professional services to consult with you about the experiment you are conducting. The company is interested in knowing what effect the temperature has on the resistance of the product (higher resistance is better for the product). The experimental data are: Find the best-fitting least squares line and parabola. Conduct an analysis to choose which of the two models (linear or quadratic) best predicts the behavior of the experiment. Based on what you found in your analysis, could you suggest an optimal operating temperature to the company? Argue your answers.arrow_forwardFour points have these coordinates: 1 y 5. 4 Find the least-squares line for the data. y=5+0.6 x O ŷ=0.6 - 5.5 x y=5.5-0.6 x O ŷ=6-5.5 x 3. 3. 4.arrow_forwardWhat is the least-squares line approximation to a set of datapoints? How is the line determined?arrow_forward
- terpreting technology: A biologist computed a least-squares regression line for predicting the brain weight in grams of a bird from its body weight in grams. e results are presented in the following Excel output. Coefficients Intercept 3.74691917 Body Weight 0.08444928 Part: 0 / 2 Part 1 of 2 Write the equation of the least-squares regression line. Use the full accuracy shown in the Excel output (do not round your answers). Regression line equation: y X 3arrow_forwardExplain the Formula for the Two Stage Least Squares estimator?arrow_forward"ple: Find the linear least squares approximation to the data: - 1 0.5 Yi 2 4arrow_forward
- True or false If false, explain briefly. a) We choose the linear model that passes through the most data points on the scatterplot. b) The residuals are the observed y-values minus the y-values predicted by the linear model. c) Least squares means that the square of the largest residual is as small as it could possibly be.arrow_forwardThe table gives the weight r (thousands of pounds) and available heat energy y (million BTU) of a standard cord of various species of wood typically used for heating. Perform a complete analysis of the data in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find SSE, se, and r, and so on). In the hypothesis test, use as the alternative hypothesis B, > 0, and test at the 5% level of significance. Use confidence level 95% for the confidence interval for B1. Construct 95% confidence and predictions intervals at xp = 5 at the end. x 3.37 3.50 4.29 4.00 4.64 y 23.6 17.5 20.1 21.6 28.1 x 4.99 4.94 5.48 3.26 4.16 y 25.3 27.0 30.7 18.9 20.7arrow_forwardThe least squares regression line given by the formula y=9x+7.5 what would the predictied cost be for a family of 8arrow_forward
- i want Solution Part 1, 2arrow_forwardDiscuss the importance of a model being well documented.arrow_forwardConsider the one-variable regression model Yi = β0 + β1Xi + ui, and suppose it satisfies the least squares assumptions in Key Concept 4.3. Suppose Yi is measured with error, so the data are ?"i = Yi + wi, where wi is the measurement error, which is i.i.d. and independent of Yi. Consider the population regression ?"i = β0 + β1Xi + vi, where vi is the regression error, using the mismeasured dependent variable ?"i.a) Showthatvi =ui +wi.b) Show that the regression ?"i = β0 + β1Xi + vi satisfies the least squares assumptions in KeyConcept 4.3. (Assume that wi is independent of Yj and Xj for all values of i and j and has afinite fourth moment.)c) Are the OLS estimators consistent? part b in another questionarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageTrigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage LearningElementary Linear Algebra (MindTap Course List)AlgebraISBN:9781305658004Author:Ron LarsonPublisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning
Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning