a6-solution
pdf
keyboard_arrow_up
School
Rumson Fair Haven Reg H *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Nov 24, 2024
Type
Pages
7
Uploaded by CoachRiverTiger30
Assignment 6: Linear Model Selection
SDS293 - Machine Learning
Due: 1 November 2017 by 11:59pm
Conceptual Exercises
6.8.2 (p. 259 ISLR)
For parts each of the following, indicate whether each method is
more or less flexible
than least
squares. Describe how each method’s trade-off between bias and variance impacts its prediction
accuracy. Justify your answers.
(a) The lasso
Solution:
Puts a budget constraint on least squares. It is therefore less flexible. The lasso
will have improved prediction accuracy when its increase in bias is less than its decrease in
variance.
(b) Ridge regression
Solution:
For the same reason as above, this method is also less flexible. Ridge regression
will have improved prediction accuracy when its increase in bias is less than its decrease in
variance.
(c) Non-linear methods (PCR and PLS)
Solution:
Non-linear methods are more flexible and will give improved prediction accuracy
when their increase in variance are less than their decrease in bias.
6.8.5 (p. 261)
Ridge regression tends to give similar coefficient values to correlated variables, whereas the lasso
may give quite different coefficient values to correlated variables. We will now explore this property
in a very simple setting.
Suppose that
n
= 2
, p
= 2
, x
11
=
x
12
, x
21
=
x
22
.
Furthermore, suppose that
y
1
+
y
2
= 0 and
x
11
+
x
21
= 0 and
x
12
+
x
22
= 0, so that the estimate for the intercept in a least squares, ridge
regression, or lasso model is zero:
ˆ
β
0
= 0.
1
(a) Write out the ridge regression optimization problem in this setting.
Solution:
In general, Ridge regression optimization looks like:
min
1
...n
X
i
(
y
i
-
ˆ
β
0
-
1
...p
X
j
ˆ
β
j
x
j
)
2
+
λ
1
...p
X
i
ˆ
β
2
i
In this case,
ˆ
β
0
= 0
and
n
=
p
= 2
. So, the optimization simplifies to:
min
h
(
y
1
-
ˆ
β
1
x
11
-
ˆ
β
2
x
12
)
2
+ (
y
2
-
ˆ
β
1
x
21
-
ˆ
β
2
x
22
)
2
+
λ
(
ˆ
β
2
1
+
ˆ
β
2
2
)
i
(b) Argue that in this setting, the ridge coefficient estimates satisfy
ˆ
β
1
=
ˆ
β
2
.
Solution:
We know the following:
x
11
=
x
12
, so we’ll call that
x
1
, and
x
21
=
x
22
, so we’ll
call that
x
2
. Plugging this into the above, we get:
min
h
(
y
1
-
ˆ
β
1
x
1
-
ˆ
β
2
x
1
)
2
+ (
y
2
-
ˆ
β
1
x
2
-
ˆ
β
2
x
2
)
2
+
λ
(
ˆ
β
2
1
+
ˆ
β
2
2
)
i
Taking the partial derivatives of the above with respect to
ˆ
β
1
and
ˆ
β
2
and setting them equal
to 0 will give us the point at which the function is minimized. Doing this, we find:
ˆ
β
1
(
x
2
1
+
x
2
2
+
λ
) +
ˆ
β
2
(
x
2
1
+
x
2
2
)
-
y
1
x
1
-
y
2
x
2
= 0
and
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
+
λ
)
-
y
1
x
1
-
y
2
x
2
= 0
Since the right-hand side of both equations is identical, we can set the two left-hand sides
equal to one another:
ˆ
β
1
(
x
2
1
+
x
2
2
+
λ
) +
ˆ
β
2
(
x
2
1
+
x
2
2
)
-
y
1
x
1
-
y
2
x
2
=
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
+
λ
)
-
y
1
x
1
-
y
2
x
2
and then cancel out common terms:
ˆ
β
1
(
x
2
1
+
x
2
2
+
λ
) +
ˆ
β
2
(
x
2
1
+
x
2
2
)
-
y
1
x
1
-
y
2
x
2
=
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
+
λ
)
-
y
1
x
1
-
y
2
x
2
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
1
λ
+
ˆ
β
2
(
x
2
1
+
x
2
2
) =
ˆ
β
1
(
x
2
1
+
x
2
2
) +
ˆ
β
2
(
x
2
1
+
x
2
2
) +
ˆ
β
2
λ
ˆ
β
1
λ
+
ˆ
β
2
(
x
2
1
+
x
2
2
) =
ˆ
β
2
(
x
2
1
+
x
2
2
) +
ˆ
β
2
λ
ˆ
β
1
λ
=
ˆ
β
2
λ
Thus,
ˆ
β
1
=
ˆ
β
2
.
(c) Write out the lasso optimization problem in this setting.
Solution:
min
h
(
y
1
-
ˆ
β
1
x
11
-
ˆ
β
2
x
12
)
2
+ (
y
2
-
ˆ
β
1
x
21
-
ˆ
β
2
x
22
)
2
+
λ
(
|
ˆ
β
1
|
+
|
ˆ
β
2
|
)
i
2
(d) Argue that in this setting, the lasso coefficients
ˆ
β
1
and
ˆ
β
2
are not unique – in other words,
there are many possible solutions to the optimization problem in (c). Describe these solutions.
Solution:
One way to demonstrate that these solutions are not unique is to make a geometric
argument. To make things easier, we’ll use the alternate form of Lasso constraints that we
saw in class, namely:
|
ˆ
β
1
|
+
|
ˆ
β
2
|
< s
.
If we were to plot these constraints, they take the
familiar shape of a diamond centered at the origin
(0
,
0)
.
Next we’ll consider the squared optimization constraint, namely:
(
y
1
-
ˆ
β
1
x
11
-
ˆ
β
2
x
12
)
2
+ (
y
2
ˆ
β
1
x
21
-
ˆ
β
2
x
22
)
2
Using the facts we were given regarding the equivalence of many of the variables, we can
simplify down to the following optimization:
min
h
2(
y
1
-
(
ˆ
β
1
+
ˆ
β
2
)
x
11
i
This optimization problem has a minimum at
ˆ
β
1
+
ˆ
β
2
=
y
1
x
11
, which defines a line parallel
to one edge of the Lasso-diamond
ˆ
β
1
+
ˆ
β
2
=
s
.
As
ˆ
β
1
and
ˆ
β
2
vary along the line
ˆ
β
1
+
ˆ
β
2
=
y
1
x
11
, these contours touch the Lasso-diamond
edge
ˆ
β
1
+
ˆ
β
2
=
s
at different points. As a result, the entire edge
ˆ
β
1
+
ˆ
β
2
=
s
is a potential
solution to the Lasso optimization problem!
A similar argument holds for the opposite Lasso-diamond edge, defined by:
ˆ
β
1
+
ˆ
β
2
=
-
s
.
Thus, the Lasso coefficients are not unique.
The general form of solution can be given by
two line segments:
ˆ
β
1
+
ˆ
β
2
=
s
;
ˆ
β
1
≥
0;
ˆ
β
2
≥
0 and
ˆ
β
1
+
ˆ
β
2
=
-
s;
ˆ
β
1
≤
0;
ˆ
β
2
≤
0
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Applied Exercises
6.8.9 (p. 263 ISLR)
In this exercise, we will predict the number of applications received using the other variables in the
College
data set. For consistency, please use
set.seed(11)
before beginning.
(a) Split the data set into a training set and a test set.
(b) Fit a linear model using least squares on the training set, and report the test error obtained.
(c) Fit a ridge regression model on the training set, with
λ
chosen by cross-validation. Report
the test error obtained.
(d) Fit a lasso model on the training set, with
λ
chosen by cross-validation. Report the test error
obtained, along with the number of non-zero coefficient estimates.
(e) Fit a PCR model on the training set, with
M
chosen by cross-validation.
Report the test
error obtained, along with the value of
M
selected by cross-validation.
(f) Fit a PLS model on the training set, with
M
chosen by cross-validation.
Report the test
error obtained, along with the value of
M
selected by cross-validation.
(g) Comment on the results you obtained. How accurately can we predict the number of college
applications received? Is there much difference among the test errors resulting from these five
approaches?
4
A6 Applied Solutions
6.8.9 (a)
library
(ISLR)
library
(dplyr)
Check to make sure we don’t have any null values
sum
(
is.na
(College))
## [1] 0
Split the data set into a training set and a test set.
set.seed
(
1
)
train = College %>%
sample_frac
(
0.5
)
test = College %>%
setdiff
(train)
6.8.9 (b)
Fit a linear model using least squares on the training set, and report the test error obtained.
lm_fit =
lm
(Apps~.,
data =
train)
lm_pred =
predict
(lm_fit, test)
mean
((test[,
"Apps"
] - lm_pred)^
2
)
## [1] 1108531
6.8.9 (c)
Fit a ridge regression model on the training set, with
λ
chosen by cross-validation. Report the test error
obtained, along with the number of non-zero coefficient estimates.
library
(glmnet)
# Build model matrices for
# test and training data
train_mat =
model.matrix
(Apps~.,
data =
train)
test_mat =
model.matrix
(Apps~.,
data =
test)
# Find best lambda using cross-validation,
# alpha = 0 --> use ridge regression
grid =
10
^
seq
(
4
, -
2
,
length=
100
)
mod_ridge =
cv.glmnet
(train_mat, train[,
"Apps"
],
alpha =
0
,
lambda =
grid,
thresh =
1e-12
)
1
lambda_best_ridge = mod_ridge$lambda.min
# Predict on test data, report error
ridge_pred =
predict
(mod_ridge,
newx =
test_mat,
s =
lambda_best_ridge)
mean
((test[,
"Apps"
] - ridge_pred)^
2
)
## [1] 1108512
6.8.9 (d)
Fit a lasso model on the training set, with
λ
chosen by cross-validation. Report the test error obtained, along
with the number of non-zero coefficient estimates.
# Find best lambda using cross-validation,
# alpha = 1 --> use lasso
mod_lasso =
cv.glmnet
(train_mat, train[,
"Apps"
],
alpha =
1
,
lambda =
grid,
thresh =
1e-12
)
lambda_best_lasso = mod_lasso$lambda.min
# Predict on test data, report error
lasso_pred =
predict
(mod_lasso,
newx =
test_mat,
s =
lambda_best_lasso)
mean
((test[,
"Apps"
] - lasso_pred)^
2
)
## [1] 1028718
predict
(mod_lasso,
newx =
test_mat,
s =
lambda_best_lasso,
type=
"coefficients"
)
## 19 x 1 sparse Matrix of class "dgCMatrix"
##
1
## (Intercept) -4.248125e+02
## (Intercept)
.
## PrivateYes
-4.955003e+02
## Accept
1.540306e+00
## Enroll
-3.900157e-01
## Top10perc
4.779689e+01
## Top25perc
-7.926581e+00
## F.Undergrad -9.846932e-03
## P.Undergrad
.
## Outstate
-5.231286e-02
## Room.Board
1.880308e-01
## Books
1.265938e-03
## Personal
.
## PhD
-4.137294e+00
## Terminal
-3.184316e+00
## S.F.Ratio
.
## perc.alumni -2.181304e+00
## Expend
3.193679e-02
## Grad.Rate
2.877667e+00
6.8.9 (e)
Results for OLS, Lasso, Ridge are comparable. Lasso reduces the
P.Undergrad
,
Personal
and
S.F.Ratio
variables to zero and shrinks coefficients of other variables. Below are the test
R
2
values for all models.
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
test_avg =
mean
(test[,
"Apps"
])
lm_test_r2 =
1
-
mean
((test[,
"Apps"
] - lm_pred)^
2
) /
mean
((test[,
"Apps"
] - test_avg)^
2
)
ridge_test_r2 =
1
-
mean
((test[,
"Apps"
] - ridge_pred)^
2
) /
mean
((test[,
"Apps"
] - test_avg)^
2
)
lasso_test_r2 =
1
-
mean
((test[,
"Apps"
] - lasso_pred)^
2
) /
mean
((test[,
"Apps"
] - test_avg)^
2
)
barplot
(
c
(lm_test_r2,
ridge_test_r2,
lasso_test_r2),
ylim=
c
(
0
,
1
),
names.arg =
c
(
"OLS"
,
"Ridge"
,
"Lasso"
),
main =
"Test R-squared"
)
abline
(
h =
0.9
,
col =
"red"
)
OLS
Ridge
Lasso
Test R-squared
0.0
0.2
0.4
0.6
0.8
1.0
Since the test
R
2
values for all three models are above .90, they all predict the number of college applications
with high accuracy.
3
Related Documents
Related Questions
The following gives the number of accidents that occurred on Florida State Highway 101 during the last 4 months:
Month
Number of Accidents
Jan
25
Feb
45
Mar
60
Apr
100
Using the least-squares regression method, the trend equation for forecasting is (round your responses to two decimal places):
ŷ=+x
arrow_forward
Plz solve it correctly I vill give 4 upvotes.
arrow_forward
A hiring company considers hiring your professional services to consult with you about the experiment you are conducting. The company is interested in knowing what effect the temperature has on the resistance of the product (higher resistance is better for the product). The experimental data are:
Find the best-fitting least squares line and parabola. Conduct an analysis to choose which of the two models (linear or quadratic) best predicts the behavior of the experiment. Based on what you found in your analysis, could you suggest an optimal operating temperature to the company? Argue your answers.
arrow_forward
A large city hospital conducted a study to investigate the relationship between the number of unauthorized days that employees are absent per year and the distance (miles) between home and work for the employees. A sample of 10 employees was selected and the following data were collected.
Develop a scatter diagram for these data. Does a linear relationship appear reasonable? Explain.
Develop the least squares estimated regression equation that relates the distance to work to the number of days absent.
Predict the number of days absent for an employee that lives 5 miles from the hospital.
arrow_forward
st
e this
1:33
38%
K
A pediatrician wants to determine the relation that exists between a child's height, x, and head circumference, y. She
randomly selects 11 children from her practice, measures their heights and head circumferences, and obtains the
accompanying data. Complete parts (a) through (g) below.
Click the icon to view the children's data.
(a) Find the least-squares regression line treating height as the explanatory variable and head circumference as the
response variable.
Data Table
y = X +
(Round the slope to three decimal places and round the constant to one decimal place as needed.)
View an example
Height (inches), X Head Circumference (inches), y D
27.75
17.8
24.75
17.3
25.75
17.4
26.25
17.7
25
17.1
28.25
17.9
26.75
17.5
27
17.7
26
26
27.5
|||
=
Print
Get more help.
(...)
17.5
17.7
17.8
1
LTE2 + ...
Done
O
X
Clear all
Check answer
arrow_forward
What is the least-squares line approximation to a set of datapoints? How is the line determined?
arrow_forward
A student is preparing to take a stand allies exam she was told that she needs to get plenty of sleep the night before the exam she is interested in the relationship between the number of hours of sleep a student gets her for an exam and the score earned on the exam. She collects information from 10 other students who have already taken the exam as shown on the table.
she fits at least squares regression line to the data and determines the equation of the line is why equals 26-0.18 X where why is the score earn on the exam and ask is the number of hours of sleep the night before the exam. The residual is given.
based on the residual plot is the linear model appropriate?
no, there is no clear pattern in the residual plot.
yes, there is no clear pattern in the residual plot.
no, the student who got the most you've had a negative residual
yes, there are more negative residuals (6) then positive residuals (4)
arrow_forward
Please read the question carefully and show all your work. Use the formulas provided in the image.
arrow_forward
the least squares regression line for predicting the temperature from the chirp rate is y=32.298+12.297x. If the two chirp rates differ by 1.5 chirps per second, by how much would the temperature differ?
arrow_forward
The quadratic model for the given data is wrong.
arrow_forward
Select the equation of the least squares line for the data: (51.00, 1.0), (48.75, 2.5), (52.50, .5), (46.50, 5.0), (45.00, 4.5), (41.25, 6.5), (43.50, 5.0).
a) ŷ = -28.956 − 0.54067x
b) ŷ = 28.956 − 0.59474x
c) ŷ = 0.54067x − 28.956
d) ŷ = 31.852 − 0.59474x
e) ŷ = 28.956 − 0.54067x
f) None of the above
arrow_forward
Interpreting technology: The following MINITAB output presents the least-squares regression line for predicting the score on a
final exam from the score on a midterm exam. Predict the final exam score for a student who scored 64 on the midterm. Round the answer to at least two decimal places.
arrow_forward
Part 1 of 3
Compute the least-squares regression Iline for predicting the 2006 budget from the 2004 budget. Round the slope and y-
intercept to at least four decimal places.
Regression line equation:
=1.1306x
14.9597
Part: 1/3
Part 2 of 3
If two institutions have budgets that differ by 20 million dollars in 2004, by how much would you predict their budgets to
differ in 20067 Round the answer to three decimal places.
In 2006, their budgets would differ by
million dollars.
arrow_forward
terpreting technology: A biologist computed a least-squares regression line for predicting the brain weight in grams of a bird from its body weight in grams.
e results are presented in the following Excel output.
Coefficients
Intercept
3.74691917
Body Weight 0.08444928
Part: 0 / 2
Part 1 of 2
Write the equation of the least-squares regression line. Use the full accuracy shown in the Excel output (do not round your answers).
Regression line equation: y
X
3
arrow_forward
Explain the Formula for the Two Stage Least Squares estimator?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning

Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning
Related Questions
- The following gives the number of accidents that occurred on Florida State Highway 101 during the last 4 months: Month Number of Accidents Jan 25 Feb 45 Mar 60 Apr 100 Using the least-squares regression method, the trend equation for forecasting is (round your responses to two decimal places): ŷ=+xarrow_forwardPlz solve it correctly I vill give 4 upvotes.arrow_forwardA hiring company considers hiring your professional services to consult with you about the experiment you are conducting. The company is interested in knowing what effect the temperature has on the resistance of the product (higher resistance is better for the product). The experimental data are: Find the best-fitting least squares line and parabola. Conduct an analysis to choose which of the two models (linear or quadratic) best predicts the behavior of the experiment. Based on what you found in your analysis, could you suggest an optimal operating temperature to the company? Argue your answers.arrow_forward
- A large city hospital conducted a study to investigate the relationship between the number of unauthorized days that employees are absent per year and the distance (miles) between home and work for the employees. A sample of 10 employees was selected and the following data were collected. Develop a scatter diagram for these data. Does a linear relationship appear reasonable? Explain. Develop the least squares estimated regression equation that relates the distance to work to the number of days absent. Predict the number of days absent for an employee that lives 5 miles from the hospital.arrow_forwardst e this 1:33 38% K A pediatrician wants to determine the relation that exists between a child's height, x, and head circumference, y. She randomly selects 11 children from her practice, measures their heights and head circumferences, and obtains the accompanying data. Complete parts (a) through (g) below. Click the icon to view the children's data. (a) Find the least-squares regression line treating height as the explanatory variable and head circumference as the response variable. Data Table y = X + (Round the slope to three decimal places and round the constant to one decimal place as needed.) View an example Height (inches), X Head Circumference (inches), y D 27.75 17.8 24.75 17.3 25.75 17.4 26.25 17.7 25 17.1 28.25 17.9 26.75 17.5 27 17.7 26 26 27.5 ||| = Print Get more help. (...) 17.5 17.7 17.8 1 LTE2 + ... Done O X Clear all Check answerarrow_forwardWhat is the least-squares line approximation to a set of datapoints? How is the line determined?arrow_forward
- A student is preparing to take a stand allies exam she was told that she needs to get plenty of sleep the night before the exam she is interested in the relationship between the number of hours of sleep a student gets her for an exam and the score earned on the exam. She collects information from 10 other students who have already taken the exam as shown on the table. she fits at least squares regression line to the data and determines the equation of the line is why equals 26-0.18 X where why is the score earn on the exam and ask is the number of hours of sleep the night before the exam. The residual is given. based on the residual plot is the linear model appropriate? no, there is no clear pattern in the residual plot. yes, there is no clear pattern in the residual plot. no, the student who got the most you've had a negative residual yes, there are more negative residuals (6) then positive residuals (4)arrow_forwardPlease read the question carefully and show all your work. Use the formulas provided in the image.arrow_forwardthe least squares regression line for predicting the temperature from the chirp rate is y=32.298+12.297x. If the two chirp rates differ by 1.5 chirps per second, by how much would the temperature differ?arrow_forward
- The quadratic model for the given data is wrong.arrow_forwardSelect the equation of the least squares line for the data: (51.00, 1.0), (48.75, 2.5), (52.50, .5), (46.50, 5.0), (45.00, 4.5), (41.25, 6.5), (43.50, 5.0). a) ŷ = -28.956 − 0.54067x b) ŷ = 28.956 − 0.59474x c) ŷ = 0.54067x − 28.956 d) ŷ = 31.852 − 0.59474x e) ŷ = 28.956 − 0.54067x f) None of the abovearrow_forwardInterpreting technology: The following MINITAB output presents the least-squares regression line for predicting the score on a final exam from the score on a midterm exam. Predict the final exam score for a student who scored 64 on the midterm. Round the answer to at least two decimal places.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageTrigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage LearningElementary Linear Algebra (MindTap Course List)AlgebraISBN:9781305658004Author:Ron LarsonPublisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning

Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning