a5-solution
pdf
keyboard_arrow_up
School
Rumson Fair Haven Reg H *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Nov 24, 2024
Type
Pages
5
Uploaded by CoachRiverTiger30
Assignment 5: Linear Model Selection
SDS293 - Machine Learning
Due: 24 Oct 2017 by 11:59pm
Conceptual Exercises
6.8.1 (p. 259 ISLR)
We perform best subset, forward stepwise, and backward stepwise selection on a single data set.
For each approach, we obtain
p
+1 models, containing 0
,
1
,
2
, ..., p
predictors. Explain your answers:
(a) Which of the three models with
k
predictors has the smallest
training RSS
?
Solution:
Best subset selection has the smallest training RSS. Both forward and backward
selection determine models that depend on which predictors they pick first as they iterate
toward the
k
th
model, meaning that a poor choice early on cannot be undone.
(b) Which of the three models with k predictors has the smallest
test RSS
?
Solution:
Best subset selection
may
have the smallest test RSS because it considers more
models then the other methods. However, the other models might have better luck picking a
model that fits the test data better, as they would be less subject to overfitting. The outcome
will depend more heavily on the choice of test set / validation method than on the selection
method.
(c) True or False: the predictors in Model 1
are a subset of
the predictors in Model 2:
Model 1
Model 2
T/F
i.
Forward selection,
k
variables
Forward selection,
k
+ 1 variables
True
ii.
Backward selection,
k
variables
Backward selection,
k
+ 1 variables
True
iii.
Backward selection,
k
variables
Forward selection,
k
+ 1 variables
False
iv.
Forward selection,
k
variables
Backward selection,
k
+ 1 variables
False
v.
Best subset selection,
k
variables
Best subset selection,
k
+ 1 variables
False
Explain your reasoning.
1
Applied Exercises
6.8.8 parts a-d (p. 262-263 ISLR)
In this exercise, we will generate simulated data, and will then use this data to perform best subset
selection.
(a) Generate a predictor
X
of length n=100, as well as a noise vector
of length n=100.
Solution:
> set.seed(1)
> X=rnorm(100)
> eps=rnorm(100)
(b) Generate a response vector
Y
of length n=100 according to the model
Y
=
β
0
+
β
1
*
X
+
β
2
*
X
2
+
β
3
*
X
3
+
where
β
0
,
β
1
,
β
2
, and
β
3
are constants of your choice.
Solution:
Selecting
β
0
= 3
,
β
1
= 2
,
β
2
=
-
3
and
β
3
= 0
.
3
:
> beta0=3
> beta1=2
> beta2=-3
> beta3=0.3
> Y=beta0 + beta1 * X + beta2 * X
^
2 + beta3 * X
^
3 + eps
(c) Perform best subset selection in order to choose the best model containing the predictors
X, X
2
, ..., X
10
.
What is the best model obtained according to Cp, BIC, and adjusted
R
2
?
Show some plots to provide evidence for your answer, and report the coefficients of the best
model obtained.
Solution:
> library(leaps)
> data.full=data.frame(y=Y, x=X)
> mod.full=regsubsets(y
∼
poly(x, 10, raw=T), data=data.full, nvmax=10)
> mod.summary=summary(mod.full)
# Find the model size for best cp, BIC and adjr2
> min.cp=which.min(mod.summary
$
cp)
> min.bic=which.min(mod.summary
$
bic)
> max.adjr2=which.max(mod.summary
$
adjr2)
# Plot cp, BIC and adjr2
> plot(mod.summary
$
cp, xlab="Subset Size", ylab="Cp", pch=20, type="l")
> points(min.cp, mod.summary
$
cp[min.cp], pch=4, col="red", lwd=7)
> plot(mod.summary
$
bic, xlab="Subset Size", ylab="BIC", pch=20, type="l")
2
> points(min.bic, mod.summary
$
bic[min.bic], pch=4, col="red", lwd=7)
> plot(mod.summary
$
adjr2, xlab="Subset Size", ylab="adjr2", pch=20, type="l")
> points(max.adjr2, mod.summary
$
adjr2[max.adjr2], pch=4, col="red", lwd=7)
We find that all three criteria (Cp, BIC and Adjusted R2) criteria select 3-variable models.
The coefficients of the best 3-variable model are:
> coefficients(mod.full, id=3)
(Intercept)
poly(x, 10, raw=T)1
poly(x, 10, raw=T)2
poly(x, 10, raw=T)7
3.07627412
2.35623596
-3.16514887
0.01046843
(d) Repeat (c), using forward stepwise selection and also using backward stepwise selection. How
does your answer compare to the results in (c)?
Solution:
> mod.fwd=regsubsets(y
∼
poly(x, 10, raw=T), data=data.full, nvmax=10, method="forward")
> mod.bwd=regsubsets(y
∼
poly(x, 10, raw=T), data=data.full, nvmax=10, method="backward")
> fwd.summary=summary(mod.fwd)
> bwd.summary=summary(mod.bwd)
# Find best forward-selected model size
> min.cp.f=which.min(fwd.summary
$
cp)
> min.bic.f=which.min(fwd.summary
$
bic)
> max.adjr2.f=which.max(fwd.summary
$
adjr2)
# Find best backward-selected model size
> min.cp.b=which.min(bwd.summary
$
cp)
> min.bic.b=which.min(bwd.summary
$
bic)
> max.adjr2.b=which.max(bwd.summary
$
adjr2)
# Plot the statistics
> par(mfrow=c(3, 2))
# Forward Cp
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
> plot(fwd.summary
$
cp, xlab="Subset Size", ylab="Fwd Cp", pch=20, type="l")
> points(min.cp.f, fwd.summary
$
cp[min.cp.f], pch=4, col="red", lwd=7)
# Backward Cp
> plot(bwd.summary
$
cp, xlab="Subset Size", ylab="Bwd Cp", pch=20, type="l")
> points(min.cp.b, bwd.summary
$
cp[min.cp.b], pch=4, col="red", lwd=7)
# Forward BIC
> plot(fwd.summary
$
bic, xlab="Subset Size", ylab="Fwd BIC", pch=20, type="l")
> points(min.bic.f, fwd.summary
$
bic[min.bic.f], pch=4, col="red", lwd=7)
# Backward BIC
> plot(bwd.summary
$
bic, xlab="Subset Size", ylab="Bwd BIC", pch=20, type="l")
> points(min.bic.b, bwd.summary
$
bic[min.bic.b], pch=4, col="red", lwd=7)
# Forward Adj R
^
2
> plot(fwd.summary
$
adjr2, xlab="Subset Size", ylab="Fwd adjr2", pch=20, type="l")
> points(max.adjr2.f, fwd.summary
$
adjr2[max.adjr2.f], pch=4, col="red", lwd=7)
# Backward Adj R
^
2
> plot(bwd.summary
$
adjr2, xlab="Subset Size", ylab="Bwd adjr2", pch=20, type="l")
> points(max.adjr2.b, bwd.summary
$
adjr2[max.adjr2.b], pch=4, col="red", lwd=7)
We see that all statistics pick 3-variable models except backward selection with adjusted R2.
Here are the coefficients:
> coefficients(mod.fwd, id = 3)
4
(Intercept)
poly(x, 10)1
poly(x, 10)2
poly(x, 10)7
3.07627412
2.35623596
-3.16514887
0.01046843
> coefficients(mod.bwd, id = 3)
(Intercept)
poly(x, 10)1
poly(x, 10)2
poly(x, 10)9
3.078881355
2.419817953
-3.177235617
0.001870457
> coefficients(mod.bwd, id = 4)
(Intercept)
poly(x, 10)1
poly(x, 10)2
poly(x, 10)4
poly(x, 10)5
3.12902640
2.27105667
-3.32284363
0.04320229
0.05388957
Here forward stepwise picks X7 over X3. Backward stepwise with 3 variables picks X9 while
backward stepwise with 4 variables picks X4 and X7.
5
Related Documents
Related Questions
The quadratic model for the given data is wrong.
arrow_forward
Is there a relationship between total team salary and the performance of football teams? For a recent season, a linear model predicting Wins
(out of 16 regular season games) from the total team Salary (SM) for 32 teams in a football league is Wins = -6.353 +0.105 Salary. Complete
parts a through h below.
a) What is the explanatory variable?
The explanatory variable is
because
b) What is the response variable?
The response variable is
because
c) What does the slope mean in this context?
in this league, team
(Type an integer or a decimal. Do not round.)
are, on average, about
higher for every
d) What does the y-intercept mean in this context? Is it meaningful?
V is
This
v meaningful because it
The y-intercept is the average
of a team in this league whose
is
(Type an integer or a decimal. Do not round.)
e) If one team spends $10 million more than another on salary, how many more games on average would the first team be predicted to win?
O game(s)
(Type an integer or a decimal. Do not…
arrow_forward
Please help on all parts of question 2 and all parts of question 3. Thank you!
arrow_forward
The November 24, 2001, issue of The Economist published economic data for 15
industrialized nations. Included were the percent changes in gross domestic product (GDP),
industrial production (IP), consumer prices (CP), and producer prices (PP) from Fall 2000
to Fall 2001, and the unemployment rate in Fall 2001 (UNEMP). An economist wants to
construct a model to predict GDP from the other variables. A fit of the model
GDP = , + P,IP + 0,UNEMP + f,CP + P,PP + €
yields the following output:
The regression equation is
GDP = 1.19 + 0.17 IP + 0.18 UNEMP + 0.18 CP – 0.18 PP
Predictor
Coef SE Coef
тР
Constant
1.18957 0.42180 2.82 0.018
IP
0.17326 0.041962 4.13 0.002
UNEMP
0.17918 0.045895 3.90 0.003
CP
0.17591 0.11365 1.55 0.153
PP
-0.18393 0.068808 -2.67 0.023
Predict the percent change in GDP for a country with IP = 0.5, UNEMP = 5.7, CP =
3.0, and PP = 4.1.
a.
b.
If two countries differ in unemployment rate by 1%, by how much would you predict
their percent changes in GDP to differ, other…
arrow_forward
College GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary.
Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables?
Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA.
At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables?
GPA
Salary
2.21
71000
2.28
49000
2.56
71000
2.58
63000
2.76
87000
2.85
97000
3.11
134000
3.35
130000
3.67
156000
3.69
161000
arrow_forward
You may need to use the appropriate technology to answer this question.
An automobile dealer conducted a test to determine if the time in minutes needed to complete a minor engine tune-up depends on whether a computerized engine analyzer or an electronic analyzer is used. Because tune-up time varies among compact, intermediate, and full-sized cars, the three types of cars were used as blocks in the experiment. The data obtained follow.
Analyzer
computerized
electronic
Car
compact
50
41
Intermediate
56
45
Full Sized
62
46
Use ? = 0.05 to test for any significant differences.
State the null and alternative hypotheses.
H0: ?Computerized = ?ElectronicHa: ?Computerized ≠ ?ElectronicH0: ?Computerized ≠ ?ElectronicHa: ?Computerized = ?Electronic H0: ?Computerized = ?Electronic = ?Compact = ?Intermediate = ?Full-sizedHa: Not all the population means are equal.H0: ?Compact = ?Intermediate = ?Full-sizedHa: ?Compact ≠ ?Intermediate ≠ ?Full-sizedH0:…
arrow_forward
A survey of high school students was done to examine whether students had ever driven a car after consuming a substantial amount of alcohol (1=yes, 0=no). Data was collected on their sex (male/female), race (White/non-White), and grade level (9,10,11,12). Researchers realized that the impact of race on consuming alcohol before driving might vary by grade level and decided to fit the following model.
Variable
Coding = 1 if
Intercept
Sex ()
Female
Race ()
Black
Grade level (
9th grade
10th grade
11th grade
[Reference = 12th grade]
Attached is the logistic model
1. Compute the OR of drinking before driving for students who self-reported as Black versus non-Black in the 9th grade, adjusting for gender.
2. Compute the OR of drinking before driving for students who self-reported as Black versus non-Black in the 12th grade, adjusting for gender.
3. Compute the OR of drinking before driving for someone in the 9th grade versus 12th grade for a student who…
arrow_forward
A new product made from recycled plastic soda bottles needs 18
labor hours to build the first prototype. The production operates at
a 86.5 % learning curve rate. The average time to complete the first
five prototypes is
hours.
arrow_forward
Lecture(8.1):
A research study was conducted that measured empathy after watching Sesame street for five days
in a row. The study was conducted twice. For Study A the researcher recruited 20 children and randomly assigned them into 2 groups, one group who watched Sesame street. and a second
group who did not watched Sesame street. Empathy was measured after 5 days. For study B the researcher decided to only recruit only 10 children who all watched Sesame street and this time
measued empathy before the first hour of Sesame street and again afte 5 days once the lasr hour
was watched. Which of these studies uses an independent sample and which uses a dependent
sample.?
arrow_forward
A data set is given below.
(a) Draw a scatter diagram. Comment on the type of relation that appears to exist between x and y.
(b) Given that x = 3.8333, s, =2.0412, y = 3.4167, s, = 1.6702, and r= - 0.9435, determine the least-squares regression line.
(c) Graph the least-squares regression line on the scatter diagram drawn in part (a).
1
4
4
y
4.9 5.4 4.0 3.2
1.3
1.7
(a) Choose the correct graph below.
Ay
6-
6-
0-
6.
There appears to be a linear, negative relationship.
(b) ŷ =x+ (O
(Round to three decimal places as needed.)
B.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- The quadratic model for the given data is wrong.arrow_forwardIs there a relationship between total team salary and the performance of football teams? For a recent season, a linear model predicting Wins (out of 16 regular season games) from the total team Salary (SM) for 32 teams in a football league is Wins = -6.353 +0.105 Salary. Complete parts a through h below. a) What is the explanatory variable? The explanatory variable is because b) What is the response variable? The response variable is because c) What does the slope mean in this context? in this league, team (Type an integer or a decimal. Do not round.) are, on average, about higher for every d) What does the y-intercept mean in this context? Is it meaningful? V is This v meaningful because it The y-intercept is the average of a team in this league whose is (Type an integer or a decimal. Do not round.) e) If one team spends $10 million more than another on salary, how many more games on average would the first team be predicted to win? O game(s) (Type an integer or a decimal. Do not…arrow_forwardPlease help on all parts of question 2 and all parts of question 3. Thank you!arrow_forward
- The November 24, 2001, issue of The Economist published economic data for 15 industrialized nations. Included were the percent changes in gross domestic product (GDP), industrial production (IP), consumer prices (CP), and producer prices (PP) from Fall 2000 to Fall 2001, and the unemployment rate in Fall 2001 (UNEMP). An economist wants to construct a model to predict GDP from the other variables. A fit of the model GDP = , + P,IP + 0,UNEMP + f,CP + P,PP + € yields the following output: The regression equation is GDP = 1.19 + 0.17 IP + 0.18 UNEMP + 0.18 CP – 0.18 PP Predictor Coef SE Coef тР Constant 1.18957 0.42180 2.82 0.018 IP 0.17326 0.041962 4.13 0.002 UNEMP 0.17918 0.045895 3.90 0.003 CP 0.17591 0.11365 1.55 0.153 PP -0.18393 0.068808 -2.67 0.023 Predict the percent change in GDP for a country with IP = 0.5, UNEMP = 5.7, CP = 3.0, and PP = 4.1. a. b. If two countries differ in unemployment rate by 1%, by how much would you predict their percent changes in GDP to differ, other…arrow_forwardCollege GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary. Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables? Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA. At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables? GPA Salary 2.21 71000 2.28 49000 2.56 71000 2.58 63000 2.76 87000 2.85 97000 3.11 134000 3.35 130000 3.67 156000 3.69 161000arrow_forwardYou may need to use the appropriate technology to answer this question. An automobile dealer conducted a test to determine if the time in minutes needed to complete a minor engine tune-up depends on whether a computerized engine analyzer or an electronic analyzer is used. Because tune-up time varies among compact, intermediate, and full-sized cars, the three types of cars were used as blocks in the experiment. The data obtained follow. Analyzer computerized electronic Car compact 50 41 Intermediate 56 45 Full Sized 62 46 Use ? = 0.05 to test for any significant differences. State the null and alternative hypotheses. H0: ?Computerized = ?ElectronicHa: ?Computerized ≠ ?ElectronicH0: ?Computerized ≠ ?ElectronicHa: ?Computerized = ?Electronic H0: ?Computerized = ?Electronic = ?Compact = ?Intermediate = ?Full-sizedHa: Not all the population means are equal.H0: ?Compact = ?Intermediate = ?Full-sizedHa: ?Compact ≠ ?Intermediate ≠ ?Full-sizedH0:…arrow_forward
- A survey of high school students was done to examine whether students had ever driven a car after consuming a substantial amount of alcohol (1=yes, 0=no). Data was collected on their sex (male/female), race (White/non-White), and grade level (9,10,11,12). Researchers realized that the impact of race on consuming alcohol before driving might vary by grade level and decided to fit the following model. Variable Coding = 1 if Intercept Sex () Female Race () Black Grade level ( 9th grade 10th grade 11th grade [Reference = 12th grade] Attached is the logistic model 1. Compute the OR of drinking before driving for students who self-reported as Black versus non-Black in the 9th grade, adjusting for gender. 2. Compute the OR of drinking before driving for students who self-reported as Black versus non-Black in the 12th grade, adjusting for gender. 3. Compute the OR of drinking before driving for someone in the 9th grade versus 12th grade for a student who…arrow_forwardA new product made from recycled plastic soda bottles needs 18 labor hours to build the first prototype. The production operates at a 86.5 % learning curve rate. The average time to complete the first five prototypes is hours.arrow_forwardLecture(8.1): A research study was conducted that measured empathy after watching Sesame street for five days in a row. The study was conducted twice. For Study A the researcher recruited 20 children and randomly assigned them into 2 groups, one group who watched Sesame street. and a second group who did not watched Sesame street. Empathy was measured after 5 days. For study B the researcher decided to only recruit only 10 children who all watched Sesame street and this time measued empathy before the first hour of Sesame street and again afte 5 days once the lasr hour was watched. Which of these studies uses an independent sample and which uses a dependent sample.?arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningElementary Linear Algebra (MindTap Course List)AlgebraISBN:9781305658004Author:Ron LarsonPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt