a3-solution
pdf
keyboard_arrow_up
School
Rumson Fair Haven Reg H *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Nov 24, 2024
Type
Pages
4
Uploaded by CoachRiverTiger30
Assignment 3: Linear/Quadratic Discriminant Analysis and
Comparing Classification Methods
SDS293 - Machine Learning
Due: 11 Oct 2017 by 11:59pm
Conceptual Exercises
4.5 (p. 169 ISLR)
This question examines the differences between LDA and QDA.
(a) If the Bayes decision boundary is
linear
, do we expect LDA or QDA to perform better on
the training set? On the test set?
Solution:
We would expect QDA to perform better on the training set because its increased
flexiblity will result in a closer fit. If the Bayes decision boundary is linear, we expect LDA
to perform better than QDA on the test set, as QDA could be subject to overfitting.
(b) If the Bayes decision boundary is
non-linear
, do we expect LDA or QDA to perform better
on the training set? On the test set?
Solution:
If the Bayes decision bounary is non-linear, we expect QDA to perform better on
both the training and test sets.
(c) In general, as the sample size
n
increases
, do we expect the test prediction accuracy of QDA
relative to LDA to improve, decline, or be unchanged? Why?
Solution:
We expect the test prediction accuracy of QDA relative to LDA to improve as n
gets bigger. In general, as the the sample size increases, a more flexibile method will yield a
better fit as the variance is offset by the larger sample size.
(d)
True or False
: Even if the Bayes decision boundary for a given problem is linear, we will
probably achieve a superior test error rate using QDA rather than LDA because QDA is
flexible enough to model a linear decision boundary. Justify your answer.
Solution:
False. With fewer sample points, the variance from using a more flexible method,
such as QDA, would likely result in overfitting, yielding a higher test error rate than LDA.
1
Applied Exercises
4.10 (p. 171 ISLR)
This question should be answered using the
Weekly
data set, which is part of the
ISLR
package.
This data is similar in nature to the
Smarket
data from this chapter’s lab, except that it contains
1,089
weekly
returns for 21 years, from the beginning of 1990 to the end of 2010.
(a) Produce some numerical and graphical summaries of the
Weekly
data. Do there appear to
be any
patterns
?
Solution:
Year
and
Volume
appear to have a relationship. No other patterns are discernible.
(b) Use the full data set to perform a logistic regression with
Direction
as the response and the
five
lag
variables plus
Volume
as predictors, and use the
summary()
function to print the
results. Do any of the predictors appear to be
statistically significant
? If so, which ones?
Solution:
Lag2
appears to have some statistical significance with
Pr
(
>
|
z
|
) = 3%
.
(c) Compute the confusion matrix and overall fraction of correct predictions. What is the con-
fusion matrix is telling you about the
types of mistakes
made by your logistic model?
Solution:
Percentage of correct predictions:
(54 + 557)
/
(54 + 557 + 48 + 430) = 56
.
1%
On weeks where the market goes down, the logistic regression is right most of the time:
557
/
(557 + 48) = 92
.
1%
However, on weeks the market goes down the logistic regression is wrong most of the time:
54
/
(430 + 54) = 11
.
2%
(d) Now fit the logistic regression model using a training data period from 1990 to 2008, with
Lag2
as the only predictor. Report the confusion matrix and the overall fraction of correct
predictions for the
test data
(that is, the data from 2009 and 2010).
Solution:
glm.pred Down Up
Down 9
5
Up 34
56
mean: 0.625
(e) Repeat (d) using LDA.
Solution:
Same as logistic regression.
2
(f) Repeat (d) using QDA.
Solution:
glm.pred Down Up
Down 0
0
Up 43
61
mean: 0.587
A correctness of 58.7% even though it picked Up the whole time!
(g) Repeat (d) using KNN with
K
= 1.
Solution:
glm.pred Down Up
Down 21
30
Up 22
31
mean: 0.5
(h) Which of these methods appears to provide the best results on this data?
Solution:
Logistic regression and LDA methods both provide equally low test error rates.
(i) Experiment with different combinations of predictors, including possible transformations and
interactions, for each of the methods. You should also experiment with values for
K
in the
KNN classifier. Report the predictors, method, and associated confusion matrix that appears
to provide the best results on the held out data. Why do you think this one performed best?
Solution:
This problem will have different solutions depending on which combinations you
tried.
Variation of 4.13 (p. 173 ISLR)
Using the
Boston
data set from
ISLR
, fit a classification model in order to predict whether a given
suburb has a
crime rate
above or below the median. You may want to explore logistic regression,
LDA, and KNN models using various subsets of the predictors.
Once you’re satisfied with your results, describe your model and findings:
•
Why did you choose that type of model?
•
How did you choose your predictors?
•
What does your model it tell you about the data?
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
•
Where does it break down?
•
Is there additional information that you would need to know to be able to make a better
model?
Solution:
This problem will have different solutions depending on which combinations you tried.
Interesting solutions will be anonymized and made available after grading is complete.
4
Related Questions
Answer the exercise of the image.
(In the other image are the final answers, this to verify the final answer).
arrow_forward
Plz solve it correctly I vill give 4 upvotes.
arrow_forward
Hi, I have my answers for this but I would like to double check my answers and clarify my understandings towards this problem. Please use excel in solving it. Thank you very much : )
arrow_forward
Use the R-generated data set shown to identify the variable of interest and the measurement scale that was used to obtain the measurements.
arrow_forward
2a) • Using a Graphing calculator or spreadsheet program create a least squares regression line of the tuition for 4 years versus the average salary after 10 years, define any variables that you used. • Identify and interpret the correlation coefficient and coefficient of determination. • By looking at the least squares regression line, determine the college that you believe is the best value. Explain your reasoning. • Using the college (Harvard University) you chose, identify and interpret its residual based on the least squares regression model.
arrow_forward
Clocking the Cheetah. The cheetah (Acinonyx jubatus) is the fastest land mammal and is highly specialized to run down prey. The cheetah often exceeds speeds of 60 mph and, according to the online document “Cheetah Conservation in Southern Africa” (Trade&Environment Database (TED) Case Studies, Vol. 8, No. 2) by J. Urbaniak, the cheetah is capable of speeds up to 72 mph. Following is a frequency histogram for the speeds, in miles per hour, for a sample of 35 cheetahs.
arrow_forward
Reading is fundamental to a teenager's ability to perform well in school. Assume a researcher is
interested in the ability of the number of books read over the summer to predict ACT Reading
scores. Using a random sample of 10 random high school students the researchers recorded the
number of books read over the summer and the students' ACT Reading scores.
Books Read (X) ACT Score (Y) (X- Xmean)
(Y - Ymean)
(X-Xmean)(Y-Ymean) (X-Xmean)?
(Y - Ymean)?
2
16
-5.4
-14.9
81
29
221
2
19
-5.4
-11.9
64
29
141
3
17
-4.4
-13.9
61
20
192
4
25
-3.4
-5.9
20
12
34
21
-3.4
-9.9
34
12
97
24
-1.4
-6.9
10
2
47
6
21
-1.4
-9.9
14
2
97
8
24
0.6
-6.9
-4
47
7
27
-0.4
-3.9
2
15
10
22
2.6
-8.9
-23
7
78
Total
52
216
259.1
113.3
969.3
МEAN
7.4
30.9
a. Identify the regression line using the number of books read to predict ACT Reading
score. Use a = .05 to evaluate the quality of the prediction of the regression line.
b. What is the predicted ACT Reading score when 5 books are read?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Answer the exercise of the image. (In the other image are the final answers, this to verify the final answer).arrow_forwardPlz solve it correctly I vill give 4 upvotes.arrow_forwardHi, I have my answers for this but I would like to double check my answers and clarify my understandings towards this problem. Please use excel in solving it. Thank you very much : )arrow_forward
- Use the R-generated data set shown to identify the variable of interest and the measurement scale that was used to obtain the measurements.arrow_forward2a) • Using a Graphing calculator or spreadsheet program create a least squares regression line of the tuition for 4 years versus the average salary after 10 years, define any variables that you used. • Identify and interpret the correlation coefficient and coefficient of determination. • By looking at the least squares regression line, determine the college that you believe is the best value. Explain your reasoning. • Using the college (Harvard University) you chose, identify and interpret its residual based on the least squares regression model.arrow_forwardClocking the Cheetah. The cheetah (Acinonyx jubatus) is the fastest land mammal and is highly specialized to run down prey. The cheetah often exceeds speeds of 60 mph and, according to the online document “Cheetah Conservation in Southern Africa” (Trade&Environment Database (TED) Case Studies, Vol. 8, No. 2) by J. Urbaniak, the cheetah is capable of speeds up to 72 mph. Following is a frequency histogram for the speeds, in miles per hour, for a sample of 35 cheetahs.arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt