a3-solution
.pdf
keyboard_arrow_up
School
Rumson Fair Haven Reg H *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Nov 24, 2024
Type
Pages
4
Uploaded by CoachRiverTiger30
Assignment 3: Linear/Quadratic Discriminant Analysis and
Comparing Classification Methods
SDS293 - Machine Learning
Due: 11 Oct 2017 by 11:59pm
Conceptual Exercises
4.5 (p. 169 ISLR)
This question examines the differences between LDA and QDA.
(a) If the Bayes decision boundary is
linear
, do we expect LDA or QDA to perform better on
the training set? On the test set?
Solution:
We would expect QDA to perform better on the training set because its increased
flexiblity will result in a closer fit. If the Bayes decision boundary is linear, we expect LDA
to perform better than QDA on the test set, as QDA could be subject to overfitting.
(b) If the Bayes decision boundary is
non-linear
, do we expect LDA or QDA to perform better
on the training set? On the test set?
Solution:
If the Bayes decision bounary is non-linear, we expect QDA to perform better on
both the training and test sets.
(c) In general, as the sample size
n
increases
, do we expect the test prediction accuracy of QDA
relative to LDA to improve, decline, or be unchanged? Why?
Solution:
We expect the test prediction accuracy of QDA relative to LDA to improve as n
gets bigger. In general, as the the sample size increases, a more flexibile method will yield a
better fit as the variance is offset by the larger sample size.
(d)
True or False
: Even if the Bayes decision boundary for a given problem is linear, we will
probably achieve a superior test error rate using QDA rather than LDA because QDA is
flexible enough to model a linear decision boundary. Justify your answer.
Solution:
False. With fewer sample points, the variance from using a more flexible method,
such as QDA, would likely result in overfitting, yielding a higher test error rate than LDA.
1
Applied Exercises
4.10 (p. 171 ISLR)
This question should be answered using the
Weekly
data set, which is part of the
ISLR
package.
This data is similar in nature to the
Smarket
data from this chapter’s lab, except that it contains
1,089
weekly
returns for 21 years, from the beginning of 1990 to the end of 2010.
(a) Produce some numerical and graphical summaries of the
Weekly
data. Do there appear to
be any
patterns
?
Solution:
Year
and
Volume
appear to have a relationship. No other patterns are discernible.
(b) Use the full data set to perform a logistic regression with
Direction
as the response and the
five
lag
variables plus
Volume
as predictors, and use the
summary()
function to print the
results. Do any of the predictors appear to be
statistically significant
? If so, which ones?
Solution:
Lag2
appears to have some statistical significance with
Pr
(
>
|
z
|
) = 3%
.
(c) Compute the confusion matrix and overall fraction of correct predictions. What is the con-
fusion matrix is telling you about the
types of mistakes
made by your logistic model?
Solution:
Percentage of correct predictions:
(54 + 557)
/
(54 + 557 + 48 + 430) = 56
.
1%
On weeks where the market goes down, the logistic regression is right most of the time:
557
/
(557 + 48) = 92
.
1%
However, on weeks the market goes down the logistic regression is wrong most of the time:
54
/
(430 + 54) = 11
.
2%
(d) Now fit the logistic regression model using a training data period from 1990 to 2008, with
Lag2
as the only predictor. Report the confusion matrix and the overall fraction of correct
predictions for the
test data
(that is, the data from 2009 and 2010).
Solution:
glm.pred Down Up
Down 9
5
Up 34
56
mean: 0.625
(e) Repeat (d) using LDA.
Solution:
Same as logistic regression.
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Plz solve it correctly I vill give 4 upvotes.
arrow_forward
(P2) Helping tags: Statistics, Analysis of Relationships Among Variables, Measures of Association, Simple and Multiple Linear Regression, Path Analysis
.
.
.
.
.
.
WILL UPVOTE, just pls help me answer the following questions in the attached image. Pls show complete solutions and explain them. Thank you!
arrow_forward
Q22
arrow_forward
Applying VIF Test to the model below,
determine if the assumption that "there is no
exact linear relationship or no exacr
collinearity between the X variables" is met.
Explain the result of VIF Test and include the
obtained VIF value for each predictor variables
in your explanation.
ŷ = Bo + ₁x₂ + B₂x₂
I will give a thumbs up.
arrow_forward
A Help | Microsoft Teams
O Hideous Creature - YouTube
No Exit Searcher | Math Playground X
9 Schoology
A clevelandmetro.schoology.com/common-assessment-delivery/start/3194001862?action=Donresume&submissionld=D214343790
(12.org bookmarks (4) ELA 206 Class.
9 Home | Schoology
C Clever | Portal
Typing Games - Ty..
N Achievements - Ear..
Chrome Music Lab..
G Good
Mr. Fink's economy car can travel 420 miles on a 12-gallon tank of gas.
CREATE and USE a ratio table in your answer to determine how many miles he can travel on 8
gallons.
(CLICK ON THE TABLE ICON. 2 ROWS, 4 COLUMNS, CHECK BOARDERS)
Type your answer with a number and label.
Miles
Gallons
B IU
miles
arrow_forward
We expect a car's highway gas mileage to be related to its
city gas mileage (in miles per gallon, mpg). Data for all
1259 vehicles in the government's 2019 Fuel Economy
Guide give the regression line
highway mpg = 8.720 + (0.914x city mpg)
for predicting highway mileage from city mileage.
1
O Macmillan Learning
(b) What is the intercept? Give your answer to three
decimal places.
intercept:
Why is the value of the intercept not
statistically meaningful?
The value of the intercept is an average value
calculated from a sample.
The value of the intercept represents the predicted
highway mileage for city gas mileage of 0 mpg,
and such a prediction would be invalid since 0 is
outside the range of the data.
The value of the intercept represents the predicted
highway mileage for slope 0.
O The value of the intercept represents the predicted
city mileage for highway gas mileage of 0 mpg,
and such a car does not exist.
mpg
arrow_forward
Clocking the Cheetah. The cheetah (Acinonyx jubatus) is the fastest land mammal and is highly specialized to run down prey. The cheetah often exceeds speeds of 60 mph and, according to the online document “Cheetah Conservation in Southern Africa” (Trade&Environment Database (TED) Case Studies, Vol. 8, No. 2) by J. Urbaniak, the cheetah is capable of speeds up to 72 mph. Following is a frequency histogram for the speeds, in miles per hour, for a sample of 35 cheetahs.
arrow_forward
(P3) Helping tags: Statistics, Analysis of Relationships Among Variables, Measures of Association, Simple and Multiple Linear Regression, Path Analysis
.
.
.
.
.
.
WILL UPVOTE, just pls help me answer the following questions in the attached image. Pls show complete solutions and explain them. Thank you!
arrow_forward
LOOK: The aim of the study is to identify the factors affecting students’ satisfaction and performance regarding online classes during the pandemic period of COVID–19 for Grade 6 students in Oxford univeristy.
What is the dependent variable, independent variable, control variables, control intervening variable of the study/
what is the theory of the study that examines the independent variable?
arrow_forward
Clocking the Cheetah. The cheetah (Acinonyx jubatus)isthe fastest land mammal and is highly specialized to run down prey. The cheetah often exceeds speeds of 60 mph and, according to the online document “Cheetah Conservation in Southern Africa” (Trade&Envi-ronment Database (TED) Case Studies, Vol. 8, No. 2) by J. Urbaniak, the cheetah is capable of speeds up to 72 mph. The WeissStats site contains the top speeds, in miles per hour, for a sample of 35 chee-tahs. Use the technology of your choice to do the following tasks. a. Find a 95% confidence interval for the mean top speed, μ,ofall cheetahs. Assume that the population standard deviation of top speeds is 3.2 mph. d. Comment on the advisability of using the z-interval procedure on these data.
arrow_forward
What is the simultaneous equation bias? Give an example? What are the techniques used to estimate such model? What are the necessary conditions that are required to validly estimate the original models parameters?
arrow_forward
Which of the following about open-source software is true
а.
A software that can be easily customized.
b. A software that does not cost any money for usage.
O c. All of these
O d. A software that is freely downloadable and available on the internet
arrow_forward
Hand Computation of this, same with the result of the software excel and jamovi:
A psychologist would like to research if the Hope scale(Synder, et al., 1991) scores differ between those living in City A and City B. The 12-item scale consisted of two subscales: a 6-item agency subscale and a 6-item pathways subscales. Scores on the 12 items were averaged, with higher scores indicating higher levels of hope. The following scores were obtained.
Use Student’s t-test for Independent Groups, alpha at .05 2 tail
City A - 43 38 36 38 52 60 58 66 55 63
City B - 62 34 35 52 58 72 57 55 80 67
arrow_forward
Q4
An engineer wants to investigate the effect of operating temperature to polyethylene (PE)
material reliability (in terms of Derating Factor, the higher the better, maximum value is
1.00) to be used for outdoor piping systems. Data are gathered as in Table Q4.
Table Q4
Experiment
no.
1
2
3
45
879
6
9
10
Operating temperature (°C), A Derating Factor, B
21
27
32
38
43
49
54
60
66
71
10
10
10
Given Σ 4 = 23801, Σ Β = 4.84, Σ AB; = 237.4
i=1
i=1
i=1
1.00
0.90
0.90
0.80
0.80
0.70
0.50
0.40
0.20
0.00
arrow_forward
Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model
What is the predicted average mileage at tire pressure x = 31?
arrow_forward
Omitting a relevant variable(s) from a model dangerous than including an irrelevant variable(s). Do you agree? Why or why not?
arrow_forward
q17
arrow_forward
Exercise 13-60 (Algo) (LO13-5)
Waterbury Insurance Company wants to study the relationship between the amount of fire damage and the distance between the burning house and the nearest fire station. This information will be used in setting rates for insurance coverage. For a sample of 30 claims for the last year, the director of the actuarial department determined the distance from the fire station (x) and the amount of fire damage, in thousands of dollars (y). The MegaStat output is reported here:
ANOVA table
Source
SS
df
MS
F
Regression
1,835.5782
1
1,835.5782
40.4537
Residual
1,270.4934
28
45.3748
Total
3,106.0716
29
Regression output
Variables
Coefficients
Std. Error
t(df=28)
Intercept
14
3.1125
2.34
Distance–X
6
0.8778
6.36
Click here for the Excel Data File
a-1. Determine the regression equation. (Round your answers to 3 decimal places.)
a-2. Is there a direct or indirect relationship between the distance from…
arrow_forward
What minimization is a basic technique in linear regression of learning models? How do you get the minimum? Please explain.
arrow_forward
Q4 Deep learning is a type of Machine Learning, inspired by the function and structure ofa human brain, where machines can learn by experience and acquire skills withoutany human involvement. Table Q4 shows the relation of data amount and theperformance of a deep learning technique capability to perform COVID-19 face maskidentification among crowd
arrow_forward
Multiple linear regression
b)
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Plz solve it correctly I vill give 4 upvotes.arrow_forward(P2) Helping tags: Statistics, Analysis of Relationships Among Variables, Measures of Association, Simple and Multiple Linear Regression, Path Analysis . . . . . . WILL UPVOTE, just pls help me answer the following questions in the attached image. Pls show complete solutions and explain them. Thank you!arrow_forwardQ22arrow_forward
- Applying VIF Test to the model below, determine if the assumption that "there is no exact linear relationship or no exacr collinearity between the X variables" is met. Explain the result of VIF Test and include the obtained VIF value for each predictor variables in your explanation. ŷ = Bo + ₁x₂ + B₂x₂ I will give a thumbs up.arrow_forwardA Help | Microsoft Teams O Hideous Creature - YouTube No Exit Searcher | Math Playground X 9 Schoology A clevelandmetro.schoology.com/common-assessment-delivery/start/3194001862?action=Donresume&submissionld=D214343790 (12.org bookmarks (4) ELA 206 Class. 9 Home | Schoology C Clever | Portal Typing Games - Ty.. N Achievements - Ear.. Chrome Music Lab.. G Good Mr. Fink's economy car can travel 420 miles on a 12-gallon tank of gas. CREATE and USE a ratio table in your answer to determine how many miles he can travel on 8 gallons. (CLICK ON THE TABLE ICON. 2 ROWS, 4 COLUMNS, CHECK BOARDERS) Type your answer with a number and label. Miles Gallons B IU milesarrow_forwardWe expect a car's highway gas mileage to be related to its city gas mileage (in miles per gallon, mpg). Data for all 1259 vehicles in the government's 2019 Fuel Economy Guide give the regression line highway mpg = 8.720 + (0.914x city mpg) for predicting highway mileage from city mileage. 1 O Macmillan Learning (b) What is the intercept? Give your answer to three decimal places. intercept: Why is the value of the intercept not statistically meaningful? The value of the intercept is an average value calculated from a sample. The value of the intercept represents the predicted highway mileage for city gas mileage of 0 mpg, and such a prediction would be invalid since 0 is outside the range of the data. The value of the intercept represents the predicted highway mileage for slope 0. O The value of the intercept represents the predicted city mileage for highway gas mileage of 0 mpg, and such a car does not exist. mpgarrow_forward
- Clocking the Cheetah. The cheetah (Acinonyx jubatus) is the fastest land mammal and is highly specialized to run down prey. The cheetah often exceeds speeds of 60 mph and, according to the online document “Cheetah Conservation in Southern Africa” (Trade&Environment Database (TED) Case Studies, Vol. 8, No. 2) by J. Urbaniak, the cheetah is capable of speeds up to 72 mph. Following is a frequency histogram for the speeds, in miles per hour, for a sample of 35 cheetahs.arrow_forward(P3) Helping tags: Statistics, Analysis of Relationships Among Variables, Measures of Association, Simple and Multiple Linear Regression, Path Analysis . . . . . . WILL UPVOTE, just pls help me answer the following questions in the attached image. Pls show complete solutions and explain them. Thank you!arrow_forwardLOOK: The aim of the study is to identify the factors affecting students’ satisfaction and performance regarding online classes during the pandemic period of COVID–19 for Grade 6 students in Oxford univeristy. What is the dependent variable, independent variable, control variables, control intervening variable of the study/ what is the theory of the study that examines the independent variable?arrow_forward
- Clocking the Cheetah. The cheetah (Acinonyx jubatus)isthe fastest land mammal and is highly specialized to run down prey. The cheetah often exceeds speeds of 60 mph and, according to the online document “Cheetah Conservation in Southern Africa” (Trade&Envi-ronment Database (TED) Case Studies, Vol. 8, No. 2) by J. Urbaniak, the cheetah is capable of speeds up to 72 mph. The WeissStats site contains the top speeds, in miles per hour, for a sample of 35 chee-tahs. Use the technology of your choice to do the following tasks. a. Find a 95% confidence interval for the mean top speed, μ,ofall cheetahs. Assume that the population standard deviation of top speeds is 3.2 mph. d. Comment on the advisability of using the z-interval procedure on these data.arrow_forwardWhat is the simultaneous equation bias? Give an example? What are the techniques used to estimate such model? What are the necessary conditions that are required to validly estimate the original models parameters?arrow_forwardWhich of the following about open-source software is true а. A software that can be easily customized. b. A software that does not cost any money for usage. O c. All of these O d. A software that is freely downloadable and available on the internetarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt