Statistical Learning: MATH 4322 Homework 2 Guide

MATH 4322 - Homework 2 Instructions • Due September 14, 2023 at 11:59 pm • Answer all questions fully • Submit the answers in one file, preferably PDF, then upload in Canvas. • These questions are from Introduction to Statistical Learning, 2nd edition , chap- ters 3 and 6. Problem 1 The following output is based on predicting sales based on three media budgets, TV , radio , and newspaper . Call: lm(formula = sales ~ TV + radio + newspaper, data = Advertising) Residuals: Min 1Q Median 3Q Max -8.8277 -0.8908 0.2418 1.1893 2.8292 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.938889 0.311908 9.422 <2e-16 *** TV 0.045765 0.001395 32.809 <2e-16 *** radio 0.188530 0.008611 21.893 <2e-16 *** newspaper -0.001037 0.005871 -0.177 0.86 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.686 on 196 degrees of freedom 1

Multiple R-squared: 0.8972, Adjusted R-squared: 0.8956 F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16 a. Give the estimated model to predict sales. Sales = 2.938889 + TV*0.045765 + radio*0.188530 – newspaper*0.001037 b. Describe the null hypothesis to which the p-values given in the Coefficients table correspond. Explain this in terms of the sales , TV , radio , and newspaper , rather than in terms of the coe icients of the linear model. ff Null hypotheses are accepted when the p-value of the coefficient or p-value of overall model is less than or equal to 0.05 (5%). If the p-value associated with a coefficient is less than 0.05, it is considered significant. In this data, all coefficients except “newspaper” have p-values less than 0.05, which means only “newspaper” does not significantly contribute to sales prediction. The overall model’s p-value is also less than 0.05, indicating that the entire model is suitable for predicting sales. The multiple R-squared value of 0.8972 means that roughly 89.72% of the variation in sales can be explained by the predictors in the model. Which means that the model is good and suitable. c. Are there any variables that may not be significant in predicting sales ? Only the p-value of “newspaper” is greater than 0.05, hence “newspaper is not significant in predicting “sales”. Problem 2 Based on the previous problem, the following is the output from the full model:

a) Determine the AIC for all three models. Model 1: 2*(3+1) + 200*ln(556.8/200) = 212.778 Model 2: 2*(2+1) + 200*ln(556.9/200) = 210.814 Model 3: 2*(1+1) + 200*ln(2102.5/200) = 474.513 b) Determine the 𝐶𝑝 for all three models. Model 1: 556.8/2.8 + 2*(3+1) – 200 = 6.857 Model 2: 556.9/2.8 + 2*(2+1) – 200 = 4.893 Model 3: 2102.5/2.8 + 2*(1+1) – 200 = 554.893 c) Determine the adjusted 𝑅^ 2 for all three models. SST = 3314.6 + 1545.6 + 0.1 + 556.8 = 5417.1 Model 1: 1 – (556.8/(200-3-1))/(5417.1/199) = 0.8956 Model 2: 1- (556.9/(200-2-1))/(5417.1/199) = 0.8962 Model 3: 1- (2102.5/(200-1-1))/(5417.1/199) = 0.6099 d) Determine the RSE for all three models. Model 1: 1.686 Model 2: 1.681 Model 3: 3.259 e) Which model best fits to predict sales based on these statistics? Model 2 is best fits to predict “sales” based on these statistics.

Your preview ends here