homework_2

doc

School

University of Houston *

*We aren’t endorsed by this school

Course

4322

Subject

Mathematics

Date

Feb 20, 2024

Type

doc

Pages

21

Uploaded by hongyumei411

Report
MATH 4322 - Homework 2 Instructions Due September 14, 2023 at 11:59 pm Answer all questions fully Submit the answers in one file, preferably PDF, then upload in Canvas. These questions are from Introduction to Statistical Learning, 2nd edition , chap- ters 3 and 6. Problem 1 The following output is based on predicting sales based on three media budgets, TV , radio , and newspaper . Call: lm(formula = sales ~ TV + radio + newspaper, data = Advertising) Residuals: Min 1Q Median 3Q Max -8.8277 -0.8908 0.2418 1.1893 2.8292 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.938889 0.311908 9.422 <2e-16 *** TV 0.045765 0.001395 32.809 <2e-16 *** radio 0.188530 0.008611 21.893 <2e-16 *** newspaper -0.001037 0.005871 -0.177 0.86 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.686 on 196 degrees of freedom 1
Multiple R-squared: 0.8972, Adjusted R-squared: 0.8956 F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16 a. Give the estimated model to predict sales. Sales = 2.938889 + TV*0.045765 + radio*0.188530 – newspaper*0.001037 b. Describe the null hypothesis to which the p-values given in the Coefficients table correspond. Explain this in terms of the sales , TV , radio , and newspaper , rather than in terms of the coe icients of the linear model. ff Null hypotheses are accepted when the p-value of the coefficient or p-value of overall model is less than or equal to 0.05 (5%). If the p-value associated with a coefficient is less than 0.05, it is considered significant. In this data, all coefficients except “newspaper” have p-values less than 0.05, which means only “newspaper” does not significantly contribute to sales prediction. The overall model’s p-value is also less than 0.05, indicating that the entire model is suitable for predicting sales. The multiple R-squared value of 0.8972 means that roughly 89.72% of the variation in sales can be explained by the predictors in the model. Which means that the model is good and suitable. c. Are there any variables that may not be significant in predicting sales ? Only the p-value of “newspaper” is greater than 0.05, hence “newspaper is not significant in predicting “sales”. Problem 2 Based on the previous problem, the following is the output from the full model:
a) Determine the AIC for all three models. Model 1: 2*(3+1) + 200*ln(556.8/200) = 212.778 Model 2: 2*(2+1) + 200*ln(556.9/200) = 210.814 Model 3: 2*(1+1) + 200*ln(2102.5/200) = 474.513 b) Determine the 𝐶𝑝 for all three models. Model 1: 556.8/2.8 + 2*(3+1) – 200 = 6.857 Model 2: 556.9/2.8 + 2*(2+1) – 200 = 4.893 Model 3: 2102.5/2.8 + 2*(1+1) – 200 = 554.893 c) Determine the adjusted 𝑅^ 2 for all three models. SST = 3314.6 + 1545.6 + 0.1 + 556.8 = 5417.1 Model 1: 1 – (556.8/(200-3-1))/(5417.1/199) = 0.8956 Model 2: 1- (556.9/(200-2-1))/(5417.1/199) = 0.8962 Model 3: 1- (2102.5/(200-1-1))/(5417.1/199) = 0.6099 d) Determine the RSE for all three models. Model 1: 1.686 Model 2: 1.681 Model 3: 3.259 e) Which model best fits to predict sales based on these statistics? Model 2 is best fits to predict “sales” based on these statistics.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Problem 3 (a) Which answer is correct, and why? i. For a fixed value of IQ and GPA, males earn more on average than females. ii. For a fixed value of IQ and GPA, females earn more on average than males. iii. For a fixed value of IQ and GPA, males earn more on average than females provided that the GPA is high enough. iv. For a fixed value of IQ and GPA, females earn more on average than males provided that the GPA is high enough. iii is correct. Least square line: y = 50 + 20*GPA + 0.07*IQ + 35*Gender + 0.01*GPA*IQ – 10*GPA*Gender For males: y = 50 + 20*GPA + 0.07*IQ + 0.01*GPA*IQ For females: y = 85 + 10*GPA + 0.07*IQ + 0.01*GPA*IQ 50 + 20*GPA >= 85 + 10*GPA which equivalent to GPA >= 3.5. Which means the starting salary for males is higher than females. (b) Predict the salary of a female with IQ of 110 and a GPA of 4.0. For females: y = 85 + 10*GPA + 0.07*IQ + 0.01*GPA*IQ Y = 85 + 10*4 + 0.07*110 + 0.01*4*110 = 137.1 Starting salary of $137,100 (c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect. Justify your answer. False, the coefficient value of an interaction term alone does not provide conclusive evidence for or against the presence of an interaction effect. To draw meaningful conclusions about the existence or absence of an interaction effect, it is essential to either have or calculate a p-value associated with the coefficient of the interaction term. Problem 4 We perform best subset, forward stepwise, and backward stepwise selection on a single data set. For each approach, we obtain p + 1 models, containing 0, 1, 2, …, p predictors. Answer true or false to the following statements.
(a) The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k + 1)-variable model identified by forward stepwise selection. True (b) The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the (k + 1)-variable model identified by backward stepwise selection. True (c) The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the (k + 1)-variable model identified by forward stepwise selection. False (d) The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k + 1)-variable model identified by backward stepwise selection. False (e) The predictors in the k-variable model identified by best subset are a subset of the predictors in the (k + 1) - variable model identified by best subset selection. False Problem 5 This question involves the use of simple linear regression on the Auto data set. This can be found in the ISLR2 package in R . (a) Use the lm() function to perform a simple linear regression with mpg as the response and horsepower ( hp ) as the predictor. Use the summary() function to print the results. Comment on the output. For example: i. Is there a relationship between the predictor and the response? Yes, since the p-value is 2.2e-16, there is a relationship between the predictor and the response.
ii. How strong is the relationship between the predictor and the response? R-squared is 0.6049, which shows that almost 60.49% of the variation in mpg (response) is because of the horsepower (predictor). iii. Is the relationship between the predictor and the response positive or negative? Negative iv. What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals? Give an interpretation of these intervals. > predict(auto.lm, data.frame(horsepower = 98), interval = "prediction") 24.151388 14.4938885 33.80889 > predict(auto.lm, data.frame(horsepower = 98), interval = "confidence") 24.151388 23.660958 24.641817 The predicted mpg associated with a horsepower of 98 is about 24.15. 95% confidence interval of (13.660958, 24.641817) and prediction interval of (14.4938885, 33.80889). (b) Plot the response and the predictor. Use the abline() function to display the least squares regression line. > plot(Auto$horsepower,Auto$mpg) > abline(auto.lm) (c) Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Problem 6 This question involves the use of multiple linear regression on the Auto data set. (a) Produce a scatterplot matrix which includes all of the variables in the data set. > pairs(Auto) (b) Compute the matrix of correlations between the variables using the function cor() . You will need to exclude the name variable, cor() which is qualitative.
(c) Use the lm() function to perform a multiple linear regression with mpg as the response and all other variables except name as the predictors. Use the summary() function to print the results. Comment on the output. For instance: i. Is there a relationship between the predictors and the response? The p-value is low which shows that there are relationships between some variables and “mpg” in the model. ii. Which predictors appear to have a statistically significant relationship to the re- sponse? Displacement, weight, year, and origin iii. What does the coe icient for the year variable suggest? ff Coefficient for the year variable suggests that “mpg” increases by the coefficient for every one year. (d) Use the plot() function to produce diagnostic plots of the linear regression fit based on the predictors that appear to have a statistically signifianct relationship to the response. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?
First graph shows that there is non-linear relationship between the response and the predictors. There does appear to be outliers. And there on observation that stands out as a potential leverage point in third graphs (e) Use the * and/or : symbols to fit linear regression models with interaction effects. Do any interactions appear to be statistically significant? Interaction between “displacement” and “weight” appear to be statistically significant. (f) Try a few different transformations of the variables, such as log(X), √X, X^2. Comment on your findings.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Log(acceleration) is less significant than acceleration Problem 7 This problem involves the Boston data set, from the ISLR2 pakcage. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors. (a) For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.
Except for crim vs chas, all other models are statistically significant association between the predictor and the response. (b) Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis 𝐻 0 𝛽𝑗 = 0?
Only zn, dis, rad, and medv have a significant association with crim, all of them have p-value that below 0.05. Which means we can reject the null hypothesis. (c) How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis. Univariate and multiple regression coefficients exhibit a clear distinction. In a simple regression model, the slope signifies the average impact of a predictor's increase, disregarding other predictors in the dataset. Conversely, in multiple regression, the slope represents the average impact of a predictor's increase while holding all other predictors constant. (d) Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
There is no evidence of non-linear association between any of the predictors and the response Problem 8 This problem focuses on the collinearity problem. (a) Perform the following commands in R: set.seed ( 1 ) x1 =runif ( 100 ) x2 = 0.5* x1 + rnorm ( 100 ) /10 y = 2+2* x1 +0.3* x2 + rnorm ( 100 ) The last line corresponds to creating a linear model in which 𝑦 is a function of x and x . Write out the form of the linear model. What are the regression coefficients? Linear model: y = β + β x + β x + ε Regression coefficients: β ₀ = 2, β ₁ = 2, β ₂ = 0.3 (b) What is the correlation between 𝑥 1 and 𝑥 2? Create a scatterplot displaying the relationship between the variables. > cor(x1,x2) [1] 0.8351212
(c ) Using this data, fit a least squares regression to predict y using 𝑥 1 and 𝑥 2. Describe the results obtained. What are 𝛽 0 ̂, 𝛽 1 ̂, and 𝛽 2 ̂? How do these relate to the true 𝛽 0, 𝛽 1, and 𝛽 2? Can you reject the null hypothesis 𝐻 0 𝛽 1 = 0? How about the null hypothesis 𝐻 0 𝛽 2 = 0? βo = 2.1305, β1 = 1.4396 , and β2 = 1.0097; Only βo is close to true βo. P-value for 𝛽 1 is 0.0487 < 0.05, which it reject null hypothesis. P-value for β2 is 0.3754 > 0.05, which it cannot reject null hypothesis. (d ) Now fit a least squares regression to predict 𝑦 using only 𝑥 1. Comment on your results. Can you reject the null hypothesis 𝐻 0 𝛽 1 = 0? We can reject the null hypothesis because the p-value is very low (e) Now fit a least squares regression to predict 𝑦 using only 𝑥 2. Comment on your results. Can you reject the null hypothesis 𝐻 0 𝛽 1 = 0? We can reject the null hypothesis because the p-value is very low
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(f) Do the results obtained in (c)–(e) contradict each other? Explain your answer. No, the results obtained in (c) – (e) do not contradict each other. When both variables, x1 and x2, are included in the model, the coefficient of x2 does not show statistical significance. However, when either x1 or x2 is individually used to fit the model, both the coefficients of x1 and x2 exhibit statistical significance. (g) Now suppose we obtain one additional observation, which was unfortunately miss =measured. x1=c(x1 , 0.1) x2=c(x2 , 0.8) y=c(y,6) Re-fit the linear models from (c) to (e) using this new data. What effect does this new observation have on the each of the models? In each model, is this observation an outlier? A high-leverage point? Both? Explain your answers.
In this observation, it does not reject null hypothesis: 𝐻 0 𝛽 1 = 0 and it reject null hypothesis: 𝐻 0 𝛽 2 = 0. And in this observation, the new point is a leverage point but not an outlier.
Both (d) and this new observation are rejecting null hypothesis: 𝐻 0 𝛽 1 = 0. R^2 is smaller than (d) and σ^2 is larger than (d), which means this observation is an outlier. But it does not have high leverage in the plot residual. Both (e) and this new observation are rejecting null hypothesis: 𝐻 0 𝛽 1 = 0. R^2 is not smaller than (e) and σ^2 is not larger than (d), which means this observation is not an outlier. But it has high leverage in the plot residual. Problem 9 In this exercise, we will generate simulated data, and will then use this data to perform best subset selection. (a) Use the rnorm() function to generate a predictor of length = 100, as well as a noise vector of length = 100. > set.seed(1)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> x = rnorm(100) > noise = rnorm(100) (b) Generate a response vector 𝑌 of length n = 100 according to the model where 𝛽 0, 𝛽 1, 𝛽 2, and 𝛽 3 are constants of your choice. > b0 = 14 > b1 = 2 > b2 = -3 > b3 = 5 > Y = b0+b1*x+b2*x^2+b3*x^3+noise (c) > which.min(mod.s$cp) The best model according to Cp has 4 [1] 4 variables. > which.min(mod.s$bic) The best model according to BIC has 3 [1] 3 variables. > which.max(mod.s$adjr2) The best model according to adjusted R^2 [1] 4 has 4 variables. (d) Repeat (c), using forward stepwise selection and also using backwards stepwise selection. How does your answer compare to the results in (c)? > mod.fss=regsubsets(y~poly(x,10,raw=TRUE),data=data.f,nvmax=10,method="forward") > mod.bss=regsubsets(y~poly(x,10,raw=TRUE),data=data.f,nvmax=10,method="backward") > fss.s=summary(mod.fss) > bss.s=summary(mod.bss) > min.cp.fss=which.min(fss.s$cp)
> min.bic.fss=which.min(fss.s$bic) > max.adjr2.fss=which.max(fss.s$adjr2) > min.cp.bss=which.min(bss.s$cp) > min.bic.bss=which.min(bss.s$bic) > max.adjr2.bss=which.max(bss.s$adjr2) > plot(fss.s$cp,main="Forward Stepwise Selection Of Cp",pch=20) > points(min.cp.fss,fss.s$cp[min.cp.fss],pch=4,col="blue", lwd=7) > plot(bss.s$cp,main="Backward Stepwise Selection Of Cp",pch=20) > points(min.cp.bss,bss.s$cp[min.cp.bss],pch=4,col="blue", lwd=7) > plot(fss.s$bic,main="Forward Stepwise Selection Of BIC",pch=20) > points(min.bic.fss,fss.s$bic[min.bic.fss],pch=4,col="blue", lwd=7) > plot(bss.s$bic,main="Backward Stepwise Selection Of BIC",pch=20) > points(min.bic.bss,bss.s$bic[min.bic.bss],pch=4,col="blue", lwd=7) > plot(fss.s$adjr2,main="Forward Stepwise Selection Of Adjr2",pch=20) > points(max.adjr2.fss,fss.s$adjr2[max.adjr2.fss],pch=4,col="blue", lwd=7) > plot(bss.s$adjr2,main="Backward Stepwise Selection Of Adjr2",pch=20) > points(max.adjr2.bss,bss.s$adjr2[max.adjr2.bss],pch=4,col="blue", lwd=7) > which.min(fss.s$cp) > which.max(fss.s$adjr2) [1] 4 [1] 4 > which.min(bss.s$cp) > which.max(bss.s$adjr2) [1] 4 [1] 4 > which.min(fss.s$bic) [1] 3 > which.min(bss.s$bic)
[1] 3 The best model according to Cp, BIC, and Adjusted R^2 are same as (c).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help