Homework-5

docx

School

University of South Florida *

*We aren’t endorsed by this school

Course

4606

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

16

Uploaded by Heynsider

Report
1 Stephanie Padin ESI 4606 Analytics I - Foundations of Data Science Homework 5 Due: Nov. 8 st (11:00AM), 2023 Problem 1 (1 point) Explain how to utilize forward stepwise selection and 5-fold cross-validation to select the best combination of predictors for multiple linear regression, y i = β 0 + β 1 x i 1 + ... + β 8 x i 8 + ε i , i = 1 ,..., N , where N is the sample size. (Use words, figures and mathematical notations to provide a clear description) Step 1: Initializing the Model Begin with a model that has no variables except for the intercept term β0. The model is represented by the equation y_i = β0 + ε_i, where y_i is the dependent variable, β0 is the intercept term, and ε_i is the error term. y = β0 + ε Step 2: Iterative Selection Iterative Selection is a method used to build a model by gradually adding predictors that improve its performance. The process starts by selecting the predictor variable that results in the lowest error when added to the model. This is determined by fitting the model with each remaining predictor individually and selecting the one that gives the best result based on a metric like Mean Squared Error or R-squared. Then, the next predictor that improves the model the most is added, and this process continues until a stopping criterion is met. This could be a predetermined number of predictors or until there is no significant improvement in the model's performance. The model grows by one predictor at each step of the process. y = β0 + β1x1 + ε
2 Step 3: 5-Fold Cross Validation 5-Fold Cross-Validation is a technique used to evaluate the performance of a model on unseen data. It involves dividing the dataset into 5 equal-sized folds. In each iteration, 4 folds are used for training and 1 fold is used for testing. The model is trained on the training set and evaluated on the test set, with a performance metric such as Mean Squared Error recorded. This process is repeated for all 5 folds, with the fold used for testing rotating each time. Finally, the average performance metric across all folds is calculated to estimate how well the model is likely to perform on unseen data. y = β0 + β1x1 + β2x2 + ε Step 4: Recording the Performance Record the average performance metric obtained through 5-fold cross-validation for each combination of predictors to determine how well they generalize to unseen data. Step 5: Selecting the Best Combination Here there is an option to choose the combination of predictors with the lowest average performance metric as it is considered the best combination based on the chosen evaluation metric. Step 6: Final Model: Train the final model using the chosen predictors on the complete dataset. Step 7: Evaluating the Model Performance Assess the model's performance on an independent dataset to validate its ability to generalize. y = β0 + β1x1 + β2x2 + ... + βnxn + ε Problem 2 (3 points)
3 Questions in this problem should be answered using the data set file of "HM5.txt". x 1 , x 2 , ..., x 10 are predictors and y is the response variable. (a) Perform the best subset selection to choose the best model from the possible predictors, x 1 , x 2 , ..., x 10 . What is the best model obtained according to C p , BIC and adjusted R 2 , respectively? Show some plots to provide evidence for your answer, and report the coefficients of the corresponding best model obtained.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 To determine the best model obtained according to Cp, BIC, and adjusted R2 from the results using "HM5.txt" data, I used the information provided in the code. The best model according to Cp includes intercept, x1, x2, x3, x7, x8, and x10. The best model according to BIC includes intercept, x1, x2, and x3. The best model according to adjusted R2 includes intercept, x1, x2, x3, x5, x7, and x9. I can plot the Cp, BIC, and adjusted R2 values to provide evidence for the answer. To report the coefficients of the corresponding best models, I used the coefficients from the linear regression fit of these models to the data. (b) Perform the forward stepwise selection to choose the best model from the possible predictors, x 1 , x 2 ,..., x 10 . What is the best model obtained according to C p , BIC and adjusted R 2 , respectively? Show some plots to provide evidence for your answer and report the coefficients of the corresponding best model obtained.
5
6 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.09251 10.407 < 2e-16 *** x1 0.56270 0.09369 6.006 3.31e-08 *** x2 -2.18096 0.09369 -23.280 < 2e-16 *** Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.08914 10.801 < 2e-16 *** x1 0.35756 0.17906 1.997 0.04876 * x2 -1.70398 0.26440 -6.445 5.08e-09 *** x5 2.49725 1.04584 2.388 0.01897 * x6 -2.22214 0.79423 -2.798 0.00625 ** x7 -3.07218 1.13934 -2.696 0.00832 ** x10 2.57529 0.78834 3.267 0.00152 ** The forward stepwise selection results indicate that the best model is chosen based on Cp, BIC, and adjusted R2 criteria. According to BIC, the best model includes variables x1 and x2. According to Cp, the best model includes variables x1, x2, x5, x6,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 x7, and x10. The adjusted R2 criterion agrees with the Cp criterion. The best model includes predictors x1, x2, x5, x6, x7, and x10, as it has the lowest Cp and the highest adjusted R2. (c) Perform the backward stepwise selection to choose the best model from the possible predictors, x 1 , x 2 ,..., x 10 . What is the best model obtained according to C p , BIC and adjusted R 2 , respectively? Show some plots to provide evidence for your answer, and report the coefficients of the corresponding best model obtained.
8 The backward stepwise selection method was used to select the best model based on three criteria: Cp, BIC, and adjusted R-squared. The best model by Cp and BIC includes predictors x1, x2, x3, x5, x7, and x9. The best model by adjusted R-squared includes predictors x1, x2, x3, x4, x5, x7, and x9. The coefficients and model statistics for both models are provided. Overall, the best model consists of predictors x1, x2, x3, x5, x7, and x9 and strikes a balance between model complexity and goodness of fit. Note: To get full points, include R codes in the appendix sections Problem 3 (1 point) Let p = 9 and n = 50, where p is the number of predictors and n is the sample size. When linear regression is considered for data fitting, answer the following questions.
9 (a) To determine the best model by selecting the best subset of relevant predictors, how many models in total need to be compared if best subset selection method is used? In the best subset selection for linear regression, all possible combinations of predictors are considered to determine the best model. The number of models to compare can be calculated using the formula: Number of models = 2^p - 1, where p is the number of predictors. In this case, with 9 predictors, you would need to compare 511 models. Number of models = 2^9 - 1 = 512 - 1 = 511 models (b) To determine M 4 , i.e., best model with 4 predictors, how many models with 4 predictors need to be compared if best subset selection method is used? To determine the best model with 4 predictors, I need to compare all possible combinations of 4 predictors from a total of 9 predictors. This is a combinatorial problem, and I need to calculate the number of combinations of 9 items taken 4 at a time. The number of models with 4 predictors that need to be compared is given by the binomial coefficient or "n choose k" formula, which is calculated as: C(n, k) = n! / (k!(n - k)!) In this case, n = 9 (total number of predictors) and k = 4. C(9, 4) = 9! / (4!(9 - 4)!) C(9, 4) = 9! / (4! * 5!) C(9, 4) = (9 * 8 * 7 * 6) / (4 * 3 * 2 * 1) C(9, 4) = 126 There needs to be a comparison of 126 different models with 4 predictors to determine
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 0 the best model with 4 predictors using the best subset selection method. (c) To determine M 4 , i.e., best model with 4 predictors, how many models with 4 predictors need to be compared if forward stepwise selection method is used? Forward stepwise selection involves iteratively adding predictors to a model, starting with zero predictors, and stopping when the desired number of predictors is reached. At each step, the remaining predictors are evaluated and the one that improves model fit the most, as determined by a statistical criterion like AIC or BIC, is added to the model. In this scenario with 9 predictors and a sample size of 50, a total of 30 different models with 4 predictors would need to be compared to determine the best model using forward stepwise selection. 9 + 8 + 7 + 6 = 30 (d) To determine M 4 , i.e., best model with 4 predictors, how many models with 4 predictors need to be compared if backward stepwise selection method is used? Backward stepwise selection for linear regression involves starting with a model that includes all predictors, and then removing one predictor at a time based on a certain criterion until the desired number of predictors is reached. In this case, the goal is to determine the best model with 4 predictors (M4). The number of models with 4 predictors that need to be compared can be calculated using the formula "p choose k," where p is the total number of predictors and k is the desired number of predictors in the final model. "p choose k" represents the binomial coefficient, which is also denoted as C(p, k). Using this formula and the given values, it is determined that 126 different models need to be compared when using backward stepwise selection to determine the best model with 4 predictors from a total of 9 predictors. C(9, 4) = 9! / (4!(9 - 4)!) C(9, 4) = 9! / (4! * 5!) 9! = 9 * 8 * 7 * 6 * 5! C(9, 4) = (9 * 8 * 7 * 6 * 5!) / (4! * 5!) C(9, 4) = (9 * 8 * 7 * 6) / 4!
1 1 4! = 4 * 3 * 2 * 1 = 24 C(9, 4) = (9 * 8 * 7 * 6) / 24 C(9, 4) = 3024 / 24 C(9, 4) = 126 Appendix: 2a) > # Import "HM5.txt" and call it "data" > data <- read.table("HM5.txt", header = TRUE) # Load your data from a file > library(leaps) > # Best subset selection > m.exhaustive <- regsubsets(y ~ ., data = data, nvmax = 10) > summary(m.exhaustive) Subset selection object Call: regsubsets.formula(y ~ ., data = data, nvmax = 10) 10 Variables (and intercept) Forced in Forced out x1 FALSE FALSE x2 FALSE FALSE x3 FALSE FALSE x4 FALSE FALSE x5 FALSE FALSE x6 FALSE FALSE x7 FALSE FALSE x8 FALSE FALSE x9 FALSE FALSE x10 FALSE FALSE 1 subsets of each size up to 10 Selection Algorithm: exhaustive x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 1 ( 1 ) " " "*" " " " " " " " " " " " " " " " " 2 ( 1 ) "*" "*" " " " " " " " " " " " " " " " " 3 ( 1 ) "*" "*" " " " " " " " " "*" " " " " " " 4 ( 1 ) " " "*" " " " " "*" " " "*" " " "*" " " 5 ( 1 ) " " "*" "*" " " " " " " "*" "*" " " "*" 6 ( 1 ) "*" "*" "*" " " "*" " " "*" " " "*" " " 7 ( 1 ) "*" "*" "*" " " "*" " " "*" "*" " " "*" 8 ( 1 ) "*" "*" "*" " " "*" "*" "*" "*" "*" " " 9 ( 1 ) "*" "*" "*" " " "*" "*" "*" "*" "*" "*" 10 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" > # Plot BIC values > plot(summary(m.exhaustive)$bic, xlab = "# of variables", ylab = "BIC", pch = 19) > # Find the model with the lowest BIC > best_model <- which.min(summary(m.exhaustive)$bic) > # Summary of the best model > summary(m.exhaustive)$which[best_model, ] (Intercept) x1 x2 x3 x4 x5 x6 TRUE TRUE TRUE FALSE FALSE FALSE FALSE
1 2 x7 x8 x9 x10 FALSE FALSE FALSE FALSE > # Plot Cp values > plot(summary(m.exhaustive)$cp, xlab = "# of variables", ylab = "cp", pch = 19) > # Find the model with the lowest cp > best_model_cp <- which.min(summary(m.exhaustive)$cp) > # Summary of the best model by Cp > summary(m.exhaustive)$which[best_model_cp, ] (Intercept) x1 x2 x3 x4 x5 x6 TRUE FALSE TRUE TRUE FALSE FALSE FALSE x7 x8 x9 x10 TRUE TRUE FALSE TRUE > # Plot Adjusted R-squared values > plot(summary(m.exhaustive)$adjr2, xlab = "# of variables", ylab = "Adjusted R2", pch = 19) > # Find the model with the highest Adjusted R-squared > best_model_adjr2 <- which.max(summary(m.exhaustive)$adjr2) > # Summary of the best model by Adjusted R-squared > summary(m.exhaustive)$which[best_model_adjr2, ] (Intercept) x1 x2 x3 x4 x5 x6 TRUE TRUE TRUE TRUE FALSE TRUE FALSE x7 x8 x9 x10 TRUE FALSE TRUE FALSE 2b) > # Forward stepwise selection > lm.forward <- regsubsets(y ~ ., data = data, nvmax = 10, method = "forward") > summary(lm.forward) Subset selection object Call: regsubsets.formula(y ~ ., data = data, nvmax = 10, method = "forward") 10 Variables (and intercept) Forced in Forced out x1 FALSE FALSE x2 FALSE FALSE x3 FALSE FALSE x4 FALSE FALSE x5 FALSE FALSE x6 FALSE FALSE x7 FALSE FALSE x8 FALSE FALSE x9 FALSE FALSE x10 FALSE FALSE 1 subsets of each size up to 10 Selection Algorithm: forward x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 1 ( 1 ) " " "*" " " " " " " " " " " " " " " " " 2 ( 1 ) "*" "*" " " " " " " " " " " " " " " " " 3 ( 1 ) "*" "*" " " " " " " " " "*" " " " " " " 4 ( 1 ) "*" "*" " " " " " " " " "*" " " " " "*" 5 ( 1 ) "*" "*" " " " " " " "*" "*" " " " " "*" 6 ( 1 ) "*" "*" " " " " "*" "*" "*" " " " " "*" 7 ( 1 ) "*" "*" " " "*" "*" "*" "*" " " " " "*" 8 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " " " " "*" 9 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " " "*" "*" 10 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" > plot(summary(lm.forward)$bic, xlab = "# of variables", ylab = "BIC", pch = 19)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 3 > which.min(summary(lm.forward)$bic) [1] 2 > summary(lm(y ~ x1 + x2, data = data)) Call: lm(formula = y ~ x1 + x2, data = data) Residuals: Min 1Q Median 3Q Max -2.15547 -0.66177 0.05938 0.62141 1.80622 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.09251 10.407 < 2e-16 *** x1 0.56270 0.09369 6.006 3.31e-08 *** x2 -2.18096 0.09369 -23.280 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9251 on 97 degrees of freedom Multiple R-squared: 0.8506, Adjusted R-squared: 0.8475 F-statistic: 276 on 2 and 97 DF, p-value: < 2.2e-16 > plot(summary(lm.forward)$cp, xlab = "# of variables", ylab = "cp", pch = 19) > which.min(summary(lm.forward)$cp) [1] 6 > summary(lm(y ~ x1 + x2 + x5 + x6 + x7 + x10, data = data)) Call: lm(formula = y ~ x1 + x2 + x5 + x6 + x7 + x10, data = data) Residuals: Min 1Q Median 3Q Max -2.07607 -0.63658 0.06472 0.57064 1.79178 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.08914 10.801 < 2e-16 *** x1 0.35756 0.17906 1.997 0.04876 * x2 -1.70398 0.26440 -6.445 5.08e-09 *** x5 2.49725 1.04584 2.388 0.01897 * x6 -2.22214 0.79423 -2.798 0.00625 ** x7 -3.07218 1.13934 -2.696 0.00832 ** x10 2.57529 0.78834 3.267 0.00152 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.8914 on 93 degrees of freedom Multiple R-squared: 0.867, Adjusted R-squared: 0.8584 F-statistic: 101 on 6 and 93 DF, p-value: < 2.2e-16 > plot(summary(lm.forward)$adjr2, xlab = "# of variables", ylab = "Adjusted R2", pch = 19) > which.max(summary(lm.forward)$adjr2) [1] 6 > summary(lm(y ~ x1 + x2 + x5 + x6 + x7 + x10, data = data)) Call:
1 4 lm(formula = y ~ x1 + x2 + x5 + x6 + x7 + x10, data = data) Residuals: Min 1Q Median 3Q Max -2.07607 -0.63658 0.06472 0.57064 1.79178 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.08914 10.801 < 2e-16 *** x1 0.35756 0.17906 1.997 0.04876 * x2 -1.70398 0.26440 -6.445 5.08e-09 *** x5 2.49725 1.04584 2.388 0.01897 * x6 -2.22214 0.79423 -2.798 0.00625 ** x7 -3.07218 1.13934 -2.696 0.00832 ** x10 2.57529 0.78834 3.267 0.00152 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.8914 on 93 degrees of freedom Multiple R-squared: 0.867, Adjusted R-squared: 0.8584 F-statistic: 101 on 6 and 93 DF, p-value: < 2.2e-16 2c) > # Perform backward stepwise selection > lm.backward <- regsubsets(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = data, nvmax = 10, method = "backward") > # Print the summary of backward selection > summary(lm.backward) Subset selection object Call: regsubsets.formula(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10, data = data, nvmax = 10, method = "backward") 10 Variables (and intercept) Forced in Forced out x1 FALSE FALSE x2 FALSE FALSE x3 FALSE FALSE x4 FALSE FALSE x5 FALSE FALSE x6 FALSE FALSE x7 FALSE FALSE x8 FALSE FALSE x9 FALSE FALSE x10 FALSE FALSE 1 subsets of each size up to 10 Selection Algorithm: backward x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 1 ( 1 ) " " "*" " " " " " " " " " " " " " " " " 2 ( 1 ) " " "*" " " " " "*" " " " " " " " " " " 3 ( 1 ) " " "*" " " " " "*" " " "*" " " " " " " 4 ( 1 ) " " "*" " " " " "*" " " "*" " " "*" " " 5 ( 1 ) "*" "*" " " " " "*" " " "*" " " "*" " " 6 ( 1 ) "*" "*" "*" " " "*" " " "*" " " "*" " " 7 ( 1 ) "*" "*" "*" " " "*" " " "*" "*" "*" " " 8 ( 1 ) "*" "*" "*" " " "*" "*" "*" "*" "*" " " 9 ( 1 ) "*" "*" "*" " " "*" "*" "*" "*" "*" "*"
1 5 10 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" > # Plot BIC > plot(summary(lm.backward)$bic, xlab = "# of variables", ylab = "BIC", pch = 19) > # Find the model with the minimum BIC > best_model_bic <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x7 + x9, data = data) > # Print the coefficients of the best model (BIC) > summary(best_model_bic) Call: lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x7 + x9, data = data) Residuals: Min 1Q Median 3Q Max -2.0799 -0.5823 -0.0333 0.6160 2.0196 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.08932 10.780 < 2e-16 *** x1 1.03611 0.49157 2.108 0.03777 * x2 -2.10106 0.31359 -6.700 1.63e-09 *** x3 -6.50914 3.43169 -1.897 0.06100 . x4 -0.10439 0.37925 -0.275 0.78374 x5 25.57737 10.11095 2.530 0.01312 * x7 -36.69365 12.60449 -2.911 0.00452 ** x9 17.26926 5.54261 3.116 0.00245 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.8932 on 92 degrees of freedom Multiple R-squared: 0.8679, Adjusted R-squared: 0.8578 F-statistic: 86.34 on 7 and 92 DF, p-value: < 2.2e-16 > # Plot Cp > plot(summary(lm.backward)$cp, xlab = "# of variables", ylab = "Cp", pch = 19) > # Find the model with the minimum Cp > best_model_cp <- lm(y ~ x1 + x2 + x3 + x5 + x7 + x9, data = data) > # Print the coefficients of the best model (Cp) > summary(best_model_cp) Call: lm(formula = y ~ x1 + x2 + x3 + x5 + x7 + x9, data = data) Residuals: Min 1Q Median 3Q Max -2.09653 -0.57411 -0.04178 0.61981 2.03994 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.08887 10.834 < 2e-16 *** x1 1.04303 0.48848 2.135 0.03537 * x2 -2.18226 0.10585 -20.617 < 2e-16 *** x3 -6.58106 3.40468 -1.933 0.05629 . x5 25.77460 10.03529 2.568 0.01181 * x7 -36.81267 12.53432 -2.937 0.00418 ** x9 17.22748 5.51293 3.125 0.00237 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 6 Residual standard error: 0.8887 on 93 degrees of freedom Multiple R-squared: 0.8678, Adjusted R-squared: 0.8592 F-statistic: 101.7 on 6 and 93 DF, p-value: < 2.2e-16 > # Plot Adjusted R-squared > plot(summary(lm.backward)$adjr2, xlab = "# of variables", ylab = "Adjusted R2", pch = 19) > # Find the model with the maximum Adjusted R-squared > best_model_adjr2 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x7 + x9, data = data) > # Print the coefficients of the best model (Adjusted R2) > summary(best_model_adjr2) Call: lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x7 + x9, data = data) Residuals: Min 1Q Median 3Q Max -2.0799 -0.5823 -0.0333 0.6160 2.0196 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.96281 0.08932 10.780 < 2e-16 *** x1 1.03611 0.49157 2.108 0.03777 * x2 -2.10106 0.31359 -6.700 1.63e-09 *** x3 -6.50914 3.43169 -1.897 0.06100 . x4 -0.10439 0.37925 -0.275 0.78374 x5 25.57737 10.11095 2.530 0.01312 * x7 -36.69365 12.60449 -2.911 0.00452 ** x9 17.26926 5.54261 3.116 0.00245 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.8932 on 92 degrees of freedom Multiple R-squared: 0.8679, Adjusted R-squared: 0.8578 F-statistic: 86.34 on 7 and 92 DF, p-value: < 2.2e-16