Mirza_Report_M4

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

6015

Subject

Mathematics

Date

Apr 3, 2024

Type

pdf

Pages

30

Uploaded by DoctorDragonfly3804

Report
Regularization 1 Module 4 Assignment — Regularization Muhammad U. Mirza College of Professional Studies, Northeastern University Toronto ALY6015 - Intermediate Analytics Dr. Matthew Goodwin February 4, 2024
Regularization 2 Introduction In this statistical analysis report, I explore the application of Ridge and LASSO regression techniques alongside stepwise selection to predict graduation rates using the College dataset from the ISLR package. Regularization methods like Ridge and LASSO help prevent overfitting by penalizing the magnitude of coefficients, while stepwise selection iteratively refines models by criteria such as the Akaike Information Criterion (AIC). Overall, this comprehensive approach aims to generate more precise and insightful predictions for graduation rates in the College dataset. Analysis Split the data into a train and test set The college dataset contains 777 observations and 18 variables. To predict graduation rates accurately, I divided the College dataset into a training set, which constitutes 70% (543 observations) of the data, and a test set, which makes up the remaining 30% (234 observations). This split, guided by the Feature_Selection_R.pdf document, is crucial for evaluating the model's performance on data it has not been trained on. I set a random seed for reproducibility, allowing for consistent results across multiple runs. For regression analysis in glmnet, the datasets were converted to matrix format using the model.matrix function. This step separated the predictor variables into train_x and test_x, and the response variable, Grad.Rate, into train_y and test_y. This transformation is essential, as glmnet requires numerical inputs and a clear delineation between predictors and response. This methodical preparation of the data ensures that the analysis is structured and poised for the modeling phase.
Regularization 3 Ridge Regression Ridge Regression combats multicollinearity in datasets with highly correlated predictors by incorporating an L2 regularization penalty into the loss function. This shrinkage of coefficient magnitudes helps mitigate overfitting, enhancing model interpretability and reducing the undue impact of any single predictor. Additionally, Ridge Regression stabilizes the model by improving its generalization capability, thereby decreasing the variability in the predictions it generates. Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. Compare and discuss the values. To accurately determine the optimal regularization strength for our Ridge regression model, I employed the cv.glmnet function, utilizing a 10-fold cross-validation method. This technique involves dividing the dataset into ten parts, training the model on nine, and testing it on the tenth, repeatedly, to ensure robust estimation. The analysis yielded two critical lambda values: lambda.min and lambda.1se. The lambda.min represents the value that minimizes the prediction error, indicating the most regularized model that still provides the lowest loss. On the other hand, lambda.1se is a more conservative estimate, providing a simpler model within one standard error of the minimum error. The logged values of these lambdas show the balance we seek between model complexity and predictive accuracy, with lambda.min focusing on precision and lambda.1se on simplicity and robustness. Figure 1: Lambda min and 1se (Ridge Regression)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 4 The calculated logged values of these lambdas, -2.612328 for lambda.min and 0.2717177 for lambda.1se, suggest a spectrum of regularization from tight to more relaxed, balancing complexity against generalization. Plot the results from the cv.glmnet function provide an interpretation. What does this plot tell us? Figure 2: Ridge Regression Plot Figure 2 visually represents the model's mean squared error against varying levels of regularization (lambda). The plot indicates two significant lambda values: lambda.min and lambda.1se. At lambda.min, the model retains 16 predictors, a point which corresponds to the left dotted line and reflects the most complex model with the lowest error. The right dotted line at lambda.1se represents a balance between complexity and generalization, where the model retains 9 predictors, offering a more generalizable model at the cost of slightly higher error. This graphical analysis aids in selecting an optimal regularization parameter to ensure a model that generalizes well to new data.
Regularization 5 Fit a Ridge regression model against the training set and report on the coefficients. Is there anything interesting? Figure 3: Ridge regression model with lambda.min Figure 4: Ridge regression with lambda.1se Upon fitting the Ridge regression model to the training data, I evaluated the coefficients' magnitude and significance using both lambda.min and lambda.1se from the cross-validation process. Lambda.min, which is associated with the least prediction error, resulted in a model where all predictors were retained, and most had non-zero coefficients. This suggests a model complexity that tries to fit the training data as closely as possible without undue penalization.
Regularization 6 Figure 5: Ridge regression lambda.min model coefficients Figure 6: Ridge regression lambda.1se model coefficients Conversely, the model using lambda.1se, which is within one standard error of the minimum error, showed somewhat smaller coefficients, indicating a more regularized approach.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 7 This model is simpler and potentially more generalizable, though it may not capture the training data's variability as closely as the lambda.min model. For example, coefficients for predictors like 'PrivateYes', 'Top10perc', and 'perc.alumni' remained significant in both models, but their magnitudes were reduced in the lambda.1se model, reflecting the regularization effect. Figure 7: OLS model coefficients Figure 8: RMSE for OLS model In contrast, the ordinary least squares (OLS) model, with no regularization, presented the largest coefficients, reflecting the most complex model possible with the given predictors. However, when predicting on the test set, the RMSE value was slightly higher than the RMSE values for both Ridge models, suggesting potential overfitting. This comparison between the models reveals the balance between bias and variance, where Ridge regression (especially with lambda.1se) helps to mitigate overfitting by introducing bias, which can lead to a more reliable model on unseen data. Determine the performance of the fit model against the training set by calculating the root mean square error (RMSE). Figure 9: RMSE value for train model
Regularization 8 The Root Mean Square Error (RMSE) is a standard way to measure the accuracy of a model's predictions. It represents the square root of the average of the squared differences between the predicted values and the actual values. In essence, RMSE quantifies how much, on average, the predictions deviate from the observed actual outcomes. A lower RMSE value indicates a better fit of the model to the data, as it suggests smaller discrepancies between the predicted and actual values. In the case of the Ridge regression model using lambda.1se, the RMSE calculated on the training set was approximately 12.524. This value provides a quantitative measure of the average prediction error made by the model when forecasting the graduation rates based on the training data. The goal in model selection and evaluation is often to minimize this value, which would indicate a model with a higher predictive accuracy. Determine the performance of the fit model against the test set by calculating the root mean square error (RMSE). Is your model overfit? Figure 10: RMSE value for test model The RMSE on the test set is approximately 13.058. This metric serves as an indicator of the predictive accuracy of the Ridge regression model when applied to unseen data. By comparing the training RMSE (approximately 12.524) and the test RMSE, we can assess if the model is overfitting. Overfitting is often suggested if the model performs significantly better on the training set than on the test set. Here, the RMSE on the test set is slightly higher than on the training set, which is a normal occurrence as the model was trained on the training set. However, the fact that these
Regularization 9 numbers are quite close suggests that there is no significant overfitting happening. This implies that the model generalizes well to new data, maintaining similar performance across both training and testing datasets. LASSO LASSO regression, an acronym for Least Absolute Shrinkage and Selection Operator, is a type of linear regression that uses shrinkage. Shrinkage here means that data values are shrunk towards a central point, like the mean. The LASSO technique encourages simple, sparse models (i.e., models with fewer parameters). This is particularly useful when you have datasets with a large number of features, as it automatically performs feature selection by setting coefficients of less important features to zero. This helps in avoiding overfitting and makes the model easier to interpret. Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. Compare and discuss the values. Figure 11: Lambda min and 1se for LASSO In the LASSO regression analysis using cross-validation with cv.glmnet, the optimal lambda values obtained are -2.426261 for lambda.min and 0.6438526 for lambda.1se. The negative log value for lambda.min indicates a strong regularization effect optimizing for the lowest prediction error, while the positive log value for lambda.1se suggests a more conservative regularization, striking a balance within one standard error of the minimum error. This balance between model complexity and generalization is key in predictive modeling to prevent overfitting.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 10 Plot the results from the cv.glmnet function provide an interpretation. What does this plot tell us? Figure 12: LASSO plot In Figure 11, the plot from the cv.glmnet function is displaying the relationship between the log(lambda) values and the mean squared error (MSE) from the cross-validation process. The red dots represent the average MSE for each lambda value, with vertical lines showing the range of errors indicating variability across the folds. Two key points are marked by vertical dashed lines: the lambda value that minimizes the MSE (lambda.min) and the most regularized model within one standard error of the minimum MSE (lambda.1se). The number at the top indicates the count of non-zero coefficients at various lambda values, suggesting model complexity. At lambda.min, the model retains 15 predictors. At lambda.1se, the model retains 7 predictors. This plot helps in selecting a lambda that balances model complexity with prediction error, aiming to prevent overfitting while maintaining predictive power.
Regularization 11 Fit a LASSO regression model against the training set and report on the coefficients. Do any coefficients reduce to zero? If so, which ones? Upon fitting the LASSO regression model using the glmnet function, I observed distinct outcomes for coefficients when employing lambda.min and lambda.1se. Figure 13: LASSO model with lambda.min Figure 14: LASSO lambda.min coefficients The LASSO method, known for its feature selection capabilities, reduced several coefficients to zero. Specifically, with lambda.min, the 'F.Undergrad' and 'Books' coefficients were shrunk to zero, indicating their exclusion from the model in predicting graduation rates.
Regularization 12 Figure 15: LASSO model with lambda.1se Figure 16: LASSO lambda.1se coefficients When lambda.1se was applied, which introduces more regularization, even more coefficients reduced to zero, suggesting a simpler model. This included 'PrivateYes', 'Apps', 'Accept', 'Enroll', 'F.Undergrad', 'Books', 'PhD', 'Terminal', and 'S.F.Ratio', and 'Expend', reflecting the LASSO's inherent property of feature reduction to minimize overfitting. Figure 17: OLS model coefficients
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 13 Figure 18: RMSE value for OLS model This simplification is evident when comparing to the Ordinary Least Squares (OLS) model which includes all predictors. The RMSE from the OLS predictions, 13.30244, serves as a benchmark to evaluate the LASSO model's predictive performance. The shrinkage of coefficients and the simplification of the model could potentially lead to better generalization on new data. Determine the performance of the fit model against the training set by calculating the root mean square error (RMSE). Figure 19: RMSE for training model To gauge the predictive accuracy of the LASSO model on the training data, I calculated the Root Mean Square Error (RMSE). This statistic measures the model's prediction error, with a lower RMSE indicating a better fit between the predicted and actual values. An RMSE of 13.258 suggests that, on average, the model's predictions of the graduation rate deviate from the actual values by approximately 13.3 percentage points. This metric is crucial in assessing the model's performance and ensuring that it generalizes well to new, unseen data. Determine the performance of the fit model against the test set by calculating the root mean square error (RMSE). Is your model overfit? Figure 20: RMSE for test model
Regularization 14 To assess if the model is overfitting, I calculated the RMSE for the LASSO model's performance on the test set. An RMSE value of 13.01437 was obtained, which is marginally lower than the training set's RMSE. This slight decrease in error when moving from the training to the test set suggests that the model is generalizing well and is not overfit. Overfitting typically manifests as a much higher error on the test set compared to the training set, which is not observed here. Hence, the model appears to be well-tuned to the underlying pattern in the data without being overly complex. Which model performed better and why? Is that what you expected? In determining the best-performing model between Ridge and LASSO regression, the key metrics are the RMSE values on both the training and test datasets. Ridge regression, with its RMSE of 12.52386 on the training set and 13.05759 on the test set, indicates a model that generalizes well without overfitting. On the other hand, the LASSO regression model has a slightly higher RMSE on the training set (13.258) but a marginally better RMSE on the test set (13.01437), suggesting a good balance between bias and variance. The performance of both models is relatively close, with Ridge having a slight edge on the test and training sets. This nuanced difference highlights Ridge's effectiveness in handling multicollinearity among predictors while maintaining all variables in the model. In evaluating Ridge and LASSO regression models, the choice hinges on the dataset's characteristics and the analysis goals. Ridge regression is preferable when dealing with many correlated predictors to minimize multicollinearity without excluding variables. LASSO, on the other hand, is beneficial for feature selection, creating a more parsimonious model by reducing some coefficients to zero. Given these considerations, the better model depends on whether the
Regularization 15 priority is to manage multicollinearity (Ridge) or to emphasize feature selection for a sparse model (LASSO). Given the goal to predict graduation rates, Ridge regression's slightly better performance on both training and test datasets indicates its effectiveness in this context, particularly due to its capability to handle multicollinearity among predictors. This aligns with the expectation that Ridge could be more suitable for datasets where predictors are highly correlated. LASSO's strength in feature selection, while valuable, suggests it might be more advantageous in scenarios where reducing model complexity directly translates to better interpretability or when the dataset includes irrelevant predictors. Refer to the Intermediate_Analytics_Feature_Selection_R.pdf document for how to perform stepwise selection and then fit a model. Did this model perform better or as well as Ridge regression or LASSO? Which method do you prefer and why?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 16 Figure 21: Step-wise selection model In this analysis, stepwise selection was employed to refine the model by iteratively adding and removing predictors based on their statistical significance, aiming to optimize the Akaike Information Criterion (AIC). This method narrowed down the predictors to those most relevant for predicting graduation rates, resulting in a final model that included variables like Private, Apps, Accept, Top25perc, P.Undergrad, Outstate, Room.Board, Personal, PhD, Terminal, perc.alumni, and Expend. Figure 22: RMSE value for Step-wise selection model The stepwise model achieved an RMSE of 12.49192 on the training set and 13.14759 on the test set. When compared to the Ridge regression RMSEs (12.52386 on training and 13.05759 on test) and the LASSO regression RMSEs (13.03631 on training and 12.99371 on test), the stepwise selection model performed competitively, showcasing slightly better training performance but slightly worse testing performance than Ridge regression. Given the results, each method has its merits, but Ridge regression might be preferred in this context for its ability to handle multicollinearity among predictors. This model ensures that all variables are included and appropriately shrunk, thus maintaining the integrity and interpretability of the model without omitting important predictors. Unlike LASSO, which might eliminate variables entirely, Ridge adjusts the coefficients to minimize overfitting and maintain a comprehensive model that leverages all available information.Ridge regression is preferred over stepwise selection because it systematically addresses multicollinearity by penalizing the size of
Regularization 17 coefficients, thus ensuring all predictors contribute to the model without being entirely excluded. This approach maintains a holistic view of the dataset's structure, unlike stepwise selection, which can result in model oversimplification by removing variables deemed non-significant. Ridge's ability to include all variables, applying shrinkage where necessary, offers a more stable and reliable prediction model, especially when predictors are correlated, making it a robust choice for predicting outcomes like graduation rates. Conclusion In concluding my analysis on predicting graduation rates using the College dataset from the ISLR package, I embarked on a comprehensive journey that integrated Ridge and LASSO regression techniques along with stepwise selection. The goal was to utilize these advanced statistical methods to refine our predictions and gain deeper insights into the factors influencing graduation rates. I began by dividing the dataset into training and test sets to validate the models' performances on unseen data. This foundational step set the stage for a thorough exploration of regularization techniques, which are crucial for managing multicollinearity and preventing overfitting. Ridge Regression, with its L2 penalty, proved effective in handling correlated predictors without excluding any variables, thereby maintaining a holistic view of the dataset's structure. This approach not only mitigated overfitting but also ensured that the model's interpretability was preserved. The LASSO regression, renowned for its feature selection capability, presented a different angle by introducing sparsity to the model. This method automatically reduced the coefficients of less critical features to zero, thus focusing on the most influential predictors. This
Regularization 18 aspect of LASSO was particularly enlightening, as it highlighted the importance of distinguishing between essential and superfluous variables in predictive modeling. Through the application of stepwise selection, I further refined the model by iteratively selecting significant predictors based on the Akaike Information Criterion (AIC). This process underscored the value of a targeted approach in model building, where each variable's contribution to the predictive power is meticulously evaluated. The comparison of Ridge and LASSO regression models, along with stepwise selection, offered valuable lessons in model selection and optimization. While each method has its merits, Ridge regression emerged slightly superior in this context due to its ability to handle multicollinearity and maintain all predictors in the model. This finding aligned with my initial hypothesis and underscored the importance of choosing the right modeling technique based on the dataset's characteristics and the analysis goals. From a statistical and data analysis perspective, this assignment illuminated the intricate balance between model complexity, predictive accuracy, and generalizability. The exploration of regularization techniques and stepwise selection enriched my understanding of statistical modeling, demonstrating that there is no one-size-fits-all solution. Instead, effective predictive modeling requires a nuanced approach, considering the dataset's specifics and the underlying relationships among predictors. In summary, this analysis has been a profound learning experience, deepening my appreciation for statistical methods' power and complexity in predictive modeling. By applying Ridge and LASSO regression alongside stepwise selection, I have gained insights into the art and science of statistical analysis, emerging with a more nuanced understanding of how to approach predictive modeling challenges. This journey has not only enhanced my analytical skills but also
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 19 reinforced the critical importance of methodical model selection and optimization in uncovering meaningful insights from data.
Regularization 20 References 1. Bluman, A. (2018). Elementary Statistics: A Step by Step Approach (10th ed.). McGraw Hill. ISBN 13: 978-1-259-755330. 2. Kabacoff, R. (2011). R in Action. Manning (2nd ed.). Manning Publications Co. ISBN 978-1-935-182399. 3. Goodwin, M. (2024). Module 4. Canvas. https://northeastern.instructure.com/courses/164840/modules 4. Goodwin, M. (2024). Module 3 Pre-Assingment Lab. Canvas. https://northeastern.instructure.com/courses/164840/assignments/2098519 5. Jain, A. (2024). A Complete Tutorial on Ridge and Lasso Regression in Python. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-t utorial/ 6. Regularization Tutorial: Ridge, Lasso & Elastic Net Regression. (2019). Www.datacamp.com. https://www.datacamp.com/tutorial/tutorial-ridge-lasso-elastic-net 7. Bhattacharyya, S. (2018). Ridge and Lasso Regression: A Complete Guide with Python Scikit-Learn. Medium; Towards Data Science. https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python -scikit-learn-e20e34bcbf0b 8. Shah, R. (2021). Comparision of Regularized and Unregularized Models. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/08/performance-comparision-of-regularized- and-unregularized-regression-models/
Regularization 21 Appendix 1. R Scritp # Muhammad Umer Mirza, 29/01/2024, ALY 6015 # Module 4 Assignment — Regularization ############################################################################## # Clear the Console ############################################################################## cat("\014") # Clears the console rm(list = ls()) # Clears the global environment try(dev.off(dev.list()["RStudioGD"]), silent = TRUE) # Clears plots try(p_unload(p_loaded(), character.only = TRUE), silent = TRUE) # Clears packages options(scipen = 100) # Disables scientific notation for the entire R session ############################################################################## # Load Necessary Packages ############################################################################## library(pacman) p_load(dplyr, tidyverse, janitor, lubridate, ggplot2, ggthemes, ggeasy, psych, knitr, kableExtra, corrplot, RColorBrewer, car, MASS, leaps, caret, gridExtra, pROC, ISLR, glmnet, Metrics) ############################################################################## # Load Data from ISLR package ############################################################################## college <- College # save data in specified variable
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 22 ############################################################################## # 1) Split the data into a train and test set ############################################################################## # Split data based on feature selection methods set.seed(123) # set seed for duplication, number is arbitrary train_index <- sample(x = nrow(college), size = nrow(college) * 0.7) # use sample function, split by 70/30 train <- college[train_index,] # only give values that are in train index test <- college[-train_index,] # remove train index values to create test set # Ensure proper format of data in metrics, removes response variable, and prepare for glmnet function train_x <- model.matrix(Grad.Rate ~., train)[,-1] test_x <- model.matrix(Grad.Rate ~., test)[,-1] # Assign Grad.Rate to train and test variables train_y <- train$Grad.Rate test_y <- test$Grad.Rate ############################################################################## # Ridge Regression # 2) Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. # Compare and discuss the values. ##############################################################################
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 23 # Find the best lambda using cross-validation set.seed(123) # because cv.glmnet uses randomization cv_ridge <- cv.glmnet(train_x, train_y, nfolds = 10) # split data into 10 folds, 9 for training, 1 holdout set for test and vice versa # Optimal value of lambda, minimizes the prediction error log(cv_ridge$lambda.min) # minimizes out of sample loss log(cv_ridge$lambda.1se) # largest value of lambda within 1 SE of lambda min. ############################################################################## # 3) Plot the results from the cv.glmnet function provide an interpretation. # What does this plot tell us? ############################################################################## plot(cv_ridge) # y axis ~ mean of squared error # x axis ~ log of lambda # top ~ non zero coefficients in the model for particular value of lambda # red dots are the error estimates ~ confidence interval for error estimates ~ red dots is lost metric # two vertical dotted lines ~ # line to left reflects minimum value of lambda whihc retains few predictor variables # line to right reflects lambda 1se ~ maximum value of 1 SE of the min. this model have 7 non-zero coefficients simplest model that performs as well as the best model # plot verifies lambda min and 1.se
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 24 ############################################################################## # 4) Fit a Ridge regression model against the training set and report on the coefficients. # Is there anything interesting? ############################################################################## # Fit the final model on the training data using lambda.min # alpha = 0 for Ridge (L1) ridge_min_model <- glmnet(train_x, train_y, alpha = 0, lambda = cv_ridge$lambda.min) ridge_min_model # Display coefficients coef(ridge_min_model) # Fit the final model on the training data using lambda.1se ridge_1se_model <- glmnet(train_x, train_y, alpha = 0, lambda = cv_ridge$lambda.1se) ridge_1se_model # Display coefficients coef(ridge_1se_model) # Display coefficients of OLS model with no regularization ols_model <- lm(Grad.Rate ~., data = train) coef(ols_model)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 25 # coefficients in 1se tend to be smaller # but coefficients in full model close to lambda.min # View RMSE of full model preds_ols_model <- predict(ols_model, new = test) preds_ols_model # Do RMS to see how well model is performing rmse(test$Grad.Rate, preds_ols_model) ############################################################################## # 5) Determine the performance of the fit model against the training set by calculating the root mean square error (RMSE). ############################################################################## preds_train <- predict(ridge_1se_model, newx = train_x) train_rmse <- rmse(train_y, preds_train) train_rmse ############################################################################## # 6) Determine the performance of the fit model against the test set by calculating the root mean square error (RMSE). # Is your model overfit? error (RMSE). ############################################################################## preds_test <- predict(ridge_1se_model, newx = test_x) test_rmse <- rmse(test_y, preds_test)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 26 test_rmse ############################################################################## # LASSO # 7) Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. # Compare and discuss the values. ############################################################################## set.seed(123) # because cv.glmnet uses randomization cv_lasso <- cv.glmnet(train_x, train_y, nfolds = 10, alpha = 1) # split data into 10 folds, 9 for training, 1 holdout set for test and vice versa # Optimal value of lambda, minimizes the prediction error log(cv_lasso$lambda.min) # minimizes out of sample loss log(cv_lasso$lambda.1se) # largest value of lambda within 1 SE of lambda min. ############################################################################## # 8) Plot the results from the cv.glmnet function provide an interpretation. # What does this plot tell us? ############################################################################## plot(cv_lasso) ############################################################################## # 9) Fit a LASSO regression model against the training set and report on the coefficients. # Do any coefficients reduce to zero? If so, which ones? ############################################################################## # Fit the final model on the training data using lambda.min
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 27 # alpha = 1 for LASSO lasso_min_model <- glmnet(train_x, train_y, alpha = 1, lambda = cv_lasso$lambda.min) lasso_min_model # Display coefficients coef(lasso_min_model) # Fit the final model on the training data using lambda.1se lasso_1se_model <- glmnet(train_x, train_y, alpha = 1, lambda = cv_lasso$lambda.1se) lasso_1se_model # Display coefficients coef(lasso_1se_model) # Display coefficients of OLS model with no regularization ols_model_2 <- lm(Grad.Rate ~., data = train) coef(ols_model_2) # coefficients in 1se tend to be smaller # but coefficients in full model close to lambda.min # View RMSE of full model preds_ols_model_2 <- predict(ols_model_2, new = test)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 28 preds_ols_model_2 # Do RMS to see how well model is performing rmse(test$Grad.Rate, preds_ols_model_2) ############################################################################## # 10) Determine the performance of the fit model against the training set by calculating the root mean square error (RMSE). ############################################################################## preds_train_2 <- predict(lasso_1se_model, newx = train_x) train_rmse_2 <- rmse(train_y, preds_train_2) train_rmse_2 ############################################################################## # 11) Determine the performance of the fit model against the test set by calculating the root mean square error (RMSE). # Is your model overfit? ############################################################################## preds_test_2 <- predict(lasso_1se_model, newx = test_x) test_rmse_2 <- rmse(test_y, preds_test_2) test_rmse_2 ############################################################################## # Comparison # 12) Which model performed better and why? Is that what you expected? ##############################################################################
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 29 # Ridge is good when goal is to reduce over fitting but not best for feature selection # Lasso is good for feature selection purpose # Ridge rmse < Lasso rmse so will chose ridge ############################################################################## # 13) Refer to the Intermediate_Analytics_Feature_Selection_R.pdf document for how to perform stepwise selection and then fit a model. # Did this model perform better or as well as Ridge regression or LASSO? # Which method do you prefer and why? ############################################################################## # Apply stepwise selection on training data only fit <- step(lm(Grad.Rate ~ ., data = train), direction = 'both') summary(fit) # Evaluate model performance # Predictions on Training Data prediction_train <- predict(fit, newdata = train) # Calculate RMSE for Training rmse_train <- rmse(actual = train$Grad.Rate, predicted = prediction_train) rmse_train # Predictions on Test Data prediction_test <- predict(fit, newdata = test) # Calculate RMSE for Test
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regularization 30 rmse_test <- rmse(actual = test$Grad.Rate, predicted = prediction_test) rmse_train rmse_test ############################################################################## # END OF ANALYSIS ##############################################################################
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help