Mirza_Report_M4
pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6015
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
30
Uploaded by DoctorDragonfly3804
Regularization
1
Module 4 Assignment — Regularization
Muhammad U. Mirza
College of Professional Studies, Northeastern University Toronto
ALY6015 - Intermediate Analytics
Dr. Matthew Goodwin
February 4, 2024
Regularization
2
Introduction
In this statistical analysis report, I explore the application of Ridge and LASSO
regression techniques alongside stepwise selection to predict graduation rates using the College
dataset from the ISLR package. Regularization methods like Ridge and LASSO help prevent
overfitting by penalizing the magnitude of coefficients, while stepwise selection iteratively
refines models by criteria such as the Akaike Information Criterion (AIC). Overall, this
comprehensive approach aims to generate more precise and insightful predictions for graduation
rates in the College dataset.
Analysis
Split the data into a train and test set
The college dataset contains 777 observations and 18 variables. To predict graduation
rates accurately, I divided the College dataset into a training set, which constitutes 70% (543
observations) of the data, and a test set, which makes up the remaining 30% (234 observations).
This split, guided by the Feature_Selection_R.pdf document, is crucial for evaluating the model's
performance on data it has not been trained on. I set a random seed for reproducibility, allowing
for consistent results across multiple runs.
For regression analysis in glmnet, the datasets were converted to matrix format using the
model.matrix function. This step separated the predictor variables into train_x and test_x, and the
response variable, Grad.Rate, into train_y and test_y. This transformation is essential, as glmnet
requires numerical inputs and a clear delineation between predictors and response. This
methodical preparation of the data ensures that the analysis is structured and poised for the
modeling phase.
Regularization
3
Ridge Regression
Ridge Regression combats multicollinearity in datasets with highly correlated predictors
by incorporating an L2 regularization penalty into the loss function. This shrinkage of coefficient
magnitudes helps mitigate overfitting, enhancing model interpretability and reducing the undue
impact of any single predictor. Additionally, Ridge Regression stabilizes the model by improving
its generalization capability, thereby decreasing the variability in the predictions it generates.
Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. Compare
and discuss the values.
To accurately determine the optimal regularization strength for our Ridge regression
model, I employed the cv.glmnet function, utilizing a 10-fold cross-validation method. This
technique involves dividing the dataset into ten parts, training the model on nine, and testing it
on the tenth, repeatedly, to ensure robust estimation.
The analysis yielded two critical lambda values: lambda.min and lambda.1se. The
lambda.min represents the value that minimizes the prediction error, indicating the most
regularized model that still provides the lowest loss. On the other hand, lambda.1se is a more
conservative estimate, providing a simpler model within one standard error of the minimum
error. The logged values of these lambdas show the balance we seek between model complexity
and predictive accuracy, with lambda.min focusing on precision and lambda.1se on simplicity
and robustness.
Figure 1: Lambda min and 1se (Ridge Regression)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
4
The calculated logged values of these lambdas, -2.612328 for lambda.min and 0.2717177
for lambda.1se, suggest a spectrum of regularization from tight to more relaxed, balancing
complexity against generalization.
Plot the results from the cv.glmnet function provide an interpretation. What does this plot
tell us?
Figure 2: Ridge Regression Plot
Figure 2 visually represents the model's mean squared error against varying levels of
regularization (lambda). The plot indicates two significant lambda values: lambda.min and
lambda.1se. At lambda.min, the model retains 16 predictors, a point which corresponds to the
left dotted line and reflects the most complex model with the lowest error. The right dotted line at
lambda.1se represents a balance between complexity and generalization, where the model retains
9 predictors, offering a more generalizable model at the cost of slightly higher error. This
graphical analysis aids in selecting an optimal regularization parameter to ensure a model that
generalizes well to new data.
Regularization
5
Fit a Ridge regression model against the training set and report on the coefficients. Is there
anything interesting?
Figure 3: Ridge regression model with lambda.min
Figure 4: Ridge regression with lambda.1se
Upon fitting the Ridge regression model to the training data, I evaluated the coefficients'
magnitude and significance using both lambda.min and lambda.1se from the cross-validation
process. Lambda.min, which is associated with the least prediction error, resulted in a model
where all predictors were retained, and most had non-zero coefficients. This suggests a model
complexity that tries to fit the training data as closely as possible without undue penalization.
Regularization
6
Figure 5: Ridge regression lambda.min model coefficients
Figure 6: Ridge regression lambda.1se model coefficients
Conversely, the model using lambda.1se, which is within one standard error of the
minimum error, showed somewhat smaller coefficients, indicating a more regularized approach.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
7
This model is simpler and potentially more generalizable, though it may not capture the training
data's variability as closely as the lambda.min model. For example, coefficients for predictors
like 'PrivateYes', 'Top10perc', and 'perc.alumni' remained significant in both models, but their
magnitudes were reduced in the lambda.1se model, reflecting the regularization effect.
Figure 7: OLS model coefficients
Figure 8: RMSE for OLS model
In contrast, the ordinary least squares (OLS) model, with no regularization, presented the
largest coefficients, reflecting the most complex model possible with the given predictors.
However, when predicting on the test set, the RMSE value was slightly higher than the RMSE
values for both Ridge models, suggesting potential overfitting. This comparison between the
models reveals the balance between bias and variance, where Ridge regression (especially with
lambda.1se) helps to mitigate overfitting by introducing bias, which can lead to a more reliable
model on unseen data.
Determine the performance of the fit model against the training set by calculating the root
mean square error (RMSE).
Figure 9: RMSE value for train model
Regularization
8
The Root Mean Square Error (RMSE) is a standard way to measure the accuracy of a
model's predictions. It represents the square root of the average of the squared differences
between the predicted values and the actual values. In essence, RMSE quantifies how much, on
average, the predictions deviate from the observed actual outcomes. A lower RMSE value
indicates a better fit of the model to the data, as it suggests smaller discrepancies between the
predicted and actual values.
In the case of the Ridge regression model using lambda.1se, the RMSE calculated on the
training set was approximately 12.524. This value provides a quantitative measure of the average
prediction error made by the model when forecasting the graduation rates based on the training
data. The goal in model selection and evaluation is often to minimize this value, which would
indicate a model with a higher predictive accuracy.
Determine the performance of the fit model against the test set by calculating the root
mean square error (RMSE). Is your model overfit?
Figure 10: RMSE value for test model
The RMSE on the test set is approximately 13.058. This metric serves as an indicator of
the predictive accuracy of the Ridge regression model when applied to unseen data. By
comparing the training RMSE (approximately 12.524) and the test RMSE, we can assess if the
model is overfitting. Overfitting is often suggested if the model performs significantly better on
the training set than on the test set.
Here, the RMSE on the test set is slightly higher than on the training set, which is a
normal occurrence as the model was trained on the training set. However, the fact that these
Regularization
9
numbers are quite close suggests that there is no significant overfitting happening. This implies
that the model generalizes well to new data, maintaining similar performance across both
training and testing datasets.
LASSO
LASSO regression, an acronym for Least Absolute Shrinkage and Selection Operator, is
a type of linear regression that uses shrinkage. Shrinkage here means that data values are shrunk
towards a central point, like the mean. The LASSO technique encourages simple, sparse models
(i.e., models with fewer parameters). This is particularly useful when you have datasets with a
large number of features, as it automatically performs feature selection by setting coefficients of
less important features to zero. This helps in avoiding overfitting and makes the model easier to
interpret.
Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. Compare
and discuss the values.
Figure 11: Lambda min and 1se for LASSO
In the LASSO regression analysis using cross-validation with cv.glmnet, the optimal
lambda values obtained are -2.426261 for lambda.min and 0.6438526 for lambda.1se. The
negative log value for lambda.min indicates a strong regularization effect optimizing for the
lowest prediction error, while the positive log value for lambda.1se suggests a more conservative
regularization, striking a balance within one standard error of the minimum error. This balance
between model complexity and generalization is key in predictive modeling to prevent
overfitting.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
10
Plot the results from the cv.glmnet function provide an interpretation. What does this plot
tell us?
Figure 12: LASSO plot
In Figure 11, the plot from the cv.glmnet function is displaying the relationship between
the log(lambda) values and the mean squared error (MSE) from the cross-validation process. The
red dots represent the average MSE for each lambda value, with vertical lines showing the range
of errors indicating variability across the folds. Two key points are marked by vertical dashed
lines: the lambda value that minimizes the MSE (lambda.min) and the most regularized model
within one standard error of the minimum MSE (lambda.1se). The number at the top indicates
the count of non-zero coefficients at various lambda values, suggesting model complexity. At
lambda.min, the model retains 15 predictors. At lambda.1se, the model retains 7 predictors. This
plot helps in selecting a lambda that balances model complexity with prediction error, aiming to
prevent overfitting while maintaining predictive power.
Regularization
11
Fit a LASSO regression model against the training set and report on the coefficients. Do
any coefficients reduce to zero? If so, which ones?
Upon fitting the LASSO regression model using the glmnet function, I observed distinct
outcomes for coefficients when employing lambda.min and lambda.1se.
Figure 13: LASSO model with lambda.min
Figure 14: LASSO lambda.min coefficients
The LASSO method, known for its feature selection capabilities, reduced several
coefficients to zero. Specifically, with lambda.min, the 'F.Undergrad' and 'Books' coefficients
were shrunk to zero, indicating their exclusion from the model in predicting graduation rates.
Regularization
12
Figure 15: LASSO model with lambda.1se
Figure 16: LASSO lambda.1se coefficients
When lambda.1se was applied, which introduces more regularization, even more
coefficients reduced to zero, suggesting a simpler model. This included 'PrivateYes', 'Apps',
'Accept', 'Enroll', 'F.Undergrad', 'Books', 'PhD', 'Terminal', and 'S.F.Ratio', and 'Expend',
reflecting the LASSO's inherent property of feature reduction to minimize overfitting.
Figure 17: OLS model coefficients
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
13
Figure 18: RMSE value for OLS model
This simplification is evident when comparing to the Ordinary Least Squares (OLS)
model which includes all predictors. The RMSE from the OLS predictions, 13.30244, serves as a
benchmark to evaluate the LASSO model's predictive performance. The shrinkage of coefficients
and the simplification of the model could potentially lead to better generalization on new data.
Determine the performance of the fit model against the training set by calculating the root
mean square error (RMSE).
Figure 19: RMSE for training model
To gauge the predictive accuracy of the LASSO model on the training data, I calculated
the Root Mean Square Error (RMSE). This statistic measures the model's prediction error, with a
lower RMSE indicating a better fit between the predicted and actual values. An RMSE of 13.258
suggests that, on average, the model's predictions of the graduation rate deviate from the actual
values by approximately 13.3 percentage points. This metric is crucial in assessing the model's
performance and ensuring that it generalizes well to new, unseen data.
Determine the performance of the fit model against the test set by calculating the root
mean square error (RMSE). Is your model overfit?
Figure 20: RMSE for test model
Regularization
14
To assess if the model is overfitting, I calculated the RMSE for the LASSO model's
performance on the test set. An RMSE value of 13.01437 was obtained, which is marginally
lower than the training set's RMSE. This slight decrease in error when moving from the training
to the test set suggests that the model is generalizing well and is not overfit. Overfitting typically
manifests as a much higher error on the test set compared to the training set, which is not
observed here. Hence, the model appears to be well-tuned to the underlying pattern in the data
without being overly complex.
Which model performed better and why? Is that what you expected?
In determining the best-performing model between Ridge and LASSO regression, the key
metrics are the RMSE values on both the training and test datasets. Ridge regression, with its
RMSE of 12.52386 on the training set and 13.05759 on the test set, indicates a model that
generalizes well without overfitting. On the other hand, the LASSO regression model has a
slightly higher RMSE on the training set (13.258) but a marginally better RMSE on the test set
(13.01437), suggesting a good balance between bias and variance.
The performance of both models is relatively close, with Ridge having a slight edge on
the test and training sets. This nuanced difference highlights Ridge's effectiveness in handling
multicollinearity among predictors while maintaining all variables in the model.
In evaluating Ridge and LASSO regression models, the choice hinges on the dataset's
characteristics and the analysis goals. Ridge regression is preferable when dealing with many
correlated predictors to minimize multicollinearity without excluding variables. LASSO, on the
other hand, is beneficial for feature selection, creating a more parsimonious model by reducing
some coefficients to zero. Given these considerations, the better model depends on whether the
Regularization
15
priority is to manage multicollinearity (Ridge) or to emphasize feature selection for a sparse
model (LASSO).
Given the goal to predict graduation rates, Ridge regression's slightly better performance
on both training and test datasets indicates its effectiveness in this context, particularly due to its
capability to handle multicollinearity among predictors. This aligns with the expectation that
Ridge could be more suitable for datasets where predictors are highly correlated. LASSO's
strength in feature selection, while valuable, suggests it might be more advantageous in scenarios
where reducing model complexity directly translates to better interpretability or when the dataset
includes irrelevant predictors.
Refer to the Intermediate_Analytics_Feature_Selection_R.pdf document for how to
perform stepwise selection and then fit a model. Did this model perform better or as well as
Ridge regression or LASSO? Which method do you prefer and why?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
16
Figure 21: Step-wise selection model
In this analysis, stepwise selection was employed to refine the model by iteratively
adding and removing predictors based on their statistical significance, aiming to optimize the
Akaike Information Criterion (AIC). This method narrowed down the predictors to those most
relevant for predicting graduation rates, resulting in a final model that included variables like
Private, Apps, Accept, Top25perc, P.Undergrad, Outstate, Room.Board, Personal, PhD,
Terminal, perc.alumni, and Expend.
Figure 22: RMSE value for Step-wise selection model
The stepwise model achieved an RMSE of 12.49192 on the training set and 13.14759 on
the test set. When compared to the Ridge regression RMSEs (12.52386 on training and 13.05759
on test) and the LASSO regression RMSEs (13.03631 on training and 12.99371 on test), the
stepwise selection model performed competitively, showcasing slightly better training
performance but slightly worse testing performance than Ridge regression.
Given the results, each method has its merits, but Ridge regression might be preferred in
this context for its ability to handle multicollinearity among predictors. This model ensures that
all variables are included and appropriately shrunk, thus maintaining the integrity and
interpretability of the model without omitting important predictors. Unlike LASSO, which might
eliminate variables entirely, Ridge adjusts the coefficients to minimize overfitting and maintain a
comprehensive model that leverages all available information.Ridge regression is preferred over
stepwise selection because it systematically addresses multicollinearity by penalizing the size of
Regularization
17
coefficients, thus ensuring all predictors contribute to the model without being entirely excluded.
This approach maintains a holistic view of the dataset's structure, unlike stepwise selection,
which can result in model oversimplification by removing variables deemed non-significant.
Ridge's ability to include all variables, applying shrinkage where necessary, offers a more stable
and reliable prediction model, especially when predictors are correlated, making it a robust
choice for predicting outcomes like graduation rates.
Conclusion
In concluding my analysis on predicting graduation rates using the College dataset from
the ISLR package, I embarked on a comprehensive journey that integrated Ridge and LASSO
regression techniques along with stepwise selection. The goal was to utilize these advanced
statistical methods to refine our predictions and gain deeper insights into the factors influencing
graduation rates.
I began by dividing the dataset into training and test sets to validate the models'
performances on unseen data. This foundational step set the stage for a thorough exploration of
regularization techniques, which are crucial for managing multicollinearity and preventing
overfitting. Ridge Regression, with its L2 penalty, proved effective in handling correlated
predictors without excluding any variables, thereby maintaining a holistic view of the dataset's
structure. This approach not only mitigated overfitting but also ensured that the model's
interpretability was preserved.
The LASSO regression, renowned for its feature selection capability, presented a
different angle by introducing sparsity to the model. This method automatically reduced the
coefficients of less critical features to zero, thus focusing on the most influential predictors. This
Regularization
18
aspect of LASSO was particularly enlightening, as it highlighted the importance of
distinguishing between essential and superfluous variables in predictive modeling.
Through the application of stepwise selection, I further refined the model by iteratively
selecting significant predictors based on the Akaike Information Criterion (AIC). This process
underscored the value of a targeted approach in model building, where each variable's
contribution to the predictive power is meticulously evaluated.
The comparison of Ridge and LASSO regression models, along with stepwise selection,
offered valuable lessons in model selection and optimization. While each method has its merits,
Ridge regression emerged slightly superior in this context due to its ability to handle
multicollinearity and maintain all predictors in the model. This finding aligned with my initial
hypothesis and underscored the importance of choosing the right modeling technique based on
the dataset's characteristics and the analysis goals.
From a statistical and data analysis perspective, this assignment illuminated the intricate
balance between model complexity, predictive accuracy, and generalizability. The exploration of
regularization techniques and stepwise selection enriched my understanding of statistical
modeling, demonstrating that there is no one-size-fits-all solution. Instead, effective predictive
modeling requires a nuanced approach, considering the dataset's specifics and the underlying
relationships among predictors.
In summary, this analysis has been a profound learning experience, deepening my
appreciation for statistical methods' power and complexity in predictive modeling. By applying
Ridge and LASSO regression alongside stepwise selection, I have gained insights into the art and
science of statistical analysis, emerging with a more nuanced understanding of how to approach
predictive modeling challenges. This journey has not only enhanced my analytical skills but also
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
19
reinforced the critical importance of methodical model selection and optimization in uncovering
meaningful insights from data.
Regularization
20
References
1.
Bluman, A. (2018). Elementary Statistics: A Step by Step Approach (10th ed.). McGraw
Hill. ISBN 13: 978-1-259-755330.
2.
Kabacoff, R. (2011). R in Action. Manning (2nd ed.). Manning Publications Co. ISBN
978-1-935-182399.
3.
Goodwin, M. (2024). Module 4. Canvas.
https://northeastern.instructure.com/courses/164840/modules
4.
Goodwin, M. (2024). Module 3 Pre-Assingment Lab. Canvas.
https://northeastern.instructure.com/courses/164840/assignments/2098519
5.
Jain, A. (2024). A Complete Tutorial on Ridge and Lasso Regression in Python.
Analytics Vidhya.
https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-t
utorial/
6.
Regularization Tutorial: Ridge, Lasso & Elastic Net Regression. (2019).
Www.datacamp.com.
https://www.datacamp.com/tutorial/tutorial-ridge-lasso-elastic-net
7.
Bhattacharyya, S. (2018). Ridge and Lasso Regression: A Complete Guide with Python
Scikit-Learn. Medium; Towards Data Science.
https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python
-scikit-learn-e20e34bcbf0b
8.
Shah, R. (2021). Comparision of Regularized and Unregularized Models. Analytics
Vidhya.
https://www.analyticsvidhya.com/blog/2021/08/performance-comparision-of-regularized-
and-unregularized-regression-models/
Regularization
21
Appendix
1.
R Scritp
# Muhammad Umer Mirza, 29/01/2024, ALY 6015
# Module 4 Assignment — Regularization
##############################################################################
# Clear the Console
##############################################################################
cat("\014") # Clears the console
rm(list = ls()) # Clears the global environment
try(dev.off(dev.list()["RStudioGD"]), silent = TRUE) # Clears plots
try(p_unload(p_loaded(), character.only = TRUE), silent = TRUE) # Clears packages
options(scipen = 100) # Disables scientific notation for the entire R session
##############################################################################
# Load Necessary Packages
##############################################################################
library(pacman)
p_load(dplyr, tidyverse, janitor, lubridate, ggplot2, ggthemes, ggeasy, psych,
knitr, kableExtra, corrplot, RColorBrewer, car, MASS, leaps, caret,
gridExtra, pROC, ISLR, glmnet, Metrics)
##############################################################################
# Load Data from ISLR package
##############################################################################
college <- College # save data in specified variable
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
22
##############################################################################
# 1) Split the data into a train and test set
##############################################################################
# Split data based on feature selection methods
set.seed(123) # set seed for duplication, number is arbitrary
train_index <- sample(x = nrow(college), size = nrow(college) * 0.7) # use sample function, split
by 70/30
train <- college[train_index,] # only give values that are in train index
test <- college[-train_index,] # remove train index values to create test set
# Ensure proper format of data in metrics, removes response variable, and prepare for glmnet
function
train_x <- model.matrix(Grad.Rate ~., train)[,-1]
test_x <- model.matrix(Grad.Rate ~., test)[,-1]
# Assign Grad.Rate to train and test variables
train_y <- train$Grad.Rate
test_y <- test$Grad.Rate
##############################################################################
# Ridge Regression
# 2) Use the cv.glmnet function to estimate the lambda.min and lambda.1se values.
# Compare and discuss the values.
##############################################################################
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
23
# Find the best lambda using cross-validation
set.seed(123) # because cv.glmnet uses randomization
cv_ridge <- cv.glmnet(train_x, train_y, nfolds = 10) # split data into 10 folds, 9 for training, 1
holdout set for test and vice versa
# Optimal value of lambda, minimizes the prediction error
log(cv_ridge$lambda.min) # minimizes out of sample loss
log(cv_ridge$lambda.1se) # largest value of lambda within 1 SE of lambda min.
##############################################################################
# 3) Plot the results from the cv.glmnet function provide an interpretation.
# What does this plot tell us?
##############################################################################
plot(cv_ridge)
# y axis ~ mean of squared error
# x axis ~ log of lambda
# top ~ non zero coefficients in the model for particular value of lambda
# red dots are the error estimates ~ confidence interval for error estimates ~ red dots is lost
metric
# two vertical dotted lines ~
# line to left reflects minimum value of lambda whihc retains few predictor variables
# line to right reflects lambda 1se ~ maximum value of 1 SE of the min. this model have 7
non-zero coefficients simplest model that performs as well as the best model
# plot verifies lambda min and 1.se
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
24
##############################################################################
# 4) Fit a Ridge regression model against the training set and report on the coefficients.
# Is there anything interesting?
##############################################################################
# Fit the final model on the training data using lambda.min
# alpha = 0 for Ridge (L1)
ridge_min_model <- glmnet(train_x, train_y, alpha = 0, lambda = cv_ridge$lambda.min)
ridge_min_model
# Display coefficients
coef(ridge_min_model)
# Fit the final model on the training data using lambda.1se
ridge_1se_model <- glmnet(train_x, train_y, alpha = 0, lambda = cv_ridge$lambda.1se)
ridge_1se_model
# Display coefficients
coef(ridge_1se_model)
# Display coefficients of OLS model with no regularization
ols_model <- lm(Grad.Rate ~., data = train)
coef(ols_model)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
25
# coefficients in 1se tend to be smaller
# but coefficients in full model close to lambda.min
# View RMSE of full model
preds_ols_model <- predict(ols_model, new = test)
preds_ols_model
# Do RMS to see how well model is performing
rmse(test$Grad.Rate, preds_ols_model)
##############################################################################
# 5) Determine the performance of the fit model against the training set by calculating the root
mean square error (RMSE).
##############################################################################
preds_train <- predict(ridge_1se_model, newx = train_x)
train_rmse <- rmse(train_y, preds_train)
train_rmse
##############################################################################
# 6) Determine the performance of the fit model against the test set by calculating the root mean
square error (RMSE).
# Is your model overfit? error (RMSE).
##############################################################################
preds_test <- predict(ridge_1se_model, newx = test_x)
test_rmse <- rmse(test_y, preds_test)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
26
test_rmse
##############################################################################
# LASSO
# 7) Use the cv.glmnet function to estimate the lambda.min and lambda.1se values.
# Compare and discuss the values.
##############################################################################
set.seed(123) # because cv.glmnet uses randomization
cv_lasso <- cv.glmnet(train_x, train_y, nfolds = 10, alpha = 1) # split data into 10 folds, 9 for
training, 1 holdout set for test and vice versa
# Optimal value of lambda, minimizes the prediction error
log(cv_lasso$lambda.min) # minimizes out of sample loss
log(cv_lasso$lambda.1se) # largest value of lambda within 1 SE of lambda min.
##############################################################################
# 8) Plot the results from the cv.glmnet function provide an interpretation.
# What does this plot tell us?
##############################################################################
plot(cv_lasso)
##############################################################################
# 9) Fit a LASSO regression model against the training set and report on the coefficients.
# Do any coefficients reduce to zero? If so, which ones?
##############################################################################
# Fit the final model on the training data using lambda.min
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
27
# alpha = 1 for LASSO
lasso_min_model <- glmnet(train_x, train_y, alpha = 1, lambda = cv_lasso$lambda.min)
lasso_min_model
# Display coefficients
coef(lasso_min_model)
# Fit the final model on the training data using lambda.1se
lasso_1se_model <- glmnet(train_x, train_y, alpha = 1, lambda = cv_lasso$lambda.1se)
lasso_1se_model
# Display coefficients
coef(lasso_1se_model)
# Display coefficients of OLS model with no regularization
ols_model_2 <- lm(Grad.Rate ~., data = train)
coef(ols_model_2)
# coefficients in 1se tend to be smaller
# but coefficients in full model close to lambda.min
# View RMSE of full model
preds_ols_model_2 <- predict(ols_model_2, new = test)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
28
preds_ols_model_2
# Do RMS to see how well model is performing
rmse(test$Grad.Rate, preds_ols_model_2)
##############################################################################
# 10) Determine the performance of the fit model against the training set by calculating the root
mean square error (RMSE).
##############################################################################
preds_train_2 <- predict(lasso_1se_model, newx = train_x)
train_rmse_2 <- rmse(train_y, preds_train_2)
train_rmse_2
##############################################################################
# 11) Determine the performance of the fit model against the test set by calculating the root mean
square error (RMSE).
# Is your model overfit?
##############################################################################
preds_test_2 <- predict(lasso_1se_model, newx = test_x)
test_rmse_2 <- rmse(test_y, preds_test_2)
test_rmse_2
##############################################################################
# Comparison
# 12) Which model performed better and why? Is that what you expected?
##############################################################################
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
29
# Ridge is good when goal is to reduce over fitting but not best for feature selection
# Lasso is good for feature selection purpose
# Ridge rmse < Lasso rmse so will chose ridge
##############################################################################
# 13) Refer to the Intermediate_Analytics_Feature_Selection_R.pdf document for how to
perform stepwise selection and then fit a model.
# Did this model perform better or as well as Ridge regression or LASSO?
# Which method do you prefer and why?
##############################################################################
# Apply stepwise selection on training data only
fit <- step(lm(Grad.Rate ~ ., data = train), direction = 'both')
summary(fit)
# Evaluate model performance
# Predictions on Training Data
prediction_train <- predict(fit, newdata = train)
# Calculate RMSE for Training
rmse_train <- rmse(actual = train$Grad.Rate, predicted = prediction_train)
rmse_train
# Predictions on Test Data
prediction_test <- predict(fit, newdata = test)
# Calculate RMSE for Test
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Regularization
30
rmse_test <- rmse(actual = test$Grad.Rate, predicted = prediction_test)
rmse_train
rmse_test
##############################################################################
# END OF ANALYSIS
##############################################################################
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help