Regularization 2

pptx

School

Denmark Technical College *

*We aren’t endorsed by this school

Course

291

Subject

Computer Science

Date

Feb 20, 2024

Type

pptx

Pages

16

Uploaded by MajorManateePerson634

Report
Cross Validation (CV) How can evaluate the model with new data? We can mimic an out-of-sample (OOS) experiment to select the best model using cross-validation. Split the dataset into K evenly-sized folds. For each fold, repeat the following steps: a. Use k-1 folds as the training dataset and fit the model. b. Hold out one fold as the out-of-sample (OOS) set for evaluation. Repeat the model k times The dataset has been split into 5 equal bins OOS OOS OOS OOS OOS
Cross-validation ALGORITHM: K -fold Cross-Validation Given a dataset of n observations, , and M candidate models (or algorithms), Split the data into K roughly evenly sized nonoverlapping random subsets ( folds ). For k = 1 . . . K : Fit the parameters for each candidate model/algorithm using all but the k th fold of data. Record deviance (or, equivalently R 2 ) on the left-out k th fold based on predictions from each model. This will yield a set of K OOS deviances for each of your candidate models. This sample is an estimate of the distribution of each model’s predictive performance on new data, and you can select the model with the best OOS performance.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CV for Lasso Rather than IC, we can actually do an OOS experiment. For Lasso paths, you want to design a CV experiment to evaluate the OOS predictive performance of different λ penalty values. To run the CV algorithm: Fit the Lasso path for the full dataset Run a CV experiment where you split your data into K folds and apply λ t penalties in Lasso estimation on the training data excluding each fold. Record OOS deviances for prediction on each left-out fold. Select the λ t with “best” OOS performance. Your selected model is defined by the corresponding coefficients that were obtained through Lasso estimation on the full dataset with penalty λ t .
CV for Lasso: How many folds? A common question around CV is “How do I choose K?” More folds reduce Monte Carlo variation, which we want. However, using too many folds: gets computationally very expensive And using too many folds (anything approaching K = n) gives bad results if there is even a tiny amount of dependence between your observations. Smaller values of K lead to CV that is more robust to this type of mis- specification.
CV for Lasso: How many folds? If you run your CV experiment and the uncertainty around average OOS deviance is larger than you want, then you can re-run the experiment with more folds. However, if adding a small number of folds doesn’t significantly reduce the uncertainty then you are probably better off using the AICc for model selection.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Example: Ames Housing We apply naref to the ames data, with impute=TRUE , to obtain amesImputed as a data frame that contains no missing values. > ames <- read.csv("https://raw.githubusercontent.com/leslieahendrix/MBAbook/main/ 3regularization/AmesHousing.csv",strings=T) >library(gamlr) Loading required package: Matrix > amesImputed <- naref(ames, impute=TRUE ) > sum(is.na(amesImputed))
Example: Ames Housing Now we’re ready to build the model matrix First, we want to model log sale price. > yAmes <- log(ames$SalePrice) The next step is to create the sparse numeric model matrix, leaving out the column for the intercept (since gamlr creates its own). > ycol <- which(names(amesImputed)=="SalePrice") > xAmes <- sparse.model.matrix( ~ ., data=amesImputed[,-ycol])[,-1] > dim(xAmes) [1] 2930 339
CV for Lasso Once again, this is all easiest to understand visually. The gamlr library provides the cv.gamlr function to run CV experiments for Lasso paths. This function uses the exact same syntax as for gamlr . > set.seed(0) > cvfitAmes <- cv.gamlr(xAmes, yAmes, verb=TRUE, lmr=1e-4) fold 1,2,3,4,5,done. > plot(cvfitAmes) print additional information about the progress of the cross- validation
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CV for Lasso: Plot Just like the path plot, the CV plot has λ on the x axis and the degrees of freedom (number of nonzero coefficients) on the top. The average OOS deviances are marked with blue dots The error bars are extended one standard error on each side of these estimates of the expected OOS deviance. If the error bounds are too large, use more folds (default for gamlr is 5) Our error bars appear small for this example.
CV for Lasso: How to select optimal λ t ? There are two common options for how you select the optimal λ t o CV-min and CV-1se rule o The CV-min rule selects the λt corresponding to the smallest average OOS deviance. best choice if you are focused on OOS predictive performance For most applications, we recommend using the CV-min rule. The CV-1se rule selects the biggest λ t with average OOS deviance no more than one standard error away from the minimum. o 1se rule is more conservative: it hedges toward a simpler model. Use if you have a heightened worry about accidentally including useless coefficients in your model. o The CV-1se rule is the default in cv.gamlr (and in cv.glmnet ) but we will often specify that we want to use CV-min selection instead.
CV for Lasso > cvfitAmes$cvm # means – blue dots on plot [1] 0.16532834 0.14703130 0.13104589 0.11777469 ... [97] 0.02118427 0.02121335 0.02122546 0.02123655 > cvfitAmes$cvs # standard errors – grey bars on plot [1] 0.004943823 0.004566041 0.003893101 0.003353137 ... [97] 0.004669086 0.004688207 0.004690687 0.004694711
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CV for Lasso: How to select optimal λ t ? λ penalty selected by CV-min rule > cvfitAmes$seg.min [1] 52 > log(cvfitAmes$lambda.min) # log lambda using min criteria [1] -5.833983
CV for Lasso: How to select optimal λ t ? λ penalty selected by 1se rule > cvfitAmes$seg.1se [1] 33 > log(cvfitAmes$lambda.1se) # log lambda using 1se criteria [1] -4.066342
CV for Lasso: Predictions You can pass select=“min” or select=“1se” to predict functions to access predictions corresponding to the models selected under each rule. Use the drop function first to remove the sparse formatting. First, a prediction using the CV-min criteria. > drop(predict(cvfitAmes,xAmes[c(1,100),],select="min")) 1 100 12.22848 12.36305 > exp(drop(predict(cvfitAmes,xAmes[c(1,100),],select="min"))) 1 100 204531.5 233994.2  
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CV for Lasso: Predictions Prediction using the 1-se criteria. > drop(predict(cvfitAmes,xAmes[c(1,100),],select="1se")) 1 100 12.14025 12.26284 > exp(drop(predict(cvfitAmes,xAmes[c(1,100),],select="1se"))) 1 100 187259.7 211681.7