Skip to main content

Documents Computer Science

Regularization 2.pptx

Regularization 2

pptx

School

Denmark Technical College *

*We aren’t endorsed by this school

Course

291

Subject

Computer Science

Date

Feb 20, 2024

Type

pptx

Pages

16

Uploaded by MajorManateePerson634

Cross Validation (CV) • How can evaluate the model with new data? • We can mimic an out-of-sample (OOS) experiment to select the best model using cross-validation. • Split the dataset into K evenly-sized folds. • For each fold, repeat the following steps: • a. Use k-1 folds as the training dataset and fit the model. • b. Hold out one fold as the out-of-sample (OOS) set for evaluation. • Repeat the model k times The dataset has been split into 5 equal bins OOS OOS OOS OOS OOS

Cross-validation ALGORITHM: K -fold Cross-Validation Given a dataset of n observations, , and M candidate models (or algorithms), • Split the data into K roughly evenly sized nonoverlapping random subsets ( folds ). • For k = 1 . . . K : • Fit the parameters for each candidate model/algorithm using all but the k th fold of data. • Record deviance (or, equivalently R 2 ) on the left-out k th fold based on predictions from each model. This will yield a set of K OOS deviances for each of your candidate models. This sample is an estimate of the distribution of each model’s predictive performance on new data, and you can select the model with the best OOS performance.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CV for Lasso • Rather than IC, we can actually do an OOS experiment. • For Lasso paths, you want to design a CV experiment to evaluate the OOS predictive performance of different λ penalty values. • To run the CV algorithm: • Fit the Lasso path for the full dataset • Run a CV experiment where you split your data into K folds and apply λ t penalties in Lasso estimation on the training data excluding each fold. Record OOS deviances for prediction on each left-out fold. • Select the λ t with “best” OOS performance. Your selected model is defined by the corresponding coefficients that were obtained through Lasso estimation on the full dataset with penalty λ t .

CV for Lasso: How many folds? • A common question around CV is “How do I choose K?” • More folds reduce Monte Carlo variation, which we want. • However, using too many folds: • gets computationally very expensive • And using too many folds (anything approaching K = n) gives bad results if there is even a tiny amount of dependence between your observations. • Smaller values of K lead to CV that is more robust to this type of mis- specification.

CV for Lasso: How many folds? • If you run your CV experiment and the uncertainty around average OOS deviance is larger than you want, then you can re-run the experiment with more folds. • However, if adding a small number of folds doesn’t significantly reduce the uncertainty then you are probably better off using the AICc for model selection.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example: Ames Housing • We apply naref to the ames data, with impute=TRUE , to obtain amesImputed as a data frame that contains no missing values. > ames <- read.csv("https://raw.githubusercontent.com/leslieahendrix/MBAbook/main/ 3regularization/AmesHousing.csv",strings=T) • >library(gamlr) • Loading required package: Matrix • > amesImputed <- naref(ames, impute=TRUE ) • > sum(is.na(amesImputed))

Example: Ames Housing • Now we’re ready to build the model matrix • First, we want to model log sale price. > yAmes <- log(ames$SalePrice) • The next step is to create the sparse numeric model matrix, leaving out the column for the intercept (since gamlr creates its own). > ycol <- which(names(amesImputed)=="SalePrice") > xAmes <- sparse.model.matrix( ~ ., data=amesImputed[,-ycol])[,-1] > dim(xAmes) [1] 2930 339

CV for Lasso • Once again, this is all easiest to understand visually. • The gamlr library provides the cv.gamlr function to run CV experiments for Lasso paths. This function uses the exact same syntax as for gamlr . > set.seed(0) > cvfitAmes <- cv.gamlr(xAmes, yAmes, verb=TRUE, lmr=1e-4) fold 1,2,3,4,5,done. > plot(cvfitAmes) print additional information about the progress of the cross- validation

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CV for Lasso: Plot • Just like the path plot, the CV plot has λ on the x axis and the degrees of freedom (number of nonzero coefficients) on the top. • The average OOS deviances are marked with blue dots • The error bars are extended one standard error on each side of these estimates of the expected OOS deviance. • If the error bounds are too large, use more folds (default for gamlr is 5) • Our error bars appear small for this example.

CV for Lasso: How to select optimal λ t ? • There are two common options for how you select the optimal λ t o CV-min and CV-1se rule o The CV-min rule selects the λt corresponding to the smallest average OOS deviance.  best choice if you are focused on OOS predictive performance  For most applications, we recommend using the CV-min rule. • The CV-1se rule selects the biggest λ t with average OOS deviance no more than one standard error away from the minimum. o 1se rule is more conservative: it hedges toward a simpler model.  Use if you have a heightened worry about accidentally including useless coefficients in your model. o The CV-1se rule is the default in cv.gamlr (and in cv.glmnet ) but we will often specify that we want to use CV-min selection instead.

CV for Lasso > cvfitAmes$cvm # means – blue dots on plot [1] 0.16532834 0.14703130 0.13104589 0.11777469 ... [97] 0.02118427 0.02121335 0.02122546 0.02123655 > cvfitAmes$cvs # standard errors – grey bars on plot [1] 0.004943823 0.004566041 0.003893101 0.003353137 ... [97] 0.004669086 0.004688207 0.004690687 0.004694711

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CV for Lasso: How to select optimal λ t ? λ penalty selected by CV-min rule > cvfitAmes$seg.min [1] 52 > log(cvfitAmes$lambda.min) # log lambda using min criteria [1] -5.833983

CV for Lasso: How to select optimal λ t ? λ penalty selected by 1se rule > cvfitAmes$seg.1se [1] 33 > log(cvfitAmes$lambda.1se) # log lambda using 1se criteria [1] -4.066342

CV for Lasso: Predictions You can pass select=“min” or select=“1se” to predict functions to access predictions corresponding to the models selected under each rule. Use the drop function first to remove the sparse formatting. First, a prediction using the CV-min criteria. > drop(predict(cvfitAmes,xAmes[c(1,100),],select="min")) 1 100 12.22848 12.36305 > exp(drop(predict(cvfitAmes,xAmes[c(1,100),],select="min"))) 1 100 204531.5 233994.2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CV for Lasso: Predictions Prediction using the 1-se criteria. > drop(predict(cvfitAmes,xAmes[c(1,100),],select="1se")) 1 100 12.14025 12.26284 > exp(drop(predict(cvfitAmes,xAmes[c(1,100),],select="1se"))) 1 100 187259.7 211681.7

Related Documents

Completed- C310-HW3.docx

Lesson 02 Research Assignment.docx

Lesson 04 Research Assignment.docx

Homework 3 Code.docx

Test plan 3.docx

Test Plan 4.docx

HW1-Decision+Trees+and+Random+Forests-jc12818.pdf

CPAN112_Lab05.docx

CPAN112_FinalTest_Tyler_Escobar.docx

CPAN112__CurrencyConversion_Tyler_Escobar.docx

CPAN112_MidtermTest_Tyler_Escobar.docx

Recommended textbooks for you

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Text book image

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Text book image

Information Technology Project Management

Computer Science

ISBN:9781337101356

Author:Kathy Schwalbe

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Text book image

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Text book image

Information Technology Project Management

Computer Science

ISBN:9781337101356

Author:Kathy Schwalbe

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS