424_F2023_Prediction_Competition_2_v1 (2)

pdf

School

University of Waterloo *

*We aren’t endorsed by this school

Course

424

Subject

Economics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by nafisreza7

Econ 626. ML for Economists Prediction Competition 2: Cross-Validation, Regression Trees, and Variable Importance September 20, 2023 Answers are due on Thursday Sept 28, 6pm. • Your submission must consist of two parts: CSV file and PDF file. • The top of first page of PDF must include (1) Anonymized name; (2) the R 2 and MSE for the training set, (3) the answer for Q2 as calculated from the training data, (4) Screenshot of an example from ChatGPT/GPT4 interaction. The rest of Submission 1 must include code for Q1 and Q2 answers. • The CSV file must include one prediction for every observation in the test set, and nothing else (e.g. no variable names or other headers). Hence, given that the test set has 10,000 observations, the CSV file must have 10,000 lines (not a line less, not a line more). Best PDF answers will be distributed to class. Students whose answer is selected will receive 10 percentage point bonus. You can use any programming language/statistical software package. Collaboration is encouraged but everyone must run their own code and write up their own answers. Q1. [8 points] This question is a prediction competition. Use the regression tree algorithms to build a prediction model for predicting housing values. Please do not use bagging, boosting or random forests (we will study those later). Your algorithm will be evaluated based on the its predictive performance in a test set that will only be revealed after predictions have been submitted. Accuracy is measured by MSE. (Please also report R 2 ) Timeline: • The training data set is posted on Learn. This data set has 20,000 observations on housing units (a subset of the Mullanaithan and Spiess sample). In addition to the logarithm of housing values, there are 13 other variables that you can use as model features or to build model features. . The size of the data set may make your estimation very slow. Machine learning is al- ways about making compromises because of limited computing resources. So please be prepared to potentially drop variables or observations when you estimate the model. 1

• Data on the test set is also posted on Learn,, but without observations on the response variable. Model accuracy is evaluated based on MSE between your predictions for these test set observations and the actual values of the response variable in the test set. The TA will calculate this MSE based on the predictions in your CSV file and the actual values of the response variable in the test set (which have not been distributed to students). Grade will be an increasing function of the algorithm’s performance in the test data set. To receive credit for a submission, the code must be reported clearly enough to enable replication. Please include a brief explanation of each step so that the reader can follow the logic of code easily. Notes on the data sets: The training and test data sets are comma separated. In the training set, the response variable is the first variable (logarithm of housing value). You can include any of the other variables as features in your model. Final note: you have to produce a prediction for all 10,000 observations in the test set. Hence, you cannot skip observations for which values of some features are missing. This means that when estimating the model using training data, you want to make sure the model gives a prediction for all 20,000 observations in the training data – including those observations for which values of some features are missing. Q2. [1 point, challenging] Construct a graphical illustration of the relative importance of different features in your final model. The importance is measured based on each variable’s importance in predicting housing values in the training data . Hint: You are trying to do replicate ISLR Figure 8.9 for this data set. Include this FIGURE in your submission, immediately after your report on the accuracy of your model in training dataa Q3. [1 point] Ask ChatGPT/GPT4 for help on how to best answer Q1 and Q2. Report a screenshot of the most useful/interesting interaction. 2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Learning Journal Unit 1.pdf

Written Assignment Unit 3.docx

Chapter 1&2 Activity Questions.docx

Week Two Homework.docx

Problems_for_NBXS.docx

Policy Brief- Surnish Johal (169024497) - PO101.docx

424_F2023_ProblemSet1_v1 (1) (1).pdf

ECO 201 Project Template.docx

lab10soln.pdf

Discussion 2.docx

eco364april2021examsolutions.pdf

REC 101 Position Paper.docx

Recommended textbooks for you

Managerial Economics: Applications, Strategies an...

Economics

ISBN:9781305506381

Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris

Publisher:Cengage Learning

Managerial Economics: A Problem Solving Approach

Economics

ISBN:9781337106665

Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor

Publisher:Cengage Learning

Economics Today and Tomorrow, Student Edition

Economics

ISBN:9780078747663

Author:McGraw-Hill

Publisher:Glencoe/McGraw-Hill School Pub Co

Microeconomic Theory

Economics

ISBN:9781337517942

Author:NICHOLSON

Publisher:Cengage

EBK HEALTH ECONOMICS AND POLICY

Economics

ISBN:9781337668279

Author:Henderson

Publisher:YUZU

MACROECONOMICS FOR TODAY

Economics

ISBN:9781337613057

Author:Tucker

Publisher:CENGAGE L

SEE MORE TEXTBOOKS

Recommended textbooks for you

Managerial Economics: Applications, Strategies an...
Economics
ISBN:9781305506381
Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris
Publisher:Cengage Learning
Managerial Economics: A Problem Solving Approach
Economics
ISBN:9781337106665
Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:Cengage Learning
Economics Today and Tomorrow, Student Edition
Economics
ISBN:9780078747663
Author:McGraw-Hill
Publisher:Glencoe/McGraw-Hill School Pub Co
Microeconomic Theory
Economics
ISBN:9781337517942
Author:NICHOLSON
Publisher:Cengage
EBK HEALTH ECONOMICS AND POLICY
Economics
ISBN:9781337668279
Author:Henderson
Publisher:YUZU
MACROECONOMICS FOR TODAY
Economics
ISBN:9781337613057
Author:Tucker
Publisher:CENGAGE L