HW1

pdf

School

McGill University *

*We aren’t endorsed by this school

Course

416

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

4

Uploaded by MagistrateArt3035

Report
MGSC 416 Winter 2024 Data-driven Models for Operations Analytics Problem Set 1-Individual Assignment Handout date: January 12, 2024 Due date: January 25, 2023 Please submit your Python code with comments. Please paste the necessary supporting graphs/tables from python in your document. You also could submit a word/pdf document where you summarize your findings and answer each question, with the appropriate code and graphs (this could be for example your jupyter notebook in pdf form). Problem 1. (13 pts) We replicate and analyze a regression equation developed by Orley Ashenfelter to predict the quality of Bordeaux wines. The data and original model can be found at http://www.liquidasset.com/orley.htm. “WineData-HW1.csv” is a modified tidied-up version, it is available on myCourses. The variables are: LPRICE2 - logged price index with base year 1961 WRAIN - rainfall in mm over preceding winter DEGREES - average temperature (Apr-Sep) HRAIN - rainfall in mm over harvest period and TIME SV - age of wine. 1. In the original model, Ashenfelter modelled the logged price index as a linear regression model with the other four variables being the independent predictor variables. Reconstruct this model and report its summary statistics. What do you think of this model? (3 pts) 2. We want to see if Ashenfelter’s model can be simplified, and we want to assess its in-sample and out-of sample performance. Propose a simplified model for predicting wine quality (explain your reasoning to how you choose it). Construct this model and contrast with Ashenfelter’s model. Which model do you think is better and why? In making your assessment, make sure to consider an array of metrics and criteria. (7 pts) 1
3. When checking out Ashenfelter’s model residuals, are you able to validate the linear model? (3pts) Problem 2. (10 pts) Let’s consider the linear model titled ”linreg” which was estimated on the Boston housing dataset portrayed in the code for lecture 2 (Lecture 2 - Regression.ipynb). This model considers the following features (11 features) shown in the table below: Figure 1: Dataset Variable Descriptions Figure 2 shows the summary statistics of this model on the training set generated in the code: In the following, we want to contrast two regression models with the ”linreg”. The first regression model ”Modelreg1” selects its coefficients by minimizing the following metric: 1 n n X i =1 ( y i - ˆ y i ) 2 + 0 . 15 · 11 X j =0 β 2 j (1) The second model ”Modelreg2” selects its coefficients by minimizing the following metric: 1 n n X i =1 ( y i - ˆ y i ) 2 + 0 . 51 · 11 X j =0 | β j | (2) where n is the number of houses in the training set, ( y i ) i =1 ··· n is the outcome vector (MEDV), (ˆ y i ) i =1 ··· n are the predicted values and ( β j ) j =1 ··· 11 are the coefficients of the regression associated with the features above ( β 0 is the intercept). Output from both models (fitted values, predicted values) is provided in the files “Modelreg1.pkl“ and “Modelreg2.pkl“ which you can acces using the code in “HW1-template.ipynb“ and are also available on myCourses. 1. Calculate in-sample R 2 for both models and contrast with ”linreg”. Are you surprised by the result? 2
Figure 2: Model ”linreg” output summary why? (3pts) 2. Table 3 shows the coefficients obtained from both models in contrast to those in linreg. What do you observe? Do you have an explanation for it? (3pts) 3. Evaluate out-of-sample performance of the three models together. What is your conclusion? Which model would you choose and for what reason? (4ps) Problem 3. (22 pts) We want to construct a model that both depends on causal predictor variables, and considers time series dynamics. For this question, we use the dataset called ”ApolloData-HW1.csv”, which is available on myCourses, posted with HW1. 1. Dr. Rao has suggested that the food demand should be related to the occupancy, and so we investigate this belief. Using simple linear regression, decide which food item we would most want to build a model of that depends on breakfast occupancy. (4 pts) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2. For the demand of the food item you have chosen, fit these models and compare their MAD, MSE and MAPE: (6 pts) (a) Simple moving average with N = 10. (b) Exponential smoothing with α = 0 . 3 (c) AR(1) 3. From the first part, you should have a regression model for which breakfast occupancy is the only independent variable, and the selected food item is the dependent variable. Show that the regression errors of this model exhibit autocorrelation. (3 pts) 4. Because breakfast occupancy is a significant predictor for the food item, we want to incorporate it into our model. To do this, we first consider the regression errors from the simple regression model. Now we treat these residuals as a new time series. Is this a stationary time series? Using what we learned from Time Series lectures, which time series models fits best? explain why. (6 pts) 5. By adding up the predicted value from the simple regression model and the predicted value from the time series model you chose, we can now forecast demand as a linear regression on occupancy with autocorrelated errors. What are the MAD, MSE and MAPE of this new model on the dataset? (3 pts) 4