IMb880c16293b1412cb527202dd2dc4e6a

docx

School

Carnegie Mellon University *

*We aren’t endorsed by this school

Course

10601

Subject

Information Systems

Date

Feb 20, 2024

Type

docx

Pages

3

Uploaded by SuperHumanNightingalePerson177

Report
Week 1 Assignment 2022-04-03 setwd ( "C:/Users/13019/OneDrive/Desktop/School/Data Mining" ) Airline1 <- read.csv ( "Airline Data.csv" ) library (ggplot2) library (stringr) Airline1 $ FARE <- as.numeric ( substr (Airline1 $ FARE, 2 , 7 )) Airline1 $ VACATION <- as.factor (Airline1 $ VACATION) ggplot (Airline1, aes ( x= DISTANCE, y= FARE, col= VACATION)) + geom_point () 1. Data Exploration & Visualization a) Using the graphical capabilities of R (or the software of your choice) provide a single plot that captures some aspects of the data. Include the plot as a clearly marked Exhibit. Exhibit: Fare versus Distance b) What do you observe from the plot? How could your observation influence your regression model (or why would it not)? 2. Fitting a linear regression model
a) Following the scripts available online (Data Mining with R) adjust them to: a. Randomly partition the data into 70% training and 30% validation, setting the seed to 1. b. Run a multivariable regression with all appropriate variables (note that starting end ending airport indicators are probably not very useful variables). HINT: To remover variables from a data frame in R, one can use the -sign (please refer to the “R Tips and Tricks from Week 1” script). Provide a summary of the model (that includes the values of the regression coefficients) or otherwise include it as a clearly marked and well formatted Exhibit (please refer to the “R Tips and Tricks from Week 1” script on how to export the needed information). b) What is the resulting RMSE on the training data? c) On the validation data? d) From your model, how would you quantify the effects of GATE on the predicted FARE? Please be precise in your interpretation, thinking back to your earlier data analysis class. e) What is the predicted fare of a leg that has COUPON = 1, NEW = 3, VACATION = No, SW = No, HI = 6000, S_income = $25000, E_income = $30000, S_POP = 4,000,000, E_POP=7,150,000, SLOT = Free and GATE = constrained, DISTANCE = 1000, and PAX = 6000? 3. Variable Selection Experiment with variable selection methods (please refer to Step 8 in the Data Mining with R). Minimally implement forward, backwards and stepwise regression models. a) From your experiments – pick a model as your final regression model. Provide a summary of the model or otherwise include it as a clearly marked Exhibit. b) Why did you select this particular model? Please provide quantitative reasoning. c) What is the resulting RMSE on the training data? d) On the validation data? 4. The Impact of SW A senior consultant in the airline industry has indicated that the presence of Southwest on vacation routes has significantly been driving prices down on these legs, beyond other routes.
Add this domain knowledge to your regression model from Section 3 by creating an interaction variable HINT : To create an interaction term in R refer to the “Good to know Module 1”. As an example, here is how I created an interaction term indicating a male applicant asking for credit for Type 1 in the Credit data (you need to implement the same but in your case you want an interaction term for SW =1 and Vacation =1): Either add this term to both your training and validation data or add it to the airline data and then repartition. a) What is the resulting RMSE on the training and validation data? b) How would you quantify the effect of SW on the fare on vacation routes vs. non- vacation routes (using your model)? Does the data support the consultant’s claim? HINT: Carefully think about which variables determine the fare on each type of route (your have both vacation and non-vacation routes, and for each type, some routes are served by SW and some are not, this will require you thinking back to your Data Analysis course).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help