IE 7280 Lab3-4

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

7280

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

2

Uploaded by DrWhale4087

Report
1 Lab # 3 & Lab#4 Conducted Week (10/20 , 10/27 ) Regression and Prediction For this Lab, work in groups with 2 students. Each group will turn in a single group assignment. This Lab involves predicting a person's weight (the response variable y), based on a number of factors (x 1 through x 8 below). y: weight (pounds) x 1 : height (inches) x 2 : gender (male = 1; female = 0) x 3 : average number of meat servings (4 oz.) consumed each day x 4 : average number of fruit or vegetable servings consumed each day x 5 : age (years) x 6 : carries cell phone regularly (yes = 1; no = 0) x 7 : last digit of home phone number x 8 : will Cubs win the World Series? (Yes = 1; No = 0) This lab involves two separate sets of data: The training set (in Lab 3 Training.csv) are the preceding variables collected for 22 students in the Fall, 2006 IEMS 304 class (each row = one person; each column = one variable). The test set (in Lab 3 Test.csv) are the same variables collected for 136 students in other years. You will fit a regression model to the training data. Then, you will use the model fitted to the training data to see how well you can predict the responses for the data in the test data. Do NOT include the test data when fitting the regression model. 1) For the training data, construct and interpret a scatter-plot matrix of all variables (response and predictors). Are any relationships between y and the predictor variables apparent? 2) Using the training data, fit a regression model that includes all eight predictor variables . Use this fitted model to predict the weights for everyone in the test set. Construct a scatter plot of y versus y ˆ (y on the vertical axis) for all observations in the training set and in the test set together (i.e., a single plot), but distinguish the training group and the test group using two different symbols. Discuss the significance of what you see. 3) Calculate 95% prediction intervals for the weights of everyone in the test set, and also calculate the prediction errors. The "prediction errors" are defined as the residual errors ( y y ˆ ) for predicting the test data, using the coefficients estimated from the training data. Hence, you have one prediction error for each person in the test set. Lab 3 (week, 10/2 0 )
2 Note that you are NOT supposed to fit a new model to the test data. Use the same model that you fit to the training data in Problem 2, but apply it to predicting the test data. Do the actual response values for the test data seem consistent with the prediction intervals? Explain. Please calculate the probability of observations in the test data set located in the prediction interval. 4) Calculate the standardized residuals for each of the training observations. Interpret the results and provide explanations for why any observations are high influential or considered as outlier (i.e., with the standardized residual > 2). 5) Plot the standardized residuals versus the fitted values for the training data. Interpret the results. Lab 4 (week, 10/27) 6) Repeat Problems 2—5, but this time include only the two predictor variables x 1 and x 2 in the regression model. Construct side-by-side box plots of the two sets of prediction errors for the test data (one set using the eight-predictor model and the second set using the two-predictor model). Discuss what you see. 7) Using only the training data, conduct an "Extra Sum of Squares" F-test of whether x 3 through x 8 , together, have an effect on y. 8) Based on Problems 2—7, would you recommend the 8-predictor model or the 2- predictor model for predicting the weight of some new person (say from a different class)? Provide a detailed justification for your answer, incorporating all relevant findings from Problems 2—7 into your discussion. Use quantitative arguments, as well as qualitative arguments based on the plots (e.g. from Problems 2 and 6) that you constructed. If evidence from different parts of Problems 1 through 7 contradicts each other, you will have to determine which to weigh more heavily. Finally, give some concluding remarks summarizing and generalizing what you found in this lab.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help