Linear Regression Analysis: Simple & Multiple Models, Data

.pdf

School

James Madison University *

*We aren’t endorsed by this school

Course

511

Subject

Statistics

Date

Jun 12, 2024

Type

pdf

Pages

3

Uploaded by BrigadierOctopusMaster942

Chapter 3 Lab Resources: Lab notebook: https://github.com/intro-stat-learning/ISLP_labs/blob/stable/Ch03-linreg-lab.ipynb ISLP package: https://intro-stat-learning.github.io/ISLP/ Instructions: In this lab, you will perform a simple and multiple linear regression on a dataset of your choosing. For the following steps, you can use the same tools and libraries as the textbook recommends, but you are certainly welcome to use other libraries if you choose to. Complete the following analysis. Record a video of your screen (using Panopto) while you run the notebook, walking through each step of the process. Answer all of the questions at the end of this document. Please include audio commentary as you run the notebook and answer the questions. Attach your notebook file, along with an embed of your Panopto recording to your submission. Setup: 1. Before you start working on your own analysis, open the starter .ipynb file from the textbook and run it all the way through. No need to record this, but make sure that it works on your machine. You will need to install the ISLP package in order to do this. Follow the installation instructions here . Take a look at the analysis performed in the notebook and get an understanding of what is going on. 2. Select a dataset of your choosing to use for this analysis. This could be a dataset that you are interested in, or just another dataset that you find online. As you will be performing a simple and multiple linear regression, you will need to have at least three fields that can be encoded numerically (two “predictor” fields, and one “target” field). You are certainly not limited to three fields, but that is a minimum in order to be able to perform a multiple linear regression. 3. Create a new jupyter notebook. You can copy the original notebook and just delete the unneeded cells, or you can create a new one and copy over the imports that you need. 4. In your new notebook, read in your chosen dataset from step 2. If you need a refresher on how to read a file into a Dataframe, refer to the Chapter 2 lab. Simple Linear Regression: 5. Perform a simple linear regression using the ordinary least squares method on your dataset.
a. Fit and transform your model on the dataset. b. Produce the summary output of the fitted model. c. Produce predictions from your input data (X). d. Display a confidence interval for the predictions. 6. Plot both the target and predictor variable, along with your newly created regression line. You can use matplotlib like the textbook does, or you can use another library if you prefer ( seaborn is nice) 7. Plot at least one of the following: a. The residuals of the model b. The “leverage statistics” or influence of your predictor of your model Multiple Linear Regression: 8. Perform a multiple linear regression using the ordinary least squares method on your dataset. a. Fit and transform your model on the dataset. b. Summarize the model. 9. Compute the variance inflation factor for each variable in the model. 10. Create an additional model to measure the “interaction term” of two of your predictor variables. Questions: 1. If your dataset contains nominal variables (i.e. [“blue”, “red”, “green”]), you can encode these using “one-hot encoding” (this would create multiple binary variables, i.e. [“is_blue”,”is_red”,”is_green”]) If the dataset contains ordinal variables (i.e. [“small”,”medium”,”large”]), you can encode these using ordinal encoding ([“small” = 1, “medium” = 2, “large” = 3), or one-hot encoding. You can encode these manually, or refer to the end of the lab for a guide on how to automatically handle these categorical variables. a. Can you think of any trade-offs of each of these encoding methods? b. Why would you choose one or the other? c. If you choose to encode any variables, indicate which method you use and provide a brief rationale for your choice. If you choose not to encode any variables, indicate why (some possible reasons: variables are not important, simplicity, concerns about multicollinearity, etc.) 2. In selecting and preparing your dataset, you will need to select a target variable – something to predict with your linear models. a. What is your target variable (y)? b. Why did you select this target variable? 3. For the simple linear regression, you need to select one predictor variable (X) to regress against the target variable. a. What is your predictor variable? b. Why did you select this predictor variable? c. Are there other variables in your dataset that could be good predictors? How would you approach measuring the effectiveness of different predictor variables for a simple linear regression model?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help