Linear Regression Analysis: Simple & Multiple Models, Data
.pdf
keyboard_arrow_up
School
James Madison University *
*We aren’t endorsed by this school
Course
511
Subject
Statistics
Date
Jun 12, 2024
Type
Pages
3
Uploaded by BrigadierOctopusMaster942
Chapter 3 Lab
Resources:
●
Lab notebook:
https://github.com/intro-stat-learning/ISLP_labs/blob/stable/Ch03-linreg-lab.ipynb
●
ISLP package:
https://intro-stat-learning.github.io/ISLP/
Instructions:
In this lab, you will perform a simple and multiple linear regression on a dataset of your
choosing. For the following steps, you can use the same tools and libraries as the textbook
recommends, but you are certainly welcome to use other libraries if you choose to.
●
Complete the following analysis.
●
Record a video of your screen (using Panopto) while you run the notebook, walking
through each step of the process. Answer all of the questions at the end of this
document. Please include audio commentary as you run the notebook and answer the
questions.
●
Attach your notebook file, along with an embed of your Panopto recording to your
submission.
Setup:
1.
Before you start working on your own analysis, open the starter .ipynb file from the
textbook and run it all the way through. No need to record this, but make sure that it
works on your machine. You will need to install the ISLP package in order to do this.
Follow the installation instructions
here
. Take a look at the analysis performed in the
notebook and get an understanding of what is going on.
2.
Select a dataset of your choosing to use for this analysis. This could be a dataset that
you are interested in, or just another dataset that you find online. As you will be
performing a simple and multiple linear regression, you will need to have at least three
fields that can be encoded numerically (two “predictor” fields, and one “target” field). You
are certainly not limited to three fields, but that is a minimum in order to be able to
perform a multiple linear regression.
3.
Create a new jupyter notebook. You can copy the original notebook and just delete the
unneeded cells, or you can create a new one and copy over the imports that you need.
4.
In your new notebook, read in your chosen dataset from step 2. If you need a refresher
on how to read a file into a Dataframe, refer to the Chapter 2 lab.
Simple Linear Regression:
5.
Perform a simple linear regression using the ordinary least squares method on your
dataset.
a.
Fit and transform your model on the dataset.
b.
Produce the summary output of the fitted model.
c.
Produce predictions from your input data (X).
d.
Display a confidence interval for the predictions.
6.
Plot both the target and predictor variable, along with your newly created regression line.
You can use matplotlib like the textbook does, or you can use another library if you
prefer (
seaborn
is nice)
7.
Plot at least one of the following:
a.
The residuals of the model
b.
The “leverage statistics” or influence of your predictor of your model
Multiple Linear Regression:
8.
Perform a multiple linear regression using the ordinary least squares method on your
dataset.
a.
Fit and transform your model on the dataset.
b.
Summarize the model.
9.
Compute the variance inflation factor for each variable in the model.
10. Create an additional model to measure the “interaction term” of two of your predictor
variables.
Questions:
1.
If your dataset contains nominal variables (i.e. [“blue”, “red”, “green”]), you can encode
these using “one-hot encoding” (this would create multiple binary variables, i.e.
[“is_blue”,”is_red”,”is_green”]) If the dataset contains ordinal variables (i.e.
[“small”,”medium”,”large”]), you can encode these using ordinal encoding ([“small” = 1,
“medium” = 2, “large” = 3), or one-hot encoding. You can encode these manually, or refer
to the end of the lab for a guide on how to automatically handle these categorical
variables.
a.
Can you think of any trade-offs of each of these encoding methods?
b.
Why would you choose one or the other?
c.
If you choose to encode any variables, indicate which method you use and
provide a brief rationale for your choice. If you choose not to encode any
variables, indicate why (some possible reasons: variables are not important,
simplicity, concerns about multicollinearity, etc.)
2.
In selecting and preparing your dataset, you will need to select a target variable –
something to predict with your linear models.
a.
What is your target variable (y)?
b.
Why did you select this target variable?
3.
For the simple linear regression, you need to select one predictor variable (X) to regress
against the target variable.
a.
What is your predictor variable?
b.
Why did you select this predictor variable?
c.
Are there other variables in your dataset that could be good predictors? How
would you approach measuring the effectiveness of different predictor variables
for a simple linear regression model?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Excel file:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Please
*Find the equation of the least-squares regression line that models the data.
*Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below.
*Estimate the tuition and fees in 2005.
arrow_forward
when a regression is used as a method of predicting dependent variables from one or more independent variables. How are the independent variables different from each other yet related to the dependent variable?
arrow_forward
Give examples of where the use
of regression analysis can
be benificially be made.
arrow_forward
The following result perspective in RapidMiner shows a multiple linear regression model.
Based on the diagram, the model for our dependent variable Y is Predicted Y=
(Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589
Attribute
Insulation
Temperature
Avg Age
Home Size
(Intercept)
O True
O False
Coefficient
3.323
-0.869
1.968
3.173
134.511
Std. Error
0.420
0.071
0.065
0.311
7.589
Std. Coefficient
0.164
-0.262
0.527
0.131
?
Tolerance
0.431
0.405
0.491
0.914
?
t-Stat
7.906
-12.222
30.217
10.210
17.725
arrow_forward
(BIOSTATISTICS)
In this question , What characteristics are associated with BMI?
Use simple and multivariable linear regression analysis to complete the following table relating the characteristics listed to BMI as a continuous variable. Before conducting the analysis, be sure that all participants have complete data on all analysis variables. If participants are excluded due to missing data, the numbers excluded should be reported. Then, describe how each characteristic is related to BMI. Are crude and multivariable effects similar? What might explain or account for any differences?
Outcome Variable: BMI, kg/m2
Characteristic
Regression Coefficient Crude Models
p-value Regression Coefficient Multivariable Model
P-value
Age, years
0.0627
<0.001-0.02155
0.004
Male sex
-0.580
<0.001-09884
<0.001
Systolic blood pressure, mmHg
0.0603
<0.0010.05716
<0.001
Total serum cholesterol, mg/dL
0.0113
<0.0010.00638…
arrow_forward
Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model
What is the predicted average mileage at tire pressure x = 31?
arrow_forward
APPLIED STATISTICS
UPVOTE WILL BE GIVEN.
YOU MAY USE EXCEL OR IBM SPSS. PLEASE WRITE/TYPE THE COMPLETE SOLUTIONS. TAKE A SCREENSHOT IF EXCEL OR SPSS IS BEING USED.
An article in Technometrics by S. C. Narula and J. F. Wellington (“Prediction, Linear Regression, and a Minimum Sum of Relative Errors,” Vol. 19, 1977) presents data on the selling price (y) and annual taxes (x) for 24 houses. The taxes include local, school and county taxes. The data are shown in the table below. Calculate the least square estimates of slope and intercept.
Answer letter D.
arrow_forward
The regression line always gives an exact model for data.
true or false
arrow_forward
What are some examples of ways in which linear regression to create a beneficial statistical outcome, in a business setting?
arrow_forward
give an easy example of an simple linear regression with solution and line graph
arrow_forward
Define the Linear Regression Model. Also explain Terminology for the Linear Regression Model with a Single Regressor?
arrow_forward
What is the difference between an interaction term and a main effect in multiple linear regression?
arrow_forward
Explain how using multiple linear regression controls for confounding.
arrow_forward
Explain why it is not a good idea to exclude
an intercept, b0 , from any linear regression
model?
arrow_forward
What are the assumptions of multiple linear regressions only?
arrow_forward
What is Instrumental Variables Regression?
arrow_forward
Explain various Assumptions of the Fixed Effects Regression ?
arrow_forward
How does the linear trend line forecasting model differfrom a linear regression model for forecasting?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Excel file: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardLink to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardLink to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forward
- Link to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardPlease *Find the equation of the least-squares regression line that models the data. *Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below. *Estimate the tuition and fees in 2005.arrow_forwardwhen a regression is used as a method of predicting dependent variables from one or more independent variables. How are the independent variables different from each other yet related to the dependent variable?arrow_forward
- Give examples of where the use of regression analysis can be benificially be made.arrow_forwardThe following result perspective in RapidMiner shows a multiple linear regression model. Based on the diagram, the model for our dependent variable Y is Predicted Y= (Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589 Attribute Insulation Temperature Avg Age Home Size (Intercept) O True O False Coefficient 3.323 -0.869 1.968 3.173 134.511 Std. Error 0.420 0.071 0.065 0.311 7.589 Std. Coefficient 0.164 -0.262 0.527 0.131 ? Tolerance 0.431 0.405 0.491 0.914 ? t-Stat 7.906 -12.222 30.217 10.210 17.725arrow_forward(BIOSTATISTICS) In this question , What characteristics are associated with BMI? Use simple and multivariable linear regression analysis to complete the following table relating the characteristics listed to BMI as a continuous variable. Before conducting the analysis, be sure that all participants have complete data on all analysis variables. If participants are excluded due to missing data, the numbers excluded should be reported. Then, describe how each characteristic is related to BMI. Are crude and multivariable effects similar? What might explain or account for any differences? Outcome Variable: BMI, kg/m2 Characteristic Regression Coefficient Crude Models p-value Regression Coefficient Multivariable Model P-value Age, years 0.0627 <0.001-0.02155 0.004 Male sex -0.580 <0.001-09884 <0.001 Systolic blood pressure, mmHg 0.0603 <0.0010.05716 <0.001 Total serum cholesterol, mg/dL 0.0113 <0.0010.00638…arrow_forward
- Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model What is the predicted average mileage at tire pressure x = 31?arrow_forwardAPPLIED STATISTICS UPVOTE WILL BE GIVEN. YOU MAY USE EXCEL OR IBM SPSS. PLEASE WRITE/TYPE THE COMPLETE SOLUTIONS. TAKE A SCREENSHOT IF EXCEL OR SPSS IS BEING USED. An article in Technometrics by S. C. Narula and J. F. Wellington (“Prediction, Linear Regression, and a Minimum Sum of Relative Errors,” Vol. 19, 1977) presents data on the selling price (y) and annual taxes (x) for 24 houses. The taxes include local, school and county taxes. The data are shown in the table below. Calculate the least square estimates of slope and intercept. Answer letter D.arrow_forwardThe regression line always gives an exact model for data. true or falsearrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra and Trigonometry (MindTap Course List)AlgebraISBN:9781305071742Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt