pdf
keyboard_arrow_up
School
James Madison University *
*We aren’t endorsed by this school
Course
511
Subject
Statistics
Date
Jun 12, 2024
Type
Pages
3
Uploaded by BrigadierOctopusMaster942
Chapter 3 Lab
Resources:
●
Lab notebook:
https://github.com/intro-stat-learning/ISLP_labs/blob/stable/Ch03-linreg-lab.ipynb
●
ISLP package:
https://intro-stat-learning.github.io/ISLP/
Instructions:
In this lab, you will perform a simple and multiple linear regression on a dataset of your
choosing. For the following steps, you can use the same tools and libraries as the textbook
recommends, but you are certainly welcome to use other libraries if you choose to.
●
Complete the following analysis.
●
Record a video of your screen (using Panopto) while you run the notebook, walking
through each step of the process. Answer all of the questions at the end of this
document. Please include audio commentary as you run the notebook and answer the
questions.
●
Attach your notebook file, along with an embed of your Panopto recording to your
submission.
Setup:
1.
Before you start working on your own analysis, open the starter .ipynb file from the
textbook and run it all the way through. No need to record this, but make sure that it
works on your machine. You will need to install the ISLP package in order to do this.
Follow the installation instructions
here
. Take a look at the analysis performed in the
notebook and get an understanding of what is going on.
2.
Select a dataset of your choosing to use for this analysis. This could be a dataset that
you are interested in, or just another dataset that you find online. As you will be
performing a simple and multiple linear regression, you will need to have at least three
fields that can be encoded numerically (two “predictor” fields, and one “target” field). You
are certainly not limited to three fields, but that is a minimum in order to be able to
perform a multiple linear regression.
3.
Create a new jupyter notebook. You can copy the original notebook and just delete the
unneeded cells, or you can create a new one and copy over the imports that you need.
4.
In your new notebook, read in your chosen dataset from step 2. If you need a refresher
on how to read a file into a Dataframe, refer to the Chapter 2 lab.
Simple Linear Regression:
5.
Perform a simple linear regression using the ordinary least squares method on your
dataset.
a.
Fit and transform your model on the dataset.
b.
Produce the summary output of the fitted model.
c.
Produce predictions from your input data (X).
d.
Display a confidence interval for the predictions.
6.
Plot both the target and predictor variable, along with your newly created regression line.
You can use matplotlib like the textbook does, or you can use another library if you
prefer (
seaborn
is nice)
7.
Plot at least one of the following:
a.
The residuals of the model
b.
The “leverage statistics” or influence of your predictor of your model
Multiple Linear Regression:
8.
Perform a multiple linear regression using the ordinary least squares method on your
dataset.
a.
Fit and transform your model on the dataset.
b.
Summarize the model.
9.
Compute the variance inflation factor for each variable in the model.
10. Create an additional model to measure the “interaction term” of two of your predictor
variables.
Questions:
1.
If your dataset contains nominal variables (i.e. [“blue”, “red”, “green”]), you can encode
these using “one-hot encoding” (this would create multiple binary variables, i.e.
[“is_blue”,”is_red”,”is_green”]) If the dataset contains ordinal variables (i.e.
[“small”,”medium”,”large”]), you can encode these using ordinal encoding ([“small” = 1,
“medium” = 2, “large” = 3), or one-hot encoding. You can encode these manually, or refer
to the end of the lab for a guide on how to automatically handle these categorical
variables.
a.
Can you think of any trade-offs of each of these encoding methods?
b.
Why would you choose one or the other?
c.
If you choose to encode any variables, indicate which method you use and
provide a brief rationale for your choice. If you choose not to encode any
variables, indicate why (some possible reasons: variables are not important,
simplicity, concerns about multicollinearity, etc.)
2.
In selecting and preparing your dataset, you will need to select a target variable –
something to predict with your linear models.
a.
What is your target variable (y)?
b.
Why did you select this target variable?
3.
For the simple linear regression, you need to select one predictor variable (X) to regress
against the target variable.
a.
What is your predictor variable?
b.
Why did you select this predictor variable?
c.
Are there other variables in your dataset that could be good predictors? How
would you approach measuring the effectiveness of different predictor variables
for a simple linear regression model?
4.
For the multiple linear regression, you need to select two or more predictor variables (X)
to regress against the target variable.
a.
What are your predictor variables?
b.
Why did you select these predictor variables?
c.
Are there other variables in your dataset that could be good predictors? How
would you approach measuring the effectiveness of different predictor variables
for a multiple linear regression model?
5.
With multiple linear regression models, multicollinearity is a concern.
a.
Explain what multicollinearity is, and why it can be a problem.
b.
Do any variables in your dataset exhibit multicollinearity?
c.
How can you combat this?
6.
You do not need to perform any non-linear transformations on your predictors, although
feel free to do so if you wish to. Do you feel that any of the predictors in your dataset
may benefit from non-linear transformation? Provide your reasoning.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Excel file:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Please
*Find the equation of the least-squares regression line that models the data.
*Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below.
*Estimate the tuition and fees in 2005.
arrow_forward
when a regression is used as a method of predicting dependent variables from one or more independent variables. How are the independent variables different from each other yet related to the dependent variable?
arrow_forward
The following result perspective in RapidMiner shows a multiple linear regression model.
Based on the diagram, the model for our dependent variable Y is Predicted Y=
(Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589
Attribute
Insulation
Temperature
Avg Age
Home Size
(Intercept)
O True
O False
Coefficient
3.323
-0.869
1.968
3.173
134.511
Std. Error
0.420
0.071
0.065
0.311
7.589
Std. Coefficient
0.164
-0.262
0.527
0.131
?
Tolerance
0.431
0.405
0.491
0.914
?
t-Stat
7.906
-12.222
30.217
10.210
17.725
arrow_forward
The graph shows a bivariate data set and its least squares regression line. Draw the residual plot for the same data set?
arrow_forward
What is Instrumental Variables Regression?
arrow_forward
Scenario
Smart businesses in all industries use data to provide an intuitive analysis of how they can get a competitive advantage. The real estate industry heavily uses linear regression to estimate home prices, as cost of housing is currently the largest expense for most families. Additionally, in order to help new homeowners and home sellers with important decisions, real estate professionals need to go beyond showing property inventory. They need to be well versed in the relationship between price, square footage, build year, location, and so many other factors that can help predict the business environment and provide the best advice to their clients.
Prompt
You have been recently hired as a junior analyst by D.M. Pan Real Estate Company. The sales team has tasked you with preparing a report that examines the relationship between the selling price of properties and their size in square feet. You have been provided with a Real Estate Data spreadsheet that includes properties sold…
arrow_forward
If a regression line for two variables has a small positive slope, then the:
variables are positively associated? variables are negatively associated? association of the variables cannot be determined. variables have no association with each other.
arrow_forward
Explain why all-subsets regression is considered superior to stepwise regression in selecting a regression equation.
arrow_forward
Professional basketball has truly become a sport that generates interest among fans around the world. More and more players come from outside the United States to play in the National Basketball Association (NBA). You want to develop a regression model to predict the number of wins achieved by each NBA team, based on field goal (shots made) percentage and three-point field goal percentage. The data are stored in NBA.xlsx .
TEAM
Wins
Field Goal %
Three-Point Field Goal %
Points Per Game
Rebound
Freedraw
Turnover
Houston Rockets
65
46
36.2
112.4
43.5
19.6
13.8
Toronto Raptors
59
47.2
35.8
111.7
44
17.3
13.4
Golden State Warriors
58
50.3
39.1
113.5
43.5
16.6
15.4
Boston Celtics
55
45
37.7
104
44.5
16
14
Philadelphia 76ers
52
47.2
36.9
109.8
47.4
17.1
16.5
Cleveland Cavaliers
50
47.6
37.2
110.9
42.1
18.1
13.7
Portland Trail Blazers
49
45.2
36.6
105.6
45.5
16.7
13.5
Indiana Pacers
48
47.2
36.9
105.6
42.3
14.9
13.3
New Orleans Pelicans
48
48.3
36.2
111.7
44.3
16.1
14.9…
arrow_forward
Professional basketball has truly become a sport that generates interest among fans around the world. More and more players come from outside the United States to play in the National Basketball Association (NBA). You want to develop a regression model to predict the number of wins achieved by each NBA team, based on field goal (shots made) percentage and three-point field goal percentage. The data are stored in NBA.xlsx .
TEAM
Wins
Field Goal %
Three-Point Field Goal %
Points Per Game
Rebound
Freedraw
Turnover
Houston Rockets
65
46
36.2
112.4
43.5
19.6
13.8
Toronto Raptors
59
47.2
35.8
111.7
44
17.3
13.4
Golden State Warriors
58
50.3
39.1
113.5
43.5
16.6
15.4
Boston Celtics
55
45
37.7
104
44.5
16
14
Philadelphia 76ers
52
47.2
36.9
109.8
47.4
17.1
16.5
Cleveland Cavaliers
50
47.6
37.2
110.9
42.1
18.1
13.7
Portland Trail Blazers
49
45.2
36.6
105.6
45.5
16.7
13.5
Indiana Pacers
48
47.2
36.9
105.6
42.3
14.9
13.3
New Orleans Pelicans
48
48.3
36.2
111.7
44.3
16.1
14.9…
arrow_forward
Please ASAP in part a , b and c
arrow_forward
Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
FloorArea (Sq.Ft.)
Offices
Entrances
Age
AssessedValue ($'000)
4790
4
2
8
1796
4720
3
2
12
1544
5940
4
2
2
2094
5720
4
2
34
1968
3660
3
2
38
1567
5000
4
2
31
1878
2990
2
1
19
949
2610
2
1
48
910
5650
4
2
42
1774
3570
2
1
4
1187
2930
3
2
15
1113
1280
2
1
31
671
4880
3
2
42
1678
1620
1
2
35
710
1820
2
1
17
678
4530
2
2
5
1585
2570
2
1
13
842
4690
2
2
45
1539
1280
1
1
45
433
4100
3
1
27
1268
3530
2
2
41
1251
3660
2
2
33
1094
1110
1
2
50
638
2670
2
2
39
999
1100
1
1
20
653
5810
4
3
17
1914
2560
2
2
24
772
2340
3
1
5
890
3690
2
2
15
1282
3580
3
2
27
1264
3610
2
1
8
1162
3960
3
2
17
1447
arrow_forward
What is A REGRESSION LINE?
arrow_forward
Briefly describe what is meant by the problem of errors in measurement of the predictor variables and describe its effect on a regression analysis.
arrow_forward
Explain why a 3 way interaction will have more regression lines than a 2 way interaction.
arrow_forward
https://1drv.ms/x/s!Av-KKmo42J4EggRqEfmUbMJgjKt3
The file Assignment 2 data.xlsx contains hypothetical starting salaries for MBA students directly after graduation. The file also lists their years of experience prior to the MBA program and their class rank in the MBA program (on a 0-100 scale- this not a categorical Variable)
Estimate the regression equation with Salary as the dependent variable and Experience and Class Rank as the explanatory (independent) variables. What does this equation imply? What does the standard error of estimate tell you? What about ?
Repeat part a, but now include the interaction term Experience*ClassRank in the equation as well as Experience and Class Rank individually. Answer the same questions as in part a.
What evidence in there that this extra variable (the interaction variable) is worth including? Why might you expect the interaction to be present in real data of this type?
Please submit the excel file of solution including results…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
data:image/s3,"s3://crabby-images/86990/869902122cc988a8b1078ef9afcefe0673468505" alt="Text book image"
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/b0445/b044547db96333d789eefbebceb5f3241eb2c484" alt="Text book image"
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Excel file: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardLink to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardLink to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forward
- Link to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardPlease *Find the equation of the least-squares regression line that models the data. *Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below. *Estimate the tuition and fees in 2005.arrow_forwardwhen a regression is used as a method of predicting dependent variables from one or more independent variables. How are the independent variables different from each other yet related to the dependent variable?arrow_forward
- The following result perspective in RapidMiner shows a multiple linear regression model. Based on the diagram, the model for our dependent variable Y is Predicted Y= (Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589 Attribute Insulation Temperature Avg Age Home Size (Intercept) O True O False Coefficient 3.323 -0.869 1.968 3.173 134.511 Std. Error 0.420 0.071 0.065 0.311 7.589 Std. Coefficient 0.164 -0.262 0.527 0.131 ? Tolerance 0.431 0.405 0.491 0.914 ? t-Stat 7.906 -12.222 30.217 10.210 17.725arrow_forwardThe graph shows a bivariate data set and its least squares regression line. Draw the residual plot for the same data set?arrow_forwardWhat is Instrumental Variables Regression?arrow_forward
- Scenario Smart businesses in all industries use data to provide an intuitive analysis of how they can get a competitive advantage. The real estate industry heavily uses linear regression to estimate home prices, as cost of housing is currently the largest expense for most families. Additionally, in order to help new homeowners and home sellers with important decisions, real estate professionals need to go beyond showing property inventory. They need to be well versed in the relationship between price, square footage, build year, location, and so many other factors that can help predict the business environment and provide the best advice to their clients. Prompt You have been recently hired as a junior analyst by D.M. Pan Real Estate Company. The sales team has tasked you with preparing a report that examines the relationship between the selling price of properties and their size in square feet. You have been provided with a Real Estate Data spreadsheet that includes properties sold…arrow_forwardIf a regression line for two variables has a small positive slope, then the: variables are positively associated? variables are negatively associated? association of the variables cannot be determined. variables have no association with each other.arrow_forwardExplain why all-subsets regression is considered superior to stepwise regression in selecting a regression equation.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra and Trigonometry (MindTap Course List)AlgebraISBN:9781305071742Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/86990/869902122cc988a8b1078ef9afcefe0673468505" alt="Text book image"
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/b0445/b044547db96333d789eefbebceb5f3241eb2c484" alt="Text book image"
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt