pdf
keyboard_arrow_up
School
James Madison University *
*We aren’t endorsed by this school
Course
511
Subject
Statistics
Date
Jun 12, 2024
Type
Pages
3
Uploaded by BrigadierOctopusMaster942
Chapter 3 Lab
Resources:
●
Lab notebook:
https://github.com/intro-stat-learning/ISLP_labs/blob/stable/Ch03-linreg-lab.ipynb
●
ISLP package:
https://intro-stat-learning.github.io/ISLP/
Instructions:
In this lab, you will perform a simple and multiple linear regression on a dataset of your
choosing. For the following steps, you can use the same tools and libraries as the textbook
recommends, but you are certainly welcome to use other libraries if you choose to.
●
Complete the following analysis.
●
Record a video of your screen (using Panopto) while you run the notebook, walking
through each step of the process. Answer all of the questions at the end of this
document. Please include audio commentary as you run the notebook and answer the
questions.
●
Attach your notebook file, along with an embed of your Panopto recording to your
submission.
Setup:
1.
Before you start working on your own analysis, open the starter .ipynb file from the
textbook and run it all the way through. No need to record this, but make sure that it
works on your machine. You will need to install the ISLP package in order to do this.
Follow the installation instructions
here
. Take a look at the analysis performed in the
notebook and get an understanding of what is going on.
2.
Select a dataset of your choosing to use for this analysis. This could be a dataset that
you are interested in, or just another dataset that you find online. As you will be
performing a simple and multiple linear regression, you will need to have at least three
fields that can be encoded numerically (two “predictor” fields, and one “target” field). You
are certainly not limited to three fields, but that is a minimum in order to be able to
perform a multiple linear regression.
3.
Create a new jupyter notebook. You can copy the original notebook and just delete the
unneeded cells, or you can create a new one and copy over the imports that you need.
4.
In your new notebook, read in your chosen dataset from step 2. If you need a refresher
on how to read a file into a Dataframe, refer to the Chapter 2 lab.
Simple Linear Regression:
5.
Perform a simple linear regression using the ordinary least squares method on your
dataset.
a.
Fit and transform your model on the dataset.
b.
Produce the summary output of the fitted model.
c.
Produce predictions from your input data (X).
d.
Display a confidence interval for the predictions.
6.
Plot both the target and predictor variable, along with your newly created regression line.
You can use matplotlib like the textbook does, or you can use another library if you
prefer (
seaborn
is nice)
7.
Plot at least one of the following:
a.
The residuals of the model
b.
The “leverage statistics” or influence of your predictor of your model
Multiple Linear Regression:
8.
Perform a multiple linear regression using the ordinary least squares method on your
dataset.
a.
Fit and transform your model on the dataset.
b.
Summarize the model.
9.
Compute the variance inflation factor for each variable in the model.
10. Create an additional model to measure the “interaction term” of two of your predictor
variables.
Questions:
1.
If your dataset contains nominal variables (i.e. [“blue”, “red”, “green”]), you can encode
these using “one-hot encoding” (this would create multiple binary variables, i.e.
[“is_blue”,”is_red”,”is_green”]) If the dataset contains ordinal variables (i.e.
[“small”,”medium”,”large”]), you can encode these using ordinal encoding ([“small” = 1,
“medium” = 2, “large” = 3), or one-hot encoding. You can encode these manually, or refer
to the end of the lab for a guide on how to automatically handle these categorical
variables.
a.
Can you think of any trade-offs of each of these encoding methods?
b.
Why would you choose one or the other?
c.
If you choose to encode any variables, indicate which method you use and
provide a brief rationale for your choice. If you choose not to encode any
variables, indicate why (some possible reasons: variables are not important,
simplicity, concerns about multicollinearity, etc.)
2.
In selecting and preparing your dataset, you will need to select a target variable –
something to predict with your linear models.
a.
What is your target variable (y)?
b.
Why did you select this target variable?
3.
For the simple linear regression, you need to select one predictor variable (X) to regress
against the target variable.
a.
What is your predictor variable?
b.
Why did you select this predictor variable?
c.
Are there other variables in your dataset that could be good predictors? How
would you approach measuring the effectiveness of different predictor variables
for a simple linear regression model?
4.
For the multiple linear regression, you need to select two or more predictor variables (X)
to regress against the target variable.
a.
What are your predictor variables?
b.
Why did you select these predictor variables?
c.
Are there other variables in your dataset that could be good predictors? How
would you approach measuring the effectiveness of different predictor variables
for a multiple linear regression model?
5.
With multiple linear regression models, multicollinearity is a concern.
a.
Explain what multicollinearity is, and why it can be a problem.
b.
Do any variables in your dataset exhibit multicollinearity?
c.
How can you combat this?
6.
You do not need to perform any non-linear transformations on your predictors, although
feel free to do so if you wish to. Do you feel that any of the predictors in your dataset
may benefit from non-linear transformation? Provide your reasoning.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Excel file:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Link to the excel file. Thank you for the help.
EXCEL DATA:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
Please
*Find the equation of the least-squares regression line that models the data.
*Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below.
*Estimate the tuition and fees in 2005.
arrow_forward
when a regression is used as a method of predicting dependent variables from one or more independent variables. How are the independent variables different from each other yet related to the dependent variable?
arrow_forward
The following result perspective in RapidMiner shows a multiple linear regression model.
Based on the diagram, the model for our dependent variable Y is Predicted Y=
(Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589
Attribute
Insulation
Temperature
Avg Age
Home Size
(Intercept)
O True
O False
Coefficient
3.323
-0.869
1.968
3.173
134.511
Std. Error
0.420
0.071
0.065
0.311
7.589
Std. Coefficient
0.164
-0.262
0.527
0.131
?
Tolerance
0.431
0.405
0.491
0.914
?
t-Stat
7.906
-12.222
30.217
10.210
17.725
arrow_forward
What are some examples of ways in which linear regression to create a beneficial statistical outcome, in a business setting?
arrow_forward
What is Instrumental Variables Regression?
arrow_forward
If a regression line for two variables has a small positive slope, then the:
variables are positively associated? variables are negatively associated? association of the variables cannot be determined. variables have no association with each other.
arrow_forward
What test do I need to run in SPSS to perform a stepwise linear regression? The question is Do one's smoking history, exercise, energy level, and eating 3 meals a day predict one's overall state of health. I want to make sure I do not forget a test to run
arrow_forward
Explain why all-subsets regression is considered superior to stepwise regression in selecting a regression equation.
arrow_forward
Professional basketball has truly become a sport that generates interest among fans around the world. More and more players come from outside the United States to play in the National Basketball Association (NBA). You want to develop a regression model to predict the number of wins achieved by each NBA team, based on field goal (shots made) percentage and three-point field goal percentage. The data are stored in NBA.xlsx .
TEAM
Wins
Field Goal %
Three-Point Field Goal %
Points Per Game
Rebound
Freedraw
Turnover
Houston Rockets
65
46
36.2
112.4
43.5
19.6
13.8
Toronto Raptors
59
47.2
35.8
111.7
44
17.3
13.4
Golden State Warriors
58
50.3
39.1
113.5
43.5
16.6
15.4
Boston Celtics
55
45
37.7
104
44.5
16
14
Philadelphia 76ers
52
47.2
36.9
109.8
47.4
17.1
16.5
Cleveland Cavaliers
50
47.6
37.2
110.9
42.1
18.1
13.7
Portland Trail Blazers
49
45.2
36.6
105.6
45.5
16.7
13.5
Indiana Pacers
48
47.2
36.9
105.6
42.3
14.9
13.3
New Orleans Pelicans
48
48.3
36.2
111.7
44.3
16.1
14.9…
arrow_forward
Professional basketball has truly become a sport that generates interest among fans around the world. More and more players come from outside the United States to play in the National Basketball Association (NBA). You want to develop a regression model to predict the number of wins achieved by each NBA team, based on field goal (shots made) percentage and three-point field goal percentage. The data are stored in NBA.xlsx .
TEAM
Wins
Field Goal %
Three-Point Field Goal %
Points Per Game
Rebound
Freedraw
Turnover
Houston Rockets
65
46
36.2
112.4
43.5
19.6
13.8
Toronto Raptors
59
47.2
35.8
111.7
44
17.3
13.4
Golden State Warriors
58
50.3
39.1
113.5
43.5
16.6
15.4
Boston Celtics
55
45
37.7
104
44.5
16
14
Philadelphia 76ers
52
47.2
36.9
109.8
47.4
17.1
16.5
Cleveland Cavaliers
50
47.6
37.2
110.9
42.1
18.1
13.7
Portland Trail Blazers
49
45.2
36.6
105.6
45.5
16.7
13.5
Indiana Pacers
48
47.2
36.9
105.6
42.3
14.9
13.3
New Orleans Pelicans
48
48.3
36.2
111.7
44.3
16.1
14.9…
arrow_forward
Please ASAP in part a , b and c
arrow_forward
Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
FloorArea (Sq.Ft.)
Offices
Entrances
Age
AssessedValue ($'000)
4790
4
2
8
1796
4720
3
2
12
1544
5940
4
2
2
2094
5720
4
2
34
1968
3660
3
2
38
1567
5000
4
2
31
1878
2990
2
1
19
949
2610
2
1
48
910
5650
4
2
42
1774
3570
2
1
4
1187
2930
3
2
15
1113
1280
2
1
31
671
4880
3
2
42
1678
1620
1
2
35
710
1820
2
1
17
678
4530
2
2
5
1585
2570
2
1
13
842
4690
2
2
45
1539
1280
1
1
45
433
4100
3
1
27
1268
3530
2
2
41
1251
3660
2
2
33
1094
1110
1
2
50
638
2670
2
2
39
999
1100
1
1
20
653
5810
4
3
17
1914
2560
2
2
24
772
2340
3
1
5
890
3690
2
2
15
1282
3580
3
2
27
1264
3610
2
1
8
1162
3960
3
2
17
1447
arrow_forward
What is A REGRESSION LINE?
arrow_forward
*Help with what variables are needed for regression please!*
Chadwick has just been named the director of alumni relationships at a local university. Alumni donations are an important source of revenue for colleges and universities. If administrators could determine the factors that could lead to increases in the percentage of alumni who make a donation, they might be able to implement policies that could lead to increased revenues. Research shows that students who are more satisfied with their contact with teachers are more likely to graduate. As a result, one might suspect that smaller class sizes and lower student/faculty ration might lead to a higher percentage of satisfied graduates, which in turn might lead to increases in percentage of alumni who make a donation.
Chadwick is feeling an extra pressure to increase total giving to the university because of recent budget cuts. Chadwick is also feeling pressured to increase the percentage of alumni to give. Chadwick found this new…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill


Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Excel file: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardLink to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardLink to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forward
- Link to the excel file. Thank you for the help. EXCEL DATA: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardPlease *Find the equation of the least-squares regression line that models the data. *Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below. *Estimate the tuition and fees in 2005.arrow_forwardwhen a regression is used as a method of predicting dependent variables from one or more independent variables. How are the independent variables different from each other yet related to the dependent variable?arrow_forward
- The following result perspective in RapidMiner shows a multiple linear regression model. Based on the diagram, the model for our dependent variable Y is Predicted Y= (Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589 Attribute Insulation Temperature Avg Age Home Size (Intercept) O True O False Coefficient 3.323 -0.869 1.968 3.173 134.511 Std. Error 0.420 0.071 0.065 0.311 7.589 Std. Coefficient 0.164 -0.262 0.527 0.131 ? Tolerance 0.431 0.405 0.491 0.914 ? t-Stat 7.906 -12.222 30.217 10.210 17.725arrow_forwardWhat are some examples of ways in which linear regression to create a beneficial statistical outcome, in a business setting?arrow_forwardWhat is Instrumental Variables Regression?arrow_forward
- If a regression line for two variables has a small positive slope, then the: variables are positively associated? variables are negatively associated? association of the variables cannot be determined. variables have no association with each other.arrow_forwardWhat test do I need to run in SPSS to perform a stepwise linear regression? The question is Do one's smoking history, exercise, energy level, and eating 3 meals a day predict one's overall state of health. I want to make sure I do not forget a test to runarrow_forwardExplain why all-subsets regression is considered superior to stepwise regression in selecting a regression equation.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra and Trigonometry (MindTap Course List)AlgebraISBN:9781305071742Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill


Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt