HW11_solution
pdf
keyboard_arrow_up
School
University of Texas *
*We aren’t endorsed by this school
Course
302
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
8
Uploaded by surker21
CEE 373
Homework #11
Due date: December 6 at 11:59pm (NOTE: Wednesday due date)
Total: 100 points (NOTE: This HW will replace your lowest HW score)
1. (25 points) Data for the per capita energy consumption and per capita Gross Na-
tional Product (GNP) for eight different countries have been compiled by Mead-
ows et al. (1972) and tabulated below.
Table 1: Country-wide energy consumption and GNP for eight countries
Country
Per
Capita
Gross
National
Product,
X
Per
Capita
Energy
Consump-
tion,
Y
1
600
1000
2
2700
700
3
2900
1400
4
4200
2000
5
3100
2500
6
5400
2700
7
8600
2500
8
10300
4000
Please answer the following questions using this data.
(a) Plot a scatterplot of
Y
versus
X
(b) Determine the linear regression equation for predicting the per capita energy
consumption (
Y
) on the basis of a country’s per capita GNP (
X
) and plot
the regression line with your scatterplot from part (a).
Please calculate
regression coefficients by hand (you may check your answers with Python or
another computing software).
(c) Determine the the
R
2
value.
What does the
R
2
value tell you about the
appropriateness of the fit of this linear regression equation?
(d) Estimate the Per Capita Energy Consumption for a new country whose Per
Capita GNP is 200.
Solution:
Refer to the solution code
https://colab.research.google.com/drive/
1HGiiJzofZhQWvip5SwAAz5jhDBXRVrdz?authuser=0#scrollTo=g2acxo7tulJU
.
Page 1 of 8
CEE 373
Homework #11
(a) Scatter plot
(b) From the data, the mean values of X (
X
) and Y (
Y
) are:
X
= 4725
.
0
Y
= 2100
.
0
When you calculating regression coefficient (
ˆ
β
1
) and intercept (
ˆ
β
0
),
ˆ
β
1
=
∑
n
i
=1
(
x
i
−
x
)(
y
i
−
y
)
∑
n
i
=1
(
x
i
−
x
)
2
= 0
.
279
ˆ
β
0
=
Y
−
ˆ
β
1
X
= 783
.
15
Thus, the fitted regression line is:
ˆ
Y
=
ˆ
β
0
+
ˆ
β
1
X
= 783
.
15 + 0
.
279
X
When you draw the regression line with the scatter plot:
Page 2 of 8
CEE 373
Homework #11
(c) To calculate
R
2
:
R
2
= 1
−
SSE
SST
SSE
=
X
((
Y
−
ˆ
Y
)
2
) = 2218810
.
80
SST
=
X
((
Y
−
Y
)
2
) = 7960000
.
0
Thus,
R
2
= 1
−
2218810
.
80
7960000
.
0
≈
0
.
72
R
2
ranges from 0 to 1. A value closer to 1 indicates that a higher proportion
of the variance in the dependent variable is explained by the independent
variable(s). For instance, an
R
2
of 0.72 means that 72% of the variability in
the dependent variable is explained by the independent variable(s).
(d) When GNP = 200,
ˆ
Y
=
ˆ
β
0
+
ˆ
β
1
X
= 783
.
15 + 0
.
279
×
200 = 838
.
89
Thus, expected energy consumption is 838.89.
2. (25 points) Suppose a survey of the effect of a fare increase on the loss of ridership
for mass transit systems in the United States reveals the data tabulated below.
Please answer the following questions using this data.
(a) Plot a scatterplot of the above data for the percentage loss in ridership (
Y
)
Page 3 of 8
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CEE 373
Homework #11
%
Fare
Increase,
X
%
Loss
in
Ridership,
Y
5
1.5
35
12
20
7.5
15
6.3
4
1.2
6
1.7
18
7.2
23
8
38
11.1
8
3.6
12
3.7
17
6.6
17
4.4
13
4.5
7
2.8
23
8
versus the percentage fare increase (
X
).
(b) Perform a linear regression analysis for predicting the expected percentage
loss in ridership as a function of the percentage fare increase for a mass
transit system in the United States and plot the regression line with your
scatterplot from part (a). You may fit this model using Python, Excel, or
any other software.
(c) Evaluate the standard deviation of the estimated slope parameter
ˆ
β
1
.
(d) Determine the 90% confidence interval of the estimated slope parameter
ˆ
β
1
.
Solution:
Refer to the solution code
https://colab.research.google.com/drive/
1HGiiJzofZhQWvip5SwAAz5jhDBXRVrdz?authuser=0#scrollTo=g2acxo7tulJU
.
(a) Scatter plot
Page 4 of 8
CEE 373
Homework #11
(b) From the data, the mean values of X (
X
) and Y (
Y
) are:
X
= 16
.
3125
Y
= 5
.
63125
When you calculating regression coefficient (
ˆ
β
1
) and intercept (
ˆ
β
0
),
ˆ
β
1
=
∑
n
i
=1
(
x
i
−
x
)(
y
i
−
y
)
∑
n
i
=1
(
x
i
−
x
)
2
= 0
.
317
ˆ
β
0
=
Y
−
ˆ
β
1
X
= 0
.
464
Thus, the fitted regression line is:
ˆ
Y
=
ˆ
β
0
+
ˆ
β
1
X
= 0
.
464 + 0
.
317
X
When you draw the regression line with the scatter plot:
Page 5 of 8
CEE 373
Homework #11
(c) To calculate standard deviation of the estimated slope parameter, ˆ
s
2
ˆ
β
1
:
SSE
=
X
((
Y
−
ˆ
Y
)
2
) = 9
.
417
ˆ
σ
2
=
SSE
n
−
2
=
9
.
417
14
= 0
.
673
where n=16, which is the total number of data.
ˆ
s
2
ˆ
β
1
=
ˆ
σ
2
∑
((
X
−
X
)
2
)
=
0
.
673
1499
.
4375
= 0
.
00045
Thus,
ˆ
s
ˆ
β
1
≈
0
.
02
(d) The 90% confidence interval of
ˆ
β
1
is calculated using t-distribution:
ˆ
β
1
,Lower
=
ˆ
β
1
−
t
1
−
α
2
,n
−
2
×
ˆ
s
ˆ
β
1
= 0
.
3167
−
1
.
7613
×
0
.
02
≈
0
.
279
ˆ
β
1
,Upper
=
ˆ
β
1
+
t
1
−
α
2
,n
−
2
×
ˆ
s
ˆ
β
1
= 0
.
3167 + 1
.
7613
×
0
.
02
≈
0
.
354
Thus, The 90% confidence interval of
ˆ
β
1
is [0.279, 0.354].
3. (50 points) In class, we discussed several considerations when working with data.
You now have the chance to find and explore a dataset of your own, and answer the
Page 6 of 8
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CEE 373
Homework #11
following questions.
Please read all parts of this question before selecting
a dataset, as you will want to make sure you can answer the following
questions with your selected dataset.
To work with the data, you can build on starter code from previous homeworks
(the solution code for Homework 1 will be helpful:
https://colab.research.
google.com/drive/1uZB0_MpcakkIkEEJCQtlfPbkL9mADHTy?usp=sharing
).
(a) Find a dataset online related to your interests that you can download and
analyze.
You will want this dataset to have numerical data and ideally
available in csv format, so that you can read it into Excel, Python, or R.
If you don’t know where to start, you can visit the following websites for
inspiration and open datasets:
•
Several cities have open data:
–
City of Ann Arbor data (see the CSV column):
https://www.
a2gov.org/services/data/Pages/default.aspx
–
City of New York data:
https://data.cityofnewyork.us/browse?
limitTo=datasets
–
City of San Francisco data:
https://data.sfgov.org/browse?limitTo=
datasets
•
The federal government and humanitarian organizations also share data:
–
Data.gov (already filtered to CSV formats):
https://catalog.
data.gov/dataset/?res_format=CSV
–
Climate.gov data (already filtered to CSV formats):
https://www.
climate.gov/maps-data/all?query=*&csv=1
–
Humaniarian Data Exchange:
https://data.humdata.org/dataset
•
There are also several online resources:
–
Pudding.cool is a site that I reference in lecture a lot
https://
pudding.cool/
. You can explore their data stories, which often have
publicly available data here:
https://github.com/the-pudding/
data
–
Data is Plural, this is the online database of open datasets covered in
lecture:
https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFg
edit?usp=sharing
Once you have found a suitable dataset, please write a 1-2 paragraph de-
scription of the dataset that includes:
1. Why is the data interesting and/or important?
2. The source of the dataset (e.g., City of Ann Arbor, NOAA, UM, etc.),
including its url.
3. What is included in the dataset (e.g., rainfall, walkability, movie rank-
ings, etc.)
Page 7 of 8
CEE 373
Homework #11
4. How the data was collected
(b) Create several exploratory plots or statistics of your dataset. This can in-
clude scatterplots, histograms, sample means, sample standard deviations,
etc. You will want to explore your dataset to find something interesting to
visualize in the next part. At
minimum
, we would like to see:
1. A histogram
2. A scatterplot
3. Sample statistics
(c) Identify an interesting story or finding about your dataset that you would
want to share with the class and create 1-2 “finished” plots about it. Make
sure that your final plot includes all the information necessary for someone
to read and understand it (as discussed in lecture). Feel free to make this
directly in Python or Excel, or to export your chart to another software (like
Powerpoint or Adobe Illustrator) to edit it.
Submit the following:
1. Your 1-2 “finished” plots
2. A 1 paragraph description of the interesting finding/results shown in the
plot(s).
(d) Comment on any issues you found with the dataset while analyzing it. This
could be regarding any issues with collection, data representation, missing-
ness, or other issues you might have encountered. If no issues exist, comment
on additional information you would like to be included in the dataset in the
future.
Solution:
(a) As long as the three requirements are met, then full points were received.
(b) As long as there were a (1) histogram, (2) scatterplot, and (3) sample statistics,
full points were received.
(c) The final plot should contain:
•
an interesting finding
•
a title
•
axis labels (if applicable)
•
legend (if applicable)
In addition, the finding/results should correspond to what is shown in the
plot.
(d) If an issue or additional information was included, full points were received.
Page 8 of 8