HW11_solution

pdf

School

University of Texas *

*We aren’t endorsed by this school

Course

302

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

8

Uploaded by surker21

Report
CEE 373 Homework #11 Due date: December 6 at 11:59pm (NOTE: Wednesday due date) Total: 100 points (NOTE: This HW will replace your lowest HW score) 1. (25 points) Data for the per capita energy consumption and per capita Gross Na- tional Product (GNP) for eight different countries have been compiled by Mead- ows et al. (1972) and tabulated below. Table 1: Country-wide energy consumption and GNP for eight countries Country Per Capita Gross National Product, X Per Capita Energy Consump- tion, Y 1 600 1000 2 2700 700 3 2900 1400 4 4200 2000 5 3100 2500 6 5400 2700 7 8600 2500 8 10300 4000 Please answer the following questions using this data. (a) Plot a scatterplot of Y versus X (b) Determine the linear regression equation for predicting the per capita energy consumption ( Y ) on the basis of a country’s per capita GNP ( X ) and plot the regression line with your scatterplot from part (a). Please calculate regression coefficients by hand (you may check your answers with Python or another computing software). (c) Determine the the R 2 value. What does the R 2 value tell you about the appropriateness of the fit of this linear regression equation? (d) Estimate the Per Capita Energy Consumption for a new country whose Per Capita GNP is 200. Solution: Refer to the solution code https://colab.research.google.com/drive/ 1HGiiJzofZhQWvip5SwAAz5jhDBXRVrdz?authuser=0#scrollTo=g2acxo7tulJU . Page 1 of 8
CEE 373 Homework #11 (a) Scatter plot (b) From the data, the mean values of X ( X ) and Y ( Y ) are: X = 4725 . 0 Y = 2100 . 0 When you calculating regression coefficient ( ˆ β 1 ) and intercept ( ˆ β 0 ), ˆ β 1 = n i =1 ( x i x )( y i y ) n i =1 ( x i x ) 2 = 0 . 279 ˆ β 0 = Y ˆ β 1 X = 783 . 15 Thus, the fitted regression line is: ˆ Y = ˆ β 0 + ˆ β 1 X = 783 . 15 + 0 . 279 X When you draw the regression line with the scatter plot: Page 2 of 8
CEE 373 Homework #11 (c) To calculate R 2 : R 2 = 1 SSE SST SSE = X (( Y ˆ Y ) 2 ) = 2218810 . 80 SST = X (( Y Y ) 2 ) = 7960000 . 0 Thus, R 2 = 1 2218810 . 80 7960000 . 0 0 . 72 R 2 ranges from 0 to 1. A value closer to 1 indicates that a higher proportion of the variance in the dependent variable is explained by the independent variable(s). For instance, an R 2 of 0.72 means that 72% of the variability in the dependent variable is explained by the independent variable(s). (d) When GNP = 200, ˆ Y = ˆ β 0 + ˆ β 1 X = 783 . 15 + 0 . 279 × 200 = 838 . 89 Thus, expected energy consumption is 838.89. 2. (25 points) Suppose a survey of the effect of a fare increase on the loss of ridership for mass transit systems in the United States reveals the data tabulated below. Please answer the following questions using this data. (a) Plot a scatterplot of the above data for the percentage loss in ridership ( Y ) Page 3 of 8
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CEE 373 Homework #11 % Fare Increase, X % Loss in Ridership, Y 5 1.5 35 12 20 7.5 15 6.3 4 1.2 6 1.7 18 7.2 23 8 38 11.1 8 3.6 12 3.7 17 6.6 17 4.4 13 4.5 7 2.8 23 8 versus the percentage fare increase ( X ). (b) Perform a linear regression analysis for predicting the expected percentage loss in ridership as a function of the percentage fare increase for a mass transit system in the United States and plot the regression line with your scatterplot from part (a). You may fit this model using Python, Excel, or any other software. (c) Evaluate the standard deviation of the estimated slope parameter ˆ β 1 . (d) Determine the 90% confidence interval of the estimated slope parameter ˆ β 1 . Solution: Refer to the solution code https://colab.research.google.com/drive/ 1HGiiJzofZhQWvip5SwAAz5jhDBXRVrdz?authuser=0#scrollTo=g2acxo7tulJU . (a) Scatter plot Page 4 of 8
CEE 373 Homework #11 (b) From the data, the mean values of X ( X ) and Y ( Y ) are: X = 16 . 3125 Y = 5 . 63125 When you calculating regression coefficient ( ˆ β 1 ) and intercept ( ˆ β 0 ), ˆ β 1 = n i =1 ( x i x )( y i y ) n i =1 ( x i x ) 2 = 0 . 317 ˆ β 0 = Y ˆ β 1 X = 0 . 464 Thus, the fitted regression line is: ˆ Y = ˆ β 0 + ˆ β 1 X = 0 . 464 + 0 . 317 X When you draw the regression line with the scatter plot: Page 5 of 8
CEE 373 Homework #11 (c) To calculate standard deviation of the estimated slope parameter, ˆ s 2 ˆ β 1 : SSE = X (( Y ˆ Y ) 2 ) = 9 . 417 ˆ σ 2 = SSE n 2 = 9 . 417 14 = 0 . 673 where n=16, which is the total number of data. ˆ s 2 ˆ β 1 = ˆ σ 2 (( X X ) 2 ) = 0 . 673 1499 . 4375 = 0 . 00045 Thus, ˆ s ˆ β 1 0 . 02 (d) The 90% confidence interval of ˆ β 1 is calculated using t-distribution: ˆ β 1 ,Lower = ˆ β 1 t 1 α 2 ,n 2 × ˆ s ˆ β 1 = 0 . 3167 1 . 7613 × 0 . 02 0 . 279 ˆ β 1 ,Upper = ˆ β 1 + t 1 α 2 ,n 2 × ˆ s ˆ β 1 = 0 . 3167 + 1 . 7613 × 0 . 02 0 . 354 Thus, The 90% confidence interval of ˆ β 1 is [0.279, 0.354]. 3. (50 points) In class, we discussed several considerations when working with data. You now have the chance to find and explore a dataset of your own, and answer the Page 6 of 8
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CEE 373 Homework #11 following questions. Please read all parts of this question before selecting a dataset, as you will want to make sure you can answer the following questions with your selected dataset. To work with the data, you can build on starter code from previous homeworks (the solution code for Homework 1 will be helpful: https://colab.research. google.com/drive/1uZB0_MpcakkIkEEJCQtlfPbkL9mADHTy?usp=sharing ). (a) Find a dataset online related to your interests that you can download and analyze. You will want this dataset to have numerical data and ideally available in csv format, so that you can read it into Excel, Python, or R. If you don’t know where to start, you can visit the following websites for inspiration and open datasets: Several cities have open data: City of Ann Arbor data (see the CSV column): https://www. a2gov.org/services/data/Pages/default.aspx City of New York data: https://data.cityofnewyork.us/browse? limitTo=datasets City of San Francisco data: https://data.sfgov.org/browse?limitTo= datasets The federal government and humanitarian organizations also share data: Data.gov (already filtered to CSV formats): https://catalog. data.gov/dataset/?res_format=CSV Climate.gov data (already filtered to CSV formats): https://www. climate.gov/maps-data/all?query=*&csv=1 Humaniarian Data Exchange: https://data.humdata.org/dataset There are also several online resources: Pudding.cool is a site that I reference in lecture a lot https:// pudding.cool/ . You can explore their data stories, which often have publicly available data here: https://github.com/the-pudding/ data Data is Plural, this is the online database of open datasets covered in lecture: https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFg edit?usp=sharing Once you have found a suitable dataset, please write a 1-2 paragraph de- scription of the dataset that includes: 1. Why is the data interesting and/or important? 2. The source of the dataset (e.g., City of Ann Arbor, NOAA, UM, etc.), including its url. 3. What is included in the dataset (e.g., rainfall, walkability, movie rank- ings, etc.) Page 7 of 8
CEE 373 Homework #11 4. How the data was collected (b) Create several exploratory plots or statistics of your dataset. This can in- clude scatterplots, histograms, sample means, sample standard deviations, etc. You will want to explore your dataset to find something interesting to visualize in the next part. At minimum , we would like to see: 1. A histogram 2. A scatterplot 3. Sample statistics (c) Identify an interesting story or finding about your dataset that you would want to share with the class and create 1-2 “finished” plots about it. Make sure that your final plot includes all the information necessary for someone to read and understand it (as discussed in lecture). Feel free to make this directly in Python or Excel, or to export your chart to another software (like Powerpoint or Adobe Illustrator) to edit it. Submit the following: 1. Your 1-2 “finished” plots 2. A 1 paragraph description of the interesting finding/results shown in the plot(s). (d) Comment on any issues you found with the dataset while analyzing it. This could be regarding any issues with collection, data representation, missing- ness, or other issues you might have encountered. If no issues exist, comment on additional information you would like to be included in the dataset in the future. Solution: (a) As long as the three requirements are met, then full points were received. (b) As long as there were a (1) histogram, (2) scatterplot, and (3) sample statistics, full points were received. (c) The final plot should contain: an interesting finding a title axis labels (if applicable) legend (if applicable) In addition, the finding/results should correspond to what is shown in the plot. (d) If an issue or additional information was included, full points were received. Page 8 of 8