hw 10

docx

School

University of Pittsburgh *

*We aren’t endorsed by this school

Course

STAT 1000

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

10

Uploaded by ConstableScorpionMaster6886

Report
You must use software to solve these problems. Copy/paste the output to include with your answer. Answers without these results will receive zero credit. On the other hand, copy/pasted output without any interpretation will receive zero credit. Whenever possible include a graph of the data in addition to numeric summaries and results. Ch 10 homework problems (5 points each) 10.14 Fuel-efficiency (MPGDIFF) part a – c 10.14 Are the two fuel-efficiency measurements similar? Refer to Exercise 7.18 (page 407). In addition to the computer calculating miles per gallon (mpg), the driver also measured mpg by dividing the miles driven by the number of gallons at fill-up. The driver wants to determine if these calculations are similar. A. Consider the driver’s mpg calculations as the explanatory variable. Plot the data and describe the relationship. Are there any outliers or unusual values? Does a linear relationship seem reasonable? - there is a strong positive linear relationship. We can see slight skewness at the bottom and top of the Q-Q plot but not an amount significant enough to affect the data or results. A linear relationship seems reasonable because the r-squared is 63% which means that 63% of the computer data can be explained or fit into the fitted model of the linear equation y-hat= 11.8 + .775x and can be explained by changes of driver.
b. Run the simple linear regression and state the least-squares regression line. - y-hat= 11.8 + .775x Residual standard error: 2.676 on 18 degrees of freedom Multiple R-squared: 0.652, Adjusted R-squared: 0.6327 c. Summarize the results. Does it appear that the computer and driver calculations are the same? Explain your answer the results show F-statistic: 33.73 on 1 and 18 DF, p-value: 1.675e-05. Since the p-value is approximately equal to 0; we reject Ho there is evidence that computer and driver calculation are different.
10.32 Temperature and performance (TEMPMATH) part a – e 10.32 Temperature and academic performance, continued. Refer to the previous exercise. Repeat parts (a)–(e) using the female average score, Fave, as the response variable. a. Make a scatterplot of Mave versus Temp. Describe the relationship. There is a weak positive relationship b. Find the equation of the least-squares regression line for predicting Mave based on the room temperature and add this line to your scatterplot. y-hat= 5.85261 + 0.15355x
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c. What is r2 for these data? Briefly explain what this tells you about the overall fit of the model to these data. Adjusted R-squared: 0.2806 This tells us that 28% of the data from the female average performance score fits into our regression line and can be explained by temperature. This is because r-squared determines how well the data fit the regression line and model. A larger percentage shows a better fit of the model. d. Check the conditions that must be approximately met for inference. Provide a set of plots and any concerns you have. 1. relationship is linear: yes we see a weak linear correlation within the scatterplot 2. the response varies normally about the population regression line; yes the Q-Q plot shows a relatively normal relationship with all values close or falling on the line except 1 apparent yet small outlier seen in the histogram but it will not significantly affect the data. The histogram also shows a relative normal curve 3. Observation are independent; yes repeated responses of y do not effect each other 4. The standard deviation of the responses is the same for all values of x; yes we see a normally distributed residual plot with random and equal standard deviations away from the line.
e. Assuming that inference is appropriate, is there significant evidence that temperature is associated with performance? State the hypotheses, give a test statistic and P-value, and summarize your conclusion. Ho: beta 1=0 driver has no significant effect on computer Ha: beta 1 does not equal 0 driver has significant effect on computer Test stat F-statistic: 9.97 p-value: 0.004567 p-value is less than alpha so we reject the null so reject Ho; there is evidence of a linear relationship between temperature and female average academic performance
10.44 Predicting water quality (IBI) part a – h 10.44 Predicting water quality. The index of biotic integrity (IBI) is a measure of the water quality in streams. IBI and land use measures for a collection of streams in the Ozark Highland ecoregion of Arkansas were collected as part of a study.22 TABLE 10.3 gives the data for IBI, the percent of the watershed that was forest, and the area of the watershed, in square kilometers, for streams in the original sample with watershed area less than or equal to 70 km2. A. Use numerical and graphical methods to describe the variable IBI. Do the same for area. Summarize your results. a. Area is represented on number 1 and IBI on number 2 this can be easily differentiated by the median values since they are decently spread.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b.
> summary(data3$Area): Area has a minimum value of 2.00, 1 st quartile of 16.00, median of 28.29, mean of 28.29, 3 rd quartile of 34.00, and max of 70.00 Min. 1st Qu. Median Mean 3rd Qu. Max. 2.00 16.00 26.00 28.29 34.00 70.00 > summary(data3$IBI): IBI has a minimum value of 29.00, 1 st quartile of 55.00, median of 71.00, mean of 65.94, 3 rd quartile of 82.00, and max of 91.00 Min. 1st Qu. Median Mean 3rd Qu. Max. 29.00 55.00 71.00 65.94 82.00 91.00 - IBI has more statistical data spread than area. IBI has a very distant min value while area has a very distant max value. Based on the histograms area seems to be more rightly skewed while IBI seems to be more leftly skewed B. Plot the data and describe the relationship between IBI and area. Are there any outliers or unusual patterns? a. b. IBI and area have a moderately weak positive linear relationship; there appear no statistically significant outliers. c. Give the statistical model for simple linear regression for this problem. Yi= (bo + b1*xi) + (ei) B0: the intercept of the regression line B1: the slope of the regression line Ei: error term with mean zero and constant variance Where ei are independent and normally distributed N(0,s)
D. State the null and alternative hypotheses for examining the relationship between IBI and area. a. Ho: there is no significant effect of area on IBI b1=0 b. Ha: there is significant effect of area on IBI b1 does not equal 0 e. Run the simple linear regression and summarize the results. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.85261 1.19858 4.883 7e-05 *** Temp 0.15355 0.04863 3.157 0.00457 ** y-hat= 5.85261 + .15355x Multiple R-squared: 0.1988, Adjusted R-squared: 0.1818 18% of the IBI data fits into the regression line and can be explained by area F-statistic: 11.67 on 1 and 47 DF, p-value: 0.001322 The p-value is less than alpha so reject Ho; there is evidence that area has effect on IBI. f. Obtain the residuals and plot them versus area. Is there anything unusual in the plot? There is nothing unusual about the plot. There appears to be constant variance and random placement of values within the residual plot. g. Do the residuals appear to be approximately Normal? Give reasons for your answer. Yes the residuals appear approximately normal as see in the Q-Q plot with the residuals appearing roughly straight. And in the histogram of the residuals we see a relatively normal curve.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
h. Do the assumptions for the analysis of these data using the model you gave in part (c) appear to be reasonable? Explain your answer. The assumptions made seem to reasonable because there is a positive linear relationship even though weak, response varies normally about the population regression line as seen in the Q-Q plot and histogram above, observations are independent meaning the y values do not affect one another, and the standard deviation of the response is the same for all values of x which is seen in the normal appearing residual plot in part f.