HWK10_Soln

pdf

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

371

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by UltraDolphinMaster987

Report
Stat 371 Homework #10 SOLUTIONS *Submit your homework to Canvas by the due date and time. Email your instructor if you have extenuating circumstances and need to request an extension. *If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions. *If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly. *You must include an explanation and/or intermediate calculations for an exercise to be complete. *Be sure to submit the HWK 10 Auto grade Quiz which will give you ~20 of your 40 accuracy points. *50 points total: 40 points accuracy, and 10 points completion Least Squares Linear Regression Exercise 1. Suppose we are interested in exploring the relationship between city air particulate and rates of childhood asthma. We sample 15 cities for particulate (X) measured in parts-per-million (ppm) of large particulate matter and for the rate of childhood asthma (Y) measured in percents. The data is given in the summary table and R vectors below. variable size mean variance X 15 11.42 13.05 Y 15 14.513 2.636 a. Plot the data as you see fit and summarize the pattern’s shape, direction, and strength in the context of the problem. There appears to be a strong, positive, linear pattern between level of particulate and percent asthma for the range of values observed in this data set. particulate <- c( 11.6 , 15.9 , 15.7 , 7.9 , 6.3 , 13.7 , 13.1 , 10.8 , 6.0 , 7.6 , 14.8 , 7.4 , 16.2 , 13.1 , 11.2 ) asthma <- c( 14.5 , 16.6 , 16.5 , 12.6 , 12.0 , 15.8 , 15.1 , 14.2 , 12.2 , 13.1 , 16.0 , 12.9 , 16.4 , 15.4 , 14.4 ) plot(particulate, asthma) 1
6 8 10 12 14 16 12 13 14 15 16 particulate asthma b. Calculate the correlation coefficient (you can use an R function) and explain how the value corresponds to what you observed in the graph in part (a). r = 0.993 calculated in R. This value is very close to 1, which is not surprising based on how tight and linear the x, y data points were in the scatterplot. cor(particulate, asthma) ## [1] 0.9931873 c. Build a linear regression model with least squares estimators for slope and y intercept for the data (i) First, build a regression model by hand using the correlation computed in (b) and summary statistics given above. Our estimated slope is given by ˆ β 1 = r s y s x = 0 . 993 2 . 636 13 . 05 = 0 . 446 We can find the intercept as ¯ y ˆ β 1 ¯ x which in this case is 14 . 513 0 . 446(11 . 42) = 9 . 42 . Our linear model is y = 0 . 446 x + 9 . 42 (ii) Check your computations using lm in R. asthma_mod <- lm(asthma ~ particulate) asthma_mod ## ## Call: ## lm(formula = asthma ~ particulate) ## ## Coefficients: ## (Intercept) particulate 2
## 9.4163 0.4463 (iii) Interpret the estimated intercept and slope in the context of the question. The estimated intercept is 9.42 (this is the estimated average rate of asthma in cities with 0 particulate - this value is outside the range of our data). The estimated slope is 0.446. This suggests that for each unit increase in particulate, measured cities will tend to exhibit an increase of 0.446 percentage points in the rate of childhood asthma on average. d. Construct a residual plot of fitted y values on the x axis and residuals on the y. Graphically assess whether the correct model and constant variance assumptions are reasonably met. There are no clear deviations from constant variance or clear curvature of the residuals. # We pull the predicted values and residuals from the # R linear model object plot(asthma_mod$fitted, asthma_mod$residuals) 12 13 14 15 16 -0.3 -0.1 0.1 0.2 0.3 asthma_mod$fitted asthma_mod$residuals e. Identify which (particulate, asthma) data point results in the residual with the largest magnitude. Is that point above or below the fitted regression line? Show how the residual is calculated. (Make sure that you can also identify that point on the residual plot.) Going by the vector of residuals calculated by R, the 4th point has the largest residual (-0.3423). This point is below the regression line, because the residual is negative. The observed y is 12.6 which comes from x = 7.9, and the predicted value is 9 . 42 + 0 . 446(7 . 9) = 12 . 943 . asthma_mod$residuals ## 1 2 3 4 5 6 ## -0.09367246 0.08711504 0.07638074 -0.34225706 -0.22813148 0.26903771 ## 7 8 9 10 11 12 ## -0.16316519 -0.03660967 0.10576707 0.29164149 -0.02192362 0.18090719 ## 13 14 15 ## -0.24678350 0.13683481 -0.01514107 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Exercise 2. In the paper “Artificial Trees as a Cavity Substrate for Woodpeckers”, scientists provided polystyrene cylinders as an alternative roost. The paper related values of x = ambient temperature (C) and y = cavity depth (cm). A scatterplot in the paper showed a strong linear relationship between x and y. The summary values for x and y are given below: Variable Size Mean Variance Temp (x) 12 10.92 137.17 Depth (y) 12 16.36 21.28 A least-squares linear model for (Depth ~ Temp) was fit, and the intercept was estimated to be 20.12506 with standard error 0.94023. The slope was estimated to be -0.34504 with standard error 0.06008. The MSE for the model is 2 . 334 2 . a. Determine the sample correlation (r) from the summaries given. ˆ β 1 = r s y s x so r = ˆ β 1 s x s y = 0 . 34504 137 . 17 21 . 28 = 0 . 8760183 b. Using the slope and intercept values given in the problem, write the linear regression model with least squares estimates for β 0 and β 1 relating ambient temperature (x) and hole depth (y). ˆ depth = 20 . 12506 0 . 34504( temp ) Determine test statistics and p values for the tests in parts c-f: c. H 0 : β 1 = 0 vs H A : β 1 ̸ = 0 TS: t obs = . 34504 0 0 . 06008 = 5 . 743 ; pval: 0 . 000187 . 2 *pt(- 5.743 , df = 10 ) ## [1] 0.0001869496 d. H 0 : β 1 0 vs H A : β 1 < 0 TS: t obs = . 34504 0 0 . 06008 = 5 . 743 ; pval: 0 . 000187 / 2 = 9 . 35 e 05 . pt(- 5.743 , df= 10 ) ## [1] 9.34748e-05 e. H 0 : β 1 0 vs H A : β 1 > 0 TS: t obs = . 34504 0 0 . 06008 = 5 . 743 ; pval: 1 0 . 000187 / 2 = 0 . 9999065 . pt(- 5.743 , df = 10 , lower.tail = F) ## [1] 0.9999065 f. H 0 : β 1 = 0 . 5 vs H A : β 1 ̸ = 0 . 5 4
TS: t obs = 0 . 34504 ( 0 . 5) 0 . 06008 = 0 . 15497 0 . 06008 = 2 . 579 ; pval: 0 . 02746 . 2 *pt( 2.579 , df = 10 , lower.tail = F) ## [1] 0.02746338 g. Compute and interpret a 98% confidence interval for the slope of the regression line β 1 . 0 . 34504 ± t 0 . 01 , 10 × 0 . 06008 = 0 . 34504 ± 2 . 764 × 0 . 06008 = ( 0 . 511 , 0 . 179) We’re 98% confident the true slope relating temperature (x) and cavity depth (y) is between -0.511 and -0.179. A one degree increase in temperature is related to a decrease in cavity depth of between 0.511 and 0.179 cm. # Find t critical value for CI qt( 0.99 , df = 10 ) ## [1] 2.763769 h. Construct a 95% prediction interval for the cavity depth of the next hole when ambient temperature is 1 degree Celsius (this temperature value is within the range of those in the original study). The predicted depth 1 degree celsius is ˆ depth = 20 . 12506 0 . 34504(1) = 19 . 78 with standard error 2 . 334 ttttttt 1 + 1 12 + (1 10 . 92) 2 11(137 . 17) = 2 . 501 and the t crtical value is t 10 , 0 . 025 = 2 . 228 so our 95% PI is 19 . 78 ± 2 . 228 × 2 . 501 = 19 . 78 ± 5 . 5734 = (14 . 21 , 25 . 35) # DATA FOR QUESTION 2 (not included in HWK prompt) temp <- c(- 6 , - 3 , - 2 , 1 , 6 , 10 , 11 , 19 , 21 , 23 , 25 , 26 ) depth <- c( 21.1 , 26 , 18 , 19.2 , 16.9 , 18.1 , 16.8 , 11.8 , 11 , 12.1 , 14.8 , 10.5 ) plot(temp, depth) 5
-5 0 5 10 15 20 25 10 15 20 25 temp depth WPmod <- lm(depth~temp) summary(WPmod); anova(WPmod) ## ## Call: ## lm(formula = depth ~ temp) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.8151 -1.3084 -0.6170 0.7092 4.8398 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 20.12506 0.94023 21.404 1.1e-09 *** ## temp -0.34504 0.06008 -5.743 0.000187 *** ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 ## ## Residual standard error: 2.334 on 10 degrees of freedom ## Multiple R-squared: 0.7674, Adjusted R-squared: 0.7441 ## F-statistic: 32.98 on 1 and 10 DF, p-value: 0.0001869 ## Analysis of Variance Table ## ## Response: depth ## Df Sum Sq Mean Sq F value Pr(>F) ## temp 1 179.644 179.644 32.983 0.0001869 *** ## Residuals 10 54.465 5.447 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 cor(temp, depth) ## [1] -0.8759858 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
mean(temp); var(temp) ## [1] 10.91667 ## [1] 137.1742 mean(depth); var(depth) ## [1] 16.35833 ## [1] 21.28265 7