hw2sol

pdf

School

University of Illinois, Urbana Champaign *

*We aren’t endorsed by this school

Course

508

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by MegaRose12943

Report
STAT 425 Assignment 2 Due Wednesday, September 6, 11:59 pm. Submit through Canvas. Name: SOLUTIONS Netid: (insert) Submit your computational work both as an R markdown (*.Rmd) document and as a pdf knitted from the Rmd file, along with any files needed to run the code. Embed your answers to each problem in the document below after the question statement. If you have hand-written work, please scan or take pictures of it and include in a pdf file, ideally combined with your pdf output file from R Markdown, but you can also upload them as a seperate file. Be sure to show your work. Problem 1 (10 pts) Least squares predictions This problem uses the cheddar data from the ‘faraway’ library in R. Make sure you have the faraway package installed. (a) (2 pts) Using the cheddar data, fit a least squares linear regression model for predicting taste from Lactic . Show the analysis of variance (anova) table for the model. State the null hypothesis being tested by the “F value” and p-value “Pr(>F)” in the table. What do you conclude, at significance level α = 0 . 05 ? Answer: library (faraway) mod1 <- lm (taste ~ Lactic, data= cheddar) anova (mod1) ## Analysis of Variance Table ## ## Response: taste ## Df Sum Sq Mean Sq F value Pr(>F) ## Lactic 1 3800.4 3800.4 27.55 1.405e-05 *** ## Residuals 28 3862.5 137.9 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 1
The F value is a test of H 0 : the coefficient of Lactic equals zero. The pvalue < 0.05 so we reject H 0 . (b) (2 pts) Compute a 95% confidence interval for the coefficient of Lactic based on the fitted model, assuming normal errors with mean zero and constant error variance. Answer: Method 1: Here is the model summary: summary (mod1) ## ## Call: ## lm(formula = taste ~ Lactic, data = cheddar) ## ## Residuals: ## Min 1Q Median 3Q Max ## -19.9439 -8.6839 -0.1095 8.9998 27.4245 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -29.859 10.582 -2.822 0.00869 ** ## Lactic 37.720 7.186 5.249 1.41e-05 *** ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 ## ## Residual standard error: 11.75 on 28 degrees of freedom ## Multiple R-squared: 0.4959, Adjusted R-squared: 0.4779 ## F-statistic: 27.55 on 1 and 28 DF, p-value: 1.405e-05 Using the Lactic coefficient estimate and standard error, and the residual degrees of freedom from the summary we compute: tmult <- qt ( 0.975 , 28 ) cat ( "CI:" , 37.720 + c ( - 1 , 1 ) * tmult * 7.186 ) ## CI: 23.00015 52.43985 Method 2: confint (mod1)[ "Lactic" ,] ## 2.5 % 97.5 % ## 22.99928 52.44061 2
(c) (2 pts) Compute an estimate of the expected taste score for cheese with a Lactic score of 1.8. Answer: Fitted model coefficients: mod1 <- lm (taste ~ Lactic, data= cheddar) coef (mod1) ## (Intercept) Lactic ## -29.85883 37.71995 Expected response estimate - Method 1: - 29.86 + 37.72 * 1.8 ## [1] 38.036 Expected response estimate - Method 2 predict (mod1, newdata= data.frame ( Lactic = 1.8 )) ## 1 ## 38.03707 (d) (2 pts) Assuming independent normal errors with mean zero and constant error variance, compute a 95% confidence interval for the expected taste score for cheese with a Lactic score of 1.8. Answer: predict (mod1, newdata= data.frame ( Lactic= 1.8 ), se= TRUE , interval= "confidence" , level= 0.95 ) $ fit ## fit lwr upr ## 1 38.03707 31.17655 44.8976 The 95% confidence interval for expected taste score is (31.2, 44.9). (e) (2 pts) A new cheese sample is sent to the taste testers. It has Lactic = 1.8. Under the same assumptions as in Part (d), compute a 90% prediction interval for the taste score for the new cheese sample. Answer: predict (mod1, newdata= data.frame ( Lactic= 1.8 ), se= TRUE , interval= "prediction" , level= 0.90 ) $ fit ## fit lwr upr ## 1 38.03707 17.26076 58.81339 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The 90% prediction interval for the taste score is (17.3, 58.8). Problem 2 (10 pts) Matrix results for multiple linear regression This problem can be hand written or you can format your work in LaTeX. If handwritten, please scan to pdf files and include with this document. Clearly label the solutions (e.g. Prob- lem 2(a) etc), and indicate in this document where those solutions are located in the attached files. Alternatively, you can save each part’s solution as an image file and insert those in this document using the method described under the assignment link. All the problems below refer to a designed experiment with 6 observations and the following conditions. We fit a linear model of the form y = X β + e with y = 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA y 1 y 2 y 3 y 4 y 5 y 6 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB , X = 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA 1 1 1 1 1 1 1 0 2 1 0 2 1 1 1 1 1 1 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB , β = 1 CCCCCCC AAAAAA β 0 β 1 β 2 2 DDDDDDD BBBBBB , e = 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA e 1 e 2 e 3 e 4 e 5 e 6 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB . We make the working assumption that the errors e i are uncorrelated random variables with E ( e i ) = 0 and Var ( e i ) = σ 2 . (a) (2 pts) Calculate matrix X T X . (You can do this by hand or using R matrix calculations). Answer: Here is X : X = cbind ( c ( 1 , 1 , 1 , 1 , 1 , 1 ), c ( - 1 , - 1 , 0 , 0 , 1 , 1 ), c ( 1 , 1 , - 2 , - 2 , 1 , 1 )) X ## [,1] [,2] [,3] ## [1,] 1 -1 1 ## [2,] 1 -1 1 ## [3,] 1 0 -2 ## [4,] 1 0 -2 ## [5,] 1 1 1 ## [6,] 1 1 1 Here is X T : t (X) ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1 1 1 1 1 1 ## [2,] -1 -1 0 0 1 1 ## [3,] 1 1 -2 -2 1 1 4
Now we calculate X T X using matrix multiplication: t (X) %*% X ## [,1] [,2] [,3] ## [1,] 6 0 0 ## [2,] 0 4 0 ## [3,] 0 0 12 (b) (2 pts) Simplify X T y , displaying the result as a 3 × 1 matrix whose elements depend only on y 1 , y 2 , . . . , y 6 . (Hand written or LaTeX) Answer: X T y = 1 CCCCCCC AAAAAA 1 1 1 1 1 1 1 1 0 0 1 1 1 1 2 2 1 1 2 DDDDDDD BBBBBB 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA y 1 y 2 y 3 y 4 y 5 y 5 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB = 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA y 1 + y 2 + y 3 + y 4 + y 5 + y 6 y 1 y 2 + y 5 + y 6 y 1 + y 2 2 y 3 2 y 4 + y 5 + y 6 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB (c) (2 pts) Solve the normal equation, X T X ˆ β = X T y , to obtain the components of ˆ β in terms of y 1 , y 2 , . . . , y 6 . Answer: Using the results above we have 1 CCCCCCC AAAAAA 6 0 0 0 4 0 0 0 12 2 DDDDDDD BBBBBB 1 CCCCCCC CCCCCCC AAAAAA ˆ β 0 ˆ β 1 ˆ β 2 2 DDDDDDD DDDDDDD BBBBBB = 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA y 1 + y 2 + y 3 + y 4 + y 5 + y 6 y 1 y 2 + y 5 + y 6 y 1 + y 2 2 y 3 2 y 4 + y 5 + y 6 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB 1 CCCCCCC CCCCCCC AAAAAA ˆ β 0 ˆ β 1 ˆ β 2 2 DDDDDDD DDDDDDD BBBBBB = 1 CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC CCCCCCC AAAAAA 1 6 ( y 1 + y 2 + y 3 + y 4 + y 5 + y 6 ) 1 4 ( y 1 y 2 + y 5 + y 6 ) 1 12 ( y 1 + y 2 2 y 3 2 y 4 + y 5 + y 6 ) 2 DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD DDDDDDD BBBBBB (d) (2 pts) Show that the ordinary least squares estimators ˆ β 1 and ˆ β 2 are uncorrelated with each other for this design, in other words, show Cov ( ˆ β 1 , ˆ β 2 ) = 0 . (Hint: use our known matrix results for least squares regression estimators) Answer: Method 1: 5
From the notes we know that Cov ( ˆ β ) = Σ = σ 2 ( X T X ) 1 . Because this is a diagonal matrix Cov ( ˆ β 1 , ˆ β 2 ) = Σ 23 = 0 . Method 2: The covariance can also be found by direct calculation using the explicit formulas in Part (c) and using the fact that all of the y i are uncorrelated with each other: Cov ( ˆ β 1 , ˆ β 2 ) = Cov 1 4 ( y 1 y 2 + y 5 + y 6 ) , 1 12 ( y 1 + y 2 2 y 3 2 y 4 + y 5 + y 6 ) = 1 48 Cov ( y 1 , y 1 ) Cov ( y 2 , y 2 ) + Cov ( y 5 , y 5 ) + Cov ( y 6 , y 6 ) = 1 48 σ 2 σ 2 + σ 2 + σ 2 ) = 0 (e) (2 pts) The hat matrix is given by H = X ( X T X ) 1 X T . (i) What are the dimensions of this matrix in our problem ? (how many rows and columns?). (ii) Calculate the first diagonal element in the upper left corner, i.e., calculate H 11 , where H ij refers to the element in row i and column j of the matrix H . Answer: (i) Dimensions: H is a 6 × 6 matrix. (ii) First diagonal element: H 11 = 1 1 1 1 CCCCCCC AAAAAA 1 / 6 0 0 0 1 / 4 0 0 0 1 / 12 2 DDDDDDD BBBBBB 1 CCCCCCC AAAAAA 1 1 1 2 DDDDDDD BBBBBB = 1 / 6 1 / 4 1 / 12 1 CCCCCCC AAAAAA 1 1 1 2 DDDDDDD BBBBBB = 1 6 + 1 4 + 1 12 = 1 2 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help