HW5 Solutions

pdf

School

University of Houston *

*We aren’t endorsed by this school

Course

3364

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

9

Uploaded by yabreu

Report
HW5 Solutions Ch12 Solutions: Use the matrix least squares approach to fit the multiple linear regression model to the data. The model matrix 𝑋 and vector ? for this model can be written as follows: 𝑿 = [ 1 13 11 1 15 11 1 13 13 1 15 13 1 14 12 1 14 12 1 14 12 1 14 12 1 14 12 ] , 𝒚 = [ 62.8739 76.1328 87.4667 102.3236 76.1872 77.5287 76.7824 77.4381 78.7417 ] Then, 𝑿 𝑿 = [ 9 126 108 126 1768 1512 108 1512 1300 ] (R code: t(X)%*%X ) and 𝑿 𝒚 = [ 715.4751 10044.7672 8636.4848 ] So, the least square estimates are found from Equation 12-13 in textbook as ? ̂ = (𝑿 𝑿) −𝟏 𝑿 𝒚 = [ −171.2589 7.0290 12.6959 ] (R code: solve(t(X)%*%X)%*%t(X)%*%y ) It is okay if you choose to compute the inverse of 𝑿′𝑿 by hand. Therefore, the fitted multiple linear regression model is ? ̂ = −171.2589 + 7 . 0290? 1 + 12.76959? 2 (b) The estimate of 𝜎 2 is 𝜎 2 ̂ = ? ? 2 ? ?=1 ? − ? = 56.8546 6 = 9.4758 where p=k+1=2+1=3 and ∑ ? ? 2 ? ?=1 = ∑(? ? − ? ̂ ? ) 2 = (𝒚 − 𝑋 ? ̂ ) (𝒚 − 𝑋 ? ̂ ) = 56.8546 ? ?=1 (R code for computing ?? ? : sum((y-X%*%betaHat)^2) )
The variance of the least square estimates ? ̂ ? are expressed in terms of the elements of the inverse of the 𝑿′𝑿 matrix. The inverse of 𝑿′𝑿 times the constant 𝜎 2 represent the covariance matrix of ? ̂ ? . The diagonal elements of 𝜎 2 (𝑋 𝑋) −1 are the variance of ? ̂ 0 , ? ̂ 1 , ? ̂ 2 , … , ? ̂ ? , and the off-diagonal elements of the matrix are the covariances between respective pairwise ? ̂ ? . Therefore, The estimated ??(? ̂ 0 ) = 𝜎 2 ̂ ? 00 = √9.4758 ∗ 85.11 = 28.3987 and ??(? ̂ 1 ) = 𝜎 2 ̂ ? 11 = √9.4758 ∗ 0.25 = 1.5391 , ??(? ̂ 2 ) = 𝜎 2 ̂ ? 22 = √9.4758 ∗ 0.25 = 1.539 where ? = (𝑋 𝑋) −1 = [ 85.11 −3.50 −3.00 −3.50 0.25 0.00 −3.00 0.00 0.25 ] (c) The predicted nisin recovery when ? 1 = 14.5 ??? ? 2 = 12.5 is ? ̂ = [ 1 14.5 12.5 ] [ −171.2589 7.0290 12.6959 ] = 89.3597 Ch13 Solutions 1. (a) How many replicates did the experimenter use? Because the factor was tested over 4 levels, there are total of 4 treatments. Since the total degrees of freedom is 31, the total number of observations is 31 + 1 = 32. Therefore, each treatment has 32/4 = 8 replicates. (b) Fill in the missing information in the ANOVA table. Show all the steps of how you get the each missing information in the ANOVA table. Note that you can either find the range of P-values using the F distribution tables in the Appendix or compute the exact P-value using R and write the R code you used to find the P- value. Because the factor was tested over 4 levels there are 3 degrees of freedom for factor. Because there are 31 total degrees of freedom, df(Error) = 28. Because the ?? ?𝑎𝑐??? = 330.4716, the ?? ?𝑎𝑐??? = 3(330.4716) = 991.4148. Because the 𝐹 statistic equals ?? ?𝑎𝑐??? / ?? ????? = 4.42 = 330.4716/ ?? ????? . Therefore, ?? ????? = 74.76733.
Therefore, ?? ????? = ?? ????? /df (Error) = 74.76733. Therefore, ?? ????? = = 28(74.76733) = 2093.485 Therefore, ?? ? = ?? ?𝑎𝑐??? + ?? ????? = 3084.900 The P-value corresponds to an 𝐹 = 4.42 with 3 numerator and 28 denominator degrees of freedo m equals 0.012. (Use R code: p-value = pf(4.42, 3, 28, lower.tail = FALSE) ) By using Appendix Table IV for upper percentages points of F distribution for ? = 0.025 and ? = 0.01 , we find that ? 0.025,3,28 = 3.63 < ? 0 < ? 0.01,3,28 = 4.57, so the range for P-value is between 0.01 and 0.025 because P-value is defined as 𝑃(𝐹 3,28 > ? 0 ) . (c) What conclusions can you draw about differences in the factor-level means? Because the P-value = 0.012 < 0.05 (alternatively, P- value’s upper bound of 0.025 < 0.05), there are significant differences among the mean levels of the factor at significance level 0.05. 2. (a) Using α = 0.0 5, test the hypothesis that the three circuit types have the same response time. Circuit Type Response 𝒚 ? . 𝒚 ̅ ? . 1 19 22 20 18 25 104 20.8 2 20 21 33 27 40 141 28.2 3 16 15 18 26 17 92 18.4 𝒚 .? 55 58 71 71 82 𝒚 ̅ .? 18.33 19.33 23.67 23.67 27.33 Set up null and alternative hypotheses: 𝐻 0 : 𝜇 1 = 𝜇 2 = 𝜇 3 𝐻 1 : 𝜇 ? ≠ 𝜇 ? for at least one pair of (?, ?) Or, equivalently 𝐻 0 : 𝜏 1 = 𝜏 2 = 𝜏 3 = 0 𝐻 1 : 𝜏 ? ≠ 0 for at least one ? ? = 3, ? = 5, ? = ?? = 15. ? .. = ∑ ∑ ? ?? 5 ?=1 3 ?=1 = 337 ? ? . = ∑ ? ?? 5 ?=1 , ? = 1, … , 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
? .? = ∑ ? ?? 3 ?=1 , ? = 1, … , 5 ?? ? = ∑ ∑ ? ?? 2 ? .. 2 ? 5 ?=1 3 ?=1 = [(19) 2 + (22) 2 + ⋯ + (26) 2 + (17) 2 ] − 337 2 15 = 651.7 ?? ??𝑒𝑎??𝑒??? = ∑ ? ? . 2 ? 3 ?=1 ? .. 2 ? = 1 5 (104 2 + 141 2 + 92 2 ) − 337 2 15 = 260.9 ?? ? = ?? ? − ?? ??𝑒𝑎??𝑒??? = 651.73 − 260.9 = 390.8 ANOVA Table Source DF SS MS F P Treatments 2 260.9 130.5 4.01 0.046 Error 12 390.8 32.6 Total 14 651.7 Since ? 0 = 4.01 > ? 0.05,2,12 = 3.89 , we should reject the null hypothesis and conclude that there is sufficient evidence to claim that the types of circuit affects the mean response time of an electronic calculator. (b) Find the bounds for P-value. Using the upper percentage points table of F distributions, we found that ? 0.05,2,12 = 3.89 < 4.01 < ? 0.025,2,12 = 5.10, so the range for P-value is between 0.025 and 0.05 because P-value = 𝑃(𝐹 2,12 > ? 0 ) . Since the lower bound of P-value is greater than ? = 0.01 , we fail to reject the null hypothesis and we don’t have significant evidence to claim that the circuit type affects the mean response time of an electronic calculator. You can find the exact P-value = 𝑃(𝐹 2,12 > 4.01) = 0.046 using R code 1-pf(4.01, 2, 12) (c) Find a 95% two-sided confidence interval on the response time for circuit 3. ? ̅ 3. − ? 0.025,12 ?? ? ? ≤ 𝜇 3 ≤ ? ̅ 3. + ? 0.025,12 ?? ? ? 18.4 − 2.179 32.6 5 ≤ 𝜇 3 ≤ 18.4 + 2.179 32.6 5
12.84 ≤ 𝜇 3 ≤ 23.96 3. From Question 2, we rejected the null hypothesis and concluded that there is significance evidence to claim that the circuit type does affect the response time of an electric calculator. We can check this by applying the Fisher’s LSD method as follows. Apply Fisher’s LSD method with α = 0.0 5 to determine which levels of the factor differ. The three treatment means are ? ̅ 1 . = 20.8, ? ̅ 2. = 28.2, ? ̅ 3. = 18.4 and ? = 5 , ?? ? = 32.6 , and ? 0.025,12 = 2.179 . LSD = ? 0.025,12 2?? 𝐸 ? = 2.179√2 ∗ 32.6 5 = 7.87 . Therefore, for any pair of treatment averages that differs by more than 7.87 implies that the corresponding pair of treatment means are different. The comparison among the observed treatment averages are as follows: 1 vs 2 = |? ̅ 1 . −? ̅ 2 . | = | 20.8 28.2| = 7.4 < 7.87 1 vs 3 = |? ̅ 1 . −? ̅ 3 . | = | 20.8 18.4| = 2.4 < 7.87 2 vs 3 = |? ̅ 2 . −? ̅ 3 . | = | 28.2 18.4| = 9.8 > 7.87 From the analysis, indeed we have verified that there is significance difference between means of treatment 2 (circuit type 2) and treatment 3 (circuit type 3) at significance level of 0.05. If you are going to construct a 95% two-sided confidence interval on the treatment mean difference of circuit type 2 and circuit type 3, would you expect that zero will be included in the confidence interval? Why or why not? Compute the confidence interval to confirm your answer. 4. (a) How many levels of the factor were used in this experiment? Since MS factor = ?? ?𝑎𝑐??? ?? ?𝑎𝑐??? , DF factor = ?? ?𝑎𝑐??? ?? ?𝑎𝑐??? = 193.8 64.6 = 3 The number of levels for the factor = DF of factor + 1 = 3 + 1 = 4. Therefore, 4 levels of the factor are used in this experiment.
(b) How many blocks were used in this experiment? Because the number of blocks = ?? 𝐵 +1 = 3 + 1 =4, there are 4 blocks used in this experiment. (d) Fill in the missing information. Find the bounds for the P-value. Let ?? ? , ?? 𝐵 and ?? ? denote the degrees of freedoms for factors, blocks and error, respectively. From part a), ?? ? = 3, ?? ? = ?? ? – ?? ? − ?? 𝐵 = 15 3 3 = 9. ? 0 = ?? ?𝑎𝑐??? ?? ? = 64.6 4.464 = 14.4713 ?? ? = ?? ???𝑎? − ?? ?𝑎𝑐??? − ?? 𝐵??𝑐? = 698.19 − 193.8 − 464.218 = 40.172 Alternatively, ?? ? = ?? ? ∗ ?? ? = 4.464(9) = 40.176 ( difference due to rounding error ) Exact P-value = 𝑃(𝐹 3,9 > ? 0 ) = 0.00086 using R code: 1- pf(14.4713,3,9) Similar procedure as in Problem 2(b) can be done to find a rough bound for P-value using the Appendix upper percentage table for F distribution. (d) What conclusions would you draw if α = 0.05? What would you conclude if α = 0.01? Because the P-value is considerably smaller than 0.01 , we reject the null hypothesis at ? = 0.05 or 0.01. There are significance differences in the factor level means at ? = 0.05 or 0.01. 5. Mean Weight (lbs) Housing Air Temperature ( ) 50 60 70 80 90 100 𝒚 ? . 𝒚 ̅ ? . 1 1.37 1.58 2.00 1.97 1.40 0.39 8.71 1.45 2 1.47 1.75 2.16 1.82 1.14 -0.19 8.15 1.36 3 1.19 1.91 2.22 1.67 0.88 -0.77 7.10 1.18 𝒚 .? 4.03 5.24 6.38 5.46 3.42 -0.57 𝒚 .. = 23.96 𝒚 ̅ .? 1.34 1.75 2.13 1.82 1.14 -0.19 This is a Randomized Complete Block Design (RCBD) for single factor with 3 levels and 6 blocks. ? = 6, ? = 3, ?? = 18 Note that the above data is arranged differently; specifically, this table is a transposed version of what we have in the textbook and lecture notes. Here, we use index ? for blocks and index ? for treatments, i.e., different housing air temperature. ? .. = ∑ ∑ ? ?? 6 ?=1 3 ?=1 = 23.96
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
? ? . = ∑ ? ?? 6 ?=1 , ? = 1, … ,3 (????? ?) ? .? = ∑ ? ?? 3 ?=1 , ? = 1, … , 6 (????????? ?) Now, we compute the sum of squares using the computational formula as follows. ?? ? = ∑ ∑ ? ?? 2 ? .. 2 ?? 6 ?=1 3 ?=1 = [(1.37) 2 + (1.58) 2 + ⋯ + (0.88) 2 + (−0.77) 2 ] − 23.96 2 18 = 11.1588 ?? ??𝑒𝑎??𝑒??? = 1 ? ∑ ? .? 2 6 ?=1 ? .. 2 ?? = 1 3 (4.03 2 + 5.24 2 + ⋯ + (−0.57) 2 ) − 23.96 2 18 = 10.1852 ?? 𝐵??𝑐? = 1 ? ∑ ? ?. 2 3 ?=1 ? .. 2 ?? = 1 6 (8.71 2 + 8.15 2 + 7.10 2 ) − 23.96 2 18 = 0.2227 ?? ? = ?? ? − ?? ??𝑒𝑎??𝑒??? − ?? 𝐵??𝑐? = 0.7509 ?? ??𝑒𝑎??𝑒??? = ?? 𝑇??𝑎?????? 5 = 2.0370 , ?? 𝐵??𝑐? = ?? 𝐵??𝑐? 2 = 0.1113, M ? ? = ?? 𝐸 10 = 0.0751 F test statistic: ? 0 ??𝑒𝑎??𝑒??? = ?? 𝑇??𝑎????? ?? 𝐸 = 27.13, ? 0 𝐵??𝑐? = ?? 𝐵??𝑐? ?? 𝐸 = 1.48 The ANOVA table is summarized as below: Source DF SS MS F P Treatments 5 10.1852 2.0370 27.13 0.000 Block 2 0.2227 0.1113 1.48 0.2735 Error 10 0.7509 0.0751 Total 17 11.1588 Fixed −? approach : Because ? 0 ??𝑒𝑎??𝑒??? = 27.13 > ? 0.05,5,10 = 3.33 , we should reject the null hypothesis and conclude that the air temperature (treatment effect) affects the mean daily weight gain at significance level of 0.05. We can also test if the block effect is significant: because ? 0 𝐵??𝑐? = 1.48 < ? 0.05,2,10 = 4.10 , we fail to reject the null hypothesis for the block effect and conclude that the mean weight of swine (the block effect) doesn’t have significant effect on the mean daily weight gain at significance level of 0.05. P-value approach :
P-value for testing the treatments effect = 𝑃(𝐹 5,10 > 27.13) = 1.638322e − 05 (R code: pf(27.13, 5, 10, lower.tail=FALSE) or 1-pf(27.13, 5, 10) ) P-value for the block effect can also be found similary = (𝐹 2,10 > 1.48) = 0.2735 When statistical software is not available, we can use F-table to find a rough P-value range. For treatments effect, because ? 0 ??𝑒𝑎??𝑒??? = 27.13 > ? 0.01,5,10 = 5.64, we are sure that P-value is way less than 0.01; therefore we know the P-value for air temperature factor is way less than than ? = 0.05. For block effect, because ? 0 𝐵??𝑐? = 1.48 < ? 0.25,2,10 = 1.60, we are sure that the P-value for block effect is greater than 0.25; therefore, we know the P-value for block is greater than ? = 0.05. 6. (a) This is a complete randomized design (CRD) because different trainings (treatments) are allocated to different technicians in a random order. (b) 𝐻 0 : 𝜇 1 = 𝜇 2 = 𝜇 3 = 𝜇 4 = 𝜇 5 𝐻 1 : 𝜇 ? ≠ 𝜇 ? for at least one pair of (?, ?) where 𝜇 ? is the mean performance score obtained as a result of taking training ? ? Or, equivalently 𝐻 0 : 𝜏 1 = 𝜏 2 = 𝜏 3 = 𝜏 4 = 𝜏 5 = 0 𝐻 1 : 𝜏 ? ≠ 0 for at least one ? ?? ? = ∑ ∑ ? ?? 2 ? .. 2 ? ? 𝑖 ?=1 5 ?=1 = [8 2 + 7 2 + ⋯ + 9 2 + 5 2 ] − 180 2 23 = 55.3 ?? ??𝑒𝑎??𝑒??? = ∑ ? ? . 2 ? ? 5 ?=1 ? .. 2 ? = ( 26 2 4 + 49 2 6 + 45 2 5 + 36 2 5 + 24 2 3 ) − 180 2 23 = 16.67 where ??. ??? 26, 49, 45, 36 and 24 for training type 1 to 5, respectively. ?? ? = ?? ? − ?? ??𝑒𝑎??𝑒??? = 55.3 − 16.67 = 38.63 The ANOVA table is summarized as below: Source DF SS MS F Treatments 4 16.67 4.17 1.94 Error 18 38.63 2.15 Total 22 55.3 Because ? 0 = 1.94 < ? 0.05,4,18 = 2.93 , we fail to reject the null hypothesis and conclude that the training type doesn’t affect the technician performance at significance level of 0.05.
(c) ? 𝛼 2 ,?−𝑎 ?? 𝐸 ? 𝑖 = ? 0.025,18 2.15 4 = 2.101 ( 0.73 ) = 1.54 ? ̅ 1. − 1.54 ≤ 𝜇 1 ≤ ? ̅ 1. + 1.54 6.5 − 1.54 ≤ 𝜇 1 ≤ 6.5 + 1.54 4.96 ≤ 𝜇 1 ≤ 8.04 (d) ??? ?? = ? 𝛼 2 ,?−𝛼 √?? ? ( 1 ? ? + 1 ? ? ) = 2.101√2.15( 1 ? ? + 1 ? ? ) 1 𝑣? 2 ??? 12 = 1.99 1 𝑣? 4 ??? 14 = 2.06 2 𝑣? 4 ??? 24 = 1.86 |? ̅̅̅̅ − ? ̅̅̅̅| = 1.67 < 1.99 |? ̅̅̅̅ − ? ̅̅̅̅| = 0.7 < 2.06 |? ̅̅̅̅ − ? ̅̅̅̅| = 0.97 < 1.86 𝜇 1 , 𝜇 2 , and 𝜇 4 are approximately the same.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help