Assign3.2023.sol (1)

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

374

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

13

Uploaded by CountBoar3716

Report
STAT 404 - Assignment 3 Solutions Total marks: 60 Due date: Saturday, Nov. 4 at 11:59pm Reminder: Follow the Assignment Guidelines. Select pages when submitting to Gradescope. One-line R commands should only be used to verify answers. When performing a test, state the hypotheses, test statistic and distribution, p -value or critical value, and conclusion. If unspecified, use the default 5% significance level. 1. [20] We are interested in whether students do better on the midterm or on the final exam. The marks of 20 students from some course are given below. We wish to answer this question by performing a test for H 0 : δ = 0 against the alternative H 1 : δ ̸ = 0 based on the paired observations. Assume common notation and model assumptions. midterm = c(82.54,79.37,77.78,60.32,65.08,69.84,61.90,61.90,44.44, 63.49,80.95,84.13,76.19,88.89,73.02,85.71,71.43,84.13,84.13,69.84) final = c(85.67,84.44,80.83,69.06,67.73,81.75,73.62,62.61,66.56, 65.52,82.38,94.13,65.58,96.68,86.40,87.95,74.92,84.56,85.99,57.34) Remark: this group of students may not be representative of the general student population, nor may this course be representative of all courses at UBC or other places. The data set is used for illustration only. (a) [4] Perform the paired t-test for the above data. Answer. The hypotheses are stated in the question. We first find the differences for each pair: ( d i ) = final midterm = (3 . 13 , 5 . 07 , 3 . 05 , 8 . 74 , 2 . 65 , 11 . 91 , 11 . 72 , 0 . 71 , 22 . 12 , 2 . 03 , 1 . 43 , 10 . 00 , 10 . 61 , 7 . 79 , 13 . 38 , 2 . 24 , 3 . 49 , 0 . 43 , 1 . 86 , 12 . 50) . 1
The mean of d i is ¯ d = 4 . 432. The sample variance based on d i is given by s 2 = 1 20 1 X ( d i ¯ d ) 2 = 60 . 00 . The paired t-statistic has value T obs = ¯ d ( s 2 / 20) 0 . 5 = 2 . 559 . The reference distribution is t with 19 degrees of freedom. We compute the p-value as p = 2 × (1 pt(2 . 559 , 19)) = 0 . 019 for the two-sided alternative. Because 0 . 019 < 0 . 05, we reject the null hypothesis that students perform similarly on the midterm and the final ( δ = 0) in favour of the claim that performances are different at the conventional level 0 . 05. (b) [4] If one mistakes the problem for a two-sample problem, what would be the conclusion based on a two-sample t-test? (Perform the standard t-test.) Answer. The hypotheses are H 0 : µ m = µ f , H 1 : µ m ̸ = µ f . In this case, the pooled variance estimator is s 2 pool = var(midterm) + var(final) 2 = 125 . 12 . The test statistic has observed value (in conventional notation) T obs = ¯ y 2 ¯ y 1 q 1 / 20 + 1 / 20) s 2 pool = 77 . 686 73 . 254 q 125 . 12 10 = 1 . 253 . 2
The reference distribution is t with n 1 + n 2 2 = 38 and so the p-value would be p = 2 (1 pt(1 . 253 , 38)) = 0 . 218 . We do not reject the null hypothesis that students perform sim- ilarly on the midterm and the final ( δ = 0) at the conventional level 0 . 05. (c) [4] Conduct a randomization test for the above data. Answer. The hypotheses are stated in the question. Recall ¯ y 2 ¯ y 1 = 4 . 432. For the two-sided alternative, we calculate the proportion the times when | ¯ y 2 ¯ y 1 | < | ¯ y 2 ¯ y 1 | where indicates a hypothetical sample obtained by doing random flips for each pair independently. The following code computes this proportion. D.obs = mean(final - midterm) NN = 50000 DD= rep(0, NN) for (i in 1:NN) { ind = 2*rbinom(20, 1, 0.5) - 1 DD[i] = sum(midterm*ind - final*ind) / 20 } pp = 2*(sum(DD>D.obs) + 0.5*sum(DD==D.obs)) / NN The p-value is found to be 0 . 01764. We reject the null hypothesis that students perform similarly on the midterm and the final ( δ = 0) in favour of the claim that performances are different at the conventional level 0 . 05. Notice that this p-value is close to the one obtained from the paired t-test. (d) [4] Suppose one wishes to conduct an independent study on whether students tend to perform better in the final exam by an average of 3 marks or higher. The target is to obtain a significant outcome at the 5% level with probability 0 . 8. Based on the above data 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(assumed to be useful), how many pairs of observations should she collect for this separate study? Remark: independent thinking is needed as nothing discussed in class can be directly applied. Answer. We do a sample size calculation based on the paired t-test. Note that other approaches are acceptable as long as they are reasonable given the context and carried out correctly. Let δ = µ final µ midterm . The null hypothesis is that H 0 : δ 3 and the alternative is that H 1 : δ > 3. Let n be the number of pairs in the new study. Taking ¯ d new = ¯ y new:final ¯ y new:midterm and s 2 new = n i =1 ( d i ¯ d new ) 2 n 1 as defined in (a), the test statistic t 0 = ¯ d new 3 ( s 2 new /n ) 0 . 5 has a t-distribution with n 1 degrees of freedom under the as- sumption δ = 3. The upper α = 0 . 05 quantile of this distribution is given by q n = qt(1 α, n 1). To do a power calculation, we use the current data to obtain an effect size δ = ¯ d = 4 . 432. Under the alternative δ = 4 . 432, the statistic t 1 = ¯ d new 4 . 432 ( s 2 new /n ) 0 . 5 = t 0 1 . 132 ( s 2 new /n ) 0 . 5 has a t-distribution with n 1 degrees of freedom. This implies that we can compute the power for a given n as 1 β = P ( t 0 > q n | δ = 4 . 432) = P t 0 1 . 132 ( s 2 new /n ) 0 . 5 > q n 1 . 132 ( s 2 new /n ) 0 . 5 | δ = 4 . 432 = P t 1 > q n 1 . 132 ( s 2 new /n ) 0 . 5 | δ = 4 . 432 . Note that s 2 new is also a random variable that makes computing the probability inconvenient, and so for simplicity, we estimate it 4
using s 2 = 60 from the current data. To get 1 β 0 . 8, we find that we need at least n = 291 pairs in the new study. R code: nn = 10:300 alpha = 0.05 svar = var(final-midterm) pows = pt(qt(1-alpha,nn-1)-1.132/sqrt(svar/nn), nn-1, lower.tail=F) nn[which(pows > 0.8)[1]] (e) [4] Does a student have a higher or lower probability of doing better on the final compared to the midterm? Perform a test on this dataset to answer this question. Hint : the test you should use is not covered in STAT 404 (though likely in previous statistics courses). Think of a common distribu- tion that has probability as a parameter. The test is directly based on this distribution. Answer. The number of students who did better in the final is a good metric of whether or not students perform similarly on the final and midterm. If we denote this random variable as X , then it has a binomial distribution with parameter n = 20 and probability of success θ = 0 . 5 under the null assumption. That is, we may use Binom(20 , 0 . 5) as our reference distribution. The alternative hypothesis is that θ ̸ = 0 . 5. In this context, we have X obs = 18. The values 0 , 1 , 19 , 20 would be considered as more extreme observations in addition to the equally extreme observations 2 , 18. The p-value could be computed as p = P ( X ∈ { 0 , 1 , 19 , 20 } ) + 0 . 5 P ( X ∈ { 2 , 18 } ) = 0 . 00022 . If one does not apply continuity correction, they would get p = P ( X ∈ { 0 , 1 , 19 , 20 } ) + P ( X ∈ { 2 , 18 } ) = 0 . 00040 . 5
Both are acceptable. The outcome is highly significant, and we reject the null hypothesis that students perform similarly on the midterm and the final at the conventional level 0 . 05. Note: other tests are acceptable as long as they are reasonable given the context and carried out correctly. 2. [20] Four students conducted an experiment on paper helicopters. The response is the time it takes for a helicopter to touch the ground after being dropped from 2 meters above ground. Four helicopter designs are implemented. We use the data as if it were collected via a complete randomized block design . The data for these four students are given as yy1 , yy2 , yy3 , yy4 below. Each column corresponds to a helicopter design (a treatment). yy1 = c(1.56, 1.62, 2.14, 1.30) yy2 = c(1.53, 1.75, 2.02, 1.41) yy3 = c(1.58, 1.80, 1.97, 1.36) yy4 = c(1.60, 1.81, 1.93, 1.45) This problem asks you to go over all routine data analysis for the complete randomized design and a bit more. (a) [6] Construct the analysis of variance table. Answer. We compute the sum of squares of various effects. First, the grand mean is found to be ˆ η = ¯ y .. = 1 . 676875. The blocking effects of student volunteers are estimated to be ¯ y i · ¯ y ·· = ( 0 . 021875 , 0 . 000625 , 0 . 000625 , 0 . 020625) . Their SS is computed as SS b = 4 4 X i =1 y i · ¯ y ·· ) 2 = 0 . 00361875 . The treatment effects are estimated to be ¯ y · j ¯ y .. = ( 0 . 109375 , 0 . 068125 , 0 . 338125 , 0 . 296875) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
and the treatment SS is computed as SS trt = 4 4 X j =1 y · j ¯ y ·· ) 2 = 0 . 8762688 . The residuals have SS given by SS err = 4 X i =1 4 X j =1 ( y ij ¯ y i · ¯ y · j + ¯ y ·· ) 2 = 0 . 05945625 . The total SS is given by SS tot = 4 X i =1 4 X j =1 ( y ij ¯ y ·· ) 2 = 0 . 9393438 The MSS are obtained by dividing the SS by their corresponding DF. The F-statistic is obtained by dividing the treatment MSS by the error MSS. Hence, the ANOVA table is given by Source DF SS MSS F Volunteer 3 0.0036 0.0012 Treatment 3 0.8763 0.2921 44.21 Error 9 0.0594 0.0066 Total 15 0.9393 (b) [2] Test the hypothesis for H 0 : treatment effects are the same. Answer. The F-statistic is found in the ANOVA table. The p-value is computed as p = 1 pf(44 . 21 , 3 , 9) = 1 . 035 × 10 5 . The null hypothesis of no difference between treatment effects is rejected at the 5% level. 7
(c) [8] Regardless of the outcome of (b), use Tukey’s method to con- struct simultaneous confidence intervals for differences in treat- ment means. Answer. The pairwise treatment differences for (1 , 2), (1 , 3), (1 , 4), (2 , 3), (2 , 4), (3 , 4) are estimated as ˆ τ i ˆ τ j = ( 0 . 1775 , 0 . 4475 , 0 . 1875 , 0 . 2700 , 0 . 3650 , 0 . 6350) . The 95% Tukey quantile is given by qtukey(0 . 95 , 4 , 9) = 4 . 41489 . The width of the interval is given by qtukey(0 . 95 , 4 , 9) 2 s 1 4 + 1 4 MSS(error) = 0 . 1794186 . Hence, the simultaneous Tukey 95% CI for the difference between treatments (1 , 2), (1 , 3), (1 , 4), (2 , 3), (2 , 4), (3 , 4) are given by low -0.357 -0.627 0.008 -0.449 0.186 0.456 upp 0.002 -0.268 0.367 -0.091 0.544 0.814 All the differences except between treatments (1 , 2) are found to be significant. R code: eff.diff = c(ybar.trt[1]-ybar.trt[-1], ybar.trt[2]-ybar.trt[c(3,4)], ybar.trt[3]-ybar.trt[4]) ci.wdth = qtukey(0.95, 4, 9)*((1/4+1/4)*MSS.err/2)^.5 low = eff.diff - ci.wdth upp = eff.diff + ci.wdth (d) [4] Construct a 95% two-sided CI for τ 1 + τ 3 2 τ 2 using the rec- ommended universal recipe. 8
Answer. We estimate θ = τ 1 + τ 3 2 τ 2 by ˆ θ = y · 1 ¯ y ·· ) + (¯ y · 3 ¯ y ·· ) 2(¯ y · 2 ¯ y ·· ) = ¯ y · 1 + ¯ y · 3 y · 2 = 0 . 0925 . When regarded as a random variable (rather than an observed value), ˆ θ is the linear combination of 3 independent means where each is a mean of 4 observations. Hence, we have Var( ˆ θ ) = (1 / 4 + 1 / 4 + 1) σ 2 = 3 2 σ 2 . We naturally estimate it by d Var( ˆ θ ) = 3 2 MSS(error) = 0 . 0099 which has 9 degrees of freedom. Note qt(0 . 975 , 9) = 2 . 262157. Hence, a 95% CI for θ = τ 1 + τ 3 2 τ 2 is given by 0 . 0925 ± 2 . 262157 0 . 0099 = ( 0 . 133 , 0 . 318) . 3. [20] Animals were randomly allocated to 12 groups of 4. Each group was given one of 3 poisons and one of 4 treatments. The survival times of the animals are given in the following table. treatment poison A B C D I 0.31 0.82 0.43 0.45 0.45 1.10 0.45 0.71 0.46 0.88 0.63 0.66 0.43 0.72 0.76 0.62 II 0.36 0.92 0.44 0.56 0.29 0.61 0.35 1.02 0.40 0.49 0.31 0.71 0.23 1.24 0.40 0.38 III 0.22 0.30 0.23 0.30 0.21 0.37 0.25 0.36 0.18 0.38 0.24 0.31 0.23 0.29 0.22 0.33 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(Source: Box, Hunter, and Hunter, altered slightly). The following code might be useful. n = 4 k1 = 4 k2 = 3 dat = data.frame(times = c(0.31,0.82,0.43,0.45, 0.45,1.10,0.45,0.71, 0.46,0.88,0.63,0.66, 0.43,0.72,0.76,0.62, 0.36,0.92,0.44,0.56, 0.29,0.61,0.35,1.02, 0.40,0.49,0.31,0.71, 0.23,1.24,0.40,0.38, 0.22,0.30,0.23,0.30, 0.21,0.37,0.25,0.36, 0.18,0.38,0.24,0.31, 0.23,0.29,0.22,0.33), group = rep(c("A","B","C","D"), n*k2), poison = rep(c("I","II","III"), each=n*k1)) (a) [8] Construct the analysis of variance table for this two-way layout design. Answer. The calculations for the SS corresponding to the main effects are similar to that described in Q(2a). The interaction SS is given by SS int = 4 3 X i =1 4 X j =1 y ij. ¯ y i.. ¯ y .j. + ¯ y ... ) 2 . The error SS is found by subtracting all of the other SS from the total SS. The ANOVA table is given by 10
Source DF SS MSS F Treatment 3 0.9212 0.3071 13.80 Poison 2 1.033 0.5165 23.22 Interaction 6 0.2501 0.0417 1.874 Error 36 0.8007 0.0222 Total 47 2.0014 R code: I = 4 J = 3 n = 4 yy1 = matrix(c(0.31,0.82,0.43,0.45, 0.45,1.10,0.45,0.71, 0.46,0.88,0.63,0.66, 0.43,0.72,0.76,0.62), 4, 4, byrow=T) yy2 = matrix(c(0.36,0.92,0.44,0.56, 0.29,0.61,0.35,1.02, 0.40,0.49,0.31,0.71, 0.23,1.24,0.40,0.38), 4, 4, byrow=T) yy3 = matrix(c(0.22,0.30,0.23,0.30, 0.21,0.37,0.25,0.36, 0.18,0.38,0.24,0.31, 0.23,0.29,0.22,0.33), 4, 4, byrow=T) mean.trt = colMeans(rbind(yy1,yy2,yy3)) mean.poison = c(mean(yy1), mean(yy2), mean(yy3)) mean.int = rbind(colMeans(yy1),colMeans(yy2),colMeans(yy3)) ybar = mean(mean.trt) # Compute SS SS.trt = n*J*sum((mean.trt-ybar)^2) SS.poison = n*I*sum((mean.poison-ybar)^2) SS.int = n*sum((mean.int -matrix(rep(mean.poison,I),J,I,byrow=F) -matrix(rep(mean.trt,J),J,I,byrow=T) 11
+ybar)^2) SS.total = sum((yy1-ybar)^2) + sum((yy2-ybar)^2) + sum((yy3-ybar)^2) SS.err = SS.total - SS.poison - SS.trt - SS.int # Compute MSS MS.trt = SS.trt / (I-1) MS.poison = SS.poison / (J-1) MS.int = SS.int / ((I-1)*(J-1)) MS.err = SS.err / (I*J*(n-1)) # Compute F-statistic F.trt = MS.trt / MS.err F.poison = MS.poison / MS.err F.int = MS.int / MS.err (b) [4] Test the hypotheses of whether the effects of poison and treat- ment are significant. Answer. For each of the factors, we test the hypotheses that the main effects are equal ( H 0 ) against the alternative that at least two of the levels have different effects. The 95% quantiles of F 3 , 36 and F 2 , 36 are 2 . 866266 and 3 . 259446, respectively. Using these reference distributions and the observed F-statistics obtained in (a), we reject H 0 for both treatment and poison and conclude that both have significant effects. R code: qf(0.95, 3, 36) qf(0.95, 2, 36) (c) [4] Test the hypothesis of whether the interaction effect is signifi- cant. Answer. For the interaction, we test the hypotheses that the interaction effects are equal ( H 0 ) against the alternative that the effects are not all equal. 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The 95% quantile of F 6 , 36 is 2 . 363751. Using this reference distribution and the observed F-statistic ob- tained in (a), we do not reject H 0 and conclude that the interaction between treatment and poison is not significantly different from 0. R code: qf(0.95, 6, 36) (d) [4] Predict the survival time of an animal administered poison II and treatment C. Estimate the variance of the prediction. Answer. In (c), we concluded that the interaction between poison and treatment is insignificant. Hence, our predicted survival time of an animal administered poison II ( α 2 ) and treatment C ( β 3 ) is given by ˆ y = ˆ η + ˆ α 2 + ˆ β 3 = 0 . 4575 units . Under standard assumptions of the model (independent units), the variance is given by Var(ˆ y ) = Var(ˆ η + ˆ α 2 + ˆ β 3 ) = Var(¯ y ... + (¯ y 2 .. ¯ y ... ) + (¯ y . 3 . ¯ y ... )) = Var(¯ y 2 .. + ¯ y . 3 . ¯ y ... ) = Var(¯ y 2 .. ) + Var(¯ y . 3 . ) + Var(¯ y ... ) + 2Cov(¯ y 2 .. , ¯ y . 3 . ) 2Cov(¯ y 2 .. , ¯ y ... ) 2Cov(¯ y . 3 . , ¯ y ... ) = Var(¯ y 2 .. ) + Var(¯ y . 3 . ) + Var(¯ y ... ) + 2 × 4 12 × 16 Var(¯ y 231 ) 2 × 16 16 × 48 Var( y 211 ) 2 × 12 12 × 48 Var( y 131 ) = σ 2 16 + σ 2 12 + σ 2 48 + σ 2 24 σ 2 24 σ 2 24 = σ 2 8 . Estimating σ 2 by MSS err , we get d Var(ˆ y ) = 0 . 0028. 13