Final2022sol (1)

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

374

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by CountBoar3716

Report
DO NOT OPEN THIS EXAM UNTIL INSTRUCTED TO Name: Student #: THE UNIVERSITY OF BRITISH COLUMBIA FINAL EXAM, 2022, Dec. 13, 3:30–6:00 pm STAT 404: Design and Analysis of Experiments Number of questions: 9 Total marks: 100 + 1 bonus Preamble : 1. Write your name and student number on the upper-right corner of every page. 2. All questions require explanations unless specified otherwise. 3. Written questions only require a few complete sentences. Do NOT write essays to answer these questions. 4. Answering questions in incomplete sentences will result in only partial marks. 5. Save the R code you used in a .doc , .docx , .rtf , or .txt file. Include comments describing which question the code block is used for. Leave sufficient space between code for different questions. Submit your code to Canvas. 6. Write on the back of the previous page if you need more space for your solution. 7. Your written solution is what is being marked and should include your answer and any supporting work (explanations, formulas, calculations, etc.). Your code supports your answers but are not answers themselves. 8. Pay attention to the number of marks each question is worth. Plan your time wisely. 9. Unless otherwise specified: (a) assume common notations and model assumptions; (b) use the conventional 5% level for tests, hypothesis for two-sided alternatives, and 95% confidence level. 1
1. [6+1] We emphasize three design of experiment dogmas/principles in STAT 404: randomization, blocking and replication. (a) [2 + 1] Name a design which employs blocking but does not have the word “blocking” in its name. A bonus mark if you can name two of them. Answer : One example is “paired experiment” in which each pair is a block. An- other example is “Latin Square” where one of the factors is used as block. This one is a bit forceful. (b) [2] Randomization involves randomly assigning treatments to units. Describe another way to use randomization for analyzing a two-sample design/problem. Answer : We use it to justify the randomization test, for equal mean null hypoth- esis, for instance. (c) [2] Name one thing that we cannot do in a full-factorial design without repli- cates. Describe what our suggested remedy to this is for data analysis. Answer : We do not have a “proper” estimate of the error variance without repli- cates. We can no-longer use F-test under the usual model assumption for the significance of various effects. We use half-normal plot to visually identify effects that seemly significant. 2. [4] Name two differences between the 2-level fractional factorial design and the 2- level full factorial design based on our discussions in this course. We accept any sensible suggestions, but be sure to use complete sentences. Beware that incorrect statements will be penalized even if the general ideas are correct. Answer : The experiment is not carried out on all level combinations under a 2- level fractional factorial design , unlike the 2-level full factorial design . Factors and interactions are aliased into groups a 2-level fractional factorial design , forcing us to use convention to identify factors with significant effects. 3. [6] Under the standard linear model, the relationship between the response variable y and predictors/covariates x 1 , . . . , x p are as follows: y = β 0 + x 1 β 1 + x 2 β 2 + · · · + x p β p + . The collected data are denoted as { y i , x i } , i = 1 , . . . , n . We omit other details but highlight that (1) the predictors are not random and (2) the error term i are i.i.d. N(0, σ 2 ). The least squares estimator of the regression coefficient vector (in matrix notation) is given by ˆ β = ( X > n X n ) - 1 X > n y n . If it helps, you may consider a concrete example with p = 2 and n = 4. (a) [3] Suppose instead that the error distribution has mean 0 and variance σ 2 but is not necessarily normal . Name a well-known property of ˆ β under the standard model that is no longer valid. Provide a brief explanation (not a proof). Answer : The distribution of ˆ β is not normal. The normality under the usual model assumption is based on the fact that ”any linear combination of jointly normally distributed random variables is still normally distributed.” 2
(b) [3] Suppose that the values of the predictors x 1 , . . . , x p are scaled by a factor of 2. Describe the effect of this scaling on ˆ β in the standard model. Provide a brief explanation. Answer : Note that = (2 x ) * ( β/ 2). Hence, if x value is scaled by a factor of 2, the β value will be scaled by a factor of 1 / 2. 4. [17] Three poisons (I, II, III) are randomly allocated to animals in four groups (A, B, C, D). Three animals in each group receive the same poison. The survival times of the animals are given in the following table. group poison A B C D 0.31 0.82 0.43 0.45 I 0.45 1.10 0.45 0.71 0.46 0.88 0.63 0.66 0.36 0.92 0.44 0.56 II 0.29 0.61 0.35 1.02 0.40 0.49 0.31 0.71 0.22 0.30 0.23 0.30 III 0.21 0.37 0.25 0.36 0.18 0.38 0.24 0.31 The code to load the data is provided in the file Rcode2022final.txt on Canvas. (a) [6] A sloppy professor regarded the design as a one-way layout with three treatments being the three poisons. Complete his ANOVA table. Not every cell needs to be filled. Source DF SS MSS F Treatment 2 0.735 0.368 9.579 Error 33 1.266 0.038 Total 35 2.001 (b) [2] Determine whether he finds the treatment effect significant at the 5% level (under the wrong model). Answer : One can compute the p-value of the test for H 0 : τ 1 = τ 2 = τ 3 = 0 by P ( F > 9 . 579; 2 , 33) = 0 . 0005 which warrants rejection at 5% level. (c) [6] Compute his simultaneous 95% CI’s for the three differences in mean treat- ment effects using Tukey’s method (under the wrong model). 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Answer : We may first obtain the Tukey’s quantile qtukey ( . 95 , 3 , 33) = 3 . 470. The half length is the given by 3 . 470 / 2 0 . 5 * 0 . 038 0 . 5 * (1 / 12 + 1 / 12) 0 . 5 = 0 . 195 . These intervals are then found to be ( - 0 . 122 , 0 . 27); (0 . 137 . 0 . 530); (0 . 063 , 0 . 4554) . (d) [3] State if the MSS(err) value that he obtained is larger or smaller than the one he would obtain in the correct two-way layout. Provide a brief explanation. Do not compute the actual values. Answer : The MSS(err) would be larger than the one based on correct two-way layout in general. By ignoring the effect of group, the SS contribution of the group will be absorbed into SS(err). 5. [15] A 2-level fractional factorial design with 8 factors can be formed by either of the following two sets of defining relations: (A) 6 = 124 ; 7 = 135 ; 8 = 245 . (B) 6 = 125 ; 7 = 1235 ; 8 = 1245 . (a) [6] Derive the defining contrasts subgroup of both designs. Answer : The defining contrasts subgroup of design (A) is given by I = 1246 = 1357 = 2458 = 234567 = 1568 = 123478 = 3678 . The defining contrasts subgroup of design (B) is given by I = 1256 = 12357 = 12458 = 367 = 468 = 3478 = 12345678 . (b) [2] Determine the resolutions of these two designs. Answer : The resolutions are 4 and 3 for designs (A) and (B) respectively. (c) [4] Determine the effects (main or interaction) confounded with the main effect of factor 2 in Design (B). Answer : Multiplying factor 2 to the defining contrasts subgroup of design (B), we get the aliasing set 2 = 156 = 1357 = 1458 = 2367 = 2468 = 23478 = 1345678 . (d) [3] Suppose there are 4 replicates for each run. Calculate the degrees of freedom for SS(err). Answer : The sum of squares formed by 4 replicates in each run has 3 degrees of freedom. The current design has 2 8 - 3 = 32 runs. The total degrees of freedom is therefore 32 * 3 = 96. 4
6. [6] Consider a 2-level fractional factorial design with 6 treatment factors and 2 blocking factors. The defining relations are given by 6 = 124 ; B 1 = 135 ; B 2 = 245 . (a) [3] State how many blocks this design has. Provide a brief explanation. Answer : This design has two blocking factors both at 2-levels. They therefore form 4 level combinations or lead to 4 blocks. (b) [3] State how many runs there are in each block. Provide a brief explanation. Answer : The design before the blocking has 2 6 - 1 = 32 runs. Hence, there are 32 / 4 = 8 runs in each block. 7. [28] In a door panel stamping experiment, 6 factors (each at 2 levels) were chosen and studied for their effects on the formality of a panel. One measure of formality is the thinning percentage of the stamped panel at a critical position. The six factors are (A) concentration of lubricant, (B) panel thickness, (C) force on the outer portion of the panel, (D) force on the inner portion of the panel, (E) punch speed, and (F) thickness of lubrication. The experiment was done over two days. “Day” was consider to be a blocking factor (G) to reduce the influence of the day-to-day variation, with “ - ” representing day 1 and “+” day 2. The experiment used a 2 7 - 2 resolution IV design with defining contrasts subgroup I = ABCF = CDEG = ABDEFG . We have k = 6 , p = 1 and b = 1 in our notation. Yet, it is called 2 7 - 2 because G is regarded as a factor. The code to load the design matrix and response y is provided in the file Rcode2022final.txt on Canvas. Note that the design is different from the one used in the assignment. (a) [10] Derive the alias groups that contain a main factor. Answer : There are six main factors in this experiment. The alias groups are A = BCF = ACDEG = BDEFG B = ACF = BCDEG = ADEFG C = ABF = DEG = ABCDEFG D = ABDCF = CEG = ABEFG E = ABDEF = CDG = ABDFG F = ABC = CDEFG = ABDEG (b) [4] Name all two-factor interactions that are not confounded with any other two-factor interactions. Answer : There are 6 factors in this design. So the total number of 2-fi’s is 15. Any 2fi’s contained in ABCF are confounded with other two factor interactions. There 5
are six of them. The rest of them (9 in total) are not or they will form a length 4 word. So the set of two-factor interactions that are not confounded with any other two-factor interactions are AD, AE, BD, BE, CD, CE, DE, DF, EF. (c) [6] The file Rcode2022final.txt provides logit( y ) values and some useful code. Compute effect estimates for A , B , C , AB , AC , and BC (6 effects). Do not consider other factors that are aliased with them if any. Answer : The estimates of these factors are ˆ μ A = 3 . 4545; ˆ μ B = 0 . 01275; ˆ μ C = - 6 . 02; and ˆ μ AB = 3 . 4545; ˆ μ AC = 1 . 7776; ˆ μ BS = 0 . 4576 . (d) [4] Effect estimates for all alias groups are given in the file Rcode2022final.txt . Identify the significant effects using a half-normal plot (based on your discretion). Write the fitted model. Factors A, C and AC are judged to have significant effects. The fitted model would be ˆ y = ˆ η ± 0 . μ A + 0 . μ C ± ˆ μ AC . (e) [4] Describe the recommended factor settings for reducing/minimizing percent- age thinning. Answer : The best choice is to have A = - and C = +. 8. [12] Jewelry appraisers recorded the clarity, the carat (a measure of mass), and the suggested prices (in hundreds of dollars) of several diamonds. The clarity grades range from 1 to 6 where higher-grade diamonds are more desirable. Regard carat as a covariate, clarity as the treatments, and price as the response. Note that the dataset differs from the one in the lab. The code to load the data is provided in the file Rcode2022final.txt on Canvas. (a) [8] Complete the ANCOVA table. Not every cell needs to be filled. Source DF SS MSS F Treatment 5 619.1 123.8 4.086 Regression 1 6402 6402 211.3 Error 23 697.0 30.30 Total 29 14265 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(b) [4] Construct a 95% two-sided confidence interval for the error variance σ 2 . Remark: you have the knowledge to work on this problem though this was not directly discussed in this course. Answer : Use the classical chisquare test. The distribution of SS ( err ) 2 is chisquare with df = 23. The 2 . 5% and 97 . 5% quantiles of this distribution are 11 . 69 and 38 . 08. These lead to the CI [18 . 31 , 59 . 63). 9. [6] The model for analysis of covariance is postulated as y ij = η + τ i + β ( x ij - ¯ x ·· ) + ij with ij being i.i.d. N(0, σ 2 ). The covariate x is regarded as non-random. We consider a scalar x and omit other model details here. We estimate the i -th treatment mean by ˆ τ i = ¯ y i · - ¯ y ·· - ˆ β x i · - ¯ x ·· ) with estimated regression coefficient ˆ β = S xy S xx = i j ( x ij - ¯ x i · ) y ij i j ( x ij - ¯ x i · ) 2 . (a) [3] Prove that Cov(¯ y i · , ˆ β ) = 0. Proof : Note that ˆ β is a linear combination of y ij . It is seen that Cov(¯ y i · , y ij ) = (1 /n i )Var( y ij ) = σ 2 /n i . In addition, for i 0 6 = i , we have Cov(¯ y i · , y ij ) = 0. Hence, Cov(¯ y i · , ˆ β ) = j ( x ij - ¯ x i · )Cov(¯ y i · , y ij ) i j ( x ij - ¯ x i · ) 2 = j ( x ij - ¯ x i · ) { σ 2 /n i } i j ( x ij - ¯ x i · ) 2 = 0 × { σ 2 /n i } i j ( x ij - ¯ x i · ) 2 = 0 . Be aware that the summation over i in the numerator of the above expression is removed due to the corresponding covariance is 0. (b) [3] Show that in general, Cov(ˆ τ i , ˆ β ) 6 = 0. Proof : The conclusion in part (a) implies Cov(ˆ τ i , ˆ β ) = Cov(¯ y i · , ˆ β ) + Cov(¯ y ·· , ˆ β ) + (¯ x i · - ¯ x ·· ) 2 Var( ˆ β ) = 0 + 0 + (¯ x i · - ¯ x ·· ) 2 Var( ˆ β ) which is apparently not zero unless (¯ x i · - ¯ x ·· ) 2 = 0 or Var( ˆ β ) = 0. Both occur only in exceptional situations. 7