Assign3.2023.sol (1)
pdf
keyboard_arrow_up
School
University of British Columbia *
*We aren’t endorsed by this school
Course
374
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
13
Uploaded by CountBoar3716
STAT 404 - Assignment 3 Solutions
Total marks: 60
Due date: Saturday, Nov. 4 at 11:59pm
Reminder:
•
Follow the Assignment Guidelines.
•
Select pages when submitting to Gradescope.
•
One-line R commands should only be used to verify answers.
•
When performing a test, state the hypotheses, test statistic
and distribution,
p
-value or critical value, and conclusion. If
unspecified, use the default 5% significance level.
1. [20] We are interested in whether students do better on the midterm
or on the final exam. The marks of 20 students from some course are
given below.
We wish to answer this question by performing a test
for
H
0
:
δ
= 0 against the alternative
H
1
:
δ
̸
= 0 based on the paired
observations. Assume common notation and model assumptions.
midterm = c(82.54,79.37,77.78,60.32,65.08,69.84,61.90,61.90,44.44,
63.49,80.95,84.13,76.19,88.89,73.02,85.71,71.43,84.13,84.13,69.84)
final = c(85.67,84.44,80.83,69.06,67.73,81.75,73.62,62.61,66.56,
65.52,82.38,94.13,65.58,96.68,86.40,87.95,74.92,84.56,85.99,57.34)
Remark: this group of students may not be representative of the general
student population, nor may this course be representative of all courses
at UBC or other places. The data set is used for illustration only.
(a) [4] Perform the paired t-test for the above data.
Answer.
The hypotheses are stated in the question. We first find
the differences for each pair:
(
d
i
)
=
final
−
midterm
=
(3
.
13
,
5
.
07
,
3
.
05
,
8
.
74
,
2
.
65
,
11
.
91
,
11
.
72
,
0
.
71
,
22
.
12
,
2
.
03
,
1
.
43
,
10
.
00
,
−
10
.
61
,
7
.
79
,
13
.
38
,
2
.
24
,
3
.
49
,
0
.
43
,
1
.
86
,
−
12
.
50)
.
1
The mean of
d
i
is
¯
d
= 4
.
432. The sample variance based on
d
i
is
given by
s
2
=
1
20
−
1
X
(
d
i
−
¯
d
)
2
= 60
.
00
.
The paired t-statistic has value
T
obs
=
¯
d
(
s
2
/
20)
0
.
5
= 2
.
559
.
The reference distribution is t with 19 degrees of freedom.
We
compute the p-value as
p
= 2
×
(1
−
pt(2
.
559
,
19)) = 0
.
019
for the two-sided alternative. Because 0
.
019
<
0
.
05, we reject the
null hypothesis that students perform similarly on the midterm
and the final (
δ
= 0) in favour of the claim that performances are
different at the conventional level 0
.
05.
(b) [4] If one mistakes the problem for a two-sample problem, what
would be the conclusion based on a two-sample t-test? (Perform
the standard t-test.)
Answer.
The hypotheses are
H
0
:
µ
m
=
µ
f
,
H
1
:
µ
m
̸
=
µ
f
.
In this case, the pooled variance estimator is
s
2
pool
=
var(midterm) + var(final)
2
= 125
.
12
.
The test statistic has observed value (in conventional notation)
T
obs
=
¯
y
2
−
¯
y
1
q
1
/
20 + 1
/
20)
s
2
pool
=
77
.
686
−
73
.
254
q
125
.
12
10
= 1
.
253
.
2
The reference distribution is t with
n
1
+
n
2
−
2 = 38 and so the
p-value would be
p
= 2
∗
(1
−
pt(1
.
253
,
38)) = 0
.
218
.
We do not reject the null hypothesis that students perform sim-
ilarly on the midterm and the final (
δ
= 0) at the conventional
level 0
.
05.
(c) [4] Conduct a randomization test for the above data.
Answer.
The hypotheses are stated in the question.
Recall
¯
y
2
−
¯
y
1
= 4
.
432. For the two-sided alternative, we calculate the
proportion the times when
|
¯
y
2
−
¯
y
1
|
<
|
¯
y
∗
2
−
¯
y
∗
1
|
where
∗
indicates a
hypothetical sample obtained by doing random flips for each pair
independently. The following code computes this proportion.
D.obs = mean(final - midterm)
NN = 50000
DD= rep(0, NN)
for (i in 1:NN) {
ind = 2*rbinom(20, 1, 0.5) - 1
DD[i] = sum(midterm*ind - final*ind) / 20
}
pp = 2*(sum(DD>D.obs) + 0.5*sum(DD==D.obs)) / NN
The p-value is found to be 0
.
01764. We reject the null hypothesis
that students perform similarly on the midterm and the final (
δ
=
0) in favour of the claim that performances are different at the
conventional level 0
.
05.
Notice that this p-value is close to the one obtained from the
paired t-test.
(d) [4] Suppose one wishes to conduct an independent study on whether
students tend to perform better in the final exam by an average of
3 marks or higher. The target is to obtain a significant outcome
at the 5% level with probability 0
.
8.
Based on the above data
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(assumed to be useful), how many pairs of observations should
she collect for this separate study?
Remark: independent thinking is needed as nothing discussed in
class can be directly applied.
Answer.
We do a sample size calculation based on the paired
t-test. Note that other approaches are acceptable as long as they
are reasonable given the context and carried out correctly.
Let
δ
=
µ
final
−
µ
midterm
. The null hypothesis is that
H
0
:
δ
≤
3
and the alternative is that
H
1
:
δ >
3. Let
n
be the number of
pairs in the new study. Taking
¯
d
new
= ¯
y
new:final
−
¯
y
new:midterm
and
s
2
new
=
∑
n
i
=1
(
d
i
−
¯
d
new
)
2
n
−
1
as defined in (a), the test statistic
t
0
=
¯
d
new
−
3
(
s
2
new
/n
)
0
.
5
has a t-distribution with
n
−
1 degrees of freedom under the as-
sumption
δ
= 3. The upper
α
= 0
.
05 quantile of this distribution
is given by
q
n
= qt(1
−
α, n
−
1).
To do a power calculation, we use the current data to obtain an
effect size
δ
=
¯
d
= 4
.
432. Under the alternative
δ
= 4
.
432, the
statistic
t
1
=
¯
d
new
−
4
.
432
(
s
2
new
/n
)
0
.
5
=
t
0
−
1
.
132
(
s
2
new
/n
)
0
.
5
has a t-distribution with
n
−
1 degrees of freedom. This implies
that we can compute the power for a given
n
as
1
−
β
=
P
(
t
0
> q
n
|
δ
= 4
.
432)
=
P
t
0
−
1
.
132
(
s
2
new
/n
)
0
.
5
> q
n
−
1
.
132
(
s
2
new
/n
)
0
.
5
|
δ
= 4
.
432
=
P
t
1
> q
n
−
1
.
132
(
s
2
new
/n
)
0
.
5
|
δ
= 4
.
432
.
Note that
s
2
new
is also a random variable that makes computing
the probability inconvenient, and so for simplicity, we estimate it
4
using
s
2
= 60 from the current data. To get 1
−
β
≥
0
.
8, we find
that we need at least
n
= 291 pairs in the new study.
R code:
nn = 10:300
alpha = 0.05
svar = var(final-midterm)
pows = pt(qt(1-alpha,nn-1)-1.132/sqrt(svar/nn), nn-1, lower.tail=F)
nn[which(pows > 0.8)[1]]
(e) [4] Does a student have a higher or lower
probability
of doing
better on the final compared to the midterm? Perform a test on
this dataset to answer this question.
Hint
: the test you should use is not covered in STAT 404 (though
likely in previous statistics courses). Think of a common distribu-
tion that has probability as a parameter. The test is directly based
on this distribution.
Answer.
The number of students who did better in the final is
a good metric of whether or not students perform similarly on
the final and midterm. If we denote this random variable as
X
,
then it has a binomial distribution with parameter
n
= 20 and
probability of success
θ
= 0
.
5 under the null assumption.
That
is, we may use Binom(20
,
0
.
5) as our reference distribution. The
alternative hypothesis is that
θ
̸
= 0
.
5.
In this context, we have
X
obs
= 18. The values 0
,
1
,
19
,
20 would be
considered as more extreme observations in addition to the equally
extreme observations 2
,
18. The p-value could be computed as
p
=
P
(
X
∈ {
0
,
1
,
19
,
20
}
) + 0
.
5
P
(
X
∈ {
2
,
18
}
) = 0
.
00022
.
If one does not apply continuity correction, they would get
p
=
P
(
X
∈ {
0
,
1
,
19
,
20
}
) +
P
(
X
∈ {
2
,
18
}
) = 0
.
00040
.
5
Both are acceptable.
The outcome is highly significant, and we
reject the null hypothesis that students perform similarly on the
midterm and the final at the conventional level 0
.
05.
Note: other tests are acceptable as long as they are reasonable
given the context and carried out correctly.
2. [20] Four students conducted an experiment on paper helicopters. The
response is the time it takes for a helicopter to touch the ground after
being dropped from 2 meters above ground.
Four helicopter designs
are implemented. We use the data as if it were collected via a
complete
randomized block design
.
The data for these four students are given
as
yy1
,
yy2
,
yy3
,
yy4
below. Each column corresponds to a helicopter
design (a treatment).
yy1 = c(1.56, 1.62, 2.14, 1.30)
yy2 = c(1.53, 1.75, 2.02, 1.41)
yy3 = c(1.58, 1.80, 1.97, 1.36)
yy4 = c(1.60, 1.81, 1.93, 1.45)
This problem asks you to go over all routine data analysis for the
complete randomized design and a bit more.
(a) [6] Construct the analysis of variance table.
Answer.
We compute the sum of squares of various effects.
First, the grand mean is found to be ˆ
η
= ¯
y
..
= 1
.
676875.
The blocking effects of student volunteers are estimated to be
¯
y
i
·
−
¯
y
··
= (
−
0
.
021875
,
0
.
000625
,
0
.
000625
,
0
.
020625)
.
Their SS is computed as
SS
b
= 4
4
X
i
=1
(¯
y
i
·
−
¯
y
··
)
2
= 0
.
00361875
.
The treatment effects are estimated to be
¯
y
·
j
−
¯
y
..
= (
−
0
.
109375
,
0
.
068125
,
0
.
338125
,
−
0
.
296875)
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
and the treatment SS is computed as
SS
trt
= 4
4
X
j
=1
(¯
y
·
j
−
¯
y
··
)
2
= 0
.
8762688
.
The residuals have SS given by
SS
err
=
4
X
i
=1
4
X
j
=1
(
y
ij
−
¯
y
i
·
−
¯
y
·
j
+ ¯
y
··
)
2
= 0
.
05945625
.
The total SS is given by
SS
tot
=
4
X
i
=1
4
X
j
=1
(
y
ij
−
¯
y
··
)
2
= 0
.
9393438
The MSS are obtained by dividing the SS by their corresponding
DF. The F-statistic is obtained by dividing the treatment MSS by
the error MSS. Hence, the ANOVA table is given by
Source
DF
SS
MSS
F
Volunteer
3
0.0036
0.0012
Treatment
3
0.8763
0.2921
44.21
Error
9
0.0594
0.0066
Total
15
0.9393
(b) [2] Test the hypothesis for
H
0
: treatment effects are the same.
Answer.
The F-statistic is found in the ANOVA table.
The
p-value is computed as
p
= 1
−
pf(44
.
21
,
3
,
9) = 1
.
035
×
10
−
5
.
The null hypothesis of no difference between treatment effects is
rejected at the 5% level.
7
(c) [8] Regardless of the outcome of (b), use Tukey’s method to con-
struct simultaneous confidence intervals for differences in treat-
ment means.
Answer.
The pairwise treatment differences for (1
,
2), (1
,
3),
(1
,
4), (2
,
3), (2
,
4), (3
,
4) are estimated as
ˆ
τ
i
−
ˆ
τ
j
= (
−
0
.
1775
,
−
0
.
4475
,
0
.
1875
,
−
0
.
2700
,
0
.
3650
,
0
.
6350)
.
The 95% Tukey quantile is given by
qtukey(0
.
95
,
4
,
9) = 4
.
41489
.
The width of the interval is given by
qtukey(0
.
95
,
4
,
9)
√
2
s
1
4
+
1
4
MSS(error) = 0
.
1794186
.
Hence, the simultaneous Tukey 95% CI for the difference between
treatments (1
,
2), (1
,
3), (1
,
4), (2
,
3), (2
,
4), (3
,
4) are given by
low -0.357 -0.627 0.008 -0.449 0.186 0.456
upp
0.002 -0.268 0.367 -0.091 0.544 0.814
All the differences except between treatments (1
,
2) are found to
be significant.
R code:
eff.diff = c(ybar.trt[1]-ybar.trt[-1],
ybar.trt[2]-ybar.trt[c(3,4)],
ybar.trt[3]-ybar.trt[4])
ci.wdth = qtukey(0.95, 4, 9)*((1/4+1/4)*MSS.err/2)^.5
low = eff.diff - ci.wdth
upp = eff.diff + ci.wdth
(d) [4] Construct a 95% two-sided CI for
τ
1
+
τ
3
−
2
τ
2
using the rec-
ommended universal recipe.
8
Answer.
We estimate
θ
=
τ
1
+
τ
3
−
2
τ
2
by
ˆ
θ
=
(¯
y
·
1
−
¯
y
··
) + (¯
y
·
3
−
¯
y
··
)
−
2(¯
y
·
2
−
¯
y
··
)
=
¯
y
·
1
+ ¯
y
·
3
−
2¯
y
·
2
= 0
.
0925
.
When regarded as a random variable (rather than an observed
value),
ˆ
θ
is the linear combination of 3 independent means where
each is a mean of 4 observations. Hence, we have
Var(
ˆ
θ
) = (1
/
4 + 1
/
4 + 1)
σ
2
=
3
2
σ
2
.
We naturally estimate it by
d
Var(
ˆ
θ
) =
3
2
MSS(error) = 0
.
0099
which has 9 degrees of freedom.
Note qt(0
.
975
,
9) = 2
.
262157.
Hence, a 95% CI for
θ
=
τ
1
+
τ
3
−
2
τ
2
is given by
0
.
0925
±
2
.
262157
√
0
.
0099 = (
−
0
.
133
,
0
.
318)
.
3. [20] Animals were randomly allocated to 12 groups of 4. Each group
was given one of 3 poisons and one of 4 treatments. The survival times
of the animals are given in the following table.
treatment
poison
A
B
C
D
I
0.31
0.82
0.43
0.45
0.45
1.10
0.45
0.71
0.46
0.88
0.63
0.66
0.43
0.72
0.76
0.62
II
0.36
0.92
0.44
0.56
0.29
0.61
0.35
1.02
0.40
0.49
0.31
0.71
0.23
1.24
0.40
0.38
III
0.22
0.30
0.23
0.30
0.21
0.37
0.25
0.36
0.18
0.38
0.24
0.31
0.23
0.29
0.22
0.33
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(Source: Box, Hunter, and Hunter, altered slightly).
The following code might be useful.
n = 4
k1 = 4
k2 = 3
dat = data.frame(times = c(0.31,0.82,0.43,0.45,
0.45,1.10,0.45,0.71,
0.46,0.88,0.63,0.66,
0.43,0.72,0.76,0.62,
0.36,0.92,0.44,0.56,
0.29,0.61,0.35,1.02,
0.40,0.49,0.31,0.71,
0.23,1.24,0.40,0.38,
0.22,0.30,0.23,0.30,
0.21,0.37,0.25,0.36,
0.18,0.38,0.24,0.31,
0.23,0.29,0.22,0.33),
group = rep(c("A","B","C","D"), n*k2),
poison = rep(c("I","II","III"), each=n*k1))
(a) [8] Construct the analysis of variance table for this two-way layout
design.
Answer.
The calculations for the SS corresponding to the main
effects are similar to that described in Q(2a). The interaction SS
is given by
SS
int
= 4
3
X
i
=1
4
X
j
=1
(¯
y
ij.
−
¯
y
i..
−
¯
y
.j.
+ ¯
y
...
)
2
.
The error SS is found by subtracting all of the other SS from the
total SS.
The ANOVA table is given by
10
Source
DF
SS
MSS
F
Treatment
3
0.9212
0.3071
13.80
Poison
2
1.033
0.5165
23.22
Interaction
6
0.2501
0.0417
1.874
Error
36
0.8007
0.0222
Total
47
2.0014
R code:
I = 4
J = 3
n = 4
yy1 = matrix(c(0.31,0.82,0.43,0.45,
0.45,1.10,0.45,0.71,
0.46,0.88,0.63,0.66,
0.43,0.72,0.76,0.62), 4, 4, byrow=T)
yy2 = matrix(c(0.36,0.92,0.44,0.56,
0.29,0.61,0.35,1.02,
0.40,0.49,0.31,0.71,
0.23,1.24,0.40,0.38), 4, 4, byrow=T)
yy3 = matrix(c(0.22,0.30,0.23,0.30,
0.21,0.37,0.25,0.36,
0.18,0.38,0.24,0.31,
0.23,0.29,0.22,0.33), 4, 4, byrow=T)
mean.trt = colMeans(rbind(yy1,yy2,yy3))
mean.poison = c(mean(yy1), mean(yy2), mean(yy3))
mean.int = rbind(colMeans(yy1),colMeans(yy2),colMeans(yy3))
ybar = mean(mean.trt)
# Compute SS
SS.trt = n*J*sum((mean.trt-ybar)^2)
SS.poison = n*I*sum((mean.poison-ybar)^2)
SS.int = n*sum((mean.int
-matrix(rep(mean.poison,I),J,I,byrow=F)
-matrix(rep(mean.trt,J),J,I,byrow=T)
11
+ybar)^2)
SS.total = sum((yy1-ybar)^2) + sum((yy2-ybar)^2) + sum((yy3-ybar)^2)
SS.err = SS.total - SS.poison - SS.trt - SS.int
# Compute MSS
MS.trt = SS.trt / (I-1)
MS.poison = SS.poison / (J-1)
MS.int = SS.int / ((I-1)*(J-1))
MS.err = SS.err / (I*J*(n-1))
# Compute F-statistic
F.trt = MS.trt / MS.err
F.poison = MS.poison / MS.err
F.int = MS.int / MS.err
(b) [4] Test the hypotheses of whether the effects of poison and treat-
ment are significant.
Answer.
For each of the factors, we test the hypotheses that the
main effects are equal (
H
0
) against the alternative that at least
two of the levels have different effects.
The 95% quantiles of
F
3
,
36
and
F
2
,
36
are 2
.
866266 and 3
.
259446,
respectively.
Using these reference distributions and the observed F-statistics
obtained in (a), we reject
H
0
for both treatment and poison and
conclude that both have significant effects.
R code:
qf(0.95, 3, 36)
qf(0.95, 2, 36)
(c) [4] Test the hypothesis of whether the interaction effect is signifi-
cant.
Answer.
For the interaction, we test the hypotheses that the
interaction effects are equal (
H
0
) against the alternative that the
effects are not all equal.
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The 95% quantile of
F
6
,
36
is 2
.
363751.
Using this reference distribution and the observed F-statistic ob-
tained in (a), we do not reject
H
0
and conclude that the interaction
between treatment and poison is not significantly different from
0.
R code:
qf(0.95, 6, 36)
(d) [4] Predict the survival time of an animal administered poison II
and treatment C. Estimate the variance of the prediction.
Answer.
In (c), we concluded that the interaction between poison
and treatment is insignificant. Hence, our predicted survival time
of an animal administered poison II (
α
2
) and treatment C (
β
3
) is
given by
ˆ
y
= ˆ
η
+ ˆ
α
2
+
ˆ
β
3
= 0
.
4575 units
.
Under standard assumptions of the model (independent units),
the variance is given by
Var(ˆ
y
) = Var(ˆ
η
+ ˆ
α
2
+
ˆ
β
3
)
= Var(¯
y
...
+ (¯
y
2
..
−
¯
y
...
) + (¯
y
.
3
.
−
¯
y
...
))
= Var(¯
y
2
..
+ ¯
y
.
3
.
−
¯
y
...
)
= Var(¯
y
2
..
) + Var(¯
y
.
3
.
) + Var(¯
y
...
)
+ 2Cov(¯
y
2
..
,
¯
y
.
3
.
)
−
2Cov(¯
y
2
..
,
¯
y
...
)
−
2Cov(¯
y
.
3
.
,
¯
y
...
)
= Var(¯
y
2
..
) + Var(¯
y
.
3
.
) + Var(¯
y
...
)
+
2
×
4
12
×
16
Var(¯
y
231
)
−
2
×
16
16
×
48
Var(
y
211
)
−
2
×
12
12
×
48
Var(
y
131
)
=
σ
2
16
+
σ
2
12
+
σ
2
48
+
σ
2
24
−
σ
2
24
−
σ
2
24
=
σ
2
8
.
Estimating
σ
2
by MSS
err
, we get
d
Var(ˆ
y
) = 0
.
0028.
13
Related Documents
Recommended textbooks for you
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL