trees
pdf
keyboard_arrow_up
School
University of Waterloo *
*We aren’t endorsed by this school
Course
331
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
4
Uploaded by GeneralStrawHornet26
Tree heights
26 marks
A very short, time limited, online quiz was taken by students in a statistics course at the University of
Waterloo in 2020. Students were asked two questions and given very little time to answer them. Moreover,
they had no idea whatsoever what the questions asked would be about.
The information presented in the quiz was as follows:
“The coast redwood is perhaps the tallest species of tree growing today.
•
Do you think the tallest tree of this species alive today is
A. less than XXX metres tall? B. more than XXX metres tall?
Answer A or B.
•
Write down your best guess (in metres) of how tall you think the tallest tree might be.”
In place of XXX above, about half of the students (randomly selected) had the number 50 appear and the
others had the number 100 appear. The value of XXX presented to the students is called the
anchor
for that
question.
For the record, and presumably unknown to the students taking the quiz, the tallest coast redwood tree so
far found was discovered in 2006. It was named
Hyperion
after the Titan of Greek mythology of that name
(meaning “the high one”) and was measured to be 116.07 metres tall in 2019.
The student quiz results are given in the
R
data file
trees.Rda
.
This may be loaded into
R
using
load()
(assuming you have the csv file in a directory/folder given by
dataDirectory
) as
# Assuming the file is located in the folder/directory given by dataDirectory
# For example, a directory/foldr call "data" in the current working directory (".")
# dataDirectory <- "./data"
load
(
file.path
(dataDirectory,
"trees.Rda"
))
# The data are the value of the R data frame called trees
head
(trees,
n =
4
)
##
anchor guess
## 1
100
150
## 2
100
150
## 3
100
222
## 4
100
128
Only the anchor value presented to the student and their guess are recorded (both in metres).
The tallest tree is Hyperion
# The tallest tree
Hyperion
<-
116.07
IMPORTANT
•
In
all
of your answers,
show all the
R
code
you used in your calculations and analyses.
•
In this assignment, you
must
write the code using basic
R
functions like
mean()
,
sd()
,
var()
,
sqrt()
,
length()
,
pt()
, etc.
•
You may
not
use functions like
t.test()
, though these could be used to check your answers.
1
Questions
a. First, consider modelling the student guesses according to the mean response model
y
i
=
μ
+
r
i
for
i
= 1
, . . . , n
where
y
i
is the
i
th student’s guess of the tallest height.
Recall from STAT 231 that to test the hypothesis
H
0
:
μ
=
c
for some constant
c
, we form the statistic
d
=
|
μ
-
c
|
σ/
√
n
where
μ
=
y
is the arithmetic average (in
R
mean()
) and
σ
=
∑
n
i
=1
r
2
i
n
-
1
=
∑
n
i
=1
(
y
i
-
μ
)
2
n
-
1
=
∑
n
i
=1
(
y
i
-
y
)
2
n
-
1
is the residual standard deviation (in this case, could use
sd()
in
R
).
Large values of
d
indicate evidence against
H
0
and to assess the strength of this evidence, we compute
the obseved significance level, or
p
-value as
p
=
Pr
(
|
t
n
-
2
| ≥
d
) = 2
Pr
(
t
n
-
2
≥
d
)
where
t
n
-
2
is a Student’s
t
random variate on
n
-
2
degrees of freedom.
The smaller is
p
, the greater is the evidence against
H
0
. (See
help(pt)
in
R
.)
i.
(2 marks)
Plot a histogram of the guesses (see
help(hist)
). Add a “red” vertical
dashed
line of
width 3 at the height obtained by Hyperion.
Based only on this display, comment on whether the height of Hyperion might be a reasonable
value for
μ
.
Answer
# YOUR CODE HERE
ii.
(1 mark)
In
R
, construct the value of the discrepancy measure
d
for testing whether the mean
guess is the height of Hyperion. Show your code and print the value of
d
.
Answer
# YOUR CODE HERE
iii.
(1 mark)
Determine and print the
p
-value in
R
for this test. Show your code.
Answer
# YOUR CODE HERE
iv.
(1 mark)
Based on the above
p
-value, what do you conclude about the evidence against the
hypothesis that the mean of the guesses is the height of Hyperion?
Answer
b.
We now repeat the modelling of part (a), but this time only for guesses from those students who were
given the “low” anchor as reference (i.e.,
anchor == 50
).
i.
(2 marks)
Select only those students whose
anchor == 50
. Using
xlim = c(0,400)
produce the
histogram of the guesses for these students and mark Hyperion with a red dashed line. Comment
on whether the Hyperion’s height is a plausible value for
μ
for these students.
Answer
2
# YOUR CODE HERE
ii.
(2 marks)
For these student guesses, calculate and print the value of the discrepancy measure
d
for testing
H
0
:
μ
=
Hyperion
.
Determine the
p
-value, print it, and comment on the evidence this gives against the hypothesis
when students were given a low anchor.
Is the evidence against the hypothesis stronger or weaker than it was in part (a)?
Answer
# YOUR CODE HERE
c.
We again repeat the modelling of parts (a) and (b), but this time only for guesses from those students
who were given the “high” anchor as reference (i.e.,
anchor == 100
).
i.
(2 marks)
Select only those students whose
anchor == 100
. Using
xlim = c(0,400)
produce the
histogram of the guesses for these students and mark Hyperion with a red dashed line. Comment
on whether the Hyperion’s height is a plausible value for
μ
for these students.
Answer
# YOUR CODE HERE
ii.
(2 marks)
For these student guesses, calculate and print the value of the discrepancy measure
d
for testing
H
0
:
μ
=
Hyperion
.
Determine the
p
-value, print it, and comment on the evidence this gives against the hypothesis
when students were given a low anchor.
Is the evidence against the hypothesis stronger or weaker than it was in part (a)?
Answer
# YOUR CODE HERE
d.
Another hypothesis of interest is whether the two groups (from
low
and
high
anchor values) have the
same mean guess values. This is an example of the
two sample problem
from STAT 231. Each group is
modelled as a mean response model:
y
i
=
μ
1
+
r
i
for guesses from the low anchor group, and
y
i
=
μ
2
+
r
i
for guesses from the high anchor group.
Assuming that the variability of the guesses does not depend on the group, the discrepancy measure for
assessing evidence against the hypothesis
H
0
:
μ
1
-
μ
2
=
c
is
d
=
|
(
μ
1
-
μ
2
)
-
c
|
σ
1
n
1
+
1
n
2
where
σ
2
=
(
n
1
-
1)
σ
2
1
+ (
n
2
-
1)
σ
2
2
n
1
+
n
2
-
2
.
Large values of
d
indicate evidence against
H
0
and a hypothesis of no difference requires
c
= 0
(i.e.,
H
0
:
μ
1
-
μ
2
= 0
).
The
p
-value is
p
=
Pr
(
|
t
(
n
1
-
1)+(
n
1
-
1)
| ≥
d
)
= 2
Pr
(
t
n
-
2
≥
d
)
where
t
n
-
2
is a Student’s
t
random variate on
n
-
2 =
n
1
+
n
2
-
2
degrees of freedom.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
i.
(2 marks)
Construct and print the value of
d
to assess
H
0
:
μ
1
-
μ
2
= 0
.
Answer
# YOUR CODE HERE
ii.
(1 mark)
Determine and print the
p
-value for
d
to assess
H
0
:
μ
1
-
μ
2
= 0
. Comment on the
strength of the evidence against
H
0
.
Answer
# YOUR CODE HERE
e.
(1 mark)
What do you conclude about the effect of “anchoring” on the answers given by the students?
Answer
f.
Another way to look at the data, is to imagine that a student’s guess depends on which value that
student was given as
anchor
. A mean response model for this would be
y
i
=
μ
(
x
i
) +
r
i
for
i
= 1
, . . . , n
where
x
i
is the value of the anchor. A simple model is the
straight line model
where
μ
(
x
i
) =
β
0
+
β
1
x
i
.
i.
(2 marks)
For this context of tree heights, how would you interpret
β
0
? Does
β
0
= 0
make sense?
Answer
ii.
(2 marks)
For this context of tree heights, how would you interpret
β
1
? Does
β
1
= 0
make sense?
Answer
iii.
(3 marks)
Using suitable values of
xlab
,
ylab
, and
main
:
•
plot()
each
guess
on the vertical axis versus its
anchor
on the horizontal axis
•
mark Hyperion as a horizontal red dashed line of width 3
•
get the coefficients of a least-squares line to the data as
fit
<-
lm
(guess
~
anchor,
data =
trees)
coefs
<-
coef
(fit)
•
add the least-squares fitted line to the plot (as a blue solid line of width 3)
•
show the plot
•
print the estimated coefficients
Answer
# YOUR CODE HERE
iv.
(1 mark)
Does the interpretation of the fitted line on the plot support your conclusions in part
(e)? If so, how so? If not, why not?
Answer
v.
(1 mark)
Comment on whether it would be possible to fit a more complicated model for
μ
(
x
)
to
this data – for example, a quadratic in
x
?
Answer
4
Related Documents
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL