trees

pdf

School

University of Waterloo *

*We aren’t endorsed by this school

Course

331

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

4

Uploaded by GeneralStrawHornet26

Report
Tree heights 26 marks A very short, time limited, online quiz was taken by students in a statistics course at the University of Waterloo in 2020. Students were asked two questions and given very little time to answer them. Moreover, they had no idea whatsoever what the questions asked would be about. The information presented in the quiz was as follows: “The coast redwood is perhaps the tallest species of tree growing today. Do you think the tallest tree of this species alive today is A. less than XXX metres tall? B. more than XXX metres tall? Answer A or B. Write down your best guess (in metres) of how tall you think the tallest tree might be.” In place of XXX above, about half of the students (randomly selected) had the number 50 appear and the others had the number 100 appear. The value of XXX presented to the students is called the anchor for that question. For the record, and presumably unknown to the students taking the quiz, the tallest coast redwood tree so far found was discovered in 2006. It was named Hyperion after the Titan of Greek mythology of that name (meaning “the high one”) and was measured to be 116.07 metres tall in 2019. The student quiz results are given in the R data file trees.Rda . This may be loaded into R using load() (assuming you have the csv file in a directory/folder given by dataDirectory ) as # Assuming the file is located in the folder/directory given by dataDirectory # For example, a directory/foldr call "data" in the current working directory (".") # dataDirectory <- "./data" load ( file.path (dataDirectory, "trees.Rda" )) # The data are the value of the R data frame called trees head (trees, n = 4 ) ## anchor guess ## 1 100 150 ## 2 100 150 ## 3 100 222 ## 4 100 128 Only the anchor value presented to the student and their guess are recorded (both in metres). The tallest tree is Hyperion # The tallest tree Hyperion <- 116.07 IMPORTANT In all of your answers, show all the R code you used in your calculations and analyses. In this assignment, you must write the code using basic R functions like mean() , sd() , var() , sqrt() , length() , pt() , etc. You may not use functions like t.test() , though these could be used to check your answers. 1
Questions a. First, consider modelling the student guesses according to the mean response model y i = μ + r i for i = 1 , . . . , n where y i is the i th student’s guess of the tallest height. Recall from STAT 231 that to test the hypothesis H 0 : μ = c for some constant c , we form the statistic d = | μ - c | σ/ n where μ = y is the arithmetic average (in R mean() ) and σ = n i =1 r 2 i n - 1 = n i =1 ( y i - μ ) 2 n - 1 = n i =1 ( y i - y ) 2 n - 1 is the residual standard deviation (in this case, could use sd() in R ). Large values of d indicate evidence against H 0 and to assess the strength of this evidence, we compute the obseved significance level, or p -value as p = Pr ( | t n - 2 | ≥ d ) = 2 Pr ( t n - 2 d ) where t n - 2 is a Student’s t random variate on n - 2 degrees of freedom. The smaller is p , the greater is the evidence against H 0 . (See help(pt) in R .) i. (2 marks) Plot a histogram of the guesses (see help(hist) ). Add a “red” vertical dashed line of width 3 at the height obtained by Hyperion. Based only on this display, comment on whether the height of Hyperion might be a reasonable value for μ . Answer # YOUR CODE HERE ii. (1 mark) In R , construct the value of the discrepancy measure d for testing whether the mean guess is the height of Hyperion. Show your code and print the value of d . Answer # YOUR CODE HERE iii. (1 mark) Determine and print the p -value in R for this test. Show your code. Answer # YOUR CODE HERE iv. (1 mark) Based on the above p -value, what do you conclude about the evidence against the hypothesis that the mean of the guesses is the height of Hyperion? Answer b. We now repeat the modelling of part (a), but this time only for guesses from those students who were given the “low” anchor as reference (i.e., anchor == 50 ). i. (2 marks) Select only those students whose anchor == 50 . Using xlim = c(0,400) produce the histogram of the guesses for these students and mark Hyperion with a red dashed line. Comment on whether the Hyperion’s height is a plausible value for μ for these students. Answer 2
# YOUR CODE HERE ii. (2 marks) For these student guesses, calculate and print the value of the discrepancy measure d for testing H 0 : μ = Hyperion . Determine the p -value, print it, and comment on the evidence this gives against the hypothesis when students were given a low anchor. Is the evidence against the hypothesis stronger or weaker than it was in part (a)? Answer # YOUR CODE HERE c. We again repeat the modelling of parts (a) and (b), but this time only for guesses from those students who were given the “high” anchor as reference (i.e., anchor == 100 ). i. (2 marks) Select only those students whose anchor == 100 . Using xlim = c(0,400) produce the histogram of the guesses for these students and mark Hyperion with a red dashed line. Comment on whether the Hyperion’s height is a plausible value for μ for these students. Answer # YOUR CODE HERE ii. (2 marks) For these student guesses, calculate and print the value of the discrepancy measure d for testing H 0 : μ = Hyperion . Determine the p -value, print it, and comment on the evidence this gives against the hypothesis when students were given a low anchor. Is the evidence against the hypothesis stronger or weaker than it was in part (a)? Answer # YOUR CODE HERE d. Another hypothesis of interest is whether the two groups (from low and high anchor values) have the same mean guess values. This is an example of the two sample problem from STAT 231. Each group is modelled as a mean response model: y i = μ 1 + r i for guesses from the low anchor group, and y i = μ 2 + r i for guesses from the high anchor group. Assuming that the variability of the guesses does not depend on the group, the discrepancy measure for assessing evidence against the hypothesis H 0 : μ 1 - μ 2 = c is d = | ( μ 1 - μ 2 ) - c | σ 1 n 1 + 1 n 2 where σ 2 = ( n 1 - 1) σ 2 1 + ( n 2 - 1) σ 2 2 n 1 + n 2 - 2 . Large values of d indicate evidence against H 0 and a hypothesis of no difference requires c = 0 (i.e., H 0 : μ 1 - μ 2 = 0 ). The p -value is p = Pr ( | t ( n 1 - 1)+( n 1 - 1) | ≥ d ) = 2 Pr ( t n - 2 d ) where t n - 2 is a Student’s t random variate on n - 2 = n 1 + n 2 - 2 degrees of freedom. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
i. (2 marks) Construct and print the value of d to assess H 0 : μ 1 - μ 2 = 0 . Answer # YOUR CODE HERE ii. (1 mark) Determine and print the p -value for d to assess H 0 : μ 1 - μ 2 = 0 . Comment on the strength of the evidence against H 0 . Answer # YOUR CODE HERE e. (1 mark) What do you conclude about the effect of “anchoring” on the answers given by the students? Answer f. Another way to look at the data, is to imagine that a student’s guess depends on which value that student was given as anchor . A mean response model for this would be y i = μ ( x i ) + r i for i = 1 , . . . , n where x i is the value of the anchor. A simple model is the straight line model where μ ( x i ) = β 0 + β 1 x i . i. (2 marks) For this context of tree heights, how would you interpret β 0 ? Does β 0 = 0 make sense? Answer ii. (2 marks) For this context of tree heights, how would you interpret β 1 ? Does β 1 = 0 make sense? Answer iii. (3 marks) Using suitable values of xlab , ylab , and main : plot() each guess on the vertical axis versus its anchor on the horizontal axis mark Hyperion as a horizontal red dashed line of width 3 get the coefficients of a least-squares line to the data as fit <- lm (guess ~ anchor, data = trees) coefs <- coef (fit) add the least-squares fitted line to the plot (as a blue solid line of width 3) show the plot print the estimated coefficients Answer # YOUR CODE HERE iv. (1 mark) Does the interpretation of the fitted line on the plot support your conclusions in part (e)? If so, how so? If not, why not? Answer v. (1 mark) Comment on whether it would be possible to fit a more complicated model for μ ( x ) to this data – for example, a quadratic in x ? Answer 4