Test1_solutions_F23

pdf

School

University of Waterloo *

*We aren’t endorsed by this school

Course

202

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

10

Uploaded by CaptainBoulderMouse32

Report
{ Special instructions Your final numerical answers should be given to 3 significant digits (e.g. 0.0329). How- ever, between steps/parts you should carry more decimal places to avoid rounding errors. For each question, make sure you show your work. An incorrect answer with no work shown will receive zero marks. However, an incorrect answer could still receive credit if you’ve shown your work. Answer the questions in the spaces provided. You may use the last pages of the test for any additional work you would like to be graded. If you use this space, make it as clear as possible which question(s) your work relates to. The exam consists of 18 questions for a total of 35 marks. The number of marks available per question is indicated in [square brackets]. Only non-graphing non-programmable calculators are permitted. GOOD LUCK!
Selected Formulae Q 2 = x ( n 2 ) + x ( n 2 +1 ) 2 or Q 2 = x ( n +1 2 ) LL = Q 1 1 . 5 IQR UL = Q 3 + 1 . 5 IQR ¯ x = n i =1 x i n s 2 x = n i =1 ( x i ¯ x ) 2 n 1 = n i =1 x 2 i n ¯ x 2 n 1 r xy = s xy s x s y s xy = n i =1 ( x i ¯ x )( y i ¯ y ) n 1 = n i =1 x i y i n ¯ x ¯ y n 1 b = r xy s y s x = s xy s 2 x a = ¯ y b ¯ x
Part 1: Multiple Choice Questions Please select the most correct answer. Please answer in the bubble sheet found on the last page by filling in the most correct answer (A, B, ..., E) next to the appropriate question. The following code is for Questions 1-5. dataset1 = c(6, 7, 8, 9) dataset2 = c(10, 12, 14, 16) Question 1 [1 mark] Based on the code above: (A) The mean of dataset1 is greater than the mean of dataset2. (B) The median of dataset1 is greater than the median of dataset2. (C) The median of dataset1 is greater than the median of dataset2. (D) The mean is equal to the median for each dataset. Question 2 [1 mark] The standard deviation for dataset2 is: (A) 1.29 (B) 1.67 (C) 2.58 (D) 6.67 Question 3 [1 mark] The variances are the same for both datasets. (A) TRUE (B) FALSE Question 4 [1 mark] What is the R code that will remove the first observation from dataset1? Select all that apply. (A) dataset1[-1] (B) dataset1[1] (C) dataset1[c(1,2,3)] (D) dataset1[c(2,3,4)] Question 5 [1 mark] Suppose the first observation is removed from dataset1. The sample variance for dataset1 would: (A) Increase. (B) Decrease. (C) Remain the same.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A random sample of faculty members from the Department of Statistics and Actuarial Science was taken and their ages were recorded in years. The data are provided in the following plot. Use this plot to answer Questions 6-10. The decimal point is one digit to the right of the | 2 | 9 3 | 15799 4 | 5 | 0001222348 6 | 57 Question 6 [1 mark] How many faculty members were sampled? (A) 18 (B) 19 (C) 20 (D) 21 Question 7 [1 mark] What is the interquartile range of this dataset? (A) 8 (B) 14 (C) 22 (D) 38 Question 8 [1 mark] How many outliers are in the dataset? (A) 0 (B) 1 (C) 2 (D) 3 Question 9 [1 mark] How would you describe the shape of this distribution? (A) Symmetric. (B) Skewed right. (C) Skewed left. (D) Bimodal. Question 10 [1 mark] What is the plot above called? (A) Histogram. (B) Bar chart. (C) Stem and leaf plot. (D) Number chart.
Part 2: Fill in the Blanks Question 11 [1 mark] A bar chart can be used to visualize data. - qualitative Question 12 [1 mark] We often use a regression line (or the line of “best fit”) to predict the value of y for a given value of x . - least squares Question 13 [1 mark] Squaring the provides the amount of variability in the data that is explained by the model. - correlation or correlation coefficient Question 14 [1 mark] Eye colour is an example of data. Be specific in your answer! - nominal Question 15 [1 mark] If removing a data point from the dataset causes the line of best fit to change markedly, then it is called a(n) . - influential observation Question 16 [1 mark] The sample mean is not a statistic for small sample sizes. - robust Question 17 [1 mark] A(n) is a variable that is not one of the explanatory or re- sponse variables in a study that may influence the interpretation of relationships among those variables. - lurking variable
Part 3: Short Answer Questions Question 18 [18 marks] Oysters are categorized for retail as small, medium, or large based on their volume. The grading process (i.e., determining the category for an oyster) is slow and expensive when done by hand. A computer program estimates oyster volume based on two-dimensional (2D) images of the oysters. A total of 30 oysters - 15 from the Atlantic provinces (ATL) and 15 from British Columbia (BC) - underwent this imaging process. The data is provided in the following side-by-side boxplots. (a) [1 mark] Compare the shapes of the two boxplots. The ATL group is symmetric whereas the BC group is left skewed. (b) [1 mark] Compare the centers of the two boxplots. The median of the ATL group is larger than the median of the BC group. (c) [1 mark] Compare the spreads of the two boxplots. The IQR (range) of the ATL group is approximately the same as the IQR (range) of the BC group. (d) [1 mark] Compare anything else between the two boxplots. The ATL group has an outlier while the BC group does not. Note: other appropriate answers may receive full marks.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(e) [3 marks] The scientists decided to focus on the 15 oysters from the Atlantic provinces and graded them by hand to determine the actual volume (in cm 3 ). Use the summary data provided below to show that the covariance between y = actual volume (in cm 3 ) and x = 2D reconstruction (in thousands of pixels) is 10.968. 15 i =1 y i = 201 . 84, 15 i =1 y 2 i = 2772 . 56, 15 i =1 x i = 734 . 285, 15 i =1 x 2 i = 36652 . 46, 15 i =1 x i y i = 10034 . 09 ¯ x = ( 734 . 285 15 ) = 48 . 95233333 ¯ y = ( 201 . 84 15 ) = 13 . 456 s xy = 15 i =1 x i y i n ¯ x ¯ y n 1 = 10034 . 09 15(48 . 95233333)(13 . 456) 14 = 10 . 968 (f) [4 marks] Calculate the correlation between the actual volume and the 2D reconstruction for the Atlantic oysters. Interpret the correlation value in the context of the question. Round your answer to two decimal places and use this value for the remain- der of the question. s 2 x = 36652 . 46 15(48 . 95233333) 2 14 = 50 . 53542309 s 2 y = 2772 . 56 15(13 . 456) 2 14 = 4 . 042925714 r xy = s xy s x s y = 10 . 968 50 . 53542309 × 4 . 042925714 0 . 77 Thus, there’s a positive linear relationship between the actual volume and the 2D re- construction (i.e., as the number of pixels increases, actual volume increases). (g) [2 marks] Show that the line of best fit is ˆ y = 2 . 795 + 0 . 218 x . Use these values for the remainder of the question. b = r xy s y s x = (0 . 77)( 4 . 042925714) 50 . 53542309 = 0 . 217791363 0 . 218 a = ¯ y b ¯ x = 13 . 456 0 . 217791363(48 . 95233333) = 2 . 794604602 2 . 795 The line of best fit is ˆ y = 2 . 795 + 0 . 218 x (h) [1 mark] Interpret the estimate of the slope parameter from part (g) in the context of the question. It suggests that for every increase in pixels (in thousands), the average volume of oysters increases by approximately 0.218cm 3 .
(i) [1 mark] The first observation in the recorded dataset is ( x 1 , y 1 ) = (41 . 458 , 11 . 71). Calculate the residual for this observation. r 1 = y 1 ˆ y 1 = 11 . 71 [2 . 795 + 0 . 218(41 . 458)] = 0 . 122844 (j) [2 marks] The scientists hired engineers who were given the task of improving on the 2D reconstruction program. They designed a new program that estimates oyster volume using three-dimensional (3D) digital image processing. This processing was also used on the oysters from the Atlantic provinces and the correlation between the actual volume and the 3D reconstruction is 0.87. Provide a written explanation for whether the 3D reconstruction program is an improvement over the 2D version. The correlation between the actual volume and the 2D reconstruction is approx. 0.77 whereas the correlation between the volume and 3D is 0.87. Therefore, the 3D system would be more accurate than the 2D system in assessing oyster volume because the correlation is stronger with the 3D system (i.e., it has a stronger linear relationship). (k) [1 mark] Determine the amount of variability that will not be explained by the line of best fit from using the 3D reconstruction. 1 r 2 = 1 r 2 xy = 1 (0 . 87) 2 = 0 . 2431 END OF EXAMINATION
Use this page for any additional work you would like to be graded. If you use this space, make it as clear as possible which question(s) your work relates to. If there is any ambiguity only your work on the previous question pages will be graded.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Use this page for any additional work you would like to be graded. If you use this space, make it as clear as possible which question(s) your work relates to. If there is any ambiguity only your work on the previous question pages will be graded.