final.exam.204_2

pdf

School

McGill University *

*We aren’t endorsed by this school

Course

204

Subject

Mathematics

Date

Feb 20, 2024

Type

pdf

Pages

14

Uploaded by EarlBeaver3892

Report
McGill University Faculty of Science Final examination Principles of Statistics II Math 204 INSTRUCTIONS 1. The seven questions have to be answered in the exam booklets provided. 2. The total possible number of points for the exam is 210. 3. This is a closed book exam. One 8 1/2” × 11” double sided crib sheet is allowed. 4. Calculators (both programmable and non-programmable) are permitted. 5. Use of a regular dictionary is permitted. 6. Use of a translation dictionary is permitted. This exam comprises the cover page, 10 pages of questions and output, with questions numbered 1 to 7, and four pages of statistical tables.
Math 204 Final Exam Page 2 1. (50 pts) The presence of harmful insects can be detected by placing boards covered with sticky material in the field and examining the insects trapped to the board. To investigate which colors are most attractive to cereal leaf beetles, researchers placed six boards of each of four colors in a field of oats in July and measured the number of cereal leaf beetles stuck to the board after one week. (a) (5 pts) Name the kind of design that was used for this study. (b) (10 pts) Using part (b), test the researchers’ hypothesis that there is an association between the color of the board and the number of cereal leaf beetles attracted to the board at α = 0 . 05. (c) (8 pts) List the assumptions necessary for your test in part (c) to be valid. (d) (7 pts) If you had conducted a one-way ANOVA of this data ignoring the Board factor, would you have come to the same conclusion? Explain your answer. (e) (10 pts) The R output on page ?? contains the results of two non-parametric tests applied to the same data set used above. Explain which test is the more appropriate test for this data and interpret the results of that test in 3 sentences or fewer . 2. (20 points) The following table contains the results of a survey of a sample of right-handed men and women. The table records, by gender, the number of individuals whose feet were (i) the same size, (ii) had a bigger left than right foot (a difference of half a shoe size or more), or (iii) had a bigger right foot than left foot. Size of Feet Gender Left > Right Left = Right Left < Right Total Men 2 10 28 40 Women 55 18 14 87 Total 57 28 42 127 (a) (12 pts) Would you conclude that gender has an association with the development of foot asym- metry? State appropriate hypotheses and use an appropriate test to determine your conclusion at α = 0 . 01. (b) (8 pts) State clearly the assumptions of your test and whether you believe they are satisfied for this particular dataset. Solution: We need to assume that the data are a random sample from the population (given in the question) and that the expected cell counts are not small (e.g. not smaller than 5). We see that the expected counts are all larger than 5, so the χ 2 approximation is likely to be valid. 3. (10 pts) A manufacturer of candy must monitor the temperature at which the candies are baked. Too much variation will cause inconsistency in the taste of the candy. Past records show that the standard deviation of the temperature has been 1.2 F. A random sample of 30 batches of candy is selected, and the sample standard deviation of the temperature is 2.1 F. (a) (7 pts) At the 0.05 level of significance, is there evidence that the population standard deviation has increased above 1.2 F? (b) (3 pts) What assumptions do you need to make in order for your test to be valid? 4. (25 pts) A paper in the Forest Products Journal reported the following data on maximum crushing strength (psi) for a random sample ( n E = 6) of epoxy-impregnated bark board and for an indepen- dently drawn random sample ( n O = 5) of bark board impregnated with another polymer. Here are the raw data on the crushing strengths for the 11 observations:
Math 204 Final Exam Page 3 Epoxy 10,860 11,120 11, 340 12,130 13,070 14,380 Other 4,590 4,850 5,640 6,390 6,510 - - - Using an appropriate non-parametric test (at α = 0 . 05), evaluate the evidence in favor of the hypoth- esis that the median maximum crushing strength for the epoxy-impregnated bark board is greater than the medium maximum crushing strength for the other polymer bark board. State clearly the hypotheses, the test you are using, the critical value for the test, and your conclusion. 5. (40 pts) The stats package in R contains a dataset, collected in the 1920’s, relating the speeds of cars (in miles per hour) to the distance required to stop the car upon breaking (as measured in feet). The R output on page 5 shows a plot of the raw data. (a) (10 pts) Using the R output on page 6, test for a linear association of the stopping distance with the speed of the car. Clearly state your hypotheses and which parts of the output you are referring to. (b) (6 pts) State the assumptions necessary for your inference in part (a) to hold. Using the output on page 7, assess whether you believe your assumptions hold for this model. (c) (4 pts) Compute the sample Pearson correlation coefficient between speed and distance using the R output on 6. (d) (10 pts) Explain the difference between the Pearson correlation coefficient and the Spearman rank correlation coefficient in about 4 sentences or fewer . Do you expect the sample Spearman rank correlation coefficient to be larger, smaller or about the same as the Pearson correlation coefficient for this dataset? Explain your answer. (e) (3 pts) What is the objective of using the model carsmodel2 in the R output on page 6? Explain in three sentences or fewer. (f) (7 pts) State the hypothesis for the objective in part (e) in terms of a population parameter (or parameters). Test this hypothesis (or hypotheses) using the output. What would you conclude about the association between the speed of the car and the stopping distance? 6. (25 pts) Suppose you want to determine whether the brand of laundry detergent used and the washing temperature affects the amount of dirt removed from your laundry. To this end, you buy two different brands of detergent (’Best for Stains’ and ’Extra Strength’) and choose three different temperature levels (’Cold’, ’Warm’, and ’Hot’). You then run 4 loads of laundry at each combination of detergent and temperature and measure the amount of dirt removed, yielding a total of 24 observations. (a) (10 pts) Conduct a complete analysis of variance for this experiment using the R output on page 8, using Type I error rate α = 0 . 05. State your conclusions clearly and interpret the results in the context of the problem. (b) (5 pts) Write down the regression model that will yield equivalent inference to the inference based on the ANOVA model in part (a). Define your regression covariates clearly. (c) (10 pts) The means of the 6 groups determined by levels of detergent and temperature are also given in the R output on page 8. Using these means, calculate the least squares estimates for the parameters of the regression model you wrote down in part (b). 7. (30 pts) In a paper published in Economics of Education Review , Hamermesh and Parker (2005) reported on a study that examined the relationship of perceived beauty of the instructor to teaching evaluation scores. For 94 instructors, the researchers recorded the average teaching evaluation score, the average standardized score from a questionnaire regarding the “beauty” of the instructor, and the age of the instructor.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Math 204 Final Exam Page 4 Four different regression models were fit to the data. The regression output for these four models is contained on pages 9 and 10. age indicates the age variable, beauty is the standardized beauty score, and eval is the average course rating. (a) (4 points) Using the output for model3 , predict the average evaluation for an instructor with a beauty score of 0.7 and an age of 40. You do not need to provide a prediction interval. (b) (4 points) Using the output for model4 , predict the average evaluation for an instructor with a beauty score of 0.7 and an age of 40. You do not need to provide a prediction interval. (c) (4 points) Explain precisely why the value of the estimated coefficient for age is different in model1 compared to model3 . Use 2 sentences or fewer. (d) (5 points) Interpret the value of the interaction coefficient in model4 . (e) (8 points) Using the output for model4 , test the hypothesis that the association between beauty and the evaluation score depends on the age of the instructor with Type I error α = 0 . 05. Evaluate the fit of model4 and state your conclusion. (f) (8 points) Using forward step-wise regression and the outputs for all four models, choose an appropriate model for the data using F-tests and α = 0 . 05. (g) (7 points) Using backward step-wise regression and the outputs for all four models, choose an appropriate model for the data using F-tests and α = 0 . 05. Do you choose the same model as in part (f)?
Math 204 Final Exam Page 5 R output for Question #1 > kruskal.test(Counts,factor(Color)) Kruskal-Wallis rank sum test data: Counts and factor(Color) Kruskal-Wallis chi-squared = 16.9755, df = 3, p-value = 0.000715 > friedman.test(Counts,Color,Board) Friedman rank sum test data: Counts, Color and Board Friedman chi-squared = 13.4, df = 3, p-value = 0.003847 Plot of car stopping data for Question #4 204-Russ/stopplot.pdf 5 10 15 20 25 0 50 100 150 200 Speed (in mph) Distance (in ft)
Math 204 Final Exam Page 6 R output for Question #5 > carsmodel1 = lm(dist~speed) > summary(carsmodel1) Call: lm(formula = dist ~ speed) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -77.2419 9.4350 -8.187 1.15e-10 *** speed 12.9997 0.5801 22.411 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 21.47 on 48 degrees of freedom Multiple R-squared: 0.9128,Adjusted R-squared: 0.9109 F-statistic: 502.2 on 1 and 48 DF, p-value: < 2.2e-16 > summary(carsmodel2) Call: lm(formula = dist ~ speed + I(speed^2)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.29299 17.17191 -0.075 0.940 speed 1.56295 2.35750 0.663 0.511 I(speed^2) 0.37866 0.07645 4.953 9.86e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.59 on 47 degrees of freedom Multiple R-squared: 0.9427,Adjusted R-squared: 0.9402 F-statistic: 386.5 on 2 and 47 DF, p-value: < 2.2e-16
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Math 204 Final Exam Page 7 Plot of carsmodel1 diagnostics for Question #5 204-Russ/stopdiags.pdf 0 50 100 150 200 250 -40 0 20 40 60 80 Fitted values Residuals Residuals vs Fitted 49 2 39 ●● -2 -1 0 1 2 -2 -1 0 1 2 3 Theoretical Quantiles Standardized residuals Normal Q-Q 49 2 39 -2 -1 0 1 2 3 Standardized residuals Histogram of stdres(carsmodel1) Standardized residuals Frequency -2 -1 0 1 2 3 4 0 5 10 15 20 25
Math 204 Final Exam Page 8 R output for Question #6 > laundry = aov(dirt~detergent*temp) > summary(laundry) Df Sum Sq Mean Sq F value Pr(>F) detergent 1 20.17 20.17 9.811 0.00576 ** temp 2 200.33 100.17 48.730 5.44e-08 *** detergent:temp 2 16.33 8.17 3.973 0.03722 * Residuals 18 37.00 2.06 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > model.tables(laundry,type="means") Tables of means Grand mean 9.083333 detergent detergent Best for stains Extra strength 8.167 10.000 temp temp Cold Hot Warm 5.00 11.25 11.00 detergent:temp temp detergent Cold Hot Warm Best for stains 5.0 10.5 9.0 Extra strength 5.0 12.0 13.0
Math 204 Final Exam Page 9 R output for Question #7 > model1 = lm(eval~age,data=TeachingRatings.subsample) > summary(model1) Call: lm(formula = eval ~ age, data = TeachingRatings.subsample) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.019895 0.294929 13.630 <2e-16 *** age -0.001826 0.005992 -0.305 0.761 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.5867 on 92 degrees of freedom Multiple R-squared: 0.001009,Adjusted R-squared: -0.00985 F-statistic: 0.0929 on 1 and 92 DF, p-value: 0.7612 > > model2 = lm(eval~beauty,data=TeachingRatings.subsample) > summary(model2) Call: lm(formula = eval ~ beauty, data = TeachingRatings.subsample) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.91722 0.05925 66.111 <2e-16 *** beauty 0.16415 0.07189 2.284 0.0247 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.5711 on 92 degrees of freedom Multiple R-squared: 0.05364,Adjusted R-squared: 0.04335 F-statistic: 5.214 on 1 and 92 DF, p-value: 0.0247
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Math 204 Final Exam Page 10 R output for Question #7 (cont.) > model3 = lm(eval~beauty+age,data=TeachingRatings.subsample) > summary(model3) Call: lm(formula = eval ~ beauty + age, data = TeachingRatings.subsample) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.726730 0.314150 11.863 <2e-16 *** beauty 0.182864 0.078235 2.337 0.0216 * age 0.003920 0.006348 0.618 0.5384 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.573 on 91 degrees of freedom Multiple R-squared: 0.05759,Adjusted R-squared: 0.03687 F-statistic: 2.78 on 2 and 91 DF, p-value: 0.06729 > model4 = lm(eval~beauty*age,data=TeachingRatings.subsample) > summary(model4) Call: lm(formula = eval ~ beauty * age, data = TeachingRatings.subsample) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.756911 0.306620 12.253 <2e-16 *** beauty -0.603821 0.338580 -1.783 0.0779 . age 0.004364 0.006193 0.705 0.4828 beauty:age 0.017020 0.007137 2.385 0.0192 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.5588 on 90 degrees of freedom Multiple R-squared: 0.1136,Adjusted R-squared: 0.08405 F-statistic: 3.845 on 3 and 90 DF, p-value: 0.01221
Math 204 Final Exam Page 11 uss/TableChi1.pdf
Math 204 Final Exam Page 12 uss/TableChi2.pdf
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Math 204 Final Exam Page 13 Russ/TableF.pdf
Math 204 Final Exam Page 14 /TableWilcoxIND.pdf