final.exam.204_2
pdf
keyboard_arrow_up
School
McGill University *
*We aren’t endorsed by this school
Course
204
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
14
Uploaded by EarlBeaver3892
McGill University
Faculty of Science
Final examination
Principles of Statistics II
Math 204
INSTRUCTIONS
1. The seven questions have to be answered in the exam booklets provided.
2. The total possible number of points for the exam is 210.
3. This is a closed book exam. One 8 1/2”
×
11” double sided crib sheet is allowed.
4. Calculators (both programmable and non-programmable) are permitted.
5. Use of a regular dictionary is permitted.
6. Use of a translation dictionary is permitted.
This exam comprises the cover page, 10 pages of questions and output, with questions numbered 1 to 7,
and four pages of statistical tables.
Math 204 Final Exam
Page 2
1. (50 pts) The presence of harmful insects can be detected by placing boards covered with sticky
material in the field and examining the insects trapped to the board. To investigate which colors are
most attractive to cereal leaf beetles, researchers placed six boards of each of four colors in a field of
oats in July and measured the number of cereal leaf beetles stuck to the board after one week.
(a) (5 pts) Name the kind of design that was used for this study.
(b) (10 pts) Using part (b), test the researchers’ hypothesis that there is an association between the
color of the board and the number of cereal leaf beetles attracted to the board at
α
= 0
.
05.
(c) (8 pts) List the assumptions necessary for your test in part (c) to be valid.
(d) (7 pts) If you had conducted a one-way ANOVA of this data ignoring the Board factor, would
you have come to the same conclusion? Explain your answer.
(e) (10 pts) The R output on page
??
contains the results of two non-parametric tests applied to
the same data set used above. Explain which test is the more appropriate test for this data and
interpret the results of that test
in 3 sentences or fewer
.
2. (20 points) The following table contains the results of a survey of a sample of right-handed men and
women. The table records, by gender, the number of individuals whose feet were (i) the same size,
(ii) had a bigger left than right foot (a difference of half a shoe size or more), or (iii) had a bigger
right foot than left foot.
Size of Feet
Gender
Left
>
Right
Left = Right
Left
<
Right
Total
Men
2
10
28
40
Women
55
18
14
87
Total
57
28
42
127
(a) (12 pts) Would you conclude that gender has an association with the development of foot asym-
metry? State appropriate hypotheses and use an appropriate test to determine your conclusion
at
α
= 0
.
01.
(b) (8 pts) State clearly the assumptions of your test and whether you believe they are satisfied for
this particular dataset.
Solution:
We need to assume that the data are a random sample from the population (given
in the question) and that the expected cell counts are not small (e.g. not smaller than 5). We
see that the expected counts are all larger than 5, so the
χ
2
approximation is likely to be valid.
3. (10 pts) A manufacturer of candy must monitor the temperature at which the candies are baked.
Too much variation will cause inconsistency in the taste of the candy. Past records show that the
standard deviation of the temperature has been 1.2
◦
F. A random sample of 30 batches of candy is
selected, and the sample standard deviation of the temperature is 2.1
◦
F.
(a) (7 pts) At the 0.05 level of significance, is there evidence that the population standard deviation
has increased above 1.2
◦
F?
(b) (3 pts) What assumptions do you need to make in order for your test to be valid?
4. (25 pts) A paper in the
Forest Products Journal
reported the following data on maximum crushing
strength (psi) for a random sample (
n
E
= 6) of epoxy-impregnated bark board and for an indepen-
dently drawn random sample (
n
O
= 5) of bark board impregnated with another polymer. Here are
the raw data on the crushing strengths for the 11 observations:
Math 204 Final Exam
Page 3
Epoxy
10,860
11,120
11, 340
12,130
13,070
14,380
Other
4,590
4,850
5,640
6,390
6,510
- - -
Using an appropriate non-parametric test (at
α
= 0
.
05), evaluate the evidence in favor of the hypoth-
esis that the median maximum crushing strength for the epoxy-impregnated bark board is greater
than the medium maximum crushing strength for the other polymer bark board. State clearly the
hypotheses, the test you are using, the critical value for the test, and your conclusion.
5. (40 pts) The
stats
package in R contains a dataset, collected in the 1920’s, relating the speeds of
cars (in miles per hour) to the distance required to stop the car upon breaking (as measured in feet).
The R output on page 5 shows a plot of the raw data.
(a) (10 pts) Using the R output on page 6, test for a linear association of the stopping distance
with the speed of the car. Clearly state your hypotheses and which parts of the output you are
referring to.
(b) (6 pts) State the assumptions necessary for your inference in part (a) to hold. Using the output
on page 7, assess whether you believe your assumptions hold for this model.
(c) (4 pts) Compute the sample Pearson correlation coefficient between speed and distance using
the R output on 6.
(d) (10 pts) Explain the difference between the Pearson correlation coefficient and the Spearman
rank correlation coefficient
in about 4 sentences or fewer
. Do you expect the sample Spearman
rank correlation coefficient to be larger, smaller or about the same as the Pearson correlation
coefficient for this dataset? Explain your answer.
(e) (3 pts) What is the objective of using the model
carsmodel2
in the R output on page 6?
Explain
in three sentences or fewer.
(f) (7 pts) State the hypothesis for the objective in part (e) in terms of a population parameter (or
parameters). Test this hypothesis (or hypotheses) using the output. What would you conclude
about the association between the speed of the car and the stopping distance?
6. (25 pts) Suppose you want to determine whether the brand of laundry detergent used and the washing
temperature affects the amount of dirt removed from your laundry. To this end, you buy two different
brands of detergent (’Best for Stains’ and ’Extra Strength’) and choose three different temperature
levels (’Cold’, ’Warm’, and ’Hot’). You then run 4 loads of laundry at each combination of detergent
and temperature and measure the amount of dirt removed, yielding a total of 24 observations.
(a) (10 pts) Conduct a complete analysis of variance for this experiment using the R output on page
8, using Type I error rate
α
= 0
.
05. State your conclusions clearly and interpret the results in
the context of the problem.
(b) (5 pts) Write down the regression model that will yield equivalent inference to the inference
based on the ANOVA model in part (a). Define your regression covariates clearly.
(c) (10 pts) The means of the 6 groups determined by levels of detergent and temperature are also
given in the R output on page 8. Using these means, calculate the least squares estimates for
the parameters of the regression model you wrote down in part (b).
7. (30 pts) In a paper published in
Economics of Education Review
, Hamermesh and Parker (2005)
reported on a study that examined the relationship of perceived beauty of the instructor to teaching
evaluation scores. For 94 instructors, the researchers recorded the average teaching evaluation score,
the average standardized score from a questionnaire regarding the “beauty” of the instructor, and
the age of the instructor.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 4
Four different regression models were fit to the data. The regression output for these four models
is contained on pages 9 and 10.
age
indicates the age variable,
beauty
is the standardized beauty
score, and
eval
is the average course rating.
(a) (4 points) Using the output for
model3
, predict the average evaluation for an instructor with a
beauty score of 0.7 and an age of 40.
You do not need to provide a prediction interval.
(b) (4 points) Using the output for
model4
, predict the average evaluation for an instructor with a
beauty score of 0.7 and an age of 40.
You do not need to provide a prediction interval.
(c) (4 points) Explain precisely why the value of the estimated coefficient for age is different in
model1
compared to
model3
.
Use 2 sentences or fewer.
(d) (5 points) Interpret the value of the interaction coefficient in
model4
.
(e) (8 points) Using the output for
model4
, test the hypothesis that the association between beauty
and the evaluation score depends on the age of the instructor with Type I error
α
= 0
.
05.
Evaluate the fit of
model4
and state your conclusion.
(f) (8 points) Using
forward
step-wise regression and the outputs for all four models, choose an
appropriate model for the data using F-tests and
α
= 0
.
05.
(g) (7 points) Using
backward
step-wise regression and the outputs for all four models, choose an
appropriate model for the data using F-tests and
α
= 0
.
05. Do you choose the same model as
in part (f)?
Math 204 Final Exam
Page 5
R output for Question #1
> kruskal.test(Counts,factor(Color))
Kruskal-Wallis rank sum test
data:
Counts and factor(Color)
Kruskal-Wallis chi-squared = 16.9755, df = 3, p-value = 0.000715
> friedman.test(Counts,Color,Board)
Friedman rank sum test
data:
Counts, Color and Board
Friedman chi-squared = 13.4, df = 3, p-value = 0.003847
Plot of car stopping data for Question #4
204-Russ/stopplot.pdf
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5
10
15
20
25
0
50
100
150
200
Speed (in mph)
Distance (in ft)
Math 204 Final Exam
Page 6
R output for Question #5
> carsmodel1 = lm(dist~speed)
> summary(carsmodel1)
Call:
lm(formula = dist ~ speed)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -77.2419
9.4350
-8.187 1.15e-10 ***
speed
12.9997
0.5801
22.411
< 2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21.47 on 48 degrees of freedom
Multiple R-squared:
0.9128,Adjusted R-squared:
0.9109
F-statistic: 502.2 on 1 and 48 DF,
p-value: < 2.2e-16
> summary(carsmodel2)
Call:
lm(formula = dist ~ speed + I(speed^2))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.29299
17.17191
-0.075
0.940
speed
1.56295
2.35750
0.663
0.511
I(speed^2)
0.37866
0.07645
4.953 9.86e-06 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 17.59 on 47 degrees of freedom
Multiple R-squared:
0.9427,Adjusted R-squared:
0.9402
F-statistic: 386.5 on 2 and 47 DF,
p-value: < 2.2e-16
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 7
Plot of
carsmodel1
diagnostics for Question #5
204-Russ/stopdiags.pdf
0
50
100
150
200
250
-40
0
20
40
60
80
Fitted values
Residuals
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Residuals vs Fitted
49
2
39
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
-2
-1
0
1
2
-2
-1
0
1
2
3
Theoretical Quantiles
Standardized residuals
Normal Q-Q
49
2
39
●
-2
-1
0
1
2
3
Standardized residuals
Histogram of stdres(carsmodel1)
Standardized residuals
Frequency
-2
-1
0
1
2
3
4
0
5
10
15
20
25
Math 204 Final Exam
Page 8
R output for Question #6
> laundry = aov(dirt~detergent*temp)
> summary(laundry)
Df Sum Sq Mean Sq F value
Pr(>F)
detergent
1
20.17
20.17
9.811
0.00576 **
temp
2 200.33
100.17
48.730 5.44e-08 ***
detergent:temp
2
16.33
8.17
3.973
0.03722 *
Residuals
18
37.00
2.06
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> model.tables(laundry,type="means")
Tables of means
Grand mean
9.083333
detergent
detergent
Best for stains
Extra strength
8.167
10.000
temp
temp
Cold
Hot
Warm
5.00 11.25 11.00
detergent:temp
temp
detergent
Cold Hot
Warm
Best for stains
5.0 10.5
9.0
Extra strength
5.0 12.0 13.0
Math 204 Final Exam
Page 9
R output for Question #7
> model1 = lm(eval~age,data=TeachingRatings.subsample)
> summary(model1)
Call:
lm(formula = eval ~ age, data = TeachingRatings.subsample)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
4.019895
0.294929
13.630
<2e-16 ***
age
-0.001826
0.005992
-0.305
0.761
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5867 on 92 degrees of freedom
Multiple R-squared:
0.001009,Adjusted R-squared:
-0.00985
F-statistic: 0.0929 on 1 and 92 DF,
p-value: 0.7612
>
> model2 = lm(eval~beauty,data=TeachingRatings.subsample)
> summary(model2)
Call:
lm(formula = eval ~ beauty, data = TeachingRatings.subsample)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.91722
0.05925
66.111
<2e-16 ***
beauty
0.16415
0.07189
2.284
0.0247 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5711 on 92 degrees of freedom
Multiple R-squared:
0.05364,Adjusted R-squared:
0.04335
F-statistic: 5.214 on 1 and 92 DF,
p-value: 0.0247
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 10
R output for Question #7 (cont.)
> model3 = lm(eval~beauty+age,data=TeachingRatings.subsample)
> summary(model3)
Call:
lm(formula = eval ~ beauty + age, data = TeachingRatings.subsample)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.726730
0.314150
11.863
<2e-16 ***
beauty
0.182864
0.078235
2.337
0.0216 *
age
0.003920
0.006348
0.618
0.5384
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.573 on 91 degrees of freedom
Multiple R-squared:
0.05759,Adjusted R-squared:
0.03687
F-statistic:
2.78 on 2 and 91 DF,
p-value: 0.06729
> model4 = lm(eval~beauty*age,data=TeachingRatings.subsample)
> summary(model4)
Call:
lm(formula = eval ~ beauty * age, data = TeachingRatings.subsample)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.756911
0.306620
12.253
<2e-16 ***
beauty
-0.603821
0.338580
-1.783
0.0779 .
age
0.004364
0.006193
0.705
0.4828
beauty:age
0.017020
0.007137
2.385
0.0192 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5588 on 90 degrees of freedom
Multiple R-squared:
0.1136,Adjusted R-squared:
0.08405
F-statistic: 3.845 on 3 and 90 DF,
p-value: 0.01221
Math 204 Final Exam
Page 11
uss/TableChi1.pdf
Math 204 Final Exam
Page 12
uss/TableChi2.pdf
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 13
Russ/TableF.pdf
Math 204 Final Exam
Page 14
/TableWilcoxIND.pdf