practicefinalSolutions
pdf
keyboard_arrow_up
School
Dallas County Community College *
*We aren’t endorsed by this school
Course
2100
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
13
Uploaded by BarristerButterflyPerson278
STAT 2080
Practice Final Exam
August, 2023
DALHOUSIE UNIVERSITY FACULTY OF SCIENCE
Department of Mathematics and Statistics
STAT 2080 / MATH 2080 Practice Final Examination
Date and Time:
August 2023
NAME (PRINT CLEARLY):
BANNER ID:
SIGNATURE:
You may use a 2-sided formula sheet, and a calculator. There are 13pages in this exam.
The number of points allocated to each problem is indicated. To get maximum credit,
SHOW
ALL YOUR WORK
.
Question:
1
2
3
4
5
6
7
8
9
10
11
Total
Points:
6
5
4
2
10
8
6
12
7
6
5
71
Score:
1. Answer the following multiple choice questions (circle the correct one).
(a) (1 point) If two explanatory variables have a correlation coefficient of 1, this can
cause a problem in a linear model because of:
A. non-independence
B. non-normality
C. multicollinearity
D. all of the above
(b) (1 point) If
R
2
= 0
in a linear model with one explanatory variable what does that
tell us?
A.
SSR
= 0
B. The proportion of variation explained by the linear model is zero.
C. There is no linear relationship between
x
and
y
D. all of the above
(c) (1 point) In hypothesis testing as the test statistic gets larger
A. The p value gets larger
B. The p value gets smaller
C. The p value may get smaller or larger
D. The linear relationship grows stronger
(d) (1 point) Non-parametric statistical tests are used
A. because they are fast
B. because they do not assume a specific distribution for the data
C. because they have a lower probability of Type I error
D. because they have a lower probability of Type II error
(e) (1 point) In one-way ANOVA
MSTr
represents:
A. The variation between groups
B. The variation within groups
C. The test statistic
D. The pooled sample standard deviation
(f) (1 point) A 90% confidence interval
A. is wider than a 95% confidence interval
B. contains the true parameter estimate with 90% probability
C. contains 90% of the data
D. is narrower than a 95% confidence interval
Page 1 of 13
STAT 2080
Practice Final Exam
August, 2023
2. A study on reaction times was conducted. The reaction times for 21 professional athletes
and 21 members of the general public were tested and recorded. A 90% confidence for
the for the true mean difference in reaction time in milliseconds (
µ
p
−
µ
g
, professional -
general) was found to be:
(
−
30
,
−
10)
(a) (1 point) Is this a matched pairs or pooled standard deviation confidence interval?
Solution:
pooled standard deviation
(b) (1 point) What is the mean difference of the reaction times between professional
athletes and the general public?
Solution:
(¯
x
−
critical value
×
SE
) + (¯
x
+
critical value
×
SE
) = 2¯
x
−
30 + (
−
10) =
−
40
¯
x
=
−
40
/
2 =
−
20
(c) (1 point) What is the standard error that was used to build the confidence interval?
Solution:
t
0
.
1
/
2
,
40
= 1
.
684
−
20
−
1
.
684
×
SE
=
−
30
SE
=
−
10
−
1
.
684
= 5
.
94
(d) (2 points) Test the hypothesis that professional athletes have a lower reaction time
than the general public using
α
= 0
.
05
.
Solution:
H
0
:
µ
p
−
µ
g
= 0
vs.
H
a
:
µ
p
−
µ
g
<
0
t
=
−
20
5
.
94
=
−
3
.
36
Since
|−
3
.
36
|
>
2
.
021
we reject
H
0
suggesting there is evidence that professional
althetes have lower reaction times.
Page 2 of 13
STAT 2080
Practice Final Exam
August, 2023
3. Use the following two plots to answer the questions below:
-10
0
10
20
30
40
50
-100
-50
0
50
100
150
x vs. y
x
y
-50
0
50
100
-100
-50
0
50
Fitted Values vs. Residuals
Fitted Values
Residuals
(a) (1 point) The point marked by the * in the left hand plot is an outlier removed
from the linear model. Would removing the outlier increase, decrease or have no
effect on Pearson’s correlation coefficient for
x
and
y
?
Solution:
Increase
(b) (1 point) How would including the outlier impact the slope of the linear model?
Solution:
It would increase the slope
(c) (2 points) Based on the two plots does a linear model appear to fit the data well?
Why or why not?
Solution:
Based on the two plots a linear model seems like a poor fit.
The
residuals and plots of the data suggest a non-linear trend between
x
and
y
4. (2 points) Sketch an example of a fitted values vs. residual plot in the empty plot below
that breaks the assumption of constant variance.
Page 3 of 13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT 2080
Practice Final Exam
August, 2023
Fitted Values vs. Residuals
Fitted Values
Residuals
Solution:
Something like:
Fitted Values vs. Residuals
Fitted Values
Residuals
5. A biologist spent a week collecting and categorizing various wild flower samples. They
have noticed that all the samples are either red, blue or yellow in color and have four,
eight or ten petals. The biologist is interested in determining whether or not there is a
relationship between color and the number of petals. The table of the biologists counts
for each combination of factors is given below:
petals
colors
eight four ten
blue
143
23
39
red
66
10
23
yellow
78
7
11
(a) (6 points) Test the biologists question of interest using
α
= 0
.
1
:
Solution:
H
0
:
There is no relationship between number of petals and color vs.
H
a
:
There is a relationship between number of petals and color.
Use the two-way contingency table test. First find the row and column sums as
well as the total count. Row Sums:
Page 4 of 13
STAT 2080
Practice Final Exam
August, 2023
blue
red
yellow
205
99
96
Column Sums:
eight
four
ten
287
40
73
And the total is 400.
Then find the expected counts for each cell:
e
ij
=
(ith row count)(jth column count)
total count
eight
four
ten
blue
147.09
20.5
37.4
red
71.03
9.9
18.07
yellow
68.88
9.6
17.52
Then find the test statistic:
χ
2
obs
=
r
X
i
=1
c
X
j
=1
(
x
ij
−
e
ij
)
2
e
ij
= 6
.
5273
Compare to
χ
2
(
r
−
1)(
c
−
1)
=
χ
2
4
distribution, the p value is > 0.1 from table.
We fail to reject the null hypothesis suggesting there is no relationship between
color and number of petals.
(b) (4 points) The biologist now wants to test the idea that blue flowers can be found
50% of the time and red and yellow flowers 25% of the time. Perform the hypothesis
test to help the biologist. Use
α
= 0
.
1
.
Solution:
H
0
:
p
b
= 0
.
5
, p
r
= 0
.
25
, p
y
= 0
.
25
vs.
H
a
:
the probabilities are not as stated.
First find expected counts
n
= 400
e
b
= 400(0
.
5) = 200
e
y
=
e
r
= 400(0
.
25) = 100
Then calculate the test statistic:
χ
2
obs
=
(205
−
200)
2
200
+
(99
−
100)
2
100
+
(96
−
100)
2
100
= 0
.
295
Which we compare to
χ
2
2
which is greater than > 0.1 from table so we fail to
reject the null hypothesis.
Page 5 of 13
STAT 2080
Practice Final Exam
August, 2023
6. Use the following data to anwser the questions below:
x
y
4
12
2
4
9
10
1
3
5
9
(a) (3 points) Calculate Pearson’s correlation coefficient
r
. Say whether the data has
a positive or negative linear relationship or none at all.
Solution:
∑
x
= 21
,
∑
x
2
= 127
,
∑
y
= 38
,
∑
y
2
= 350
,
∑
xy
= 194
SS
xx
=
X
x
2
−
(
∑
x
)
2
n
= 127
−
(21)
2
5
= 38
.
8
SS
yy
= 350
−
(38)
2
5
= 61
.
2
SS
xy
=
X
xy
−
1
n
X
x
X
y
= 194
−
1
5
(21)(38) = 34
.
4
r
=
SS
xy
p
SS
xx
SS
yy
=
34
.
4
√
38
.
8
×
61
.
2
= 0
.
706
There is a positive linear relationship.
(b) (3 points) Calculate the estimates of
β
1
and
β
0
for a linear model line of best fit
and write down the equation of the line.
Solution:
ˆ
β
1
=
SS
xy
SS
xx
=
34
.
4
38
.
8
= 0
.
887
β
0
=
∑
y
n
−
ˆ
β
1
∑
x
n
=
38
5
−
0
.
887
21
5
= 3
.
87
y
= 3
.
87 + 0
.
887
x
(c) (1 point) Predict
y
for when
x
= 3
.
Solution:
ˆ
y
= 3
.
87 + 0
.
887(3) = 6
.
531
(d) (1 point) Calculate the residual for
x
= 4
.
Solution:
ˆ
y
= 3
.
87 + 0
.
887(4) = 7
.
418
ˆ
ε
= 12
−
7
.
418 = 4
.
582
Page 6 of 13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT 2080
Practice Final Exam
August, 2023
7. (6 points) The biologist from question 5 is back and needs your help.
For the last 8
years they have counted the number of offspring born to 3 different gorilla troops. They
are interested in determining whether the average number of offspring each year among
troops is different or not. Below is the number of offspring born in each troop for the
past eight years:
troop1
troop2
troop3
3
2
3
1
2
2
1
2
3
1
4
1
3
3
3
2
1
1
1
6
2
2
3
0
Using
α
= 0
.
05
determine whether the average number of offspring in each troop every
year is different or not.
Solution:
Since this is count data use Kruskal-Wallis test
H
0
:
µ
1
=
µ
2
=
µ
3
vs.
H
a
:
µ
i
̸
=
µ
j
for some
i
First rank the data from smallest to largest:
troop1
troop2
troop3
19
12
19
5
12
12
5
12
19
5
23
5
19
19
19
12
5
5
5
24
12
12
19
1
Get the average rank of each group:
troop1
10.25
troop2
15.75
troop3
11.5
Then calculate the test statistic:
K
=
12
n
(
n
+ 1)
k
X
i
=1
n
i
(
¯
R
i
−
n
+ 1
2
)
2
K
=
12
600
8(10
.
25
−
12
.
5)
2
+ 8(15
.
75
−
12
.
5)
2
+ 8(11
.
5
−
12
.
5)
2
= 2
.
66
Then we compare to
χ
2
k
−
1
=
χ
2
2
and p is greater than 0.1 so we fail to reject the null
hypothesis.
Page 7 of 13
STAT 2080
Practice Final Exam
August, 2023
8. Data was collected on the final selling price of 50 homes recently sold in the Halifax area.
In addition the number of bedrooms , the size of the house in square feet as well as if
the house has a garage or not.
(a) (2 points) What kind of variable is required to represent if the house has a garage
or not in a linear model? Show how you would represent it in the model.
Solution:
It needs to be a dummy variable or an indicator variable.
It would be represented as something like (or the opposite)
x
garage
=
(
1
if it has a garage
0
if it does not have a garage
(b) (2 points) The following linear models were fit to the data and the SSEs recorded.
Let
y
be the final sale price in tens of thousands,
x
1
be the number of bedrooms,
x
2
be the size of house in square feet and
x
3
is whether or not the house has a garage.
Are all the models here nested within each other? Why or why not?
Model
SSE
y
=
β
0
+
β
1
x
1
+
β
2
x
2
+
β
3
x
3
+
β
4
x
2
x
1
+
ε
14052.29
y
=
β
0
+
β
1
x
1
+
β
2
x
2
+
β
3
x
3
+
ε
37210.43
y
=
β
0
+
β
2
x
2
+
ε
1399015
y
=
β
0
+
β
1
x
1
+
ε
497424.3
Solution:
Yes all the models are nested. You see this by setting
β
4
= 0
you
get the second model, setting
β
1
, β
3
and
β
4
to zero gives you the third model
and setting
β
2
, β
3
and
β
4
to zero gives you the fourth model.
(c) (4 points) Using the table in part (b) test whether the interaction between number
of bedrooms and the square footage of the house is significant or not at
α
= 0
.
1
Solution:
Use the Partial F test.
H
0
:
β
4
= 0
vs.
H
a
:
β
4
̸
= 0
F
obs
=
(
SSE
2
−
SSE
1
)
/
(
k
−
m
)
MSE
1
Here
k
= 4
,
m
= 3
MSE
1
=
SSE
1
n
−
k
−
1
=
14052
.
29
50
−
4
−
1
= 312
.
27
F
obs
=
(37210
.
43
−
14052
.
29)
/
(4
−
3)
312
.
27
= 74
.
16
Compare to
F
k
−
m,n
−
k
−
1
=
F
1
,
45
shows that the p value is less than 0.001 and
the interaction is significant.
(d) (4 points) For these linear models,
SST
= 2022185
. Which of these models has the
highest
R
2
and the highest adjusted
R
2
?
Solution:
R
2
1
= 1
−
14052
.
23
2022185
= 0
.
9931
R
2
2
= 1
−
37210
.
43
2022185
= 0
.
9816
R
2
3
= 1
−
1399015
2022185
= 0
.
3081
Page 8 of 13
STAT 2080
Practice Final Exam
August, 2023
R
2
4
= 1
−
497424
.
3
2022185
= 0
.
7540
R
2
adj,
1
= 1
−
(1
−
R
2
1
)
n
−
1
n
−
k
= 1
−
(1
−
0
.
9931)
49
45
= 0
.
9925
R
2
adj,
2
= 1
−
(1
−
R
2
2
)
n
−
1
n
−
k
= 1
−
(1
−
0
.
9816)
49
46
= 0
.
9804
R
2
adj,
3
= 1
−
(1
−
R
2
3
)
n
−
1
n
−
k
= 1
−
(1
−
0
.
3081)
49
48
= 0
.
3937
R
2
adj,
4
= 1
−
(1
−
R
2
4
)
n
−
1
n
−
k
= 1
−
(1
−
0
.
7540)
49
48
= 0
.
7489
Model 1 has the highest
R
2
and adjusted
R
2
.
Page 9 of 13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT 2080
Practice Final Exam
August, 2023
9. An experiment was ran testing the leavening power of 5 different brands of baking
powder.
The same cupcake recipe (making one dozen) was baked with each of the
powders and the height of each cupcake measured (in mm). A linear model was fit to
the data using R and the output is shown below.
Call:
lm(formula = y ~ powders)
Residuals:
Min
1Q
Median
3Q
Max
-6.0810 -2.1314
0.3569
1.8447
6.4687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
29.5154
0.8333
35.420
< 2e-16 ***
powderspowder2
20.1413
1.1785
17.091
< 2e-16 ***
powderspowder3
24.0130
1.1785
20.376
< 2e-16 ***
powderspowder4
15.6148
1.1785
13.250
< 2e-16 ***
powderspowder5
6.0756
1.1785
5.155 3.55e-06 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.887 on 55 degrees of freedom
Multiple R-squared:
0.912,Adjusted R-squared:
0.9056
F-statistic: 142.5 on 4 and 55 DF,
p-value: < 2.2e-16
(a) (1 point) What is the mean height of cupcakes baked with baking powder 4?
Solution:
¯
x
4
= 29
.
5154 + 15
.
6148 = 45
.
1302
mm
(b) (1 point) What is the mean height of cupcakes baked with baking powder 1?
Solution:
¯
x
1
= 29
.
5154
(c) (1 point) What is the SSE of the regression?
Solution:
SSE
=
Residual Std. Error
2
×
df
Err
= 2
.
887
2
×
55 = 458
.
4123
(d) (3 points) Test whether baking powder number 5 increases average cupcake height
by 5 mm or not using
α
= 0
.
05
.
Solution:
H
0
:
β
4
= 5
vs.
H
a
:
β
4
̸
= 5
t
=
ˆ
β
4
−
β
4
,
0
SE
(
ˆ
β
4
)
=
6
.
0756
−
5
1
.
1785
= 0
.
913
Then compare to
t
n
−
k
−
1
=
t
55
and the p value is between 0.1 and and .25 so we
fail to reject the null hypothesis.
(e) (1 point) Construct the 95% confidence interval for the slope of baking powder 3
Page 10 of 13
STAT 2080
Practice Final Exam
August, 2023
Solution:
Since there is no
t
55
on table, round down and use
t
0
.
05
/
2
,
40
=2.021
24
.
0130
±
2
.
021
×
1
.
1785 = (21
.
63
,
26
.
39)
Page 11 of 13
STAT 2080
Practice Final Exam
August, 2023
10. A study was conducted with two factors, A and B with 2 and 3 levels respectively. 15
replications of each combination of factors was used. The sum of squares for a two-way
ANOVA with interaction is given below:
A
12.99
B
147.21
Interaction
2.02
Errors
358.75
(a) (2 points) Is the interaction significant using
α
= 0
.
05
?
Solution:
H
0
:
γ
ij
= 0
vs.
H
a
:
γ
ij
̸
= 0
The test for the interaction is
F
obs
=
MS
Interaction
MSE
The degrees of freedom of the interaction is
df
inter
= (
A
−
1)(
B
−
1) = (1)(2) = 2
So
MS
Interaction
= 2
.
02
/
2 = 1
.
01
The degrees of freedom for the errors is
df
Err
=
AB
(
K
−
1) = (2)(3)(14) = 84
So
MSE
=
358
.
75
84
= 4
.
27
and
F
obs
= 1
.
01
/
4
.
27 = 0
.
237
Which comes from a F
2,84
distribution and using the more conservative 60 de-
nominator degrees of freedom from the table we see that the p value is greater
than 0.25, so we fail to reject
H
0
and the interaction is not significant.
(b) (4 points) Construct the two-way ANOVA table without the interaction term. Are
any of the factor effects siginficant at
α
= 0
.
05
?
Solution:
Source
df
SS
MS
F
A
1
12.99
12.99
12.99/4.2=3.09
B
2
147.21
73.61
73.61/4.2=17.52
Error
84+2=86
358.75+2.02=360.8
360.8/86=4.20
From F table Factor A has a p value > 0.05 and so is not significant (compare
to F
1,86
but use F
1,60
from table), Factor B has a p value < 0.01 from table
(compare to F
2,86
) which is significant.
Page 12 of 13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT 2080
Practice Final Exam
August, 2023
11. A running coach has developed two new training plans. To assess the effectiveness of
her new plans she recorded the time it took for her students to run 10 km after a few
months of training. One group of 15 students were given the old training plan and used
as a control. The control students averaged 43 minutes with a standard deviation of 5
minutes. 16 students used the first new plan and they averaged 10 km in 40 minutes
with a standard deviation of 6 minutes. Her remaining 12 students used the other new
plan and averaged 10 km in 39 minutes with a standard deviation of 7 minutes.
(a) (4 points) Construct the 90% simultanous confidence intervals for the average 10
km running time for students using the new plan vs. the old plan.
Solution:
90% CI, so
α
= 0
.
1
.
Find the pooled variance
s
2
p
=
(
n
1
−
1)
s
2
1
+ (
n
2
−
1)
s
2
2
+ (
n
3
−
1)
s
2
3
n
1
+
n
2
+
n
3
−
k
=
(14)5
2
+ (15)6
2
+ (11)7
2
15 + 16 + 12
−
3
=
1429
40
= 35
.
725
We need to correct
α
to account for the fact we are doing two comparisons:
α
∗
=
0
.
10
2
= 0
.
05
Then
t
α
∗
/
2
,n
−
k
=
t
0
.
05
/
2
,
40
= 2
.
021
from the t table.
Then the confidence intervals are:
(43
−
40)
±
2
.
021
×
√
35
.
725
p
1
/
15 + 1
/
16 = (
−
1
.
341
,
7
.
341)
and
(43
−
39)
±
2
.
021
×
√
35
.
725
p
1
/
15 + 1
/
12 = (
−
0
.
678
,
8
.
678)
(b) (1 point) At
α
= 0
.
1
do either of the training plans result in a statistically siginfi-
cant average running time different from the control plan?
Solution:
No, since both 90% CI intervals contain zero.
Page 13 of 13
Related Documents
Recommended textbooks for you

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Recommended textbooks for you
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell