Final Exam Spring 2023
pdf
keyboard_arrow_up
School
Columbia University *
*We aren’t endorsed by this school
Course
1101
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
9
Uploaded by ChiefInternetGoldfish50
Name & UNI:
Stats 1101, Spring 2023
Final Exam
This
exam
will
not
com
pletely
re
flect
the
2023
Fall
semester
course
be
cause
of
the
change
in
text
book
and
course
struc
ture.
Vo
cab
u
lary
like
“lurk
ing
vari
able”
and
con
cepts
like
QQ
-
plots
will
not
be
tested.
z
-table
z
+0.00
+0.01
+0.02
+0.03
+0.04
+0.05
+0.06
+0.07
+0.08
+0.09
0.0
0.5000
0.5040
0.5080
0.5120
0.5160
0.5199
0.5239
0.5279
0.5319
0.5359
0.1
0.5398
0.5438
0.5478
0.5517
0.5557
0.5596
0.5636
0.5675
0.5714
0.5753
0.2
0.5793
0.5832
0.5871
0.5910
0.5948
0.5987
0.6026
0.6064
0.6103
0.6141
0.3
0.6179
0.6217
0.6255
0.6293
0.6331
0.6368
0.6406
0.6443
0.6480
0.6517
0.4
0.6554
0.6591
0.6628
0.6664
0.6700
0.6736
0.6772
0.6808
0.6844
0.6879
0.5
0.6915
0.6950
0.6985
0.7019
0.7054
0.7088
0.7123
0.7157
0.7190
0.7224
0.6
0.7257
0.7291
0.7324
0.7357
0.7389
0.7422
0.7454
0.7486
0.7517
0.7549
0.7
0.7580
0.7611
0.7642
0.7673
0.7704
0.7734
0.7764
0.7794
0.7823
0.7852
0.8
0.7881
0.7910
0.7939
0.7967
0.7995
0.8023
0.8051
0.8078
0.8106
0.8133
0.9
0.8159
0.8186
0.8212
0.8238
0.8264
0.8289
0.8315
0.8340
0.8365
0.8389
1.0
0.8413
0.8438
0.8461
0.8485
0.8508
0.8531
0.8554
0.8577
0.8599
0.8621
1.1
0.8643
0.8665
0.8686
0.8708
0.8729
0.8749
0.8770
0.8790
0.8810
0.8830
1.2
0.8849
0.8869
0.8888
0.8907
0.8925
0.8944
0.8962
0.8980
0.8997
0.9015
1.3
0.9032
0.9049
0.9066
0.9082
0.9099
0.9115
0.9131
0.9147
0.9162
0.9177
1.4
0.9192
0.9207
0.9222
0.9236
0.9251
0.9265
0.9279
0.9292
0.9306
0.9319
1.5
0.9332
0.9345
0.9357
0.9370
0.9382
0.9394
0.9406
0.9418
0.9429
0.9441
1.6
0.9452
0.9463
0.9474
0.9484
0.9495
0.9505
0.9515
0.9525
0.9535
0.9545
1.7
0.9554
0.9564
0.9573
0.9582
0.9591
0.9599
0.9608
0.9616
0.9625
0.9633
1.8
0.9641
0.9649
0.9656
0.9664
0.9671
0.9678
0.9686
0.9693
0.9699
0.9706
1.9
0.9713
0.9719
0.9726
0.9732
0.9738
0.9744
0.9750
0.9756
0.9761
0.9767
2.0
0.9772
0.9778
0.9783
0.9788
0.9793
0.9798
0.9803
0.9808
0.9812
0.9817
χ
2
distribution critical values
α
0.010
0.025
0.05
0.10
d.f.
1
6.6349
5.0239
3.8415
2.7055
2
9.2103
7.3778
5.9915
4.6052
3
11.3449
9.3484
7.8147
6.2514
4
13.2767
11.1433
9.4877
7.7794
5
15.0863
12.8325
11.0705
9.2364
6
16.8119
14.4494
12.5916
10.6446
t
distribution critical values (two-tailed)
α
0.01
0.05
0.10
d.f.
10
3.1693
2.2281
1.8125
20
2.8453
2.0860
1.7247
30
2.7500
2.0423
1.6973
40
2.7045
2.0211
1.6839
50
2.6778
2.0086
1.6759
Following the instructions on the previous page counts for three points.
Multiple Choice
Mark your answer in your blue book. Binary choice questions are each worth 2 points.
The other questions are worth 5 points each.
Figure 1: (above)
1. Figure
1
shows a symmetric distribution.
A. True
B. False
2. Based on Figure
1
, the mean will be greater than the median.
A. True
B. False
3. In a randomized controlled experiment, the researcher knows and controls how the
assignment of factor levels (or treatment and control) is done.
A. True
B. False
4. A response variable is any variable positively correlated with an explanatory variable.
A. True
B. False
5. A researcher compares the rate of cycling accidents among children who wore a bi-
cycle helmet at some point in a 24-hour period to children who never wore a helmet
in that period. The researcher reports that wearing a helmet is associated with a 10%
increase in bicycle accidents. A plausible lurking variable is
A. the baseline rate of bicycles accidents.
B. bicycle use.
Page 2
Use the following information for the next five questions.
According to new education research, learning outcomes improve if there is a disco
ball in the classroom. This is supported by observational data from different sections
of the same class at a large university and learning outcomes are measured from final-
exam performance. All classes take the same final but they have different instructors
and meet in different classrooms. There are 100 students in disco-ball-having classes
and 200 students from the unadorned classrooms.
6. Which of these belong to the control group?
A. Students who enrolled in a section with no disco ball in the classroom.
B. Students who don’t attend class.
7. Which of these
least
threatens the validity of treating this as a natural experiment?
A. Instructors did not choose if they wanted a disco ball in their classroom.
B. The sections with disco-ball-having classrooms were offered at the least de-
sirable times.
8. Which of these two tests would be best to determine if the observed difference in
learning outcomes is statistically significant?
A. A two-sample
t
test.
B. A paired
t
test.
9. A two-sided hypothesis test is performed using
α
=
0.04. Which of these is the most
plausible critical value?
A. 2.03
B. 2.07
10. Suppose the test was instead performed using
α
=
0.10. Relative to 0.04, this increases
the chance of what kind of mistake?
A. Mistakenly rejecting the null hypothesis.
B. Mistakenly failing to reject the null hypothesis.
11. What is the sample standard deviation for the values -1, 0, 1?
A.
2
3
B. 1
12. What is the sample standard deviation for the values -1, -1, 0, 0, 1, 1?
A.
s
<
1
B.
s
≥
1
13. What is the sample standard deviation for the values -1, 0, 4?
A.
s
<
1
Page 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
B.
s
≥
1
14. What is the sample variance for the values -2, 0, 2?
A. 2
B. 4
15. We conduct a two-sided
z
test and find a test statistic of .04. For
α
=
.05, do we reject
the null hypothesis?
A. Yes, reject the null.
B. No, fail to reject the null.
16. We conduct a two-sided
z
test and find a P-value of 0.04. For
α
=
.05, do we reject the
null hypothesis?
A. Yes, reject the null.
B. No, fail to reject the null.
17. Hospital records show that 15% of all patients are admitted for surgical treatment,
8% are admitted for obstetrics, and 2% receive both obstetrics and surgical treatment.
What percentage are admitted for surgery or obstetrics?
A. 21%
B. 25%
18. Hospital records show that 15% of all patients are admitted for surgical treatment,
8% are admitted for obstetrics, and 2% receive both obstetrics and surgical treatment.
Are these events (admission for surgical treatment and admission for obstetrics) in-
dependent?
A. Yes, independent.
B. No, not independent.
19. Which of these is not an assumption for simple linear regression?
A.
r
xy
̸
=
0
B. There is a linear relationship between
x
and
y
.
20. Which of these is not an assumption for inference for linear regression?
A.
r
xy
>
0.5
B. The residuals are approximately normally distributed.
21. Suppose that ˆ
y
=
1
+
x
1
but that in alternative models, ˆ
y
=
1
+
0
x
1
+
2
x
2
and that
ˆ
y
=
1
+
2
x
2
.
A. The residuals from ˆ
x
1
=
b
0
+
b
1
x
2
have no linear relationship with the resid-
uals from ˆ
y
=
1
+
2
x
2
.
Page 4
B. The residuals from ˆ
x
2
=
b
0
+
b
1
x
1
have no linear relationship with the resid-
uals from ˆ
y
=
1
+
x
1
.
22. QQ-plots are only valid when the data is normally distributed.
A. True
B. False
23. After conducting a one-sided hypothesis test with a null hypothesis of
µ
=
2, you
find a P-value of 0.99. Suppose the sample mean was 1. Which of these could have
been the alternative hypothesis?
A.
µ
>
2
B.
µ
<
2
24. Which of these is most indicative of an overfit model?
A. The correlation coefficient between
x
and
y
is
r
=
1.
B. The
R
2
falls substantially when calculating it using test data vs when using
training data.
Use the following information for the next five questions.
X
is drawn from a normal distribution with mean zero and standard deviation one.
Y
is equal to
X
if
X
is positive. If
X
is negative, then
Y
=
1
−
X
.
25.
X
and
Y
are independent.
A. True
B. False
26. What is
P
(
X
>
1)?
A. 0.16
B. 0.84
27. What is
P
(
Y
>
1
2
| |
X
|
>
1
2
)?
A.
1
2
B. 1
28. What is
P
(
X
<
2)
A. .0250
B. .0500
C. .9500
D. .9772
E. .9902
29. What is
P
(
Y
>
2
| |
X
|
>
0.675)?
Page 5
A. Below .05
B. Between .05 and 0.32
C. Between 0.32 and 0.4
D. Between 0.4 and 0.5
E. Above 0.5
30. Which of these tests would you expect to have the higher power? Alpha (
α
) refers to
the significance level for the test.
A. A test with many observations and high alpha.
B. A test with few observations and low alpha.
C. A test with many observations and low alpha.
D. A test with few observations and high alpha.
E. Any test with
α
=
0.05.
31. A null hypothesis is rejected with a P-value of 0.01. Which of these statements is true?
A. There is just a 1% chance the the null hypothesis is true.
B. If the null hypothesis is true, there is a 99% chance the sample data would
be observed.
C. If the null hypothesis is true, there is a 1% chance of observing a sample
statistic as extreme.
D. If the alternative hypothesis is true, we would fail to reject the null 1% of the
time.
E. A and C
32. Which of these would not change the
R
2
in a linear regression, relative to a baseline
model with ten observations, one quantitative predictor, and one indicator variable?
You are comparing the baseline
R
2
to that in a newly fitted model. Assume the slope
on the predictor variable is not zero in the baseline.
A. Adding seven additional predictor variables that are pure noise, being un-
related to the dependent variable.
B. Dropping the point with the greatest residual.
C. Dropping the point with the smallest residual.
D. Reversing the indicator variable so that 1s are now coded as 0s and 0s now
coded as 1s.
E. B and C
33. A logistic regression is preferred to a simple linear regression when
A. The equal variance condition is not satisfied.
B. The straight-enough condition is satisfied.
Page 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
C. The dependent variable only takes on two values.
D. The independent variable only takes on two values.
E. All of the above.
Use the following information for the next four questions.
An experiment is run where some participants receive therapy, some receive a puppy,
some receive both therapy and a puppy, and a control group receives nothing. Re-
searchers are interested in the effect on
y
, a measure of antisocial behavior, where a
higher
y
means more antisocial behavior. Consider three models fit to the data. Each
controls for participant income, because one treatment group may have a higher av-
erage income.
ˆ
y
=
a
0
+
a
1
Income
+
a
2
TherapyOnly
+
a
3
PuppyOnly
+
a
4
TherapyAndPuppy
(1)
ˆ
y
=
b
0
+
b
1
Income
+
b
2
Therapy
+
b
3
Puppy
(2)
ˆ
y
=
c
0
+
c
1
Income
+
c
2
TherapyAndOrPuppy
(3)
34. Using model 1, what is the predicted value of
y
for a participant with
Income
=
1
who was in the therapy-only group?
A.
a
0
B.
a
0
+
a
2
C.
a
0
+
a
1
+
a
2
D.
a
1
+
a
2
E.
a
2
35. Which of these is necessarily true?
A.
a
1
=
b
1
B.
a
4
=
a
2
+
a
3
C.
b
2
+
b
3
=
c
2
D.
b
2
=
a
2
−
a
3
E. None of the above
Open Response
36. (5 points) Refer again to the setup for question 34.
Suppose
a
4
<
0.
What is the
interpretation? You do not need to address statistical significance.
37. (5 points) Refer again to the setup for question 34. Make an argument for why the
R
2
from model 1 should be at least as high as the
R
2
from model 3.
Page 7
38. (5 points) Randomized experiments are generally considered to be more trustworthy
than observational studies as causal evidence. Give one reason with a brief explana-
tion.
39. (4 points) Describe a practical reason why researchers might rely on observational
data instead of an experiment.
40. (5 points) Jonathan argues that social media is bad for mental health. Noah argues
that social media isn’t inherently dangerous, but that having it accessible on a smart-
phone is a problem.
An experiment had some participants deactivate their social
media accounts and others continued with normal social media use. The assignment
of social media vs. no social media was done randomly. At the end, those who deac-
tivated their accounts were shown to have better mental health. Does the experiment
provide strong evidence for either’s claim? Provide a brief explanation.
41. (20 points) Students can fail a course because of deficient understanding or because
of difficult circumstances. A student of deficient understanding will fail 100% of the
time. Students who understand the material will encounter bad luck that causes them
to fail 10% of the time. Think of this as a hypothesis test. Let the null hypothesis be
that a student understands the material. When a student fails, we can say that we’ve
rejected the null hypothesis.
a.) A student of deficient understanding will fail 100% of the time. What statistical
concept does this correspond to? You do not have to provide an explanation.
b.) A student who understands the material will fail 10% of the time. What statisti-
cal concept does this correspond to? You do not have to provide an explanation.
c.) A professor concludes that, because a student failed, a student can be judged
as having deficient understanding with 100% certainty. What kind of mistake is
this? Provide a brief explanation that describes the mistake.
d.) Calculate the probability that a failing student has a deficient understanding if
we start with a belief that 10% of students will have a deficient understanding.
Show your work.
42. (30 points) Researchers want to know if a preference for dogs or cats is related to the
preference for coffee or tea. Researchers ask people leaving a convention center in Las
Vegas. The researcher’s population of interest is all residents of Las Vegas.
Sixteen out of thirty-two respondents prefer coffee to tea. Eighteen out of thirty-two
respondents prefer cats to dogs.
a.) Construct two different 2
×
2 contingency tables that are consistent with the
above.
b.) Assuming independence between the two variables, find the expected cell
counts.
c.) Explain if each of the assumptions for a
χ
2
test are met, including if the sample
is representative of the population of interest.
Page 8
d.) Suppose that twelve of the thirty-two subjects prefer cats
and
prefer coffee. Cal-
culate the
χ
2
statistic.
e.) State the relevant critical value if
α
=
0.05.
f.) Report the conclusion of the
χ
2
test.
Page 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Recommended textbooks for you
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillAlgebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage