psci200_finalexam_2023_M-Z(1)
pdf
keyboard_arrow_up
School
University of Rochester *
*We aren’t endorsed by this school
Course
200
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
11
Uploaded by MasterReindeerMaster936
PSCI 200 Final Exam
Fall 2023, Last Name: M-Z
There are four sections to this exam: T/F, MC, Problems, and Data Analysis.
T/F
and
MC
: Write your answers on this exam. Turn in the exam hardcopy.
Problems
: Write your answers on this exam or submit them as part of an R script.
Data Analysis
: Submit both (a) your R script
and
(b) the compiled pdf version of it.
For the Problems and Data Analysis sections, it is important that you show as much of
your work as possible. Unless otherwise noted, if a question involves multiple steps and you
simply write down a number for the answer, you will not receive full credit.
In the Data
Analysis section, you should submit your R code and the results it produced. Do not simply
handwrite the R code and output and submit a picture of that.
Unless other arrangements have been made, you have 3 hours to complete this exam, includ-
ing submitting it via Blackboard. Be sure to leave su
ffi
cient time to combine your answers
and upload them to Blackboard. Everyone should turn in a hardcopy of this exam with their
printed name and signature at the bottom.
For this exam, you may refer to the course texts, lecture slides, your lecture notes, workshops,
HW’s and their answers, practice sessions, and R help pages (in R, not online). You may
refer to any course material on Blackboard — except for the datadatabobata website. In
general, the exam is closed internet. You are not allowed to search the internet for answers
or refer to information on any websites, except as mentioned above. You are not allowed to
use any type of AI website or app (e.g., ChatGPT). You may not communicate with any
other individual (except for the TAs or me) about the questions and their answers.
Pace yourself appropriately.
For your reference, the approximate time to spend on each
section is:
True/False
25 min
Problems
60 min
Multiple Choice
15 min
Data Analysis
45 min
If you are having di
ffi
culty with a question, skip it and finish the other questions.
Then
come back to it. If you cannot finish a problem, show as much work as possible. If you have
a question, please raise your hand or come to the front of the room.
Honor Pledge
Before beginning the exam, you are required to print your name and sign
the honor pledge below.
“I a
ffi
rm that I will not give or receive any unauthorized help on this exam, and that all
work will be my own.”
Name:
Signature:
FAQ
Q: What do you mean by the term
?
A: If it’s a technical term or abbreviation (like “pdf” or “RV”) I can’t answer the question. It’s in
the lecture notes.
Q: Does my answer look correct?
Q: For this question, do you want us to use
method/equation/command?
A: I can’t answer questions like these during the exam.
Q: Can I use the restroom?
A: You may use the restroom whenever you’d like. No need to ask me.
Q: What does calculate “by hand” mean?
A: There may be a single R command that will perform the entire calculation for you. However,
in order to receive full credit, you need to show your work using “simpler” R commands and/or
operators corresponding to the parts of an equation shown in lecture.
Q: When you ask us to “plot variable1 versus variable2” or “plot variable1 as a func-
tion of variable2,” which variable should I use for the x and y axes?
A: I can’t answer that. There are many examples in lecture notes, practice sessions, HW’s, and
workshops.
Q: Can I check my answer using a canned R command?
A: Yes. However, I don’t need to see that part, unless I specifically asked.
Q: Can I use a cool R command I learned in another class to answer a question?
A: It depends. If the command does something relatively simple — e.g., sum the rows in a table —
and you use it as one step in a multi-step calculation, it is likely fine to use on the exam. However,
if the command performs many calculations and you don’t show your work for the multiple steps,
then you will not get credit for using such a command.
Q: Do you only have two shirts (one black, one white) that you alternate between?
A: That’s a completely irrelevant, but really good question. I actually have 4-5 of each color.
Not very creative, I know, but it works for me. Still, thanks for noticing...
2
True/False
(1 point each. 19 points total)
Write the correct answer (T or F) for each statement.
1. All categorical variables are discrete variables.
2. Consider a specific value
y
of a random variable
Y
.
The
Z
-score for
y
represents how
many standard deviations
y
is from the mean of
Y
.
3. In most election polls, the margin of error for an estimated proportion is typically the
width of its 95% confidence interval.
4. For small samples (
n <
50), we use the Normal distribution when conducting an hypoth-
esis test concerning the population mean.
5. The central limit theorem (CLT) states that as the sample size
n
! 1
, the sample mean
¯
Y
n
becomes distributed Normal[
E
(
Y
)
, V
(
Y
)
/n
].
6. The fundamental problem of causal inference is that we usually only observe one of the
potential outcomes in an experiment or for observational data.
7. The correlation between
X
and
Y
is standardized to values between 0 and 1.
8. The median is sensitive to outliers.
9. Standard deviation is a measure of central tendancy.
10. In an experiment, random assignment to treatment and control groups mitigates threats
to internal validity.
11. If our data is a random sample
X
=
{
x
1
, x
2
, ..., x
n
}
, then the sample mean
¯
X
is a random
variable.
12. A p-value is the probability of observing a value at least as large in magnitude as the test
statistic, under the assumption that the null hypothesis is true.
13. A correlation between
X
and
Y
higher than .95 is a strong indicator of causation.
14. In constructing a confidence interval, we calculate the test statistic assuming the null
hypothesis is true.
15. A cumulative probability value must be between 0 and 1.
16. In an hypothesis test, Type II error occurs when the null hypothesis is true but we reject
it.
17. In a regression, a residual ˆ
e
i
is the di
↵
erence between the observed value
y
i
and the
predicted value ˆ
y
i
.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
18. A RV
Y
that is distributed Bernoulli(.3) has variance
V
(
Y
) =
.
21.
19. A study’s research design influences whether we can make causal vs associational claims
about the relationship between an outcome variable
Y
and an independent (or treatment)
variable
X
.
Multiple Choice
(1 point each. 9 points total)
Write or circle the letter for the correct answer for each problem.
1. Suppose a RV
X
has mean
E
(
X
) = 2 and variance
V
(
X
) = 200. In repeated random
sampling of
n
= 400 observations, the sample mean
¯
X
will be approximately distributed
.
(a) Uniform with
E
(
¯
X
) = 2,
V
(
¯
X
) = 10
(b) Normal with
E
(
¯
X
) = 2,
V
(
¯
X
) = 10
(c) Normal with
E
(
¯
X
) = 2,
V
(
¯
X
) =
.
5
(d) Bernoulli with
E
(
¯
X
) =
.
2,
V
(
¯
X
) =
.
16
2. Assume we estimate the bivariate regression
y
=
β
0
+
β
1
x
+
✏
. Which of the following
statements are true about the coe
ffi
cient of determination,
r
2
?
(a) 0
r
2
1
(b)
r
2
= [
Cor
(
x, y
)]
2
(c)
r
2
= [
Cor
(
y,
ˆ
y
)]
2
(d) All of the above
3. Which of the following is
not
a common threat to the internal validity of a study?
(a) Nonrandom sample selection from the
population.
(b) Poor measurement of outcomes.
(c) Not having a control group.
(d) Nonrandom assignment of subjects to
treatment and control groups.
4. In a classical hypothesis test, when our test statistic does
not
fall in the rejection region,
we
.
(a) Accept the null hypothesis
(b) Reject the null hypothesis
(c) Fail to reject the alternative hypothesis
(d) Fail to reject the null hypothesis
4
5. Which of the following is often a companion to a dataset and provides a description of
the variables in the data?
(a) Hypertext
(b) Field guide
(c) Codebook
(d) None of the above
6. The Ordinary Least Squares (OLS) coe
ffi
cient estimates are those that
(a) Maximize
R
2
(b) Minimize the total sum of squares
(c) Minimize the sum of squared errors
(d) All of the above
7. Suppose RV
Z
has a Standardized Normal distribution. Denote the pdf as
f
(
z
) and the
cdf as
F
(
z
). Which of the following is
not
true:
(a)
f
(
z
) =
f
(
-
z
)
(b) Pr(
Z
= 2
.
576) =
.
01
(c)
p
V
(
Z
) = 1
(d)
F
(0) =
.
5
8. Which of the following can a
↵
ect how people respond to surveys?
(a) The ecological fallacy
(b) Question wording
(c) Heteroskedasticity
(d) All of the above
9. Suppose a survey dataset contains a variable
relig
that codes the religion or belief
system the respondent most closely identifies with: Atheist, Buddhist, Christian, Hindu,
Jew, Muslim, Other.
Which of the following describes the variable
relig
?
(a) Discrete
(b) Categorical
(c) Nominal
(d) All of the above
5
Problems
(26 points total)
You may write your answers on the exam or submit them as part of a compiled R script. In
either case, you must show your work in order to receive full credit.
1. The RV
Y
can take values
{
1
,
2
,
3
,
4
}
. Suppose you’re presented with the following (in-
complete) probability mass function (pmf):
Y
1
2
3
4
Pr(
Y
=
y
)
?
.4
.1
.1
(a) (1pt) What must Pr(
Y
= 1) be in order for the above to be a proper pmf?
(b) (2pt) Calculate
E
(
Y
).
(c) (2pt) Calculate
V
(
Y
).
2. Consider the following sample of data for the variable
X
:
{
3
,
2
,
1
,
3
,
20
,
4
,
3
,
2
,
1
,
3
}
Find/calculate “by hand” the following descriptive statistics for this sample. You may
use only the following operators and commands for your answer: =, +,
-
, /, *, ˆ, sum(),
sort(), and table(). For each descriptive statistic, show how you calculated it, whether as
an equation, R code, or a short description (no more than 1-2 sentences).
(a) (2pt) mean
(b) (2pt) median
(c) (1pt) mode
(d) (2pt) variance
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3. In a recent poll conducted by PEW, respondents were asked if Dua Lipa’s song “Houdini”
would sound better if it was instead sung by Lady Gaga. Of the
n
= 700 people randomly
sampled for the survey, 57% said that a Lady Gaga version would sound better.
(a) (3pt) Construct a 95% confidence interval for the proportion supporting a Lady Gaga
version of “Houdini.”
(b) (1pt) What is the margin of error for the PEW poll?
(c) (2pt) Suppose PEW wanted to guarantee a 1% margin of error for a 97% CI.
How many people would they need to survey?
4. In a Nov 21 survey by Bright Line Watch, randomly sampled respondents were asked if
they supported a ban on abortion. Below is a cross-tabulation of whether the respondent
self-identified as ideologically Conservative (No, Yes) and their support for an abortion
ban (No, Yes).
Abortion Ban?
No
Yes
Conservative?
No
1831
263
Yes
425
477
(a) (1pt) What proportion of the respondents supported an abortion ban?
7
(b) (1pt) Among nonconservatives, what proportion supported an abortion ban?
(c) (1pt) Among conservatives, what proportion supported an abortion ban?
(d) (1pt) What is the di
↵
erence in proportions supporting an abortion ban for these two
groups (conservatives vs nonconservatives)? Which group supports an abortion ban
in higher proportion?
(e) (4pts) Test the hypothesis that there is no di
↵
erence between the two groups in
the proportion supporting an abortion ban.
Formally state (write down) the null
and alternative hypotheses.
Calculate an appropriate test statistic.
Calculate the
p-value for the test statistic. Would you reject the null hypothesis (no di
↵
erence) at
the
↵
=
.
05 level of significance?
8
Additional page if needed
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Data Analysis
(16 points total)
In the final exam folder on Blackboard, you will find the dataset south60
2023.rdata. Each
row in the data represents a county in a southern US state in 1960. The dependent variable
blackregis
is the county-level percentage (0-100) of eligible Black voters that were regis-
tered to vote in 1960.
There are three explanatory variables:
whiteschoolyrs
Median years of education among White residents of the county.
nonwhiteschoolyrs
Median years of education among non-White residents of the county.
polltax
Whether the county required residents to pay a fee to vote: yes=1, no=0.
Use this dataset for the following problems. Submit your answers to these questions as both
(a) an R script
and
(b) as a compiled pdf of the R script.
Unless otherwise stated in a
question, you must show the R code and its output in order to receive full credit.
1. Descriptive statistics
(a) (1pt) Create a stargazer table of descriptive statistics for the variables in the dataset.
(You can ignore the state and county identifiers.)
Refer to the stargazer table to
answer the next two questions.
(b) (1pt) What is the average Black voter registration in southern counties in 1960?
(c) (1pt) What proportion of counties have a poll tax?
2. (2pt) Calculate “by hand” the correlation between
nonwhiteschoolyrs
and
whiteschoolyrs
.
Interpret the correlation. Is it a strong, moderate, or weak correlation? Positive or neg-
ative?
10
3. Suppose you’re interested in whether a county’s Black voter registration is related to the
level of education among the county’s White residents.
(a) (2pt) Consider the bivariate regression
blackregis
=
β
0
+
β
1
whiteschoolyrs
+
✏
Calculate the OLS estimates for
β
0
and
β
1
“by hand” – i.e., in R, but without using
a command like lm(). Confirm your results using lm().
(b) (2pt) Create a scatterplot of
blackregis
versus
whiteschoolyrs
.
Add the lm()
regression line (in red) to the plot. Is a county’s Black voter registration positively
or negatively associated with median years of education among White residents?
(c) (1pt) Calculate the predicted Black voter registration for a county where
whiteschoolyrs
is 10 years.
4. Now consider the multiple regression
blackregis
=
β
0
+
β
1
whiteschoolyrs
+
β
2
nonwhiteschoolyrs
+
β
3
polltax
+
✏
(a) (2pt) Use lm() to estimate the regression. Print the OLS estimates.
(b) (2pt) For each of the regressors, interpret (in words) the expected change in Black
voter registration given a 1-unit increase in the regressor, holding the other regressors
constant.
(c) (2pt) Calculate “by hand” the coe
ffi
cient of multiple determination for this regression
and interpret it.
11
Related Documents
Related Questions
Explain when can we use data grouping?
arrow_forward
Briefly explain the procedure you may adopt to summarize data set obtained from field study.
arrow_forward
How to interpret scatterplots Positive, Negative, or little or No Relationship?
arrow_forward
Explain what secondary analysis is and its key advantages and disadvantages
arrow_forward
How can data be compared?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780998625713/9780998625713_smallCoverImage.jpg)
Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337798310/9781337798310_smallCoverImage.jpg)
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,
Related Questions
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Elementary AlgebraAlgebraISBN:9780998625713Author:Lynn Marecek, MaryAnne Anthony-SmithPublisher:OpenStax - Rice UniversityMathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780998625713/9780998625713_smallCoverImage.jpg)
Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337798310/9781337798310_smallCoverImage.jpg)
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,