psci200_finalexam_2023_A-L

pdf

School

University of Rochester *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

11

Uploaded by MasterReindeerMaster936

Report
PSCI 200 Final Exam Fall 2023, Last Name: A-L There are four sections to this exam: T/F, MC, Problems, and Data Analysis. T/F and MC : Write your answers on this exam. Turn in the exam hardcopy. Problems : Write your answers on this exam or submit them as part of an R script. Data Analysis : Submit both (a) your R script and (b) the compiled pdf version of it. For the Problems and Data Analysis sections, it is important that you show as much of your work as possible. Unless otherwise noted, if a question involves multiple steps and you simply write down a number for the answer, you will not receive full credit. In the Data Analysis section, you should submit your R code and the results it produced. Do not simply handwrite the R code and output and submit a picture of that. Unless other arrangements have been made, you have 3 hours to complete this exam, includ- ing submitting it via Blackboard. Be sure to leave su ffi cient time to combine your answers and upload them to Blackboard. Everyone should turn in a hardcopy of this exam with their printed name and signature at the bottom. For this exam, you may refer to the course texts, lecture slides, your lecture notes, workshops, HW’s and their answers, practice sessions, and R help pages (in R, not online). You may refer to any course material on Blackboard — except for the datadatabobata website. In general, the exam is closed internet. You are not allowed to search the internet for answers or refer to information on any websites, except as mentioned above. You are not allowed to use any type of AI website or app (e.g., ChatGPT). You may not communicate with any other individual (except for the TAs or me) about the questions and their answers. Pace yourself appropriately. For your reference, the approximate time to spend on each section is: True/False 25 min Problems 60 min Multiple Choice 15 min Data Analysis 45 min If you are having di ffi culty with a question, skip it and finish the other questions. Then come back to it. If you cannot finish a problem, show as much work as possible. If you have a question, please raise your hand or come to the front of the room. Honor Pledge Before beginning the exam, you are required to print your name and sign the honor pledge below. “I a ffi rm that I will not give or receive any unauthorized help on this exam, and that all work will be my own.” Name: Signature:
FAQ Q: What do you mean by the term ? A: If it’s a technical term or abbreviation (like “pdf” or “RV”) I can’t answer the question. It’s in the lecture notes. Q: Does my answer look correct? Q: For this question, do you want us to use method/equation/command? A: I can’t answer questions like these during the exam. Q: Can I use the restroom? A: You may use the restroom whenever you’d like. No need to ask me. Q: What does calculate “by hand” mean? A: There may be a single R command that will perform the entire calculation for you. However, in order to receive full credit, you need to show your work using “simpler” R commands and/or operators corresponding to the parts of an equation shown in lecture. Q: When you ask us to “plot variable1 versus variable2” or “plot variable1 as a func- tion of variable2,” which variable should I use for the x and y axes? A: I can’t answer that. There are many examples in lecture notes, practice sessions, HW’s, and workshops. Q: Can I check my answer using a canned R command? A: Yes. However, I don’t need to see that part, unless I specifically asked. Q: Can I use a cool R command I learned in another class to answer a question? A: It depends. If the command does something relatively simple — e.g., sum the rows in a table — and you use it as one step in a multi-step calculation, it is likely fine to use on the exam. However, if the command performs many calculations and you don’t show your work for the multiple steps, then you will not get credit for using such a command. Q: Do you only have two shirts (one black, one white) that you alternate between? A: That’s a completely irrelevant, but really good question. I actually have 4-5 of each color. Not very creative, I know, but it works for me. Still, thanks for noticing... 2
True/False (1 point each. 19 points total) Write the correct answer (T or F) for each statement. 1. If our data is a random sample X = { x 1 , x 2 , ..., x n } , then the sample mean ¯ X is a random variable. 2. A p-value is the probability of observing a value at least as large in magnitude as the test statistic, under the assumption that the null hypothesis is true. 3. A correlation between X and Y higher than .95 is a strong indicator of causation. 4. In constructing a confidence interval, we calculate the test statistic assuming the null hypothesis is true. 5. A cumulative probability value must be between 0 and 1. 6. In an hypothesis test, Type II error occurs when the null hypothesis is true but we reject it. 7. In a regression, a residual ˆ e i is the di erence between the observed value y i and the predicted value ˆ y i . 8. A RV Y that is distributed Bernoulli(.3) has variance V ( Y ) = . 21. 9. A study’s research design influences whether we can make causal vs associational claims about the relationship between an outcome variable Y and an independent (or treatment) variable X . 10. All categorical variables are discrete variables. 11. Consider a specific value y of a random variable Y . The Z -score for y represents how many standard deviations y is from the mean of Y . 12. In most election polls, the margin of error for an estimated proportion is typically the width of its 95% confidence interval. 13. For small samples ( n < 50), we use the Normal distribution when conducting an hypoth- esis test concerning the population mean. 14. The central limit theorem (CLT) states that as the sample size n ! 1 , the sample mean ¯ Y n becomes distributed Normal[ E ( Y ) , V ( Y ) /n ]. 15. The fundamental problem of causal inference is that we usually only observe one of the potential outcomes in an experiment or for observational data. 16. The correlation between X and Y is standardized to values between 0 and 1. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
17. The median is sensitive to outliers. 18. Standard deviation is a measure of central tendancy. 19. In an experiment, random assignment to treatment and control groups mitigates threats to internal validity. Multiple Choice (1 point each. 9 points total) Write or circle the letter for the correct answer for each problem. 1. Suppose RV Z has a Standardized Normal distribution. Denote the pdf as f ( z ) and the cdf as F ( z ). Which of the following is not true: (a) f ( z ) = f ( - z ) (b) Pr( Z = 2 . 576) = . 01 (c) p V ( Z ) = 1 (d) F (0) = . 5 2. Which of the following can a ect how people respond to surveys? (a) The ecological fallacy (b) Question wording (c) Heteroskedasticity (d) All of the above 3. Suppose a survey dataset contains a variable relig that codes the religion or belief system the respondent most closely identifies with: Atheist, Buddhist, Christian, Hindu, Jew, Muslim, Other. Which of the following describes the variable relig ? (a) Discrete (b) Categorical (c) Nominal (d) All of the above 4
4. In a classical hypothesis test, when our test statistic does not fall in the rejection region, we . (a) Accept the null hypothesis (b) Reject the null hypothesis (c) Fail to reject the alternative hypothesis (d) Fail to reject the null hypothesis 5. Which of the following is often a companion to a dataset and provides a description of the variables in the data? (a) Hypertext (b) Field guide (c) Codebook (d) None of the above 6. The Ordinary Least Squares (OLS) coe ffi cient estimates are those that (a) Maximize R 2 (b) Minimize the total sum of squares (c) Minimize the sum of squared errors (d) All of the above 7. Suppose a RV X has mean E ( X ) = 2 and variance V ( X ) = 200. In repeated random sampling of n = 400 observations, the sample mean ¯ X will be approximately distributed . (a) Uniform with E ( ¯ X ) = 2, V ( ¯ X ) = 10 (b) Normal with E ( ¯ X ) = 2, V ( ¯ X ) = 10 (c) Normal with E ( ¯ X ) = 2, V ( ¯ X ) = . 5 (d) Bernoulli with E ( ¯ X ) = . 2, V ( ¯ X ) = . 16 8. Assume we estimate the bivariate regression y = β 0 + β 1 x + . Which of the following statements are true about the coe ffi cient of determination, r 2 ? (a) 0 r 2 1 (b) r 2 = [ Cor ( x, y )] 2 (c) r 2 = [ Cor ( y, ˆ y )] 2 (d) All of the above 9. Which of the following is not a common threat to the internal validity of a study? (a) Nonrandom sample selection from the population. (b) Poor measurement of outcomes. (c) Not having a control group. (d) Nonrandom assignment of subjects to treatment and control groups. 5
Problems (26 points total) You may write your answers on the exam or submit them as part of a compiled R script. In either case, you must show your work in order to receive full credit. 1. Consider the following sample of data for the variable X : { 3 , 2 , 1 , 3 , 20 , 4 , 3 , 2 , 1 , 3 } Find/calculate “by hand” the following descriptive statistics for this sample. You may use only the following operators and commands for your answer: =, +, - , /, *, ˆ, sum(), sort(), and table(). For each descriptive statistic, show how you calculated it, whether as an equation, R code, or a short description (no more than 1-2 sentences). (a) (2pt) mean (b) (2pt) median (c) (1pt) mode (d) (2pt) variance 2. The RV Y can take values { 1 , 2 , 3 , 4 } . Suppose you’re presented with the following (in- complete) probability mass function (pmf): Y 1 2 3 4 Pr( Y = y ) ? .4 .1 .1 (a) (1pt) What must Pr( Y = 1) be in order for the above to be a proper pmf? (b) (2pt) Calculate E ( Y ). (c) (2pt) Calculate V ( Y ). 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. In a recent poll conducted by PEW, respondents were asked if Dua Lipa’s song “Houdini” would sound better if it was instead sung by Lady Gaga. Of the n = 700 people randomly sampled for the survey, 57% said that a Lady Gaga version would sound better. (a) (3pt) Construct a 95% confidence interval for the proportion supporting a Lady Gaga version of “Houdini.” (b) (1pt) What is the margin of error for the PEW poll? (c) (2pt) Suppose PEW wanted to guarantee a 1% margin of error for a 97% CI. How many people would they need to survey? 4. In a Nov 21 survey by Bright Line Watch, randomly sampled respondents were asked if they supported a ban on abortion. Below is a cross-tabulation of whether the respondent self-identified as ideologically Conservative (No, Yes) and their support for an abortion ban (No, Yes). Abortion Ban? No Yes Conservative? No 1831 263 Yes 425 477 (a) (1pt) What proportion of the respondents supported an abortion ban? 7
(b) (1pt) Among nonconservatives, what proportion supported an abortion ban? (c) (1pt) Among conservatives, what proportion supported an abortion ban? (d) (1pt) What is the di erence in proportions supporting an abortion ban for these two groups (conservatives vs nonconservatives)? Which group supports an abortion ban in higher proportion? (e) (4pts) Test the hypothesis that there is no di erence between the two groups in the proportion supporting an abortion ban. Formally state (write down) the null and alternative hypotheses. Calculate an appropriate test statistic. Calculate the p-value for the test statistic. Would you reject the null hypothesis (no di erence) at the = . 05 level of significance? 8
Additional page if needed 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Data Analysis (16 points total) In the final exam folder on Blackboard, you will find the dataset south60 2023.rdata. Each row in the data represents a county in a southern US state in 1960. The dependent variable blackregis is the county-level percentage (0-100) of eligible Black voters that were regis- tered to vote in 1960. There are three explanatory variables: whiteschoolyrs Median years of education among White residents of the county. nonwhiteschoolyrs Median years of education among non-White residents of the county. polltax Whether the county required residents to pay a fee to vote: yes=1, no=0. Use this dataset for the following problems. Submit your answers to these questions as both (a) an R script and (b) as a compiled pdf of the R script. Unless otherwise stated in a question, you must show the R code and its output in order to receive full credit. 1. Descriptive statistics (a) (1pt) Create a stargazer table of descriptive statistics for the variables in the dataset. (You can ignore the state and county identifiers.) Refer to the stargazer table to answer the next two questions. (b) (1pt) What is the average Black voter registration in southern counties in 1960? (c) (1pt) What proportion of counties have a poll tax? 2. (2pt) Calculate “by hand” the correlation between nonwhiteschoolyrs and whiteschoolyrs . Interpret the correlation. Is it a strong, moderate, or weak correlation? Positive or neg- ative? 10
3. Suppose you’re interested in whether a county’s Black voter registration is related to the level of education among the county’s White residents. (a) (2pt) Consider the bivariate regression blackregis = β 0 + β 1 whiteschoolyrs + Calculate the OLS estimates for β 0 and β 1 “by hand” – i.e., in R, but without using a command like lm(). Confirm your results using lm(). (b) (2pt) Create a scatterplot of blackregis versus whiteschoolyrs . Add the lm() regression line (in red) to the plot. Is a county’s Black voter registration positively or negatively associated with median years of education among White residents? (c) (1pt) Calculate the predicted Black voter registration for a county where whiteschoolyrs is 10 years. 4. Now consider the multiple regression blackregis = β 0 + β 1 whiteschoolyrs + β 2 nonwhiteschoolyrs + β 3 polltax + (a) (2pt) Use lm() to estimate the regression. Print the OLS estimates. (b) (2pt) For each of the regressors, interpret (in words) the expected change in Black voter registration given a 1-unit increase in the regressor, holding the other regressors constant. (c) (2pt) Calculate “by hand” the coe ffi cient of multiple determination for this regression and interpret it. 11