Midterm exam Fall 2020 solutions

.pdf

School

University of Nebraska Medical Center *

*We aren’t endorsed by this school

Course

211

Subject

Economics

Date

Jun 4, 2024

Type

pdf

Pages

6

Uploaded by CorporalPencil10038

MIDTERM EXAM: SOLUTIONS Econ 409 Due 8:00am CDT (Lincoln Time), October 24, 2020 NOTE ABOUT THESE SOLUTIONS: More than 1 version of the exam was distributed. In addition, datasets were not identical. Therefore, there is no single set of correct answers. However, all versions contained similar questions, so these answers should be informative about your performance. READ THESE DIRECTIONS CAREFULLY: This is an open-book, take-home exam. You may consult your textbook, the slides, the RStudio notes, etc. You may not work with your classmates or anyone else. The exam is to be completed on your own. As outlined in the syllabus, evidence of cheating or other academic dishonesty will result in a grade of 0 at a minimum. You will submit your answers via Canvas. Your submission must include: 1.) A typed sheet of your answers; 2.) Your R code. The R code may be copied and pasted into a Word document or other text document. If you did this correctly, I should be able to load the data and then run your code on my own machine to replicate your answers. Producing replicable code (meaning I can run it) is worth 10 out of the 100 total points. The exam will be posted to Canvas at 8:00am CDT, Thursday, October 22. It is due by 8:00am CDT on Saturday, October 24. Late submissions will automatically lose 25% of available points at 8:01am CDT on October 24; 50% of available points at 9:01am, 75% of available points at 10:01am, and all available points at 11:01am or after. The exam is written to take no more than one class period (75 minutes). However, you have 48 hours to complete it. Therefore, please keep in mind that entirely avoidable situations for example, Internet outages that occur at 10:00pm on October 23 are not considered valid excuses for late submissions. Good luck! Part A: Data Analysis You have been provided a dataset. A description of the variables is below. (This data has been generated using statistical software; it does not represent real measurements.) Import it into R. Then briefly answer each question below. Be sure to save a script file with any R code you need to answer these questions. Variables: ability = score on an IQ test race = reported race (white or nonwhite) earnings = annual earnings birthplace = state of birth 1.) How many observations are there? How many variables? Be sure to specify which is the number of observations and which is the number of rows. nrow(data) ncol(data) 2.) What kind of dataset is this: Time series, cross-sectional, or panel? Briefly defend your answer in one or two sentences.
Cross-sectional. These datasets show multiple individuals or countries in a snapshot in time. 3.) Produce a five-number summary of each quantitative variable. fivenum(data$variable) 4.) For each quantitative variable, tell me whether it is symmetric, skewed left, or skewed right? Write a sentence or two to defend your answer for each variable. There is no single right way to answer this question, so points were given for defensible answers. You could use a histogram, calculate the skewness, or use some other sample statistics. 5.) The proper choice of data visualization technique depends in part on whether a variable is categorical or quantitative. Using an appropriate visualization technique, produce a graph showing the distribution of EARNINGS . Write a sentence or two explaining why this is a proper choice of visualization technique. hist(data$earnings) (Other answers are possible, such as a dotplot or a boxplot) 6.) The proper choice of data visualization technique depends in part on whether a variable is categorical or quantitative. Using an appropriate visualization technique, produce a graph showing the distribution of BIRTHPLACE . Write a sentence or two explaining why this is a proper choice of visualization technique. barplot(data$birthplace) (Other answers are possible, such as a pie chart) 7.) Assume your dataset is representative of a broader population. What is the 95% confidence interval for the population mean of ABILITY ? Some simple code to evaluate this: Lower.bound <- mean(data$ability) - 1.96*sd(data$ability)/sqrt(length(ability)) Upper.bound <- mean(data$ability) + 1.96*sd(data$ability)/sqrt(length(ability)) 8.) What is the correlation coefficient for EARNINGS and ABILITY ? Then, in words, provide an intuitive explanation of what this value of the correlation coefficient tells us about the relationship between these two variables. cor(data$earnings,data$ability) The correlation coefficient gives us the strength and direction of a linear relationship between two quantitative variables. 9.) Estimate the simple regression with EARNINGS as the dependent variable and ABILITY as the independent variable. Report the slope coefficient (b1) from this regression. Then, in words,
provide an intuitive explanation of what this value of the slope coefficient tells us about the relationship between these two variables. lm(earnings ~ ability,data=data) The slope coefficient tells us how much earnings changes when ability changes by 1 unit. It tells us about the statistical association between these variables, but not what causes what. 10.) What is the predicted value of EARNINGS for an observation with ABILITY = 150? Provide at least one reason you might want to be cautious about trusting this estimate of the EARNINGS of an individual with this level of ABILITY . There are several ways to do this, but you must calculate b0 + b1*150, where b0 is the estimated intercept from your regression and b1 is the estimated slope coefficient. The best reason not to trust this predicted value is that it is an extrapolation: 150 is far outside the range of ability in your dataset. 11.) Construct a scatterplot of EARNINGS and ABILITY . Then describe the relationship between these variables in a sentence or two. Does the relationship you see in the scatterplot match the relationship implied by the regression and correlation coefficient? plot(data$earnings,data$ability) There was a curvilinear relationship. This suggests that the correlation coefficient and regression coefficient do not fully characterize the relationship between the two variables. They are still valid estimates of the linear relationship between these variables, however. 12.) In the population, for every person born in South Dakota, there are 4 people born in Kansas, 4 born in Iowa, 2 born in Nebraska, and 5 born in Colorado. Is this dataset representative of the population? How, if at all, does the answer affect your analysis of this data? Explain briefly. This population distribution of birthplace is very different from the distribution you observed in your sample. This suggests you should be cautious about extrapolating the results to the broader population of these states. The reason is that we can only use data to make inferences about a broader population if it is representative of that population. 13.) What is the interquartile range of ABILITY ? If two peop le’s ABILITY differs by an amount equal to the interquartile range, what is the expected difference in their EARNINGS ? Use your regression results to find an answer to the second question. There are a few ways to calculate this, but here is one: iqr <- quantile(data$ability,0.75) quantile(data$ability,0.25) b1 * iqr 14.) Conduct a formal hypothesis test of whether EARNINGS is different for the two groups defined by RACE . Be sure to lay out each of the 5 steps in our hypothesis testing procedure. Step 1: The null hypothesis is 𝜇 1 = 𝜇 0 . The alternative is 𝜇 1 ≠ 𝜇 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help