2023F_322_Problem Set 2

pdf

School

Rowan University *

*We aren’t endorsed by this school

Course

322

Subject

Economics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by bb055fisher

Econ 322 - Fall 2023 Problem Set 2 Due October 6th, 2023, 8 pm Please answer each of the questions below. Note that writing only a numeric answer to the question is not enough to receive full credit unless otherwise stated. Please upload your R (or other language) code along with your answers. Total: 60 points. Question 1: Sampling Distributions in R (20 Points) Suppose that we are interested in studying commuting patterns at Rutgers University-New Brunswick. As a first step, we want to better understand the distribution of distances that Rutgers students need to travel to come to class. For this purpose, we gathered a dataset containing the distances (in miles, which we denote by D ) traveled by every Rutgers student, which you can find in the dataset “ rutgers distances.csv ”. Although unrealistic, assume that the total population of Rutgers students is 1,000 (that is, the dataset contains the full population of interest). In this question, we will get started using R by computing basic summary statistics (means and variances) and sampling from the population distribution (i.e., all 1,000 Rutgers students). This exercise will be useful to revisit some of the key concepts seen in lectures and, hopefully, provide some practical intuition on the consequences of using smaller vs larger samples. Answer the following questions: 1. (6 points) Compute the population mean of D . Compute the variance of D . Plot a histogram of D . Round your answers to three decimal places. No need to explain how you did it. 2. (5 points) Compute the sample mean for the sample containing the 10 first observations in the dataset. Note that, since the order of the students in the dataset is random, this is equivalent to drawing a random sample of 10 observations. Do the same for the first 25 and the first 300 observations. Which sample mean is closer to the population mean in part 1? Why? Round your answers to three decimal places. 3. (9 points) In this part, we will provide some evidence in favor of the Central Limit Theorem. Recall that, in lecture, we said that if we drew s random samples from the population and computed the sample mean for each of these s samples, the probability distribution of this sample mean could be approximated by a normal distribution with mean µ D and variance σ 2 D /n for a large enough sample size n (sample size refers to the number of observations in each random sample). Answer the questions below: 1

(a) Draw 500 random samples of 2 observations from the dataset and compute the sample mean for each of these 500 samples (no need to report each of these separately!). Report the average sample mean across the 500 samples and plot the histogram of the sample means. Round your numeric answer to three decimal places. (b) Draw 500 random samples of 250 observations from the dataset and compute the sample mean for each of these 500 samples (no need to report each of these separately!). Report the average sample mean across the 500 samples and plot the histogram of the sample means. Round your numeric answer to three decimal places. (c) Does the histogram in (a) look like a normal distribution? Does the histogram in (b) look like a normal distribution? Why? Hint: You can draw random samples using the sample function in R . E.g., sample(mydata $ d, size = 2, replace = T) draws a random sample of 2 observations from column d in the data frame mydata . To do this 500 times and automatically report the means for each of the 500 times as part (a) asks, you can use the function replicate as follows: replicate(500, mean(sample(mydata $ d, size = 2, replace = TRUE))) . Question 2: Testing for Housing Affordability (40 Points) Housing affordability is one of the most pressing issues in the United States. Recently, an article in the New York Times stated that the average US renter household spent 28.5 percent of their income on rent in 2021; in 2022, this figure jumped up to 30 percent. For reference, the US De- partment of Housing and Urban Development (HUD) considers that a household is rent-burdened if they pay 30 percent or more of their income towards housing. In this question, we will study the issue of housing affordability by applying our statistical skills to the context of Jersey City, NJ. According to a recent study, Jersey City is the second most ex- pensive city in the United States for renting a unit. Thus, this city is also facing a major challenge in keeping its housing stock affordable. We will examine housing affordability in Jersey City using the dataset attached to this problem set, “ acs jersey city.csv ”. This dataset contains a sample of Jersey City households participat- ing in the American Community Survey (ACS) by the Census in 2021. Every year, the Census Bureau randomly surveys 3.5 million US households (about 3% of the U.S. population) and pub- lishes a de-identified version of their data so researchers can work with them without revealing personally identifiable information (known as PII). I restricted this dataset to only feature the head of the household of families living in Jersey City in the 3% random sample and included some of their demographic and economic characteristics. Below is a list with variable descriptions: • pid : Person identifier (anonymized) • female : Dummy variable indicating whether the head of the household is female (1 = female, 0 = male) • age : Age of the head of the household • us citizen : Dummy variable indicating whether the head of the household is a US citizen (1 = US citizen, 0 = non-US citizen) Page 2

• high school : Head of the household has high-school diploma (1 = high-school diploma, 0 = no high-school diploma) • bachelor : Head of the household has a bachelor’s degree (1 = bachelor’s degree, 0 = no bachelor’s degree) • employed : Head of the household was employed last year • wage hourly : Hourly wage of head of the household if employed last year (in dollars) • income : Household income during the past year (in dollars) • renter : Dummy variable indicating whether the household rents their unit (1 = renter, 0 = homeowner) • rent : Monthly rent paid by the household (in dollars) Using the dataset, please answer the questions below: 1. (4 points) What is the percentage of households who rent their unit? Conditional on having positive income ( income > 0 ), are renter households higher-income on average than homeown- ers? No need to do any hypothesis testing and no need to explain the numbers here. Round your answers to three decimal places. For the rest of the questions, restrict the sample to only renter households that have positive income (i.e., renter == 1 and income > 0 ). 2. (6 points) Create a new variable, named rent to income ratio , that is equal to the percentage of annual income that a renter household spends on rent during a year. Plot a histogram of this variable. What is the sample mean of rent to income ratio ? And the sample variance? Round your answers to three decimal places. Hint: How do you need to modify your variables to get a number for the whole year? Don’t forget to express the ratio in percentage terms! 3. (6 points) We are interested in comparing the rent-to-income ratio that you computed in the previous question with the NY Times nationwide number for 2021 (28.5 percent). Was your estimate for Jersey City larger or smaller than the national average? Can you reject that Jersey City’s mean rent-to-income ratio is equal to 28.5 percent at the 5% level of statistical significance? And at the 10% level? Make sure to define your null and alternative hypotheses. 4. (6 points) Can you reject the null hypothesis that the average Jersey City renter household is rent-burdened ( rent to income ratio = 30) at the 5% significance level? Test this null hypothesis against the alternative hypothesis that rent to income ratio < 30. 5. (6 points) Only for this part, suppose that we only had the data for individuals with a person identifier satisfying the condition pid < 200 (assume that the pid number is randomly as- signed, so this is a random subsample). Can you still reject the null hypothesis in the previous part (against the same alternative hypothesis) at the 5% significance level? Explain why. 6. (6 points) What is the difference in rent-to-income ratio between individuals who have a high school diploma and individuals who do not have a high school diploma? Is the difference significantly different from zero at the 5% significance level? Round your answer to two decimal places. Page 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

7. (6 points) It is usually the case that lower-income families spend a higher share of their income towards rent. For individuals who are employed, generate a scatter plot of their rent-to-income ratio ( y -axis) against their hourly wages ( x -axis). Compute their correlation. Round your answer to three decimal places. Are they positively correlated, negatively correlated or not correlated at all? Is the result consistent with the first statement in this part? Page 4

2023F_322_Problem Set 2

Related Documents