SA3_Solution_Econ140

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

140

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by LieutenantWorld12843

Report
Section Assignment - Econ 140 Spring 2024 WEEK 3: Causality and introduction to OLS Exercises Question 1: Potential outcomes framework a) Define the Average Treatment Effect (ATE) and the average treatment effect on the treated (ATT) using the potential outcomes notation. Solution: Let’s first define some notation. T i is an indicator variable that takes value one if the individual i was assigned the treatment and value zero otherwise. Y i ( 1 ) is the potential outcome of individual i if she is treated, what her outcome would be if she is received the treatment. Y i ( 0 ) is the potential outcome of individual i if she is not treated, what her outcome would be if she didn’t receive the treatment. Note that for any given individual we will only be able to observe them in at most one of the two states (treated or untreated), so we won’t be able to directly observe both of their potential outcomes. This is the origin of most difficulties in causal inference. Then, the individual treatment effect for agent i is defined as Y i ( 1 ) Y i ( 0 ) . And the average treatment effect is the expected value of the individual treatment effects. That is, ATE = E [ Y i ( 1 ) Y 1 ( 0 )] . The ATT is the same thing but conditioning on being treated: ATT = E [ Y i ( 1 ) Y 1 ( 0 ) | T i = 1 ] . Note that the ATT can be different from the ATE because the treated individuals need not be a random sample of the whole population. b) You want to estimate the effect of owning an iPad on grades in Econ 140. You compare the average grades of students with iPads to the average of students without iPads. Does this comparison (a difference in means comparison) allow you to find the ATE or the ATT? Why or why not? Solution: Denote with Y the grade and with T the iPad ownership status, and keep the rest of the notation from above. Then, let’s develop that difference in means to see what we get: We start with: E [ Y i | T i = 1 ] E [ Y i | T i = 0 ] For the individuals with an iPad their expected grade is the potential outcome of having an iPad, and for those without an iPad their expected grade is the potential outcome of not having an iPad. So we can rewrite the above as: E [ Y i ( 1 ) | T i = 1 ] E [ Y i ( 0 ) | T i = 0 ] Add and subtract E [ Y i ( 0 ) | T i = 1 ] E [ Y i ( 1 ) | T i = 1 ] E [ Y i ( 0 ) | T i = 1 ] + E [ Y i ( 0 ) | T i = 1 ] E [ Y i ( 0 ) | T i = 0 ] Using properties of expectations we get that the above is equal to: E [ Y i ( 1 ) Y 1 ( 0 ) | T i = 1 ] + E [ Y i ( 0 ) | T i = 1 ] E [ Y i ( 0 ) | T i = 0 ] In the first term we got the ATT. But we have something else, we didn’t get just that. The remaining two terms are what we call selection bias, it arises from the fact that we started comparing individuals that might be inherently different (we should ask ourselves why some own an iPad and some don’t to begin with). The difference between the second and the third term (i.e., the selection bias) quantifies the difference in the expected outcome of not being treated for the individuals that were treated and the individuals that were not treated. 1
c) Can we introduce additional assumptions that allow us to get the ATE/ATT from a difference-in-means comparison? Can you think of a case in which these assumptions hold in our example? Solution: If we assume that the selection bias is zero then we would obviously get the ATT. Formally, if the treatment status is independent from the potential outcomes of the individuals, then the conditional expected values are the same as the unconditional. The selection bias is zero, so we get the ATT from that difference in means. In this particular case, the ATT would also be equal to the ATE because of the independence assumption: E [ Y i ( 1 ) Y 1 ( 0 ) | T i = 1 ] = E [ Y i ( 1 ) Y 1 ( 0 )] . 2
Question 2: Deriving the OLS estimator Consider the standard linear regression model with two regressors x 1 and x 2 : y i = β 0 + β 1 x 1 i + β 2 x 2 i + u i a) Precisely interpret β 0 and β 1 . Solution: β 0 is the expected value of y given that x 1 and x 2 are equal to zero. An increase in x 1 by one unit, keeping x 2 fixed, is associated with an increase in y of β 1 units. b) What is the difference between ˆ β 1 and β 1 ? Solution: β 1 denotes the association between x 1 and y in the whole population of interest. It is a fixed number that does not vary. ˆ β 1 is an estimator for this parameter. It is a random variable with the typical properties of random variables (i.e., it has an expectation, a variance, a distribution, etc.) c) Consider now the case of a regression model with a single regressor. y i = β 0 + β 1 x i + u i The OLS estimator { ˆ β 0 , ˆ β 1 } minimizes the sum of the squared residuals êçæêôæ N i = 1 ˆ u 2 i . Derive it. Hints: 1. Your optimization problem is: min ˆ β 0 , ˆ β 1 êçæêôæ N i = 1 ( y i ˆ β 0 ˆ β 1 x i ) 2 . 2. First solve for ˆ β 0 and then plug in your solution to solve for ˆ β 1 . 3. In the last section, we have seen that V ( X ) = E ( X E [ X ]) 2 = E X 2 E [ X ] 2 . The same is true for the sample variance (where we replace expectations with averages), i.e. 1 N êçæêôæ N i = 1 ( x i ¯ x ) 2 = 1 N êçæêôæ N i = 1 ( x 2 i ) ¯ x 2 . The same is also true for covariances: cov ( X , Y ) = E [( X E [ X ])( Y E [ Y ])] = E [ XY ] E [ X ] E [ Y ] and therefore also for the sample covariance 1 N êçæêôæ N i = 1 ( x i ¯ x ) ( y i ¯ y ) = 1 N êçæêôæ N i = 1 x i y i ¯ x ¯ y . Solution: We start from the function we need to minimize: min ˆ β 0 , ˆ β 1 N êçæêôæ i = 1 ( y i ˆ β 0 ˆ β 1 x i ) 2 . Finding the minimum of this function just means we take the derivative and set it equal to zero. Since we have two unknown variables in this problem, we have two first-order conditions (FOCs): ˆ β 0 = N êçæêôæ i = 1 2 ( y i ˆ β 0 ˆ β 1 x i ) = 0 and, ˆ β 1 = N êçæêôæ i = 1 2 x i ( y i ˆ β 0 ˆ β 1 x i ) = 0 Let us start with the first FOC – it looks easier to work with. We can remove the 2. Then, we distribute the sum (it is a linear operator!) and put the term involving ˆ β 0 on one side, keeping everything else on the other side. This gives us: N êçæêôæ i = 1 ˆ β 0 = N êçæêôæ i = 1 y i N êçæêôæ i = 1 ˆ β 1 x i Summing N times over the same number c is simply N · c . Summing up N values y i is equal to N · ¯ y i . Using this, we get N ˆ β 0 = N ¯ y N ˆ β 1 ¯ x . 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Dividing both sides by N gives us an intermediate solution: ˆ β 0 = ¯ y ˆ β 1 ¯ x . For the next step, we solve for ˆ β 1 . We again get rid of the 2 and then multiply the x i through, giving us N êçæêôæ i = 1 x i y i ˆ β 0 x i ˆ β 1 x 2 i = 0 . The next step is important . We substitute in our intermediate result for ˆ β 0 and get: N êçæêôæ i = 1 x i y i ( ¯ y ˆ β 1 ¯ x ) x i ˆ β 1 x 2 i = 0 We now distribute the sum and get N êçæêôæ i = 1 x i y i ¯ y N êçæêôæ i = 1 x i + ˆ β 1 ¯ x N êçæêôæ i = 1 x i ˆ β 1 N êçæêôæ i = 1 x 2 i = 0. To simplify, we again use that êçæêôæ N i = 1 y i = N ¯ y and êçæêôæ N i = 1 x i = N ¯ x . We also put the ˆ β 1 on the left side. We then get: ˆ β 1 = êçæêôæ N i = 1 x i y i N ¯ x ¯ y êçæêôæ N i = 1 x 2 i N ¯ x 2 . Almost there. We now use the third hint to rewrite this expression. As seen, êçæêôæ N i = 1 ( x i ¯ x ) ( y i ¯ y ) = êçæêôæ N i = 1 x i y i N ¯ x ¯ y . In addition, êçæêôæ N i = 1 ( x i ¯ x ) 2 = êçæêôæ N i = 1 x 2 i N ¯ x 2 . We substitute these two properties in get the solution for the OLS estimator: ˆ β 1 = êçæêôæ N i = 1 ( x i ¯ x ) ( y i ¯ y ) êçæêôæ N i = 1 ( x i ¯ x ) 2 = ( 1/ N ) êçæêôæ N i = 1 ( x i ¯ x ) ( y i ¯ y ) ( 1/ N ) êçæêôæ N i = 1 ( x i ¯ x ) 2 = sCov ( x , y ) sVar ( x ) . In this expression, sCov ( x , y ) is the sample covariance between x and y , and sVar ( x ) is the sample variance of x . d) Interpret the expression you obtained: What will happen to ˆ β 1 if (1) the sample variance of x increases, (2) the sample variance of x gets close to zero, (3) the sample covariance between x and y increases? Solution: Note that in the numerator of the estimator we have the sample covariance between x and y , and in the denominator we have the sample variance of x . (1) If the sample variance of x (which is necessarily non-negative) increases we are dividing by a larger number and ˆ β 1 will then get closer to zero. (2) If the sample variance of x decreases then the opposite happened. (3) It is more useful to think about the absolute value of the sample covariance between x and y . If that absolute value increses then the absolute value of the estimator goes up. The estimator increases in that case if that covariance was positive, and turns out to be even more negative if that covariance was negative. 4
Question 3: Dummy variables regression (Also refer to the R markdown document in the solutions folder). a) Load the dataset mexico.csv into RStudio. library("dplyr") # Load dataset getwd() setwd("/your_directory") # Load dataset mexico_data <- read.csv("Mexico.csv") b) What is the average monthly income ( inc_m ) for people who speak an indigenous language ( ind_lang=1 )? What is the average income for those who don’t? # Get means and difference in means # Using tidyR syntax mexico_data %>% group_by(ind_lang) %>% summarize(mean(inc_m)) # Using base R syntax mean1 = mean(mexico_data$inc_m[mexico_data$ind_lang==1]) mean0 = mean(mexico_data$inc_m[mexico_data$ind_lang==0]) mean_diff = mean1 - mean0 mean_diff c) Run and interpret the regression of monthly income inc_m on ind_lang using the lm() command. ols_results = summary(lm(inc_m ˜ ind_lang, data=mexico_data)) ols_results d) How do your answers in b) and c) relate to each other? # Compare mean difference to OLS coefficient ols_results$coefficients[2,1] mean_diff We see that the OLS regression is very useful to summarize the data. The constant/intercept gives the average monthly income in the group where ind_lang is zero, and the coefficient on ind_lang gives the mean difference between ind_lang==1 and ind_lang==0. This is always true if we run a regression with one dummy variable on the right hand side. It is also true when we run a regression with multiple dummy variables, as long as we put in enough regressors to describe all the categories present in the data. 5
Question 4: Quadratic terms in regressions: wages over the life-cycle You are interested in the relation between age and yearly wages. What do you think this relationship may look like? Will wages increase in age or become smaller as people become older? a) Open the dataset wages.csv in RStudio. This dataset contains the following variables: male (an indicator whether an individual has identified as male or not), education (the years of education of that individual), videogames (the number of hours spent playing video games per week during childhood), wage_monthly (monthly income), wage_hourly (hourly income), age (age in years), and wage_yearly (annual income). The dataset is entirely fictional. Solution Code: read . csv ( f i l e ="wages . csv " ) b) Plot yearly wages ( wage_yearly ) against age. What do you observe? Would it make sense to run a simple linear regression of wages against age? Solution Code: plot ( dataset$age , dataset$wage_yearly ) The plot clearly shows a non-linear relationship between age and wages. Wages seem to increase with age, but this relationship becomes flatter and seems to become even negative after around 55 years. Note: There are several ways to plot a relationship between two variables in R . The easiest is to use the plot() function. If you are interested in more evolved data visualization methods, you can look into the ggplot2() function or ask your GSI. c) Use the lm() command to implement the linear regression model: wage i = α + β 0 age i and interpret the coefficients you get from the regression. Also interpret the significance of the coefficients using the standard error, the t-statistic, and the p-value. Solution Code: linear_regression = summary(lm( wage_yearly ~ age , data=dataset ) ) print ( linear_regression ) The constant term (or intercept) is around 46,200: For a hypothetical individual with an age of zero, we would expect an annual income of 46,000USD based on this regression. In this case, the intercept alone does not make much sense. We can do inference on the constant by comparing the coefficient to the standard error to get the value of the t-statistic. The t-statistic is calculated as: t = ˆ β β 0 se ( ˆ β ) Let us test the null hypothesis that α = 0. We divide 46,193 by 223.32 and get t = 207. We see that the t- statistic exceeds the value 2 in absolute value, and can therefore say that the intercept is significantly different from zero at the 95% confidence level (in fact, the t-statistic is so large that the intercept is also significant at the 99.999% confidence level). We therefore reject the null hypothesis. We can also do this much quicker by looking at the p-value ( Pr ( > | t | ) ): In this case, it is less than 2 × 10 16 , so we can reject the null hypothesis. We would fail to reject it if p > 0.05. What about the coefficient on age? In this dataset, one additional life-year is associated with 143.6USD higher annual income, on average. Dividing this estimate by its standard error (4.52), we get a t-statistic of 31.73, and so the coefficient on age is also significantly different from zero at the 95% confidence level. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Say your professor claims that "across hundreds of studies, we have seen that an additional year of age is associated with 140USD of additional earnings". Can you reject this claim? To see this, let us calculate the t-statistic for the null hypothesis H 0 : β 0 = 140. t = ˆ β 0 β 0 se ( ˆ β 0 ) = 143.6 140 4.5 0.79 Since | 0.79 | < | 2 | , we fail to reject the null hypothesis. d) Use the lm() command to implement the linear regression model: wage i = α + β 0 age i + β 1 age 2 i For an increase in age i by one year, what is the associated change in wages? How is this different than in the simple linear regression model? Hint: Take the derivative of wages with respect to age. Solution Code: # We use the function I ( ) to create quadratic terms and i n t e r a c t i o n s . quadratic_regression = lm( wage_yearly ~ age + I ( age ^2) , data=dataset ) summary( quadratic_regression ) In a simple linear regression, the "effect" of age on wages is constant : No matter how old a person is, getting one year older is on average associated with USD143 higher earnings. This is different in a quadratic regression (and in many other models, such as interaction models and logarithmic specifications). To see this, we can just take a derivative of the regression function: d wage i d age i = β 0 + 2 β 1 age i The derivative changes with age! Plugging in our coefficients, we get: d wage i d age i = 962 + 2 · ( 8.6 ) · age i = 962 17.2 · age i One thing that is very neat is that we can also calculate the "tipping point": The point after which an addi- tional year of age has a negative "effect" on earnings. Setting d wage i d age i = 0 0 = β 0 + 2 β 1 age i age i = β 0 2 β 1 = 962 17.2 56 We see that after an age of 56 years, peoples earnings decrease with every additional year. We can also see this graphically. e) Plot the marginal "effect" of age on wages using the cplot command. Solution Code: # We can use cplot to create two types of plots : # 1 . Predict wages by age ( the estimated regression function ) cplot ( quadratic_regression , " age " , what = " pred " , main = " Predicted yearly wages , by age " # 2 . The marginal " e f f e c t " of age ( the derivative of the estimated regression function ) : cplot ( quadratic_regression , " age " , what = " e f f e c t " , main = " Average Marginal E f f e c t of ag 7