SA4_Econ140_Solution

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

140

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

9

Uploaded by LieutenantWorld12843

Report
Section Assignment - Econ 140 Spring 2023 WEEK 4: OLS Regression, Functional Forms, Bias in Regression Exercises Question 1: Omitted variable bias Consider the following long regression: y i = β 0 + β 1 x i + β 2 z i + u i The following short regression: y i = α 0 + α 1 x i + ϵ i And the auxiliary regression: z i = γ 0 + γ 1 x i + e i Prove that α 1 = β 1 + β 2 γ 1 and interpret. Solution: Let’s back out for a second and think about why we could be interested in estimating β 1 . (To understand the what comes next it might help to imagine you are working with observational data on wages and education.) Many times we are trying to identify causal relations. We think of causality using ceteris paribus: what would happen to y if we altered x but we kept all the other things constant? When we include the other variables as controls in the regression we can interpret the coefficient on x as the relation between y and x keeping the controls fixed, what we wanted to get. But if we don’t include them in the regression we can’t make that interpretation in general. What would the coefficient capture in that case? Substitute the expression of z i coming from the auxiliary regression into the long: y i = β 0 + β 1 x i + β 2 ( γ 0 + γ 1 x i + e i ) + u i = β 0 + β 1 x i + β 2 γ 0 + β 2 γ 1 x i + β 2 e i + u i = ( β 0 + β 2 γ 0 ) + ( β 1 + β 2 γ 1 ) x i + ( β 2 e i + u i ) So the coefficient on x i when we consider the short regression ( α 1 ) is β 1 + β 2 γ 1 . We call β 2 γ 1 omitted variable bias (OVB). α 1 captures the “effect” (to be precise: this is not really a causal effect unless we make additional assumptions. Rather, it is the statistical association) of x on y but also the “effect” of z on y , scaled by the association between z and x . This comes from the fact that when we observe higher values of x the values of z might be moving, too (e.g., education usually correlates positively with parental wealth, so if we grab someone with more education it is likely than their parents were also wealthier), so if we do not control for it we will not be able to interpret β 1 in a causal way (e.g., do the higher wages for the person with more education come from the additional education of from the fact that they had richer parents?). A way of thinking of this is that when we are trying to identify causality we are thinking about the coefficient of x from a regression that includes all the relevant predetermined variables (the “longest” regression if I may). When would it be non-problematic to omit variables – when is there no OVB? There are two cases: β 2 = 0: the omitted variable was not really relevant to begin with. γ 1 = 0: x is uncorrelated with the omitted variable. (There’s also a third case: when the omitted variable is not predetermined and may itself be an outcome of x . These are usually called bad controls, mediators or mechanisms, but more on this later. Maybe.) This is why when trying to identify causality we always have to think if there’s anything relevant in the error term (and remember, anything not explicitly included in the regression is in the error term) that’s correlated with one of the included variables. This also shows why RCTs are great, if I assign x randomly then it is expected to be uncorrelated with all the predetermined variables! 1
Question 2: OLS and Measurement error Recall that the OLS estimator in a bivariate regression of Y on X is equal to Cov ( x i , y i ) Var ( x i ) . Also note that Cov ( a + bX , Y ) = Cov ( a , Y ) + Cov ( bX , Y ) = b Cov ( X , Y ) . Recall that Var ( X + Y ) = Var ( X ) + Var ( Y ) + 2Cov ( X , Y ) . You want to estimate the effect of life satisfaction L on life expectancy Y , and you believe that the two variables are related as follows: Y i = β 0 + β 1 L i + e i (1) You manage to find out the life expectancy of a sample of 1,000 individuals. Unfortunately, you cannot observe their life satisfaction L , and so you run a survey, ask them how satisfied they are with their life, and record their answer as L . As it turns out, people are present-biased: When asked about their life satisfaction, they are influenced by random events that happened on that day – maybe they just learned something cool in their econometrics class (making them report higher life satisfaction), or their favorite sports team just lost (making them report lower life satisfaction). Therefore, you think that the reported life satisfaction L is equal to: L i = L i + v i (2) , where v i is a random error term that is fully independent of L i and Y i . You think of running the following regression specification to estimate your model: Y i = α 0 + α 1 L i + u i (3) a) Can you think of other reasons why a variable may be mismeasured in the data? Solution: There a many potential reasons for measurement error. We generally classify them to random measurement error and non-random (or systematic) measurement error Examples for random measurement error are: Physical constraints (e.g., a thermometer will never be 100% accurate), rounding (people do not report their precise salaries, but a round number), random noise (for some census data, the census bureau has started adding random numbers to preserve anonymity), random errors (when I ask people about their SATs, some will just get it wrong, but on average people will report the correct number) Non-random measurement error is also very common and has bigger issues. Examples for this are: People systematically misreporting (for example, rich people are less likely to truthfully report their wealth, autocratic countries systematically over-report their growth estimates), measurement difficulties (GDP in poorer countries is less precisely estimated than in richer countries), and many more. b) Will you (on average) get the effect you want – β 1 – if you run this regression? Hints: Use the covariance- over-variance formula for the OLS estimator. Plug in what you know about L i and L i from equation (2). Your final expression should be related to the OLS estimator for equation (1). You can use the fact that Covariance is a linear operator and Var ( A + B ) = Var ( A ) + Var ( B ) + 2 Cov ( A , B ) . No, you will not (on average) get the effect you want, β 1 , if you run this regression. The OLS estimator for α 1 will be equal to: Solution: ˆ α 1 = Cov ( y i , L i ) Var ( L i ) = Cov ( y i , L i + v i ) Var ( L i + v i ) Use linearity of Covariance = Cov ( y i , L i ) + Cov ( y i , v i ) Var ( L i ) + Var ( v i ) + 2 Cov ( L i , v i ) = Cov ( y i , L i ) Var ( L i ) + Var ( v i ) | {z } 0 Hence, | ˆ α 1 | ≤ ˆ β 1 = Cov ( y i , L i ) Var ( L i ) c) What does this tell you about the effect of measurement error on your regression? 2
Solution: We see that measurement error leads to a systematic problem in the regression. Whenever we have measurement error (Var ( v i ) > 0), the estimated coefficient from the regression is closer to zero than the true coefficient. We call this attenuation bias . When we have a regression with this type of measurement error (random measurement error in the independent variable), we know that the true coefficient will be at least as large as the one we estimated. d) Creative question : Can you think of ways to reduce measurement error in this example? Solution: One could ask people a different question, for example to disregard the last week. It is also possible to ask many questions related to life satisfaction and create an average over those questions to get a more precise estimate for their actual life satisfaction. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 3: Logs Figure 1: Hint: You can use this table as a cheatsheet (Source: Wooldridge (2016) a) You are interested in estimating the relationship between campaign spending and election results. You collect data on and you run a regression of voteA , the share (from 0 to 100) of total votes that candidate A receives, on shareA , share of the total campaign spending (from 0 to 100) corresponding to candidate A. The estimated equation is: \ voteA = 26.81 + 0.464 · shareA Interpret the coefficient on shareA . Be accurate about the difference between "percents" and "percentage points". Solution: First, to clarify a bit, think of candidate A as being the candidate for party A, and your dataset has data on many regressions in which party A participated (e.g., your data could be the percentage of votes received by the democrat candidate in all presidential elections). The estimated coefficient is telling us that in the data when the share of candidate’s A spending increases by 1 percentage point, candidate A receives on average 0.46 percentage points more of the total vote. b) You want to know how much wages change with higher education and run a regression of log(wage) , the natural log of monthly wages in US$, on educ , the years of education, on a sample of workers in the US. The estimated equation is: \ log ( wage ) = 0.584 + 0.083 · educ Interpret the coefficient on educ . Solution: The coefficient on educ has a percentage interpretation when it is multiplied by 100. The predicted wage increases by 8.3% for every additional year of education. c) A consulting firm hired you to study how firms’ CEO’s wages are associated with the sales of the com- pany. You collect a dataset of different firms in Argentina and run a regression of log(salary) , the natural log of the CEO’s salary, on log(sales) , the natural log of the sales of the firm. You are asked to discuss the relationship between sales and CEO salaries in front of your boss. \ log ( salary ) = 4.822 + 0.257 log ( sales ) Solution: Now we have an elasticity. The coefficient tells us that when sales go up by 1% the average wages of the CEOs increase by 0.257% (this is not necessarily a causal relation, it’s just capturing the relation seen in the data). d) Unlike you, your econometrics professor is obsessed with the effect of class sizes on math test scores. She asks you to run a regression of math10 (percentage from 0 to 100 of total points attained in math exam in class 10) on log(enroll) , the natural logarithm of the class size. There are also two control variables included. You get the following result. \ math 10 = 207.66 + 21.16 log ( totcomp ) + 3.98 log ( staff ) 1.29 log ( enroll ) Interpret the relationship between enrollment and the math scores. 4
Solution: Holding staff and totcomp fixed, if enrollment increases by 1%, math10 is predicted to decrease 0.013 percentage points. Or, if enrollment increases by 10%, math10 is predicted to decrease 0.13 percentage points (again, holding staff and totcomp fixed). 5
Question 4: Regressions with Interaction Terms You are studying returns to education in the labor market. You have a dataset that contains wages, years of education and an indicator variable that takes value zero if the individual identifies as a male, and zero otherwise (let’s call this variable non_male ). a) As a first pass you consider the following model for wages: wage i = α 0 + α 1 educ i + α 2 non_male i + e i Differentiate with respect to educ and interpret. Solution: d wage i d educ i = α 1 One additional year of education is associated with α 1 higher wages, ceteris paribus (keeping non-male constant). This effect is constant: No matter what gender a respondent identifies with or what level of education status they have, this is what we model the effect of one additional year of education to be. b) A friend suggests that you should instead consider the following model: wage i = β 0 + β 1 educ i + β 2 non_male i + β 3 educ i · non_male i + u i Differentiate with respect to educ . Solution: d wage i d educ i = β 1 + β 3 non_male i c) Interpret β 1 and β 3 . By how much do wages increase with an extra year of education for men? How about for the rest of the population? Solution: For male-identifying individuals, wages increase by β 1 units for every additional year of ed- ucation. For individuals that do not identify as male, wages increase by β 1 + β 3 units for an additional year of education. This is an extremely important and deep concept in regression analysis. We can model situations where the effect of one variable on the outcome depends on other variables. For example, giving free bednets to families in Kenya may have a different effects depending on whether a family already owns a bednet or not. A new Covid drug may have different efficacies for people with a different age. Or, in this case, the returns to education may differ by gender. d) What do you think are the signs of β 1 , β 2 and β 3 in the US labor market? Solution: Typically, we observe that: β 1 > 0: Individuals with higher education on average have higher wages β 2 < 0: There is a significant gender wage gap in the US (as in all other countries in the world). Male individuals β 3 0: This is an open empirical question. There are some studies that show that returns to education have been higher for women in the past ( see here ), but some newer studies found that this has changed recently ( see here ). Question 5: Inference Throughout this question we’ll be working with the following model in mind: y i = β 0 + β 1 x i + u i a) Use RStudio to do the following: Build a loop (you can use the for or lapply commands for this) that will do the following 1000 times: (i) create a database of x and y using simulations. The database should have 100 observations. Generate x using a uniform distribution between 0 and 100 (you can use 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
the command runif for this). Generate y using the model above and assuming β 0 = 5, β 1 = 0 and that u i N ( 0, 100 ) (you can use the command rnorm for this). (ii) Run a regression of y on x . (iii) Add the estimation for β 1 that came out of step (ii) to a vector containing the values of the estimation of β 1 from all the previous iterations of the loop. The end result of the loop should then be a vector that contains 1000 estimations of β 1 . Now plot the distribution of that vector using the command ggplot() + geom_density() Solution: Code: ## Clear all rm ( list = ls ()) ## Libraries library (ggplot2) library (estimatr) library (gridExtra) library (jtools) library (ggstance) library (huxtable) ########### Inference ## Let’s see better the dist of betas with different true parameters, variances or N # Varying noise, beta=0 vector_ b1 = vector () vector_ b2 = vector () T = 1000 N = 100 for (i in 1:T) { x1 = runif (N, 0, 100) y = 5 + 0 * x1 + rnorm (N, 0, 10) dataset = data . frame (x1, y) reg <- lm_ robust(y ~ x1, data = dataset) vector_ b1 = c ( vector_ b1, summary (reg) $coefficients [2,1]) print (i) df = data . frame ( vector_ b1) ggplot() + geom _density ( data = df , aes(x = vector_ b1, fill = "low noise"), alpha = 0.3) + geom _ vline(xintercept = 0, linetype = "dotted") } b) Repeat a) but instead assuming that u i N ( 0, 10000 ) . Put the two plots together. Interpret. Solution: Code: for (i in 1:T) { x1 = runif (N, 0, 100) y = 5 + 0 * x1 + rnorm (N, 0, 100) dataset = data . frame (x1, y) reg <- lm_ robust(y ~ x1, data = dataset) vector_ b2 = c ( vector_ b2, summary (reg) $coefficients [2,1]) print (i) } df = data . frame ( vector_ b1, vector_ b2) # Overlaid histograms ggplot() + geom _ histogram( data = df , aes(x = vector_ b1, fill = "low noise"), alpha = 0.3, binwidth = 0.05) + 7
geom _ histogram( data = df , aes(x = vector_ b2, fill = "high noise"), alpha = 0.3, binwidth = 0.05) + geom _ vline(xintercept = 0, linetype = "dotted") ggplot() + geom _density ( data = df , aes(x = vector_ b1, fill = "low noise"), alpha = 0.3) + geom _density ( data = df , aes(x = vector_ b2, fill = "high noise"), alpha = 0.3) + geom _ vline(xintercept = 0, linetype = "dotted") We see that we don’t always get the true value of β 1 from the estimations. Here this is coming from the sample variability. I have in my sample 100 individuals of the infinite population. The individuals in the population are not all identical, so when I get random individuals from that population I will be getting a distribution of observations that doesn’t exactly mimic the population. This is the reason why inference is so important. Although from the distributions we see that the es- timated value is centered around the real value it usually doesn’t coincide with it. We commonly take a conservative approach to that, we will believe the coefficient is different from 0 if there’s enough evi- dence of that. In this context enough evidence means that if the coefficient were actually zero it would be very unlikely to observe an estimated value like the one we got. The significance tests formalize that intuition. We also see that the dispersion of the estimations is higher when the variance of the error term is higher. When there is more noise it is more likely that we get estimations that are far away from the true value of the coefficient. c) Repeat a) and b) but now assuming that β 1 = 0.5. Interpret. Solution: Code: # Varying noise, beta != 0 vector_ b1 = vector () vector_ b2 = vector () T = 1000 N = 100 for (i in 1:T) { x1 = runif (N, 0, 100) y = 5 + 0.5 * x1 + rnorm (N, 0, 10) dataset = data . frame (x1, y) reg <- lm_ robust(y ~ x1, data = dataset) vector_ b1 = c ( vector_ b1, summary (reg) $coefficients [2,1]) print (i) } for (i in 1:T) { x1 = runif (N, 0, 100) y = 5 + 0.5 * x1 + rnorm (N, 0, 100) dataset = data . frame (x1, y) reg <- lm_ robust(y ~ x1, data = dataset) vector_ b2 = c ( vector_ b2, summary (reg) $coefficients [2,1]) print (i) } df = data . frame ( vector_ b1, vector_ b2) # Overlaid histograms ggplot() + geom _ histogram( data = df , aes(x = vector_ b1, fill = "low noise"), alpha = 0.3, binwidth = 0.05) + geom _ histogram( data = df , aes(x = vector_ b2, fill = "high noise"), alpha = 0.3, binwidth = 0.05) + geom _ vline(xintercept = 0.5, linetype = "dotted") ggplot() + geom _density ( data = df , aes(x = vector_ b1, fill = "low noise"), alpha = 0.3) + geom _density ( data = df , aes(x = vector_ b2, fill = "high noise"), alpha = 0.3) + geom _ vline(xintercept = 0.5, linetype = "dotted") 8
The interpretation is similar to the one covered in the previous question. Note that there are cases when the estimation is close to zero. In those cases we definitely won’t be able to reject the null hypothesis ( β 1 = 0), so sometimes we won’t reject it even though it’s false. d) Repeat a) with β 1 = 0.5. Now do the same but with sample size 10. Plot the two distributions of the β 1 estimations together and interpret. Solution: Code: # Varying N vector_ b1 = vector () vector_ b2 = vector () T = 1000 N = 100 for (i in 1:T) { x1 = runif (N, 0, 100) y = 5 + 0.5 * x1 + rnorm (N, 0, 10) dataset = data . frame (x1, y) reg <- lm_ robust(y ~ x1, data = dataset) vector_ b1 = c ( vector_ b1, summary (reg) $coefficients [2,1]) print (i) } for (i in 1:T) { x1 = runif (N / 10, 0, 100) y = 5 + 0.5 * x1 + rnorm (N / 10, 0, 10) dataset = data . frame (x1, y) reg <- lm_ robust(y ~ x1, data = dataset) vector_ b2 = c ( vector_ b2, summary (reg) $coefficients [2,1]) print (i) } df = data . frame ( vector_ b1, vector_ b2) # Overlaid histograms ggplot() + geom _ histogram( data = df , aes(x = vector_ b1, fill = "large N"), alpha = 0.3, binwidth = 0.05) + geom _ histogram( data = df , aes(x = vector_ b2, fill = "small N"), alpha = 0.3, binwidth = 0.05) + geom _ vline(xintercept = 0.5, linetype = "dotted") ggplot() + geom _density ( data = df , aes(x = vector_ b1, fill = "large N"), alpha = 0.3) + geom _density ( data = df , aes(x = vector_ b2, fill = "small N"), alpha = 0.3) + geom _ vline(xintercept = 0.5, linetype = "dotted") Similar intuitions from the ones above, but we now see that the variability of the estimator decreases when our sample size is larger. This is the main reason why large sample sizes are important, it makes us more confident in our estimations. All these things are captured in the variance of the OLS estimator, which in an univariate regression is: var ( ˆ β OLS 1 ) = σ 2 ϵ N · var ( x i ) (this expression is exact if the error is normally distributed, approximate other- wise). We see that the variance decreases with N and with the variance of the error term. In practical terms it is common to conduct t-tests for the coefficients of the regression. All of the things covered in this question carry on to the t-tests through our estimation of var ( ˆ β OLS ) . 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help