EC2C3 MT Lecture 7

pdf

School

London School of Economics *

*We aren’t endorsed by this school

Course

EC2C3

Subject

Economics

Date

Nov 24, 2024

Type

pdf

Pages

28

Uploaded by ONYNO_

Report
EC2C3 Econometrics I Michael Gmeiner m.w.gmeiner@lse.ac.uk Lecture 7: Omitted Variable Bias Michaelmas Term 11 October 2022
Outline Previous lectures Matching and regression deal with confounders We use observed control variables to capture the effects of confounders Dale and Krueger’s study of the private school wage gap This lecture Comparing regressions with different sets of regressors Omitted variable bias (OVB) and how it affects regression estimates Re-evaluating Dale and Krueger’s results 1
Effect of Adding Controls 2 Regression is automated matching. We estimate the effect of interest while holding fixed controls. (1) Is our estimate with all controls causal? (2) Can we quantify how estimates change when adding another control variable?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The Nature of Matching and Regression 3 We estimate the effect of treatment holding fixed controls. What if something isn’t included as a control? Could there be other confounders? Treatment (Private School) Outcome of Interest (Earnings) Ambition and Ability (Confounder) Controlled for with the Application & Acceptance Set
Could there be Other Confounders? 4 Treatment (Private School) Outcome of Interest (Earnings) Ambition and Ability (Confounder) Controlled for with the Application & Acceptance Set ??????? We will consider the effect of a confounder that is omitted from the regression.
Setting Consider the data from the example in lecture 6 . 𝑌 ? = ? + ? ? ? + ?? ? + ? ? (long regression) Does the inclusion of the dummy for ? ? matter for the estimate of ?? 𝑌 ? = ? ? + ? ? ? ? + ? ? ? (short regression) We have two regression estimates, ? = 10,000 ? ? = 20,000 Where does the difference of 10,000 between these estimates come from? (“Long” and “short” are relative to each other, with “long” meaning at least one additional variable is added). 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Auxiliary Regression 𝑌 ? = ? + ?? ? + ? ? ? + ? ? Consider the following auxiliary regression of ? ? (which is the variable omitted in the short regression) on ? ? (the regressor of interest): ? ? = 𝜋 0 + 𝜋 1 ? ? + ? ? 𝜋 1 is interpreted as “if a student attends a private rather than public school, how much more likely are they to be in group A?” If 𝜋 1 ≠ 0 , then ? ? and ? ? are correlated. We will plug the auxiliary regression into the long regression. 6
Relationship Between Long and Short Regressions The auxiliary regression of ? ? is ? ? = 𝜋 0 + 𝜋 1 ? ? + ? ? Substituting the auxiliary regression into the long regression gives, 𝑌 ? = ? + ?? ? + ? ? ? + ? ? 𝑌 ? = ? + ?? ? + ? 𝜋 0 + 𝜋 1 ? ? + ? ? + ? ? Distributing: 𝑌 ? = ? + ? ? ? + ?𝜋 0 + ?𝜋 1 ? ? + ?? ? + ? ? 𝑌 ? = ? + ?𝜋 0 + ? + ?𝜋 1 ? ? + ?? ? + ? ? 𝑌 ? = ? ? + ? ? ? ? + ? ? ? The key result is: ? 𝑠 = ? + ?𝜋 1 7
Omitted Variable Bias (OVB) The difference between the short and long regression coefficients, ? ? and ? , is called omitted variable bias (OVB): Omitted Variable Bias = Coef in Short − Coef in Long From the previous derivation, OVB = ? ? ? = ? 𝜋 1 = { Coefficient on the omitted in the Long Regression } × Coefficient on the variable of interest in the Auxiliary 8 𝑌 ? = ? + ? ? ? + ? ? ? + ? ? (Long Regression) 𝑌 ? = ? ? + ? ? ? ? + ? ? ? Short Regression ? ? = 𝜋 0 + 𝜋 1 ? ? + ? ? (Auxiliary Regression)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Numerical Example Short Regression: 𝑌 ? = ? ? + ? ? ? ? + ? ? ? 𝑌 = 70,000 + 20,000 ? ? Long Regression: 𝑌 ? = ? + ?? ? + ?? ? + ? ? 𝑌 = 40,000 + 10,000 ? ? + 60,000 ? ? Auxiliary Regression: ? ? = 𝜋 0 + 𝜋 1 ? ? + ? ? ? ? = 0.5 + 0.1667 ? ? OVB: ? ? ? = 20,000 10,000 = 10,000 = ො? 𝜋 1 = 60,000 × 0.1667 = 10,000 Without controlling for group selectivity, we overestimate the private-public pay gap by $10,000. 9 Person 1996 Earnings P A 1 110,000 1 1 2 100,000 1 1 3 110,000 0 1 4 60,000 1 0 5 30,000 0 0
Numerical Examples: OVB of own SAT 10 We’ll apply the concept of the OVB formula to compare the long and short regressions in columns (3) and (4). The ideas are the same, there are just other regressors that are present in the algebra. We’ll also apply the concept to the regressions in columns (1) and (2).
Auxiliary Regression with Other ? 11 Long : ln(????𝑖??? ? ) = ? 0 + ? 1 ??𝑖???? ? + ? 2 ?𝐴? 𝑖 100 + ? 3 ?????? ??𝐼? ? + ? 4 2???? ? + ? 5 3???? ? + ? 6 4???? ? + ? ? Short : ln(????𝑖??? ? ) = ? 0 ? + ? 1 ? ??𝑖???? ? + ? 3 ? ?????? ??𝐼? ? + ? 4 ? 2???? ? + ? 5 ? 3???? ? + ? 6 ? 4???? ? + ? ? ? The auxiliary has the omitted variable from the short as the outcome and the ? in the short as regressors, ??? ? 100 = 𝜋 0 + 𝜋 1 ??𝑖???? ? + 𝜋 2 ?????? ??𝐼? ? + 𝜋 3 2???? ? + 𝜋 4 3???? ? + 𝜋 5 4???? ? + ? ? Plugging the auxiliary into the long regression and simplifying results in the short regression. ln(????𝑖???) = ? 0 + ? 1 ??𝑖???? + ? 2 (𝜋 0 + 𝜋 1 ??𝑖???? ? + ⋯ ) + ? 3 ?????? ??𝐼? ? + ? 4 2???? ? + ? 5 3???? ? + ? 6 4???? ? + ? ? ln ????𝑖??? = ? 0 + ? 2 𝜋 0 + ? 1 + ? 2 𝜋 1 ??𝑖???? ? + ⋯ + ? ? ? ? 1 ? = ? 1 + ? 2 𝜋 1 . The principle of the OVB formula is the same: ?ℎ??? = 𝐿??? + ?????𝑖?𝑖??? ?? ??𝑖???? 𝑖? 𝐿??? × ?????𝑖?𝑖??? ?? ???𝑖???? ?? 𝑖??????? 𝑖? ???𝑖?𝑖???
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Auxiliary Regressions 12 For the case with no other ? . 1 ?𝐴? 𝑖 100 = 𝜋 0 + 𝜋 1 ??𝑖???? ??ℎ??? ? + ? ? For the case with other ? . 2 ??? ? 100 = 𝜋 0 + 𝜋 1 ??𝑖???? ??ℎ??? ? + 𝜋 2 ?????? ??𝐼? ? + 𝜋 3 2 ???? ? + 𝜋 4 3 ???? ? + 𝜋 5 4 ???? ? + ? ? Outcome variable: Own SAT Score/100 (1) (2)
Calculating the OVB of SAT without using selection controls 13 OVB = Short − Long = 0.212 − 0 .152 = 0.06 OV? = Coefficient on variable of interest in Auxiliary × Coefficient on omitted in Long = 1.165 × 0.051 = 0.06
Without looking at results from the auxiliary regression, do students who attended private colleges, on average: A. Score higher on the SAT than their public college counterparts B. Score lower on the SAT than their public college counterparts C. Score similarly on the SAT as their public college counterparts D. We don’t know 14 Practice OVB = ?ℎ??? − 𝐿??? > 0 OVB = Coefficient on variable of interest in Auxiliary × Coefficient on omitted in Long > 0 Because Coefficient on omitted in Long > 0 , we know Coefficient on variable of interest in Auxiliary > 0 , thus, the omitted variable and variable of interest are positively correlated. That is, SAT score is positively correlated with attending a private college. Go to ttpoll.eu and use session id “EC2C3”
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
15 OVB = Short − Long = 0.034 − 0.031 = 0.003 OV? = Coefficient on variable of interest in Auxiliary × Coefficient on omitted in Long = 0.066 × 0.036 ≈ 0.003 (There’s some rounding error here). Calculating the OVB of SAT using selection controls
University Selectivity Controls Reduce OVB Without University Selectivity Controls: OVB = Short − Long = 0.212 − 0.152 = 0.06 OVB = Coefficient on variable of interest in Auxiliary × Coefficient on omitted in Long = 1.165 × 0.051 = 0.06 When we include selection controls, we compare students with the same values of selection controls. Students are already quite similar even before controlling for SAT scores. The change in coefficients when controlling for SAT is much smaller. With University Selectivity Controls: OVB = Short − Long = 0.034 − 0.031 = 0.003 OVB = Coefficient on variable of interest in Auxiliary × Coefficient on omitted in Long = 0.066 × 0.036 ≈ 0.003 16
Relationship Between Selection Bias and OVB Selection bias : The difference between our estimate of the causal effect and the true causal effect. The bias that occurs when we compare the treated and control group without controlling for all confounders that affect treatment and also the outcome (selection into treatment is not fully controlled for). E.g., private school applicants may be more motivated or born into richer families on average. These factors affect income. Issue: We may not observe all factors that influence selection into treatment, so we can’t include them all in the regression as controls to remove their OVB. OVB : the mathematical relationship between coefficients for the same variable in any two regressions which differ only in that one regression contains at least one additional regressor. Adding variables to remove OVB per se does not imply a long regression has a causal interpretation . OVB is only a mathematical relationship. The OVB formula holds for any paired long and short regressions. Removing OVB will reduce selection bias if we control for good confounders. 17
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Is Selection Bias Eliminated Within Groups? Consider the regression with dummy variables for each group to attempt to control for selection. (Superscript “S” are used because this will be the short regression on the next slide). ln 𝑌 ? = ? ? + ? ? ? ? + ෍ ? ? ? ? ????? ?? + ? 1 ? ??? ? + ? 2 ? ln ?𝐼 ? + ? ? ? Could there be selection bias? Consider why a person accepted by Harvard and U-Mass might choose U-Mass. There might be factors unrelated to earnings potential that affect the decision. There might be factors that are related to earning potential. E.g., family size. Larger households might be less able to fund the high tuition at Harvard, and kids from large households are more likely to earn less. 18
What if we Include Family Size? 19 ln 𝑌 ? = ? ? + ? ? ? ? + σ ? ? ? ? ????? ?? + ? 1 ? ??? ? + ? 2 ? ln ?𝐼 ? + ? ? ? If we observe family size and include it in the regression, ln 𝑌 ? = ? + ?? ? + σ ? ? ? ????? ?? + ? 1 ??? ? + ? 2 ln ?𝐼 ? + ? 3 ?? ? + ? ? The auxiliary regression is, ?? ? = 𝜋 0 + 𝜋 1 ? ? + σ ? 𝜃 ? ????? ?? + 𝜋 2 ??? ? + 𝜋 3 ln ?𝐼 ? + ? ?
OVB = Relationship between ?? ? and ? ? × {Coefficient on ?? ? in Long Regression} Which coefficient captures the relationship between ?? ? and ? ? ? A. ? B. ? 3 C. 𝜋 1 D. 𝜋 3 20 Practice Go to ttpoll.eu and use session id “EC2C3”
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OVB Due to Omitting Family Size OVB = Relationship between ?? ? and ? ? × { Coefficient on ?? ? in Long Regression } OVB = ? ? ? = 𝜋 1 × ? 3 21 ln 𝑌 ? = ? ? + ? ? ? ? + σ ? ? ? ? ????? ?? + ? 1 ? ??? ? + ? 2 ? ln ?𝐼 ? + ? ? ? If we observe family size and include it in the regression, ln 𝑌 ? = ? + ? ? ? + σ ? ? ? ????? ?? + ? 1 ??? ? + ? 2 ln ?𝐼 ? + ? 3 ?? ? + ? ? The auxiliary regression is, ?? ? = 𝜋 0 + 𝜋 1 ? ? + σ ? 𝜃 ? ????? ?? + 𝜋 2 ??? ? + 𝜋 3 ln ?𝐼 ? + ? ?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
How is this Useful? If we don’t observe ?? , we can’t run either the long or auxiliary regression. How does the OVB formula help? The coefficient for ?? ? in the long regression, namely ? 3 , is likely negative. Kids from larger families often earn less. (They receive less parental attention as children). The relationship between ?? ? and ? ? , namely 𝜋 1 , is likely negative. Kids from larger families are less likely to attend private universities. Using the OVB formula: OVB = ? ? − ? = 𝜋 1 × ? 3 = negative × negative = positive Even without knowing the estimates, we can predict the sign of the OVB. This allows us to interpret our results as likely greater than the true effect, ? ? > ? . 22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
“Signing” the Bias OVB = Coefficient on variable of interest in Auxiliary × {Coefficient on omitted in Long Regression} OVB = ? ? − ? ? ? = ? + ??? If ??? > 0 , then ? ? > ? . ? ? is “biased up” or has “upward bias” and is an overestimate. If ??? < 0 , then ? ? < ? . ? ? is “biased down” or has “downward bias” and is an underestimate. 23 ??? Coefficient on variable of interest in Auxiliary + Coefficient on omitted in Long Regression + ??? > 0 ??? < 0 ??? < 0 ??? > 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Context of Harvard and U-Mass Many omitted variables would likely be of the style of family size, affecting earnings and the probability of attending private schools in the same direction. This means OVB would be positive, and results are biased up. Including variables such as family size would lower the estimate of the private school treatment effect, which is basically zero, even further. The story can go in the opposite way. Individuals going to U-Mass who also were admitted to Harvard may be super smart and got scholarships to attend U-Mass. The OVB would be negative, and results would be biased down. 24
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Application Essay by Hugh Gallagher, age 19 Hugh went to NYU Topic: Describe significant experiences or accomplishments that have helped define you as a person. 25 Essay: I am a dynamic figure. I ride the Tour de France and cook Thirty-Minute Brownies in twenty minutes. I am an expert in stucco, a veteran in love, and an outlaw in Peru. On Wednesdays after school, I repair appliances for the homeless. I am an abstract artist, a concrete analyst, and a ruthless bookie. I weave, I dodge, I frolic, and my bills are all paid. I have won spelling bees at the Kremlin, bullfights in San Juan, and cliff-diving competitions in Sri Lanka. I’ve played Hamlet, performed open-heart surgery, and spoken with Elvis. But I have not yet gone to college. You can see from his surreal achievements, it’s almost impossible to “keep factors fixed (equal)” if we have people like Hugh in our dataset.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Terminology Recap Outcome: the variable on the left side of a regression. Also called the “dependent variable” . Regressor: any variable on the right side of a regression, also called a “covariate”, “right -side variable”, or “independent variable” . Regressor of interest: the variable whose causal effect on the outcome is what we want to study. Also called “treatment” . Confounder: a variable that is causally related with both the treatment and the outcome. It is not necessarily a variable in our regression. Control: a regressor (or matching variable) we use to hold confounders constant. For a causal interpretation of the coefficient for the regressor of interest, controls need to capture all confounders . 26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Omitted Variable Bias: Summary The interpretation of regression estimates as causal effects hinges on our ability to control for all confounders. We can only control for observables. The OVB formula helps us reason about the effect of potential unobserved confounders. OVB = Coefficient on variable of interest in Auxiliary × Coefficient on omitted in Long The formula is a mathematical relationship between coefficients that holds irrespective of the causal interpretation of the estimates. 27
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help