ECON 104 W24 - Midterm Review

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

104

Subject

Economics

Date

Jun 6, 2024

Type

pdf

Pages

53

Uploaded by ChancellorCrowPerson1145

Report
Heteroskedasticity Time Series Endogeneity Simultaneous Equations ECON104 Midterm Review February 19, 2024
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Logistics Exam is Thursday in class from 12:30 - 1:40PM. Topics are heteroskedasticity, time series, and endogeneity. Bring your student IDs and a calculator
Heteroskedasticity Time Series Endogeneity Simultaneous Equations What to study The "style" of questions from the practice/previous midterm on the website, however the way "coding" questions are implemented will be different. In particular, questions 5, 9, 12, and 13 in the practice midterm There will still be problems that require you to interpret code, and not just code output Labs (including that Econ 103 lab PDF) are the best tools to study the necessary code. Reading/understanding code output is still required! The discussion problems are also useful to study.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Other Code Tips When studying the lab codes, think about the main takeaways/lessons of the labs. There is more to keep in mind, but think about the following questions: Econ 103: How do the different selection methods compare, and how do I choose the "best" subset according to each method. Lab 1: How do I change standard errors from OLS to HC standard errors? How do I implement WLS? How do I (manually) conduct a GQ or BP type test? Lab 2: What is the difference between setting up a time series regression vs. a standard OLS model? How do I know when there is autocorrelation, and when autocorrelation does not exist. Lab 3: What is the difference between the lm and ivreg commands? How do I know what the instruments, endogenous, and exogenous variables are by reading a command? How do I interpret the diagnostic tests in the ivreg output?
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Heteroskedasticity Overview Given data ( x i , y i ) , we may estimate a linear model: y i = β 1 + β 2 x i + e i (1)
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Heteroskedasticity Overview How do we test for it? 1 Breusch-Pagan (BP) Test White Test 2 Goldfield-Quandt (GQ) Test How do we fix it? 1 White Standard Errors (HC) 2 Weighted Least Squares
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Breusch-Pagan (BP) Test After estimating the original model, retrieve the estimated residuals ˆ e i . The BP test uses the following linear model: ˆ e 2 i = θ 0 + θ 1 z 1 + · · · + θ J z J + v i (2) where ( z 1 , . . . , z J ) are the set of J variables that you believe explain the heteroskedasticity in your model. The test statistic is then given by: ˆ BP = N · R 2 χ 2 ( J ) (3) If ˆ BP χ 2 ( 1 α, J ) , then the model is heteroskedastic
Heteroskedasticity Time Series Endogeneity Simultaneous Equations White Test The White Test is a special version of the BP test where you use the full set of first and second order terms of the X variables from your original equation. For example, if you have the original model Y i = β 1 + β 2 X 1 + β 3 X 2 + e i (4) the White Test becomes ˆ e 2 i = θ 0 + θ 1 X 1 + θ 2 X 2 + θ 3 X 2 1 + θ 4 X 2 2 + θ 5 X 1 X 2 + v i (5)
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Goldfield-Quandt (GQ) Test The GQ test is a test on GROUPS ; We want to see if σ 2 1 = σ 2 2 1 Arrange the observations in ascending order of the variable you believe is the source of heteroskedasticity (Skip if discrete) 2 Split into two groups 3 Run two separate regressions and compute their variance: ˆ σ 2 g = SSE g N g K g 4 Compute the test statistic (with σ 1 > σ 2 ): ˆ GQ = ˆ σ 2 1 ˆ σ 2 2 F ( N 1 K 1 , N 2 K 2 ) 5 If ˆ GQ > F α ( N 1 K 1 , N 2 K 2 ) , then heteroskedasticity is present
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations White’s HC Standard Errors Recall the original OLS standard errors: Var( b 2 | X ) = N N K N i = 1 [( x i ¯ x ) 2 ˆ σ 2 ] [ n i = 1 ( x i ¯ x ) 2 ] 2 (6) One solution to heteroskedasticity is computing different standard errors. One example is White’s HC1 standard error: Var( b 2 | X ) = N N K N i = 1 [( x i ¯ x ) 2 ˆ e 2 i ] [ n i = 1 ( x i ¯ x ) 2 ] 2 (7) The only difference is we substitute in the squared residuals ( ˆ e 2 i ) for the error variance ( ˆ σ 2 ) in the numerator!
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Weighted Least Squares Suppose heteroskedasticity is present and we believe that it takes on the following form: Var( e i | x i ) = σ 2 h ( x i ) (8) If h ( x i ) is known, then we can "undo" the heteroskedasticity by weighting the regression by p h ( x i ) : y i p h ( x i ) = β 1 1 p h ( x i ) + β 2 x i p h ( x i ) + e i p h ( x i ) y i = β 1 x 0 i + β 2 x 1 i + e i (WLS)
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Weighted Least Squares What affect does this have on the variance of our residuals? Var( e i | x i ) = Var e i p h ( x i ) x i ! = 1 h ( x i ) Var( e i | x i ) = 1 h ( x i ) σ 2 h ( x i ) = σ 2 After our weighting, we have a homoskedastic model! Our parameters β 1 and β 2 are unchanged The standard errors of the model are changed: Var( b 2 | X ) WLS = N N K N i = 1 [( x i ¯ x ) 2 ˆ σ 2 ] n i = 1 ( x i ¯ x ) 2 2 (9)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Generalized Least Squares In WLS, we assumed that h ( x ) followed a certain function, but what if we don’t know what it is? We need to get an estimate of h ( x ) ! (1) Suppose J variables are in your skedastic function and assume it takes the form: h ( x ) = J Y j = 1 x θ j ij (10) (2) As E [ e i ] = 0 by assumption, we create the model Var( e i | x i ) = E [ e 2 ] = σ 2 J Y j = 1 x θ j ij (11) (3) Taking logs of both sides yields log( E [ e 2 i ]) = θ 0 + θ 1 log( x i 1 ) + · · · + θ j log( x iJ ) (12)
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Generalized Least Squares (4) Using sample residuals an an error term gives us a linear model! log(ˆ e 2 i ) = θ 0 + θ 1 log( x i 1 ) + · · · + θ j log( x iJ ) + v i (13) (5) Run OLS on this model to receive parameter estimates for θ 0 , θ 1 , . . . , θ J and compute weights: ˆ w i = e ˆ θ 0 J Y j = 1 x ˆ θ j ij (6) Run GLS on your original model with weights w i : y i w i = β 1 1 w i + β 2 x i w i + e i w i (14)
Heteroskedasticity Time Series Endogeneity Simultaneous Equations WLS vs GLS WLS assumes you know the function h ( x ) , GLS estimates this function: WLS might have specification error GLS might have estimation error With error, some heteroskedasticity may remain. −→ Add on a White standard error and you are good! The residual variances for the two methods differ: WLS has Var(ˆ e 2 i ) = σ 2 GLS has Var(ˆ e 2 i ) = 1 This happens because the intercept term in the GLS function θ 0 corresponds to σ 2 : ˆ w i = ˆ σ 2 ˆ h ( x i )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Time Series Suppose that instead of observing a cross-section of units in a single period, we observe a single individual over several time periods? In particular, we now have data ( X t , Y t ) T t = 1 where the observations are indexed across time periods. In these settings, oftentimes we assume that there is some relationship between past observations and future outcomes. The idea of time series econometrics is building models to estimate those relationships.
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Stationarity A time series is said to be stationary if its distribution does not change over time:
Heteroskedasticity Time Series Endogeneity Simultaneous Equations 3 Types of Time Series Models Autoregressive: Relates previous values of Y with current outcomes of Y : Y t = β + β 1 Y t 1 + · · · + β p Y t p + e t (AR(p)) Distributed Lag: Relates previous values of X with current outcomes of Y : Y t = δ + δ 0 X t + δ 1 X t 1 + · · · + δ q X t q + e t (DL(q)) ARDL: Combines the previous two models into one: Y t = θ + β 1 Y t 1 + · · · + β p Y t p + δ 0 X t + δ 1 X t 1 + · · · + δ q X t q + e t (ARDL(p,q))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Autocorrelation When observing a variable W across multiple time periods, we can compute its correlation with itself ( autocorrelation ): ρ s = Cov( W t , W t s ) Var( W t ) (15) In our time series model, we want to make sure that our error term does not have autocorrelation: ρ s = Cov( e t , e t s ) Var e t = 0 for all lags s This is analogous to our independence of errors assumption from OLS
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Testing Autocorrelations How can we tell whether the autocorrelation is significantly different from zero? Our test statistic is Z = T · ˆ ρ s N ( 0 , 1 ) (16) Our test hypotheses are: H 0 : ρ s = 0 (No autocorrelation) H 1 : ρ s ̸ = 0 (Autocorrelation) If we reject the null, then there is autocorrelation of order s .
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Correlograms The correlogram is a representation of sample correlations at various lengths. A horizonal line is drawn at ± 2 / T , which is approximately the critical value:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Partial Autocorrelations Idea is to "partial out" the intermediate lags of the ACF to isolate the lag of interest.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Breusch-Godfrey (BG) Test A Breusch-Godfrey test is a regression of today’s errors on past errors. Suppose you had the following ARDL model: y t = δ + θ 1 y t 1 + δ 1 x t 1 + e t (17) and wanted to know if the errors had autocorrelation at some lag s ? Solution: Estimate the equation, get the residuals ˆ e t , and run the following regression: ˆ e t = δ + ρ 1 ˆ e t 1 + · · · + ρ s ˆ e t s + v t (18) Perform an LM = T × R 2 test for autocorrelation with the test statistic from the χ 2 ( s ) distribution. (Rejection = autocorrelation)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Model Selection Choosing the order, p (Lags of Y ) 1 Check correlogram of errors after estimating an AR(1) model, include the significant lags as new Y variables. Repeat with new correlogram. 2 Compute a PACF correlogram of the Y variables and select significant lags Choosing the order, q (Lags of X ) 1 Begin by selecting the maximum length you are willing to consider. 2 Create alternate models of different lag lengths. 3 Using some criterion (AIC, BIC/SC), pick the model with the lowest value
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations HAC Standard Errors If you can’t fully eliminate autocorrelated error terms, then you’ll need to compute standard errors that are robust to it. These are called HAC errors. The most commonly used one is the Newey-West Standard Error: ˆ SE NW = ˆ SE HC 1 ( 1 + g ) (19) where ˆ SE HC 1 are the White standard errors from heteroskedasticity, and g is some function of the autocorrelations. In general, putting in HAC standard will lead to wider confidence intervals.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations AR(1) Error to ARDL Model Suppose we write the following model y t = α + β 0 x t + e t (20) but we believe the error terms follow the following process: e t = ρ e t 1 + v t (21) How would we correctly write the model to get rid of autocorrelation in the errors?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations AR(1) Error to ARDL Model 1 Plug in the error formula into your original equation: y t = α + β 0 x t + ρ e t 1 + v t 2 Rewrite the error term for t 1 with the original model e t 1 = y t 1 α β 0 x t 1 3 Plug into the model and rearrange terms: y t = α + β 0 x t + ρ ( y t 1 α β 0 x t 1 ) + v t y t = δ + θ y t 1 + β 0 x t + β 1 x t 1 + v t This is now an ARDL(1,1) model where the parameters have some relation to each other.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations ARDL(1,1) to IDL Suppose we had an ARDL(1,1) Model: y t = δ + θ 1 y t 1 + δ 0 x t + δ 1 x t 1 + e t (22) and wanted to translate it to an Inifnite Distributed Lag (IDL) model of the following form: y t = α + β 0 x t + β 1 x t 1 + β 2 x t 2 + · · · + v t (23) How would we do it?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations ARDL(1,1) to IDL 1 Write the IDL model for y t 1 : y t 1 = α + β 0 x t 1 + β 1 x t 2 + β 2 x t 3 + · · · + v t 1 2 Plug it into your original ARDL model: y t = δ + θ 1 ( α + β 0 x t 1 + β 1 x t 2 + β 2 x t 3 + · · · + v t 1 ) + δ 0 x t + δ 1 x t 1 + e t 3 Match up terms according to the lags of the x variables: y t = ( δ + θ 1 α | {z } α ) + δ 0 |{z} β 0 x t + ( δ 1 + θ 1 β 0 | {z } β 1 ) x t 1 + θ 1 β 1 | {z } β 2 x t 2 + · · · + u t This process generalizes to different orders of the ARDL model.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Total Effects in IDL Models If we are interested in knowing what a one unit change in x has on future outcomes of y , we would compute the sum: TE = X k = 0 β k Assuming β k = β 0 r k where | r | < 1, this infinite sum is given by the formula: TE = X k = 0 β 0 r k = β 0 1 r This is known as a Geometric Series!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Forecasting Suppose we have an AR(1) model and we wanted to make a forecast, or prediction, for the next three periods. Assuming our model is correct, what would the next three periods true outcomes be? y t + 1 = θ 0 + θ 1 y t + e t + 1 (24) y t + 2 = θ 0 + θ 1 y t + 1 + e t + 2 (25) y t + 3 = θ 0 + θ 1 y t + 2 + e t + 3 (26) But what would our forecasts be?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Forecasting In general, our forecasts will be of the form: ˆ y t + s = E [ y t + s |I t ] (27) where I t is all the data you have today. In our example, that creates forecasts ˆ y t + 1 = ˆ θ 0 + ˆ θ 1 y t (28) ˆ y t + 2 = ˆ θ 0 + ˆ θ 1 ˆ y t + 1 (29) ˆ y t + 3 = ˆ θ 0 + ˆ θ 1 ˆ y t + 2 (30) While creating point forecasts might be straightforward, how would we compute the variance of our forecast errors f s = y t + s ˆ y t + s ?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Forecast Errors Given the definition of forecast errors, we can begin computing them for our current model: f 1 = y t + s ˆ y t + s = ( θ 0 ˆ θ 0 ) + ( θ 1 ˆ θ 1 ) y t + e t + 1 One simplifying assumption that we use is ˆ θ = θ . This assumption is reasonable as in large samples we have ˆ θ θ (unbiased), and is useful as it gets rid of a lot of terms. The forecast errors can then written as: f 1 = e t + 1 f 2 = θ 1 ( y t + 1 ˆ y t + ) + e t + 2 = θ 1 f 1 + e t + 2 f 3 = θ 1 f 2 + e t + 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Forecast Errors Forecast errors can then be derived as the following: Var( f 1 ) = Var( e t + 1 ) = σ 2 Var( f 2 ) = Var( θ 1 f 1 + e t + 2 ) = ( θ 2 1 + 1 ) σ 2 Var( f 3 ) = Var( θ 1 f 2 + e t + 3 ) = ( θ 4 1 + θ 2 1 + 1 ) σ 2 We can then construct confidence intervals for our forecasts with: CI s = ˆ y t + 1 ± t c · p Var( f s ) (31)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Granger Causality Suppose we had an ARDL(p,q) model y t = δ + p X s = 1 θ s y t s + q X k = 0 δ k x t k + e t (32) and we wanted to know if the lags of X were impact the outcome of Y . A Granger Causality Test is a F test that tests the joint signiciance of the δ k terms: H 0 : δ 0 = δ 1 = · · · = δ q = 0 H 1 : At least one is not equal to 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Exogeneity Given data ( x i , y i ) , we may estimate a linear model: y i = β 1 + β 2 x i + e i (33) Originally, we assumed that the error was uncorrelated with our x variable, i.e our x variable is exogenous : E [ Xe ] = 0 or E [ e | X ] = 0 for all X While this works for many models, this assumption is not always satisfied.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations When Can Endogeneity Fail? 1 Suppose we wanted to estimate the effect of an extra year of education on wages? WAGE i = β 0 + β 1 EDUC i + e i (34) Unobserved "ability" might increase education and earnings! More accomplished students are accepted to universities. More productive people earn a higher wage. 2 Suppose we wanted to estimate the effect of price on the demand of bottled water? Q D i = β 0 + β 1 PRICE i + e i (35) A natural disaster might hit both! If tap water is shut off, people will demand more bottle water. If supply chains are disrupted, then firms might charge more.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Why does it matter? In these cases, we have Cov( X , e ) ̸ = 0, but what affect does this have on our OLS estimates? E [ ˆ b 1 ] = Cov( X , Y ) Var( X ) = Cov( X , β 0 + β 1 X + e ) Var( X ) = Cov( X , β 0 ) Var( X ) + Cov( X , β 1 X ) Var( X ) + Cov( X , e ) Var( X ) = β 1 + Cov( X , e ) Var( X ) When X is endogenous, then our estimator is biased!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations How do we solve it? One solution to the problem of endogeneity are instruments, denoted as Z . They must satisfy the following conditions (Exclusion) Z is uncorrelated with the regression error e , i.e. Cov( Z , e ) = 0 (Relevance) Z is (strongly) correlated with X
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations How do we use an IV? Suppose the model is given Y i = β 0 + β 1 X i 1 + · · · + β B X iB + θ 1 W i 1 + · · · + W iG + e i (36) where X is the endogenous variable and W is an exogeneous variables Suppose we have a good instrument for X . How do we use it?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Solution: 2SLS One solution is the Two Stage Least Squares (2SLS): 1 Estimate the regression of X on Z and W : X i = θ 0 + β 1 Z i + β 2 W i + v i (37) Store the fitted values ˆ X This results in Cov( ˆ X , e ) = 0 2 Estimate the regression of Y on ˆ X and W : Y i = β 0 + β 1 ˆ X i + β 2 W i + e i (38) Your estimates of the second stage will be consistent for β 1 !
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Solution: 2SLS If there’s no additional exogenous variables ( W ), the estimate for β 1 is ˜ b 1 = N i = 1 ( z i ¯ z )( y i ¯ y ) N i = 1 ( z i ¯ z )( x i ¯ x ) (39) The standard error of ˜ b 1 given by Var( ˜ b 1 ) = ( z i ¯ z ) 2 ˆ σ 2 IV [ ( z i ¯ z )( x i ¯ x )] 2 Var( ˆ b OLS 1 ) ( Corr ( X , Z )) 2 (40)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Solution: 2SLS Some things to keep in mind: 1 You always have to include the exogenous variable W in the first stage. 2 Manually running both models and regressing the second stage on ˆ X is wrong. −→ Parameters are still correct (unbiased/consistent), −→ but standard errors are wrong (inefficient). 3 Want to use a command like lm <- ivreg ( Y X + W | Z + W , data = . . . )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Instrument Selection Earlier, we said that Var( ˜ b IV 1 ) Var( ˆ b OLS 1 ) ( Corr ( X , Z )) 2 so we want to select instruments that are highly correlated with X . This keeps our standard errors as small as possible! With only 1 instrument, a first-stage F -stat greater than 10 is a good rule of thumb With more instruments, you want an even higher F -stat, but no rule of thumb exists.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Checking for Endogeneity (Hausman Test) To check whether X and e are correlated, conduct a Hausman Test 1 Estimate your first stage regression and store the residuals: ˆ v i = X i ( ˆ θ 0 + ˆ θ 1 Z i + ˆ θ 2 W i ) 2 Include the residuals in the main regression: Y i = β 0 + β 1 X i + δ ˆ v i + e i 3 Conduct the following test: H 0 : δ = 0 (No endogeneity) H 1 : δ ̸ = 0 (Endogeneity)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations How many IVs do we need? Denote L as the amount of instruments we have: L < B : The model is underidentified and the model can’t be estimated. L = B : The model is just-identified and the parameter estimates are unique L > B : The model is overidentified , and problems may arise as your first stage estimate ˆ X approach the original X values in the sample (generally ok).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Overidentification (Sargan) Tests Only when L > B , you can conduct a Sargan test. This tells you whether you additional instruments should be included. 1 Conduct the 2SLS estimation, and store the residuals of the second stage: ˆ e = ˆ Y ( ˆ β 0 + ˆ β 1 X + ˆ β 2 W ) (41) 2 Regress ˆ e on all available instruments: ˆ e = θ 0 + θ 1 Z 1 + · · · + θ L Z L (42) 3 If the instruments are all valid, then N × R 2 χ 2 ( L B ) 4 If the test statistic is larger than the critical value, then at least one surplus moment condition is not valid.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Simultaneous Equations Simultaneous Equations are used in cases where we believe equilibria occur. Models of supply and demand are a popular case. In general, these models describe systems where multiple endogenous variables are determined simultaneously by a set of equations. Big Topics: 1 Identification 2 Reduced Form 3 Estimation by 2SLS
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Identification Whenever we have a simultaneous equation system, we always break down the variables into endogenous variables and exogenous variables: M Endogenous variables and equations K Exogenous variables For identification (i.e. good estimation), we need each of the M equations to exclude M 1 exogenous variables.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Reduced Form Equations One way to describe simultaneous equation systems is by solving for the reduced form equations. The reduced form system yields a system of M equations, that each contain one endogenous variable on the left-hand, and all the exogenous variables on the right hand side. When solving for these, treat it as a system of equations where the endogenous variables are the "unknown" variables that you are solving for.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Reduced Form Equations Structural Equations: y i 1 = α 1 + α 2 y i 2 + α 3 x i 1 + e i 1 y i 2 = β 1 + β 2 y i 1 + β 3 x i 2 + e i 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Reduced Form Equations Structural Equations: y i 1 = α 1 + α 2 y i 2 + α 3 x i 1 + e i 1 y i 2 = β 1 + β 2 y i 1 + β 3 x i 2 + e i 2 Reduced Form Equations: y i 1 = π 11 + π 12 x i 1 + π 13 x i 2 + v i 1 y i 2 = π 21 + π 22 x i 1 + π 23 x i 2 + v i 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Heteroskedasticity Time Series Endogeneity Simultaneous Equations Estimation by 2SLS Estimation of a simultaneous equation system relies on the reduced form equation: 1 Estimate the reduced form equation for the endogenous variable(s) on the right hand side of the equation. Treat these as the first stage. 2 Run a regression of the original equation, replacing the true values of the RHS endogenous variables with their reduced form fitted values If you don’t do this, then estimates will be biased and inconsistent!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help