ps3_sol_Fall 2023



Columbia University *

*We aren’t endorsed by this school






Jan 9, 2024





Uploaded by JudgeMaskYak32

Department of Economics UN3412 Columbia University Fall 2023 SOLUTIONS to Problem Set 3 Introduction to Econometrics (Erden_ Section 1) ______________________________________________________________________________ Please make sure to select the page number for each question while you are uploading your solutions to Gradescope. Otherwise, it is tough to grade your answers, and you may lose points. Part I. True, False, Uncertain with Explanation: (a) (3p) “Dummy" variables are variables added to the regression that have no explanatory power but serve only to increase the number of degrees of freedom. FALSE. Dummy variables are binary explanatory variables. They have explanatory power in the same way as regular regressors. (b) (3p) ? tests and t tests on coefficients in a regression are equivalent in the sense that dropping all variables with small (insignificant) t statistics always results in the same final equation as performing the appropriate F tests. FALSE. Dropping multiple variables based on the individual t-statistics does not work properly. Intuitively, this is because the individual t-statistic does not contain any information about correlation between the individual coefficient estimators. It is possible that the F-statistic on two coefficients is very significant, but the individual t-statistics for those two coefficients are close to zero. This happens when there is near multicollinearity in the two covariates. For example, if you regress people's heights on their weights today and weights yesterday, you will get very large standard errors for the two coefficients and the t-statistics will be close to zero. However, a joint test that both coefficients are zero will result in a very large F-statistic. (c) (3p) A high 𝑅 2 gives assurance that the estimated coefficient is highly significant. FALSE. It is possible to have a high 𝑅 2 , but the estimated coefficient is insignificant. For example, a small sample size can lead to insignificant coefficients but the 𝑅 2 can be high. (d) (3p) A low 𝑅 2 means that there is omitted variable bias. FALSE. It is possible to have a low 𝑅 2 , but there is no omitted variable bias. For example, the randomized controlled experiment will avoid the omitted variable bias, but it may be the case that 𝑅 2 is low with experimental data. Part II. 1. (24p) Let R be the expected return on a risky investment and R f be the return on a risk-free investment. The fundamental idea of modern finance is that an investor needs a financial incentive to take a risk. Hence, R must exceed R f . According to the capital asset pricing model (CAPM) the expected excess return on an asset is proportional to the expected excess return on a portfolio of all available assets (the “market portfolio”) That is, the CAPM says that
R R f = β ( R m R f ) + u where R m is the expected return on the market portfolio and β is the coefficient in the population regression of R R f on R m R f . In the following STATA output, variable freturn is the excess returns for two firms in computer chip industry and mreturn is the excess returns for the market. Linear regression Number of obs = 384 F( 1, 382) = 104.52 Prob > F = 0.0000 R-squared = 0.2175 Root MSE = .13447 ------------------------------------------------------------------------------ | Robust freturn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mreturn | 1.608313 .1573139 10.22 0.000 1.299004 1.917623 _cons | .0031122 .0071605 0.43 0.664 -.0109666 .0171911 ------------------------------------------------------------------------------ (a) (8p) According to CAPM, the true intercept must be zero and the true slope must be one. Using hypothesis testing at 10% significance level test if CAPM is correct according to above results. Answer: The t-statistic associated with the null hypothesis of the intercept being equal to 0 is 0.43 so we cannot reject the null hypothesis, however the t statistic for the null hypothesis that the slope is equal to one is ? = 1.608313−1 0.1573139 = 3.862473 > 1.64 (two sided test) so we can reject the null hypothesis that the slope is one at the 10% significance level. (b) (8p) What is the meaning of F test in this regression? What is it testing? and how is that statistic related to t test statistic on the same output Answer: The F test in this regression tests the null hypothesis: ? 0 : 𝛽 ??????? = 0 against ? 1 : 𝛽 ??????? ≠ 0 i.e. does the return of our assets vary with excess market return? Hence the null hypothesis is the same as for the t-statistic provided in the output and notice that the two tests are roughly equivalent since ? 2 ≈ 104.44 . (c) (8p) Each year, the rate of return on 3-month Treasury bills is 2.1% and the rate of return on a large, diversified portfolio of stocks (the S&P 500) is 6.2%. For each company listed below, use the estimated value of β to estimate the stock’s expected rate of return. Company Estimated β Expected rate of return Kellogg (breakfast cereal) -0.03 Amazon (online retailer) 2.65 Barnes and Noble (book retailer) 1.02
Answer: Company Estimated 𝜷 Expected rate of return Kellogg -0.03 1.977 Amazon 2.65 12.965 Barnes & Nobles 1.02 6.282 Recall that 𝑅 − 𝑅 ? = 𝛽(𝑅 ? − 𝑅 ? ) , here 𝑅 ? − 𝑅 ? = 6.2 − 2.1 = 4.1% . Hence the expected return is: 𝑅 ̂ = 𝑅 ? + 𝛽 ̂ (𝑅 ? − 𝑅 ? ) = 2.1 + 𝛽 ̂ × 4.1 2. (54p) We will use data file called GPA4.dta to answer this question. Variables are defined on Table 1. Table 2 presents the results of four regressions, one in each column. Please use Table 2 to answer the following questions. Estimate the indicated regressions and fill in the values (you may either handwrite or type the entries in; if you choose to type up the table, an electronic copy of Table 2 in.doc format is available on the course Web site). For example, to fill in column (1), estimate the regression with colGPA as the dependent variable and hsGPA and skipped as the independent variables, using the “robust” option, and fill in the estimated coefficients . (a) (20p) Fill out the table with necessary numbers, some will be on Stata output some you will need to calculate yourself. (b) (8p) Common sense predicts that your high school GPA (hsGPA) and the number of classes you skipped (skipped) are determinants of your college GPA (colGPA). Use regression (2) to test the hypothesis (at the 5% significance level) that the coefficients on these two economic variables are all zero, against the alternative that at least one coefficient is nonzero . ? 0 : 𝛽 ℎ?𝐺𝑃𝐴 = 𝛽 ???𝑝𝑝?? = 0 ? 1 : t least one coef. is nonzero The p-value for the F-statistic =.00<.05, thus we reject ? 0 at the 5% significance level. We tend to conclude at least one coefficient is nonzero. (c) (8p) Find the F-statistic for regression (3) and explain what is it testing jointly? The F-statistic for regression (3) is 12.07; it is jointly testing whether all of the coefficients are equal to 0, that is - if all the regressors jointly have no explanatory power . (d) (8p) Find the F-statistic for regression (4) and explain what is it testing?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The F-statistic for regression (4) is 11.14 (this is not from the table); it is jointly testing whether all of the coefficients are equal to 0, that is - if all the regressors jointly have no explanatory power. (e) (10p) Are bgfriend (whether you have a boy/girlfriend) and campus (whether you live on campus) jointly significant determinants of college GPA? Use regression (2) and (4) to test your hypothesis. (i.e. use homoskedasticity-only F stat formula, eq.7.14 in the book, instead of directly testing with STATA) ? 0 : 𝛽 ???????? = 𝛽 ???𝑝?? = 0 ? 1 : at least one coef. is nonzero ? = (𝑅 ? 2 − 𝑅 ? 2 )/? (1 − 𝑅 ? 2 )/(? − ? ? − 1) where 𝑅 ? 2 and 𝑅 ? 2 are obtained from regression (4) and (2) respectively; the number of restrictions is q =2; ? ? is the number of regressors in the unrestricted regression; q and ? − ? ? − 1 are degrees of freedom for the F distriction(here we perform on-robust regressions). ? = (0.2784 − 0.2504)/2 (1 − 0.2784)/(141 − 5 − 1) ≅ 2.62 ~ ?(2,135) F (2, 135, α=0.05) =3.06>2.62, thus we cannot reject H 0 at the 5% significance level. We tend to conclude that bgfriend and campus jointly have no explanatory power. Alternatively, we can use the Stata command: di fprob(2, 135, 2.62) to find the associated p-value = .077>.05, again we cannot reject ? 0 at the 5% significance level
Table 1 Definitions of Variables in GPA4.dta (data is from Wooldridge textbook) Variable Definition colGPA Cumulative College Grade Point Average of a sample of 141 students at Michigan State University in 1994. hsGPA High School GPA of students. skipped Average number of classes skipped per week. PC = 1 if the student owns a personal computer = 0 otherwise. bgfriend = 1 if the student answered “yes” to having a boy/girl friend question = 0 otherwise. campus = 1 if the student lives on campus. = 0 otherwise. Table 2 College GPA Results Dependent variable: colGPA Regressor (1) (2) (3) (4) hsGPA ( ) ( ) ( ) ( ) skipped ( ) ( ) ( ) ( ) PC __ ( ) ( ) ( ) bgfriend __ __ ( ) ( ) campus __ __ __ ( ) Intercept ( ) ( ) ( ) ( ) F - statistics testing the hypothesis that the population coefficients on the indicated regressors are all zero : hsGPA, skipped
( ) ( ) ( ) ( ) hsGPA, skipped, PC __ ( ) ( ) ( ) hsGPA, skipped, PC, bgfriend, __ __ ( ) ( ) bgfriend, campus __ __ __ ( ) Regression summary statistics 𝑅 ̅ 2 R 2 Regression RMSE n Notes : Heteroskedasticity-robust standard errors are given in parentheses under the estimated coefficients, and p -values are given in parentheses under F - statistics. The F -statistics are heteroskedasticity-robust. Table 2 College GPA Results Dependent variable: colGPA Regressor (1) (2) (3) (4) hsGPA .458 (.094) .455 (.092) .460 (.093) .461 (.090) Skipped -.077 (.025) -.065 (.025) -.065 (.025) -.071 (.026) PC __ .128 (.059) .130 (.059) .136 (.058) bgfriend __ __ .084 (.055) .085 (.054) campus __ __ __ -.124
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(.078) Intercept 1.579 (.325) 1.526 (.321) 1.469 (.325) 1.490 (.317) F - statistics testing the hypothesis that the population coefficients on the indicated regressors are all zero : hsGPA, skipped 20.90 (.00) 19.34 (.00) 19.42 (.00) 21.19 (.00) hsGPA, skipped, PC __ 15.47 (.00) 15.56 (.00) 17.46 (.00) hsGPA, skipped, PC, bgfriend, __ __ 12.07 (.00) 13.62 (.00) bgfriend, campus __ __ __ 2.55 (.082) Regression summary statistics 𝑅 ̅ 2 .211 .234 .241 .252 R 2 .223 .250 .263 .278 Regression RMSE .331 .326 .324 .322 n 141 141 141 141 Notes: Heteroskedasticity-robust standard errors are given in parentheses under estimated coefficients, and p-values are given in parentheses under F- statistics. The F-statistics are heteroskedasticity-robust. 3. (10p) Suppose that you are interested in testing a joint null hypothesis consisting of three restrictions, say 𝛽 1 = 𝛽 2 = 𝛽 3 = 0 in multiple regression. Assume that you have three individual t- statistics for 𝛽 ? = 0 , where ? = 1, 2, 3 . Consider the following testing procedure: reject the joint null hypothesis if at least one of t-statistics exceeds 1.96 in absolute value. If t-statistics are independent of each other, what is the probability of rejecting the joint null hypothesis when it is true? Solution:
Pr(???????? ????? ???? ℎ𝑦???ℎ????) = Pr(?? ????? ??? ?? ?ℎ? ? − ?????????? ?? ??????? ?ℎ?? 1.96 ?? ???????? ?????) = 1 − Pr(??? 3 ? − ?????????? ??? ?? ???? 1.96 ?? ???????? ?????) = 1 − 𝑃(|? − ???? 1 | ≤ 1.96)𝑃(|? − ???? 2 | ≤ 1.96)𝑃(|? − ???? 3 | ≤ 1.96) = 1 − 0.95 3 ≈ 0.143 The last equal sign is due to all 3 t-statistics being independent of each other. Following questions will not be graded, they are for you to practice and will be discussed at the recitation: 1. SW Empirical Exercise 5.2 (a) The estimated regression is ?????ℎ ̂ = 0.96 + 1.68 × 𝑇?????ℎ??? (0.54) (0.87) The t -statistic for the slope coefficient is t = 1.68/0.87 = 1.94. The t -statistic is larger in absolute value that the 10% critical value (1.64), but less than the 5% and 1% critical values (1.96 and 2.58). Therefore, the null hypothesis is rejected at the 10% significance level, but not at the 5% or 1% levels. (b) The p -value is 0.057. (c) The 90% confidence interval is 1.68 ± 1.64×0.87 or 0.25 ≤ 1 ≤ 3.11. 2. SW Empirical Exercise 5.3 (a) Average birthweights, along with standard errors are shown in the table below. (Birthweight is measured in grams.) All Mothers Non- smokers Smokers X 3383 3432.1 3178.8 SE( X ) 10.8 11.9 24.0
n 3000 2418 582 (b) The estimated difference is X Smokers - X NonSmokers = 253.2. The standard error of the difference is SE X Smokers - X NonSmokers ( ) = SE ( X Smokers ) 2 + SE ( X NonSmokers ) 2 = 26.8 . The 95% confidence for the difference is 253.2 ± 1.96×26.8 = ( 305.9, 200.6). (c) The estimated regression is 𝐵𝑖??ℎ??𝑖?ℎ? ̂ = 3432.1 253.2 Smoker (11.9) (26.8) (i) The intercept is the average birthweight for non-smokers ( Smoker = 0). The slope is the difference between average birthweights for smokers ( Smoker = 1) and non-smokers ( Smoker = 0). (ii) They are the same. (iii) This the same as the confidence interval in (b). (d) Yes and we’ll investigate this more in future empirical exercises.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. SW Empirical Exercise 6.1 (a) The estimated regression is 𝐵𝑖??ℎ??𝑖?ℎ? ̂ = 3432.1 253.2 Smoker The estimated effect of smoking on birthweight is 253.2 grams. (b) The estimated regression is 𝐵𝑖??ℎ??𝑖?ℎ? ̂ = 3051.2 217.6 Smoker 30.5 Alcohol + 34.1 Nprevist (i) Smoking may be correlated with both alcohol and the number of pre-natal doctor visits, thus satisfying (1) in Key Concept 6.1. Moreover, both alcohol consumption and the number of doctor visits may have their own independent affects on birthweight, thus satisfying (2) in Key Concept 6.1. (ii) The estimated is somewhat smaller: it has fallen to 217 grams from 253 grams, so the regression in (a) may suffer from omitted variable bias. (iii) 𝐵𝑖??ℎ??𝑖?ℎ? ̂ = 3051.2 217.6×1 30.5×0 + 34.1×8 = 3106.4 (iv) R 2 = 0.0729 and R 2 = 0.0719. They are nearly identical because the sample size is very large ( n = 3000). (v) Nprevist is a control variable. It captures, for example, mother's access to healthcare and health. Because Nprevist is a control variable, its coefficient does not have a causal interpretation. (c) The results from STATA are . ** FW calculations; . regress birthweight alcohol nprevist; Source | SS df MS Number of obs = 3000 -------------+------------------------------ F( 2, 2997) = 82.64 Model | 54966381 2 27483190.5 Prob > F = 0.0000 Residual | 996653623 2997 332550.425 R-squared = 0.0523 -------------+------------------------------ Adj R-squared = 0.0516 Total | 1.0516e+09 2999 350656.887 Root MSE = 576.67 ------------------------------------------------------------------------------ birthweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- alcohol | -103.2781 76.53276 -1.35 0.177 -253.3402 46.78392 nprevist | 36.49956 2.870272 12.72 0.000 30.87166 42.12746
_cons | 2983.739 33.35198 89.46 0.000 2918.344 3049.134 ------------------------------------------------------------------------------ . predict bw_res, r; . regress smoker alcohol nprevist; Source | SS df MS Number of obs = 3000 -------------+------------------------------ F( 2, 2997) = 38.97 Model | 11.8897961 2 5.94489803 Prob > F = 0.0000 Residual | 457.202204 2997 .152553288 R-squared = 0.0253 -------------+------------------------------ Adj R-squared = 0.0247 Total | 469.092 2999 .156416139 Root MSE = .39058 ------------------------------------------------------------------------------ smoker | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- alcohol | .334529 .0518358 6.45 0.000 .2328917 .4361662 nprevist | -.0111667 .001944 -5.74 0.000 -.0149785 -.0073549 _cons | .3102729 .0225893 13.74 0.000 .2659807 .3545651 ------------------------------------------------------------------------------ . predict smoker_res, r; . regress bw_res smoker_res; Source | SS df MS Number of obs = 3000 -------------+------------------------------ F( 1, 2998) = 66.55 Model | 21644450.1 1 21644450.1 Prob > F = 0.0000 Residual | 975009170 2998 325219.87 R-squared = 0.0217 -------------+------------------------------ Adj R-squared = 0.0214 Total | 996653621 2999 332328.65 Root MSE = 570.28 ------------------------------------------------------------------------------ bw_res | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- smoker_res | -217.5801 26.6707 -8.16 0.000 -269.8748 -165.2854 _cons | -2.75e-07 10.41185 -0.00 1.000 -20.41509 20.41509 ------------------------------------------------------------------------------ (d) The estimated regression is 𝐵𝑖??ℎ??𝑖?ℎ? ̂ = 3454.5 228.8 Smoker 15.1 Alcohol 698.0 Tripre0 100.8 Tripre2 137.0 Tripre3 (i) Tripre 1 is omitted to avoid perfect multicollinearity. ( Tripre 0+ Tripre 1+ Tripre 2+ Tripre 3 = 1, the value of the “constant” regressor that determines the intercept). The regression would not run, or the software will report results from an arbitrary normalization if Tripre 0, Tripre 1, Tripre 2, Tripre 3, and the constant term all included in the regression. (ii) Babies born to women who had no prenatal doctor visits ( Tripre 0 = 1) had birthweights that on average were 698.0 grams (≈ 1.5 lbs) lower than babies from others who saw a doctor during the first trimester ( Tripre 1 = 1). (iii) Babies born to women whose first doctor visit was during the second trimester ( Tripre 2 = 1) had birthweights that on average were 100.8 grams (≈ 0.2 lbs) lower than babies from others who saw a doctor during the first trimester ( Tripre 1 = 1). Babies born to women whose first doctor visit was during the third trimester ( Tripre 3 = 1) had birthweights that on average were 137 grams (≈ 0.3 lbs) lower than babies from others who saw a doctor during the first trimester ( Tripre 1 = 1).
4. SW Empirical Exercise 6.2 (a) Variable Mean Standard Deviation Minimum Maximum Units Growth 1.87 1.82 2.81 7.16 Percentage Points Rgdp60 3131.0 2523.0 367.0 9895.0 $1960 Tradeshare 0.542 0.229 0.141 1.128 Unit free yearsschool 3.95 2.55 0.20 10.07 years Rev_coups 0.170 0.225 0.000 0.970 Coups per year Assassinations 0.281 0.494 0.000 2.467 Assassinations per year Oil 0.00 0.00 0.00 0.00 0 1 indicator variable (b) Estimated Regression (in table format): Regressor Coefficient tradeshare 1.34 yearsschool 0.56 rev_coups 2.15 assasinations 0.32 rgdp60 0.00046 intercept 0.63 SER 1.59 R 2 0.29 2 R 0.23 The coefficient on Rev_Coups is 2.15. An additional coup in a five year period, reduces the average year growth rate by (2.15/5) = 0.43% over this 25 year period. This means the GDP in 1995 is expected to be approximately .43 25 = 10.75% lower. This is a large effect. (c) The predicted growth rate at the mean values for all regressors is 1.87. (d) The resulting predicted value is 2.18 (e) The variable “oil” takes on the value of 0 for all 64 countries in the sample. This would generate perfect multicollinearity, since = −  0 1 1 i i Oil X , and hence the variable is a linear combination of one of the regressors, namely the constant. Do file: use /Users/mwatson/Dropbox/TB/4E/EE_Datasets/birthweight_smoking.dta; describe; *************************************************************; summarize; ***********************************************; **** Some Regressions ************************; **********************************************; regress birthweight smoker, robust; regress birthweight smoker alcohol nprevist, robust; dis "Adjusted Rsquared = " _result(8);
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
** FW calculations; regress birthweight alcohol nprevist; predict bw_res, r; regress smoker alcohol nprevist; predict smoker_res, r; regress bw_res smoker_res; ***********************************; regress birthweight smoker alcohol tripre0 tripre2 tripre3, robust; dis "Adjusted Rsquared = " _result(8);