ps3_sol_Fall 2023
pdf
keyboard_arrow_up
School
Columbia University *
*We aren’t endorsed by this school
Course
UN3412
Subject
Economics
Date
Jan 9, 2024
Type
Pages
13
Uploaded by JudgeMaskYak32
Department of Economics
UN3412
Columbia University
Fall 2023
SOLUTIONS to
Problem Set 3
Introduction to Econometrics
(Erden_ Section 1)
______________________________________________________________________________
Please make sure to select the page number for each question while you are uploading your
solutions to Gradescope. Otherwise, it is tough to grade your answers, and you may lose points.
Part I.
True, False, Uncertain with Explanation:
(a)
(3p)
“Dummy" variables are variables added to the regression that have no explanatory power but
serve only to increase the number of degrees of freedom.
FALSE. Dummy variables are binary explanatory variables. They have explanatory power in the
same way as regular regressors.
(b)
(3p)
?
tests and t tests on coefficients in a regression are equivalent in the sense that dropping all
variables with small (insignificant) t statistics always results in the same final equation as
performing the appropriate F tests.
FALSE. Dropping multiple variables based on the individual t-statistics does not work properly.
Intuitively, this is because the individual t-statistic does not contain any information about
correlation between the individual coefficient estimators. It is possible that the F-statistic on two
coefficients is very significant, but the individual t-statistics for those two coefficients are close to
zero. This happens when there is near multicollinearity in the two covariates. For example, if you
regress people's heights on their weights today and weights yesterday, you will get very large
standard errors for the two coefficients and the t-statistics will be close to zero. However, a joint
test that both coefficients are zero will result in a very large F-statistic.
(c)
(3p) A high
𝑅
2
gives assurance that the estimated coefficient is highly significant.
FALSE. It is possible to have a high
𝑅
2
, but the estimated coefficient is insignificant. For
example, a small sample size can lead to insignificant coefficients but the
𝑅
2
can be high.
(d)
(3p) A low
𝑅
2
means that there is omitted variable bias.
FALSE. It is possible to have a low
𝑅
2
, but there is no omitted variable bias. For example, the
randomized controlled experiment will avoid the omitted variable bias, but it may be the case that
𝑅
2
is low with experimental data.
Part II.
1.
(24p) Let
R
be the expected return on a risky investment and
R
f
be the return on a risk-free
investment. The fundamental idea of modern finance is that an investor needs a financial incentive to
take a risk. Hence,
R
must exceed
R
f
. According to the capital asset pricing model (CAPM) the
expected excess return on an asset is proportional to the expected excess return on a portfolio of all
available assets (the “market portfolio”) That is, the CAPM says that
R
–
R
f
= β (
R
m
–
R
f
) + u
where
R
m
is the expected return on the market portfolio and β is the
coefficient in the population regression of
R
–
R
f
on
R
m
–
R
f
.
In the following STATA output, variable
freturn
is the excess returns for two firms in computer chip
industry and
mreturn
is the excess returns for the market.
Linear regression
Number of obs =
384
F(
1,
382) =
104.52
Prob > F
=
0.0000
R-squared
=
0.2175
Root MSE
=
.13447
------------------------------------------------------------------------------
|
Robust
freturn |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
mreturn |
1.608313
.1573139
10.22
0.000
1.299004
1.917623
_cons |
.0031122
.0071605
0.43
0.664
-.0109666
.0171911
------------------------------------------------------------------------------
(a)
(8p) According to CAPM, the true intercept must be zero and the true slope must be one. Using
hypothesis testing at 10% significance level test if CAPM is correct according to above results.
Answer: The t-statistic associated with the null hypothesis of the intercept being equal to 0 is 0.43
so we cannot reject the null hypothesis, however the t statistic for the null hypothesis that the slope
is equal to one is
? =
1.608313−1
0.1573139
= 3.862473 > 1.64
(two sided test) so we can reject the null
hypothesis that the slope is one at the 10% significance level.
(b)
(8p) What is the meaning of F test in this regression? What is it testing? and how is that statistic
related to t test statistic on the same output
Answer: The F test in this regression tests the null hypothesis:
?
0
: 𝛽
???????
= 0
against
?
1
: 𝛽
???????
≠ 0
i.e. does the return of our assets vary with excess market return? Hence the null
hypothesis is the same as for the t-statistic provided in the output and notice that the two tests are
roughly equivalent since
?
2
≈ 104.44
.
(c)
(8p) Each year, the rate of return on 3-month Treasury bills is 2.1% and the rate of return on a
large, diversified portfolio of stocks (the S&P 500) is 6.2%. For each company listed below, use
the estimated value of β to estimate the stock’s expected rate of return.
Company
Estimated β
Expected rate of return
Kellogg (breakfast cereal)
-0.03
Amazon (online retailer)
2.65
Barnes and Noble (book retailer)
1.02
Answer:
Company
Estimated
𝜷
Expected rate of return
Kellogg
-0.03
1.977
Amazon
2.65
12.965
Barnes & Nobles
1.02
6.282
Recall that
𝑅 − 𝑅
?
= 𝛽(𝑅
?
− 𝑅
?
)
, here
𝑅
?
− 𝑅
?
= 6.2 − 2.1 = 4.1%
. Hence the expected return is:
𝑅
̂
=
𝑅
?
+ 𝛽
̂
(𝑅
?
− 𝑅
?
) = 2.1 + 𝛽
̂
× 4.1
2.
(54p) We will use data file called GPA4.dta to answer this question. Variables are defined on Table
1. Table 2 presents the results of four regressions, one in each column.
Please use Table 2 to answer
the following questions. Estimate the indicated regressions and fill in the values (you may either
handwrite or type the entries in; if you choose to type up the table, an electronic copy of Table 2
in.doc format is available on the course Web site).
For example, to fill in column (1), estimate the
regression with
colGPA
as the dependent variable and
hsGPA
and
skipped
as the independent
variables, using the “robust” option, and fill in the estimated coefficients
.
(a)
(20p) Fill out the table with necessary numbers, some will be on Stata output some you will need
to calculate yourself.
(b)
(8p) Common sense predicts that your high school GPA (hsGPA) and the number of classes you
skipped (skipped) are determinants of your college GPA (colGPA).
Use regression (2) to test
the hypothesis (at the 5% significance level) that the coefficients on these two economic
variables are all zero, against the alternative that at least one coefficient is nonzero
.
?
0
: 𝛽
ℎ?𝐺𝑃𝐴
= 𝛽
???𝑝𝑝??
= 0
?
1
:
t least one coef. is nonzero
The p-value for the F-statistic =.00<.05, thus we reject
?
0
at the 5% significance level. We
tend to conclude at least one coefficient is nonzero.
(c)
(8p) Find the F-statistic for regression (3) and explain what is it testing jointly?
The F-statistic for regression (3) is 12.07; it is jointly testing whether all of the coefficients
are equal to 0, that is - if all the regressors jointly have no explanatory power
.
(d)
(8p) Find the F-statistic for regression (4) and explain what is it testing?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The F-statistic for regression (4) is 11.14 (this is not from the table); it is jointly testing
whether all of the coefficients are equal to 0, that is - if all the regressors jointly have no
explanatory power.
(e)
(10p) Are bgfriend (whether you have a boy/girlfriend) and campus (whether you live on
campus) jointly significant determinants of college GPA? Use regression (2) and (4) to test your
hypothesis. (i.e. use homoskedasticity-only F stat formula, eq.7.14 in the book, instead of
directly testing with STATA)
?
0
: 𝛽
????????
= 𝛽
???𝑝??
= 0
?
1
:
at least one coef. is nonzero
? =
(𝑅
?
2
− 𝑅
?
2
)/?
(1 − 𝑅
?
2
)/(? − ?
?
− 1)
where
𝑅
?
2
and
𝑅
?
2
are obtained from regression (4) and (2) respectively; the number of
restrictions
is q
=2;
?
?
is the number of regressors in the unrestricted regression;
q
and
? −
?
?
− 1
are degrees of freedom for the F distriction(here we perform on-robust regressions).
? =
(0.2784 − 0.2504)/2
(1 − 0.2784)/(141 − 5 − 1)
≅ 2.62 ~ ?(2,135)
F
(2, 135, α=0.05) =3.06>2.62, thus we cannot reject
H
0
at the 5% significance level. We tend
to conclude that
bgfriend
and
campus
jointly have no explanatory power. Alternatively, we
can use the Stata command: di fprob(2, 135, 2.62) to find the associated p-value = .077>.05,
again we cannot reject
?
0
at the 5% significance level
Table 1
Definitions of Variables in GPA4.dta (data is from Wooldridge textbook)
Variable
Definition
colGPA
Cumulative College Grade Point Average of a sample of 141 students at
Michigan State University in 1994.
hsGPA
High School GPA of students.
skipped
Average number of classes skipped per week.
PC
= 1 if the student owns a personal computer
= 0 otherwise.
bgfriend
= 1 if the student answered “yes” to having a boy/girl friend question
= 0 otherwise.
campus
= 1 if the student lives on campus.
= 0 otherwise.
Table 2
College GPA Results
Dependent variable:
colGPA
Regressor
(1)
(2)
(3)
(4)
hsGPA
(
)
(
)
(
)
(
)
skipped
(
)
(
)
(
)
(
)
PC
__
(
)
(
)
(
)
bgfriend
__
__
(
)
(
)
campus
__
__
__
(
)
Intercept
(
)
(
)
(
)
(
)
F
-
statistics testing the hypothesis that the population coefficients on the indicated regressors are all zero
:
hsGPA, skipped
(
)
(
)
(
)
(
)
hsGPA, skipped, PC
__
(
)
(
)
(
)
hsGPA, skipped, PC, bgfriend,
__
__
(
)
(
)
bgfriend, campus
__
__
__
(
)
Regression summary statistics
𝑅
̅
2
R
2
Regression RMSE
n
Notes
:
Heteroskedasticity-robust standard errors are given in parentheses under the estimated
coefficients, and
p
-values are given in parentheses under
F
- statistics.
The
F
-statistics are
heteroskedasticity-robust.
Table 2
College GPA Results
Dependent variable:
colGPA
Regressor
(1)
(2)
(3)
(4)
hsGPA
.458
(.094)
.455
(.092)
.460
(.093)
.461
(.090)
Skipped
-.077
(.025)
-.065
(.025)
-.065
(.025)
-.071
(.026)
PC
__
.128
(.059)
.130
(.059)
.136
(.058)
bgfriend
__
__
.084
(.055)
.085
(.054)
campus
__
__
__
-.124
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(.078)
Intercept
1.579
(.325)
1.526
(.321)
1.469
(.325)
1.490
(.317)
F
-
statistics testing the hypothesis that the population coefficients on the indicated regressors are all
zero
:
hsGPA, skipped
20.90
(.00)
19.34
(.00)
19.42
(.00)
21.19
(.00)
hsGPA, skipped, PC
__
15.47
(.00)
15.56
(.00)
17.46
(.00)
hsGPA, skipped, PC, bgfriend,
__
__
12.07
(.00)
13.62
(.00)
bgfriend, campus
__
__
__
2.55
(.082)
Regression summary statistics
𝑅
̅
2
.211
.234
.241
.252
R
2
.223
.250
.263
.278
Regression RMSE
.331
.326
.324
.322
n
141
141
141
141
Notes:
Heteroskedasticity-robust standard errors are given in parentheses under estimated coefficients,
and p-values are given in parentheses under F- statistics.
The F-statistics are heteroskedasticity-robust.
3.
(10p) Suppose that you are interested in testing a joint null hypothesis consisting of three
restrictions, say
𝛽
1
= 𝛽
2
= 𝛽
3
= 0
in multiple regression. Assume that you have three individual t-
statistics for
𝛽
?
= 0
, where
? = 1, 2, 3
. Consider the following testing procedure: reject the joint
null hypothesis if at least one of t-statistics exceeds 1.96 in absolute value. If t-statistics are
independent of each other, what is the probability of rejecting the joint null hypothesis when it is
true?
Solution:
Pr(???????? ????? ???? ℎ𝑦???ℎ????)
= Pr(?? ????? ??? ?? ?ℎ? ? − ?????????? ?? ??????? ?ℎ?? 1.96 ?? ???????? ?????)
= 1 − Pr(??? 3 ? − ?????????? ??? ?? ???? 1.96 ?? ???????? ?????)
= 1 − 𝑃(|? − ????
1
| ≤ 1.96)𝑃(|? − ????
2
| ≤ 1.96)𝑃(|? − ????
3
| ≤ 1.96)
= 1 − 0.95
3
≈ 0.143
The last equal sign is due to all 3 t-statistics being independent of each other.
Following questions will not be graded, they are for you to practice and will be discussed at the
recitation:
1.
SW Empirical Exercise 5.2
(a)
The estimated regression is
?????ℎ
̂
= 0.96 + 1.68 × 𝑇?????ℎ???
(0.54)
(0.87)
The
t
-statistic for the slope coefficient is
t
= 1.68/0.87 = 1.94.
The
t
-statistic is larger in absolute value that the 10% critical value (1.64), but less than
the 5% and 1% critical values (1.96 and 2.58).
Therefore, the null hypothesis is rejected
at the 10% significance level, but not at the 5% or 1% levels.
(b) The
p
-value is 0.057.
(c) The 90% confidence interval is 1.68 ± 1.64×0.87 or 0.25 ≤
1
≤ 3.11.
2.
SW Empirical Exercise 5.3
(a) Average birthweights, along with standard errors are shown in the table below.
(Birthweight is measured in grams.)
All
Mothers
Non-
smokers
Smokers
X
3383
3432.1
3178.8
SE(
X
)
10.8
11.9
24.0
n
3000
2418
582
(b) The estimated difference is
X
Smokers
-
X
NonSmokers
=
−
253.2.
The standard error of the
difference is
SE
X
Smokers
-
X
NonSmokers
(
)
=
SE
(
X
Smokers
)
2
+
SE
(
X
NonSmokers
)
2
=
26.8
.
The 95% confidence for the difference is
−
253.2 ± 1.96×26.8 = (
−
305.9,
−
200.6).
(c) The estimated regression is
𝐵𝑖??ℎ??𝑖?ℎ?
̂
= 3432.1
−
253.2
Smoker
(11.9)
(26.8)
(i) The intercept is the average birthweight for non-smokers (
Smoker
= 0). The slope is the
difference between average birthweights for smokers (
Smoker
= 1) and non-smokers (
Smoker
= 0).
(ii) They are the same.
(iii) This the same as the confidence interval in (b).
(d) Yes
−
and we’ll investigate this more in future empirical exercises.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.
SW Empirical Exercise 6.1
(a)
The estimated regression is
𝐵𝑖??ℎ??𝑖?ℎ?
̂
= 3432.1
−
253.2
Smoker
The estimated effect of smoking on birthweight is
−
253.2 grams.
(b)
The estimated regression is
𝐵𝑖??ℎ??𝑖?ℎ?
̂
= 3051.2
−
217.6
Smoker
−
30.5
Alcohol
+ 34.1
Nprevist
(i) Smoking may be correlated with both alcohol and the number of pre-natal doctor visits,
thus satisfying (1) in Key Concept 6.1.
Moreover, both alcohol consumption and the number
of doctor visits may have their own independent affects on birthweight, thus satisfying (2) in
Key Concept 6.1.
(ii) The estimated is somewhat smaller: it has fallen to 217 grams from 253 grams, so the
regression in (a) may suffer from omitted variable bias.
(iii)
𝐵𝑖??ℎ??𝑖?ℎ?
̂
= 3051.2
−
217.6×1
−
30.5×0 + 34.1×8 = 3106.4
(iv)
R
2
= 0.0729 and
R
2
= 0.0719.
They are nearly identical because the sample size is very
large (
n
= 3000).
(v)
Nprevist
is a control variable. It captures, for example, mother's access to healthcare and
health.
Because
Nprevist
is a control variable, its coefficient does not have a causal
interpretation.
(c) The results from STATA are
. ** FW calculations;
. regress birthweight alcohol nprevist;
Source |
SS
df
MS
Number of obs =
3000
-------------+------------------------------
F(
2,
2997) =
82.64
Model |
54966381
2
27483190.5
Prob > F
=
0.0000
Residual |
996653623
2997
332550.425
R-squared
=
0.0523
-------------+------------------------------
Adj R-squared =
0.0516
Total |
1.0516e+09
2999
350656.887
Root MSE
=
576.67
------------------------------------------------------------------------------
birthweight |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
alcohol |
-103.2781
76.53276
-1.35
0.177
-253.3402
46.78392
nprevist |
36.49956
2.870272
12.72
0.000
30.87166
42.12746
_cons |
2983.739
33.35198
89.46
0.000
2918.344
3049.134
------------------------------------------------------------------------------
. predict bw_res, r;
. regress smoker alcohol nprevist;
Source |
SS
df
MS
Number of obs =
3000
-------------+------------------------------
F(
2,
2997) =
38.97
Model |
11.8897961
2
5.94489803
Prob > F
=
0.0000
Residual |
457.202204
2997
.152553288
R-squared
=
0.0253
-------------+------------------------------
Adj R-squared =
0.0247
Total |
469.092
2999
.156416139
Root MSE
=
.39058
------------------------------------------------------------------------------
smoker |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
alcohol |
.334529
.0518358
6.45
0.000
.2328917
.4361662
nprevist |
-.0111667
.001944
-5.74
0.000
-.0149785
-.0073549
_cons |
.3102729
.0225893
13.74
0.000
.2659807
.3545651
------------------------------------------------------------------------------
. predict smoker_res, r;
. regress bw_res smoker_res;
Source |
SS
df
MS
Number of obs =
3000
-------------+------------------------------
F(
1,
2998) =
66.55
Model |
21644450.1
1
21644450.1
Prob > F
=
0.0000
Residual |
975009170
2998
325219.87
R-squared
=
0.0217
-------------+------------------------------
Adj R-squared =
0.0214
Total |
996653621
2999
332328.65
Root MSE
=
570.28
------------------------------------------------------------------------------
bw_res |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+----------------------------------------------------------------
smoker_res |
-217.5801
26.6707
-8.16
0.000
-269.8748
-165.2854
_cons |
-2.75e-07
10.41185
-0.00
1.000
-20.41509
20.41509
------------------------------------------------------------------------------
(d)
The estimated regression is
𝐵𝑖??ℎ??𝑖?ℎ?
̂
= 3454.5
−
228.8
Smoker
−
15.1
Alcohol
−
698.0
Tripre0
−
100.8
Tripre2
−
137.0
Tripre3
(i)
Tripre
1 is omitted to avoid perfect multicollinearity.
(
Tripre
0+
Tripre
1+
Tripre
2+
Tripre
3 =
1, the value of the “constant” regressor that determines the intercept).
The regression would not
run, or the software will report results from an arbitrary normalization if
Tripre
0,
Tripre
1,
Tripre
2,
Tripre
3, and the constant term all included in the regression.
(ii) Babies born to women who had no prenatal doctor visits (
Tripre
0 = 1) had birthweights that
on average were 698.0 grams (≈ 1.5 lbs) lower than babies from others who saw a doctor during
the first trimester (
Tripre
1 = 1).
(iii) Babies born to women whose first doctor visit was during the second trimester (
Tripre
2 = 1)
had birthweights that on average were 100.8 grams (≈ 0.2 lbs) lower than babies from others who
saw a doctor during the first trimester (
Tripre
1 = 1). Babies born to women whose first doctor
visit was during the third trimester (
Tripre
3 = 1) had birthweights that on average were 137
grams (≈ 0.3 lbs) lower than babies from others who saw a doctor during the first trimester
(
Tripre
1 = 1).
4.
SW Empirical Exercise 6.2
(a)
Variable
Mean
Standard
Deviation
Minimum
Maximum
Units
Growth
1.87
1.82
−
2.81
7.16
Percentage Points
Rgdp60
3131.0
2523.0
367.0
9895.0
$1960
Tradeshare
0.542
0.229
0.141
1.128
Unit free
yearsschool
3.95
2.55
0.20
10.07
years
Rev_coups
0.170
0.225
0.000
0.970
Coups per year
Assassinations
0.281
0.494
0.000
2.467
Assassinations per
year
Oil
0.00
0.00
0.00
0.00
0
–
1 indicator
variable
(b) Estimated Regression (in table format):
Regressor
Coefficient
tradeshare
1.34
yearsschool
0.56
rev_coups
−
2.15
assasinations
0.32
rgdp60
−
0.00046
intercept
0.63
SER
1.59
R
2
0.29
2
R
0.23
The coefficient on
Rev_Coups
is
−
2.15. An additional coup in a five year period, reduces the
average year growth rate by (2.15/5)
=
0.43% over this 25 year period.
This means the GDP
in 1995 is expected to be approximately .43
25
=
10.75% lower. This is a large effect.
(c)
The predicted growth rate at the mean values for all regressors is 1.87.
(d) The resulting predicted value is 2.18
(e)
The variable “oil” takes on the value of 0 for all 64 countries in the sample. This would
generate perfect multicollinearity, since
=
−
0
1
1
i
i
Oil
X
, and hence the variable is a linear
combination of one of the regressors, namely the constant.
Do file:
use /Users/mwatson/Dropbox/TB/4E/EE_Datasets/birthweight_smoking.dta;
describe;
*************************************************************;
summarize;
***********************************************;
**** Some Regressions ************************;
**********************************************;
regress birthweight smoker, robust;
regress birthweight smoker alcohol nprevist, robust;
dis "Adjusted Rsquared = " _result(8);
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
** FW calculations;
regress birthweight alcohol nprevist;
predict bw_res, r;
regress smoker alcohol nprevist;
predict smoker_res, r;
regress bw_res smoker_res;
***********************************;
regress birthweight smoker alcohol tripre0 tripre2 tripre3, robust;
dis "Adjusted Rsquared = " _result(8);
Recommended textbooks for you





Brief Principles of Macroeconomics (MindTap Cours...
Economics
ISBN:9781337091985
Author:N. Gregory Mankiw
Publisher:Cengage Learning

Principles of Macroeconomics (MindTap Course List)
Economics
ISBN:9781305971509
Author:N. Gregory Mankiw
Publisher:Cengage Learning
Recommended textbooks for you
- Brief Principles of Macroeconomics (MindTap Cours...EconomicsISBN:9781337091985Author:N. Gregory MankiwPublisher:Cengage LearningPrinciples of Macroeconomics (MindTap Course List)EconomicsISBN:9781305971509Author:N. Gregory MankiwPublisher:Cengage Learning





Brief Principles of Macroeconomics (MindTap Cours...
Economics
ISBN:9781337091985
Author:N. Gregory Mankiw
Publisher:Cengage Learning

Principles of Macroeconomics (MindTap Course List)
Economics
ISBN:9781305971509
Author:N. Gregory Mankiw
Publisher:Cengage Learning