SA3_Solution_Econ140
pdf
keyboard_arrow_up
School
University of California, Berkeley *
*We aren’t endorsed by this school
Course
140
Subject
Economics
Date
Feb 20, 2024
Type
Pages
7
Uploaded by LieutenantWorld12843
Section Assignment - Econ 140 Spring 2024
WEEK 3: Causality and introduction to OLS
Exercises
Question 1: Potential outcomes framework
a) Define the Average Treatment Effect (ATE) and the average treatment effect on the treated (ATT) using the
potential outcomes notation.
Solution:
Let’s first define some notation.
–
T
i
is an indicator variable that takes value one if the individual
i
was assigned the treatment and value
zero otherwise.
–
Y
i
(
1
)
is the potential outcome of individual
i
if she is treated, what her outcome would be if she is
received the treatment.
Y
i
(
0
)
is the potential outcome of individual
i
if she is not treated, what her
outcome would be if she didn’t receive the treatment. Note that for any given individual we will only
be able to observe them in at most one of the two states (treated or untreated), so we won’t be able
to directly observe both of their potential outcomes.
This is the origin of most difficulties in causal
inference.
Then, the individual treatment effect for agent
i
is defined as
Y
i
(
1
)
−
Y
i
(
0
)
.
And the average treatment
effect is the expected value of the individual treatment effects. That is,
ATE
=
E
[
Y
i
(
1
)
−
Y
1
(
0
)]
. The ATT
is the same thing but conditioning on being treated:
ATT
=
E
[
Y
i
(
1
)
−
Y
1
(
0
)
|
T
i
=
1
]
. Note that the ATT
can be different from the ATE because the treated individuals need not be a random sample of the whole
population.
b) You want to estimate the effect of owning an iPad on grades in Econ 140. You compare the average grades of
students with iPads to the average of students without iPads. Does this comparison (a difference in means
comparison) allow you to find the ATE or the ATT? Why or why not?
Solution:
Denote with
Y
the grade and with
T
the iPad ownership status, and keep the rest of the notation from above.
Then, let’s develop that difference in means to see what we get:
We start with:
E
[
Y
i
|
T
i
=
1
]
−
E
[
Y
i
|
T
i
=
0
]
For the individuals with an iPad their expected grade is the potential outcome of having an iPad, and for
those without an iPad their expected grade is the potential outcome of not having an iPad. So we can rewrite
the above as:
E
[
Y
i
(
1
)
|
T
i
=
1
]
−
E
[
Y
i
(
0
)
|
T
i
=
0
]
Add and subtract
E
[
Y
i
(
0
)
|
T
i
=
1
]
E
[
Y
i
(
1
)
|
T
i
=
1
]
−
E
[
Y
i
(
0
)
|
T
i
=
1
] +
E
[
Y
i
(
0
)
|
T
i
=
1
]
−
E
[
Y
i
(
0
)
|
T
i
=
0
]
Using properties of expectations we get that the above is equal to:
E
[
Y
i
(
1
)
−
Y
1
(
0
)
|
T
i
=
1
] +
E
[
Y
i
(
0
)
|
T
i
=
1
]
−
E
[
Y
i
(
0
)
|
T
i
=
0
]
In the first term we got the ATT. But we have something else, we didn’t get just that. The remaining two
terms are what we call selection bias, it arises from the fact that we started comparing individuals that might
be inherently different (we should ask ourselves why some own an iPad and some don’t to begin with).
The difference between the second and the third term (i.e., the selection bias) quantifies the difference in the
expected outcome of not being treated for the individuals that were treated and the individuals that were
not treated.
1
c) Can we introduce additional assumptions that allow us to get the ATE/ATT from a difference-in-means
comparison? Can you think of a case in which these assumptions hold in our example?
Solution:
If we assume that the selection bias is zero then we would obviously get the ATT. Formally, if the treatment
status is independent from the potential outcomes of the individuals, then the conditional expected values
are the same as the unconditional. The selection bias is zero, so we get the ATT from that difference in means.
In this particular case, the ATT would also be equal to the ATE because of the independence assumption:
E
[
Y
i
(
1
)
−
Y
1
(
0
)
|
T
i
=
1
] =
E
[
Y
i
(
1
)
−
Y
1
(
0
)]
.
2
Question 2: Deriving the OLS estimator
Consider the standard linear regression model with two regressors
x
1
and
x
2
:
y
i
=
β
0
+
β
1
x
1
i
+
β
2
x
2
i
+
u
i
a) Precisely interpret
β
0
and
β
1
.
Solution:
β
0
is the expected value of
y
given that
x
1
and
x
2
are equal to zero.
An increase in
x
1
by one unit, keeping
x
2
fixed, is associated with an increase in
y
of
β
1
units.
b) What is the difference between
ˆ
β
1
and
β
1
?
Solution:
β
1
denotes the association between
x
1
and
y
in the whole population of interest. It is a fixed number that
does not vary.
ˆ
β
1
is an estimator for this parameter. It is a random variable with the typical properties of
random variables (i.e., it has an expectation, a variance, a distribution, etc.)
c) Consider now the case of a regression model with a single regressor.
y
i
=
β
0
+
β
1
x
i
+
u
i
The OLS estimator
{
ˆ
β
0
,
ˆ
β
1
}
minimizes the sum of the squared residuals
êçæêôæ
N
i
=
1
ˆ
u
2
i
. Derive it.
Hints:
1. Your optimization problem is: min
ˆ
β
0
,
ˆ
β
1
êçæêôæ
N
i
=
1
(
y
i
−
ˆ
β
0
−
ˆ
β
1
x
i
)
2
.
2. First solve for
ˆ
β
0
and then plug in your solution to solve for
ˆ
β
1
.
3. In the last section, we have seen that V
(
X
) =
E
(
X
−
E
[
X
])
2
=
E
X
2
−
E
[
X
]
2
. The same is true for the
sample variance
(where we replace expectations with averages), i.e.
1
N
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
)
2
=
1
N
êçæêôæ
N
i
=
1
(
x
2
i
)
−
¯
x
2
. The same is also true for covariances: cov
(
X
,
Y
) =
E
[(
X
−
E
[
X
])(
Y
−
E
[
Y
])] =
E
[
XY
]
−
E
[
X
]
E
[
Y
]
and therefore also for the sample covariance
1
N
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
) (
y
i
−
¯
y
) =
1
N
êçæêôæ
N
i
=
1
x
i
y
i
−
¯
x
¯
y
.
Solution:
We start from the function we need to minimize:
min
ˆ
β
0
,
ˆ
β
1
N
êçæêôæ
i
=
1
(
y
i
−
ˆ
β
0
−
ˆ
β
1
x
i
)
2
.
Finding the minimum of this function just means we take the derivative and set it equal to zero. Since we
have two unknown variables in this problem, we have two first-order conditions (FOCs):
∂
∂
ˆ
β
0
=
N
êçæêôæ
i
=
1
−
2
(
y
i
−
ˆ
β
0
−
ˆ
β
1
x
i
)
=
0
and,
∂
∂
ˆ
β
1
=
N
êçæêôæ
i
=
1
−
2
x
i
(
y
i
−
ˆ
β
0
−
ˆ
β
1
x
i
)
=
0
Let us start with the first FOC – it looks easier to work with. We can remove the
−
2. Then, we distribute the
sum (it is a linear operator!) and put the term involving
ˆ
β
0
on one side, keeping everything else on the other
side. This gives us:
N
êçæêôæ
i
=
1
ˆ
β
0
=
N
êçæêôæ
i
=
1
y
i
−
N
êçæêôæ
i
=
1
ˆ
β
1
x
i
Summing
N
times over the same number
c
is simply
N
·
c
. Summing up
N
values
y
i
is equal to
N
·
¯
y
i
. Using
this, we get
N
ˆ
β
0
=
N
¯
y
−
N
ˆ
β
1
¯
x
.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Dividing both sides by
N
gives us an intermediate solution:
ˆ
β
0
=
¯
y
−
ˆ
β
1
¯
x
.
For the next step, we solve for
ˆ
β
1
. We again get rid of the
−
2 and then multiply the
x
i
through, giving us
N
êçæêôæ
i
=
1
x
i
y
i
−
ˆ
β
0
x
i
−
ˆ
β
1
x
2
i
=
0
.
The next step is
important
. We substitute in our intermediate result for
ˆ
β
0
and get:
N
êçæêôæ
i
=
1
x
i
y
i
−
(
¯
y
−
ˆ
β
1
¯
x
)
x
i
−
ˆ
β
1
x
2
i
=
0
We now distribute the sum and get
N
êçæêôæ
i
=
1
x
i
y
i
−
¯
y
N
êçæêôæ
i
=
1
x
i
+
ˆ
β
1
¯
x
N
êçæêôæ
i
=
1
x
i
−
ˆ
β
1
N
êçæêôæ
i
=
1
x
2
i
=
0.
To simplify, we again use that
êçæêôæ
N
i
=
1
y
i
=
N
¯
y
and
êçæêôæ
N
i
=
1
x
i
=
N
¯
x
. We also put the
ˆ
β
1
on the left side. We then
get:
ˆ
β
1
=
êçæêôæ
N
i
=
1
x
i
y
i
−
N
¯
x
¯
y
êçæêôæ
N
i
=
1
x
2
i
−
N
¯
x
2
.
Almost there.
We now use the third hint to rewrite this expression.
As seen,
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
) (
y
i
−
¯
y
) =
êçæêôæ
N
i
=
1
x
i
y
i
−
N
¯
x
¯
y
. In addition,
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
)
2
=
êçæêôæ
N
i
=
1
x
2
i
−
N
¯
x
2
. We substitute these two properties in get
the solution for the OLS estimator:
ˆ
β
1
=
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
) (
y
i
−
¯
y
)
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
)
2
=
(
1/
N
)
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
) (
y
i
−
¯
y
)
(
1/
N
)
êçæêôæ
N
i
=
1
(
x
i
−
¯
x
)
2
=
sCov
(
x
,
y
)
sVar
(
x
)
.
In this expression, sCov
(
x
,
y
)
is the sample covariance between
x
and
y
, and sVar
(
x
)
is the sample variance
of
x
.
d) Interpret the expression you obtained: What will happen to
ˆ
β
1
if (1) the sample variance of
x
increases, (2)
the sample variance of
x
gets close to zero, (3) the sample covariance between
x
and
y
increases?
Solution:
Note that in the numerator of the estimator we have the sample covariance between
x
and
y
, and in the
denominator we have the sample variance of
x
.
(1) If the sample variance of
x
(which is necessarily non-negative) increases we are dividing by a larger
number and
ˆ
β
1
will then get closer to zero.
(2) If the sample variance of
x
decreases then the opposite happened.
(3) It is more useful to think about the absolute value of the sample covariance between
x
and
y
. If that
absolute value increses then the absolute value of the estimator goes up. The estimator increases in that case
if that covariance was positive, and turns out to be even more negative if that covariance was negative.
4
Question 3: Dummy variables regression
(Also refer to the R markdown document in the solutions folder).
a) Load the dataset
mexico.csv
into RStudio.
library("dplyr")
# Load dataset
getwd()
setwd("/your_directory")
# Load dataset
mexico_data <- read.csv("Mexico.csv")
b) What is the average monthly income (
inc_m
) for people who speak an indigenous language (
ind_lang=1
)?
What is the average income for those who don’t?
# Get means and difference in means
# Using tidyR syntax
mexico_data %>% group_by(ind_lang) %>% summarize(mean(inc_m))
# Using base R syntax
mean1 = mean(mexico_data$inc_m[mexico_data$ind_lang==1])
mean0 = mean(mexico_data$inc_m[mexico_data$ind_lang==0])
mean_diff = mean1 - mean0
mean_diff
c) Run and interpret the regression of monthly income
inc_m
on
ind_lang
using the
lm()
command.
ols_results = summary(lm(inc_m
˜
ind_lang, data=mexico_data))
ols_results
d) How do your answers in b) and c) relate to each other?
# Compare mean difference to OLS coefficient
ols_results$coefficients[2,1]
mean_diff
We see that the OLS regression is very useful to summarize the data. The constant/intercept gives the average
monthly income in the group where ind_lang is zero, and the coefficient on ind_lang gives the mean difference
between ind_lang==1 and ind_lang==0. This is always true if we run a regression with one dummy variable on
the right hand side. It is also true when we run a regression with multiple dummy variables, as long as we put in
enough regressors to describe all the categories present in the data.
5
Question 4: Quadratic terms in regressions: wages over the life-cycle
You are interested in the relation between age and yearly wages. What do you think this relationship may look
like? Will wages increase in age or become smaller as people become older?
a) Open the dataset
wages.csv
in RStudio. This dataset contains the following variables:
male
(an indicator
whether an individual has identified as male or not),
education
(the years of education of that individual),
videogames
(the number of hours spent playing video games per week during childhood),
wage_monthly
(monthly income),
wage_hourly
(hourly income),
age
(age in years), and
wage_yearly
(annual income).
The dataset is entirely fictional.
Solution Code:
read . csv ( f i l e ="wages . csv " )
b) Plot yearly wages (
wage_yearly
) against age. What do you observe? Would it make sense to run a simple
linear regression of wages against age?
Solution Code:
plot ( dataset$age ,
dataset$wage_yearly )
The plot clearly shows a non-linear relationship between age and wages. Wages seem to increase with age,
but this relationship becomes flatter and seems to become even negative after around 55 years.
Note: There are several ways to plot a relationship between two variables in
R
. The easiest is to use the
plot()
function. If you are interested in more evolved data visualization methods, you can look into the
ggplot2()
function or ask your GSI.
c) Use the
lm()
command to implement the linear regression model:
wage
i
=
α
+
β
0
age
i
and interpret the coefficients you get from the regression. Also interpret the significance of the coefficients
using the standard error, the t-statistic, and the p-value.
Solution Code:
linear_regression
= summary(lm( wage_yearly
~
age
,
data=dataset ) )
print ( linear_regression )
The constant term (or intercept) is around 46,200: For a
hypothetical
individual with an age of zero, we
would expect an annual income of 46,000USD based on this regression. In this case, the intercept alone does
not make much sense. We can do inference on the constant by comparing the coefficient to the standard error
to get the value of the t-statistic. The t-statistic is calculated as:
t
=
ˆ
β
−
β
0
se
(
ˆ
β
)
Let us test the null hypothesis that
α
=
0. We divide 46,193 by 223.32 and get
t
=
207. We see that the t-
statistic exceeds the value 2 in absolute value, and can therefore say that the intercept is significantly different
from zero at the 95% confidence level (in fact, the t-statistic is so large that the intercept is also significant at
the 99.999% confidence level). We therefore reject the null hypothesis. We can also do this much quicker by
looking at the p-value (
Pr
(
>
|
t
|
)
): In this case, it is less than 2
×
10
−
16
, so we can reject the null hypothesis.
We would fail to reject it if
p
>
0.05.
What about the coefficient on age? In this dataset, one additional life-year is associated with 143.6USD higher
annual income, on average. Dividing this estimate by its standard error (4.52), we get a t-statistic of 31.73,
and so the coefficient on age is also significantly different from zero at the 95% confidence level.
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Say your professor claims that "across hundreds of studies, we have seen that an additional year of age is
associated with 140USD of additional earnings". Can you reject this claim? To see this, let us calculate the
t-statistic for the null hypothesis
H
0
:
β
0
=
140.
t
=
ˆ
β
0
−
β
0
se
(
ˆ
β
0
)
=
143.6
−
140
4.5
≈
0.79
Since
|
0.79
|
<
|
2
|
, we fail to reject the null hypothesis.
d) Use the
lm()
command to implement the linear regression model:
wage
i
=
α
+
β
0
age
i
+
β
1
age
2
i
For an increase in age
i
by one year, what is the associated change in wages? How is this different than in the
simple linear regression model?
Hint:
Take the derivative of wages with respect to age.
Solution Code:
# We use
the
function
I ( )
to
create
quadratic
terms
and
i n t e r a c t i o n s .
quadratic_regression
= lm( wage_yearly
~
age
+
I ( age ^2)
,
data=dataset )
summary( quadratic_regression )
In a simple linear regression, the "effect" of age on wages is
constant
: No matter how old a person is, getting
one year older is on average associated with USD143 higher earnings.
This is different in a quadratic
regression (and in many other models, such as interaction models and logarithmic specifications).
To see this, we can just take a derivative of the regression function:
d
wage
i
d
age
i
=
β
0
+
2
β
1
age
i
The derivative changes with age!
Plugging in our coefficients, we get:
d
wage
i
d
age
i
=
962
+
2
·
(
−
8.6
)
·
age
i
=
962
−
17.2
·
age
i
One thing that is very neat is that we can also calculate the "tipping point": The point after which an addi-
tional year of age has a negative "effect" on earnings. Setting
d
wage
i
d
age
i
=
0
⇔
0
=
β
0
+
2
β
1
age
i
⇔
age
i
=
−
β
0
2
β
1
=
−
962
−
17.2
≈
56
We see that after an age of 56 years, peoples earnings decrease with every additional year. We can also see
this graphically.
e) Plot the marginal "effect" of age on wages using the
cplot
command.
Solution Code:
# We can
use
cplot
to
create
two
types
of
plots :
#
1 .
Predict
wages
by
age
( the
estimated
regression
function )
cplot ( quadratic_regression ,
" age " ,
what =
" pred " ,
main =
" Predicted
yearly
wages ,
by
age "
#
2 .
The
marginal
" e f f e c t "
of
age
( the
derivative
of
the
estimated
regression
function ) :
cplot ( quadratic_regression ,
" age " ,
what =
" e f f e c t " ,
main =
" Average
Marginal
E f f e c t
of
ag
7
Related Documents
Recommended textbooks for you


Exploring Economics
Economics
ISBN:9781544336329
Author:Robert L. Sexton
Publisher:SAGE Publications, Inc

Economics (MindTap Course List)
Economics
ISBN:9781337617383
Author:Roger A. Arnold
Publisher:Cengage Learning

Recommended textbooks for you
- Exploring EconomicsEconomicsISBN:9781544336329Author:Robert L. SextonPublisher:SAGE Publications, IncEconomics (MindTap Course List)EconomicsISBN:9781337617383Author:Roger A. ArnoldPublisher:Cengage Learning


Exploring Economics
Economics
ISBN:9781544336329
Author:Robert L. Sexton
Publisher:SAGE Publications, Inc

Economics (MindTap Course List)
Economics
ISBN:9781337617383
Author:Roger A. Arnold
Publisher:Cengage Learning
