ps1_Fall 2023
pdf
keyboard_arrow_up
School
Columbia University *
*We aren’t endorsed by this school
Course
UN3412
Subject
Economics
Date
Jan 9, 2024
Type
Pages
7
Uploaded by JudgeMaskYak32
1
Department of Economics
UN3412
Columbia University
Fall 2023
Problem Set 1
Introduction to Econometrics
(Erden - Section 1)
______________________________________________________________________________
Please make sure to select the page number for each question while you are uploading your
solutions to Gradescope. Otherwise, it is tough to grade your answers, and you may lose points.
“Calculator” was once a job description.
This p
roblem set gives you an opportunity to do some
calculations on the relation between smoking and lung cancer, using a (very) small sample of
five countries.
The purpose of this exercise is to illustrate the mechanics of ordinary least
squares (OLS) regression.
Y
ou will calculate the regression “by hand” using formulas from
class and the textbook.
For these calculations, you may relive history and use long
multiplication, long division, and tables of square roots and logarithms; or you may use an
electronic calculator or a spreadsheet.
The data are summarized in the following table.
The variables are per capita cigarette
consumption in 1930 (the independent variable, “
X
”) and t
he death rate from lung cancer in 1950
(the dependent variable, “
Y
”).
The cancer rates are shown for a later time period because it takes
time for lung cancer to develop and be diagnosed.
Observation #
Country
Cigarettes consumed
per capita in 1930 (
X
)
Lung cancer deaths per
million people in 1950 (
Y
)
1
Switzerland
530
250
2
Finland
1115
350
3
Great Britain
1145
465
4
Canada
510
150
5
Denmark
380
165
Source: Edward R. Tufte,
Data Analysis for Politics and Management
, Table 3.3.
1.
(21p)
Use a calculator, a spreadsheet, or “by hand” methods to compute the
following: refer
to the textbook for the necessary formulas.
(
Note
:
if you use a spreadsheet, attach a printout)
(a)
(3p) The sample means of
X
and
Y
,
X
and
Y
.
(b)
(3p) The standard deviations of
X
and
Y
,
s
X
and
s
Y
.
(c)
(3p) The correlation coefficient,
r
, between
X
and
Y.
(d)
(3p)
𝛽
̂
1
, the OLS estimated slope coefficient from the regression
Y
i
=
0
+
1
X
i
+
u
i
(e)
(3p)
𝛽
̂
0
, the OLS estimated intercept term from the same regression.
(f)
(3p)
ˆ
i
Y
,
i
= 1,…,
n
, the predicted values for each country from the regression
(g)
(3p)
ˆ
i
u
, the OLS residual for each country.
2
2.
(4p) On graph paper or using a spreadsheet, graph the scatterplot of the five data points and
the regression line.
Be sure to label the axes, clearly show the data points.
3.
(15p) You are hired by the governor to study whether a tax on liquor has decreased average
liquor consumption in New York. From a random sample of n individuals in New York, you
obtain each person’s liquor consumption
both for the year before and for the year after the
introduction of the tax. From this data, you compute Y
i
="change in liquor consumption" for
individual i = 1,…. n. Y
i
is measured in ounces so if, for example, Y
i
= 10, then individual i
increased his liquor consumption by 10 ounces. Let the parameters μ
y
and σ
y
2
of Y denote the
population mean and variance of Y.
(a)
(3p) You are interested in testing the hypothesis H
0
that there was no change in liquor
consumption due to the tax. State this formally in terms of the population parameters.
(b)
(3p) The alternative, H
1
, is that there was a decline in liquor consumption; state the
alternative in terms of the population parameters.
(c)
(3p) Suppose that your sample size is n = 900 and you obtain estimates
𝑌
̅
=
-
32.8 and
𝑠
𝑌
= 466.4. Report the t-statistic for testing H
0
against H
1
. Obtain the p-value for the test
[use Table 1 in Stock and Watson, p. 749-750]. Do you reject at a 5% level? At 1%
level?
(d)
(3p) Would you say that the estimated fall in consumption is large in magnitude?
Comment on the practical versus statistical significance of this estimate.
(e)
(3p) In your analysis, what has been implicitly assumed about other determinants of
liquor consumption over the two-year period in order to infer causality from the tax
change to liquor consumption?
4.
(6p) Let Y be a Bernoulli random variable with success probability Pr(Y=1) = p, and let
n
Y
Y
,...,
1
be i.i.d. draws from this distribution.
Let
p
ˆ
be the fraction of successes (1s) in this
sample.
(a)
(2p) Show that
p
ˆ
=
Y
(b)
(2p) Show that
p
ˆ
is an unbiased estimator of p.
(c)
(2p) Show that var(
p
ˆ
) = p(1-p)/n
5.
(8p) Let
Y
1
,
Y
2
,
Y
3,
Y
4
, be independently, identically distributed random variables from a
population with mean
and variance
2
. Let
Y
= (1/4) (
Y
1
+
Y
2
+
Y
3
+
Y
4
) denote the average of
these four random variables.
(a)
(2p) What are the expected value and variance of
Y
in terms of
and
2
?
(b)
(2p) Now, consider a different estimator of
:
Ỹ
=(1/8)
Y
1
+(1/8)
Y
2
,+(1/4)
Y
3
+(1/2)
Y
4
.
This is an example of a
weighted
average of the
Y
i
.
’s.
Show that
Ỹ
is also an unbiased
estimator of
. Find the variance of
Ỹ
.
(c)
(2p) Based on your answer to parts (a) and (b), which estimator of
do you prefer,
Y
or
Ỹ
?
(d)
(2p) Suppose
Y
1
,
Y
2
,
Y
3,
Y
4
follow a Normal distribution with mean
=
5 and variance
2
=3. What is the distribution of
Y
and
Ỹ
?
3
6.
(6p) Suppose at Columbia University, grade point average (GPA) and SAT scores are related
by the conditional expectation E(GPA|SAT) = .90 + .001 SAT.
(a)
(2p) Find the expected GPA when SAT = 1600.
(b)
(2p) Find E(GPA|SAT=2200)
(c)
(2p) If the average SAT in the university is 2000, what is the average GPA?
7.
(12p) Suppose that X is randomly drawn from a uniform distribution on the interval [0, 3].
Also, suppose that after the value X = x has been observed (0 < x < 3), Y is randomly drawn
from a uniform distribution on the interval [x, 3].
(a)
(3p) For any given value of x (0 < x < 3), obtain E[Y |X = x].
(b)
(3p) In view of part (i), obtain E[Y|X].
(c)
(3p) What is the difference between E[Y|X = x] and E[Y |X]?
(d)
(3p) Obtain E[Y].
8.
(18p) Adult males are taller, on average, than adult females. Visiting two recent American
Youth Soccer Organization (AYSO) under-12-years-old (U12) soccer matches on a Saturday,
you do not observe an obvious difference in the height of boys and girls of that age. You
suggest to your little sister that she collect data on height and gender of children in 4th to 6th
grades as part of her science project. The accompanying table shows her findings.
Height of Young Boys and Girls, Grades 4-6, in inches
Boys
Girls
𝒀
̅
𝑩?𝒚?
?
𝑩?𝒚?
?
𝑩?𝒚?
𝒀
̅
𝑮𝒊?𝒍?
?
𝑮𝒊?𝒍?
?
𝑮𝒊?𝒍?
57.8
3.9
55
58.4
4.2
57
Where
𝒀
̅
𝑩?𝒚?
is
the sample average height
for boys
,
?
𝑩?𝒚?
is the number of boys in the
sample
,
?
𝟐
𝑩?𝒚?
is the sample variance of height of boys.
(a)
(3p) Let your null hypothesis be that there is no difference in the height of females and
males at this age level. Specify the alternative hypothesis.
(b)
(3p) What is the unbiased estimate of the difference in height between boys and girls?
Provide a formula and check the unbiasedness. Calculate the value of this estimate for the
given sample.
(c)
(3p) Derive the formula for the variance of the estimate from (b). Calculate the estimate
of the variance for the given sample.
(d)
(3p) Create a statistic for testing the hypothesis in (a) using the Central Limit Theorem
and the Law of Large Numbers.
(e)
(3p) Calculate the t-statistic for comparing the two means. Is the difference statistically
significant at the 1% level? Which critical value did you use? Why would this number be
smaller if you had assumed a one-sided alternative hypothesis? What is the intuition
behind this?
(f)
(3p) Generate a 95% confidence interval for the difference in height
.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
9.
(10p) Use the following data to show Law of Iterated Expectations.
(
i.e.
Show that
𝐸(𝑀) = 𝐸[𝐸(𝑀|𝐴)]
)
Following questions will not be graded, they are for you to practice and will be discussed at
the recitation:
10.
[Practice question, not graded] SW 2.3
Rain (X=0)
No Rain (X=1)
Total
Long Commute (Y=0)
0.15
0.07
0.22
Short Commute (Y=1)
0.15
0.63
0.78
Total
0.30
.70
1.00
Using the random variables X and Y from Table 2.2 (given above), consider two new random
variables W = 3 + 6X and V = 20
–
7Y.
Compute:
(a)
E(W) and E(V).
(b)
σ²
W
and σ²
V.
(c)
σ
W,V
and Corr(W,V).
5
11.
[Practice question, not graded] SW 2.6
The following table gives the joint probability distribution between employment status and
college graduation among those either employed or looking for work (unemployed) in the
working age US population, based on the 1990 US Census.
Unemployed (Y=0)
Employed (Y=1)
Total
Non-college grads (X=0)
0.045
0.709
0.754
College grads (X=1)
0.005
0.241
0.246
Total
0.050
0.950
1.000
(a)
Compute E(Y).
(b)
The unemployment rate is the fraction of the labor force that is unemployed.
Show that
the unemployment rate is given by 1-E(Y).
(c)
Calculate the E(Y|X=1) and E(Y|X=0).
(d)
Calculate the unemployment rate for (i) college graduates and (ii) non-college graduates.
(e)
A randomly selected member of this population reports being unemployed.
What is the
probability that this worker is a college graduate? A non-college graduate?
(f)
Are educational achievement and employment status independent? Explain.
12.
[Practice question, not graded] SW 2.14 [Hint: Use SW Appendix Table 1.]
In a population E[Y] = 100 and Var(Y) = 43. Use the central limit theorem to answer the
following questions:
(a)
In a random sample of size n = 100, find Pr(
Y
≤
101)
(b)
In a random sample of size n = 165, find Pr(
Y
>98)
(c)
In a random sample of size n = 64, find Pr(101
≤
Y
≤
103)
13.
[Practice question, not graded] SW 3.12
To investigate possible gender discrimination in a firm, a sample of 100 men and 64 women
with similar job descriptions are selected at random.
A summary of the resulting monthly
salaries are:
Avg. Salary (
Y
)
Stand Dev (of Y)
n
6
Men
$3100
$200
100
Women
$2900
$320
64
(a)
What do these data suggest about wage differences in the firm? Do they represent
statistically significant evidence that wages of men and women are different? (To answer
this question, first state the null and alternative hypothesis; second, compute the relevant
t-statistic; and finally, use the p-value to answer the equation.)
(b)
Do these data suggest that the firm is guilty of gender discrimination in its compensation
politics? Explain.
14.
[Practice question, not graded] SW 2.10 [Hint: Use SW Appendix Table 1.]
Compute the following probabilities:
(a)
If Y is distributed N(1,4), find Pr(Y
≤
3).
(b)
If Y is distributed N(3,9), find Pr(Y>0).
(c)
If Y is distribut
ed N(50,25), find Pr(40≤Y≤52).
(d)
If Y is distributed N(5,2), find Pr(6
≤
Y
≤
8)
15.
[Practice question, not graded]
SW 3.3
In a survey of 400 likely voters, 215 responded that they would vote for the incumbent and
185 responded that they would vote for the challenger.
Let p denote the fraction of all likely
voters that preferred the incumbent at the time of the survey, and let
p
ˆ
be the fraction of
survey respondents that preferred the incumbent.
(a)
Use the survey results to estimate p.
(b)
Use the estimator of the variance of
p
ˆ
,
p
ˆ
(1 -
p
ˆ
)/n to calculate the standard error of
your estimator.
(c)
What is the p-value for the test H0: p=0.5 vs. H1:p
≠0.5?
(d)
What is the p-value for the test H0: p=0.5 vs. H1:p>0.5?
(e)
Why do the results from (c) and (d) differ?
(f)
Did the survey contain statistically significant evidence that the incumbent was ahead of
the challenger at the time of the survey? Explain.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7
16.
[Practice question, not graded] Consider two events A and B with Pr(A) = 0.5 and Pr(B) =
0.9. Determine the maximum and minimum values of Pr(A
∪
B).
17.
[Practice question, not graded] Assume that events A and B
c
are
independent. That is, Pr(A ∩
B
c
) = Pr(A)Pr(B
c
). Are events A and B also independent?
18.
[Practice question, not graded] Let X and Y denote two random variables.
(i) Show that if at least one of X or Y has expectation equal to zero, then
cov(X, Y) = E[XY].
19.
[Practice question, not graded]
The following admission data are for the graduate program in
the six largest majors at the University of California at Berkeley for the fall 1973 quarter.
(a)
What is the overall probability of being admitted for males? For females? What is the
standard deviation for males and for females?
(b)
How would you write down the null and alternative hypotheses in order to test that the
overall probability of admission is higher for men than for women?
(c)
Conduct a t-test of the hypothesis from part (b) and report the p-value.
(d)
Is the result significant at the 5% level? Does it provide evidence of discrimination?
(e)
Committee chairpersons claim they are more likely to admit women than men. Is this
claim true? Compute acceptance rates for men and women by graduate program.
(f)
Do these data suggest that the university is guilty of gender discrimination in its
admission policy? Explain briefly.
Recommended textbooks for you






Essentials of Economics (MindTap Course List)
Economics
ISBN:9781337091992
Author:N. Gregory Mankiw
Publisher:Cengage Learning
Recommended textbooks for you
- Essentials of Economics (MindTap Course List)EconomicsISBN:9781337091992Author:N. Gregory MankiwPublisher:Cengage Learning






Essentials of Economics (MindTap Course List)
Economics
ISBN:9781337091992
Author:N. Gregory Mankiw
Publisher:Cengage Learning