Assignment 6
.pdf
keyboard_arrow_up
School
New York University *
*We aren’t endorsed by this school
Course
MISC
Subject
Economics
Date
May 31, 2024
Type
Pages
2
Uploaded by jeffjeff12345
Assignment 6: ECON-UA 266 - Intro to Econometrics
Sahar Parsa
Fall 2022
The solution to this assignment will be released on Friday October 28, 5pm. For the Data questions and
any other questions that relies on using R, report the output of your analysis in a “report style” pleasing to
read and add the codes you used to generate your results.
Question 1
Frisch-Waugh Theorem: In the least squares regression of
𝑌
on a constant and one covariate
𝑋
, to compute
the regression coeffcients on
𝑋
, we can first transform
𝑌
to deviations from the mean
𝑌
and, likewise,
transform
𝑋
to deviations from the column mean; second, regress the transformed
𝑌
on the transformed
𝑋
without a constant.
a. What is the OLS estimator if we only transform
𝑋
? What if we only transform
𝑌
?
b. Do we get the same result? Compare your answers and explain.
Question 2
Suppose you estimate the model:
𝑌 = 𝑋𝛽 + 𝜀
using OLS
a. Write down the projection matrix
P
in terms of
X
, and explain what
PY
is
b. Show the projection matrix
P
is symmetric and idempotent
To prove
P
is symmetric show
P’
=
P
To prove
P
is idempotent show
PP
=
P
Data Question I
From E7.1 (Stock and Watson) + extra questions:
Use the Birthweight_Smoking data set (download it on the Week 8 tab on NYU Brightspace)
a. To begin, run three regressions:
(1) Birthweight on Smoker
(2) Birthweight on Smoker, Alcohol, and Nprevist
(3) Birthweight on Smoker, Alcohol, Nprevist, and Unmarried
b. Application of Frisch-Waughn Theorem
1
Consider regression 2.
(1) run a model of Smoker on Alcohol and Nprevist - save the residuals.
(2) run a model of Birthweight on Alcohol and Nprevist - save the residuals.
(3) run a model of the residuals in 2 on the residuals in 1.
(4) Compare the coeffcients in a2 on Smoker to the coeffcient in b3. Are they the same?
(5) Explain the result in 4. Why? [This is related to Frisch-Waughn Theorem]
c. What is the value of the estimated effect of smoking on birth weight in each of the regressions?
d. Does the coeffcient on Smoker in regression (1) suffer from omitted variable bias? Explain.
e. Does the coeffcient on Smoker in regression (2) suffer from omitted variable bias? Explain.
f. Consider the coeffcient on Unmarried in regression (3).
1. Is the magnitude of the coeffcient large? Explain.
2. A family advocacy group notes that the large coeffcient suggests that public policies that encourage
marriage will lead, on average, to healthier babies.
Do you agree?
(Hint: Review the discussion of
control variables in Section 7.5. Discuss some of the various factors that Unmarried may be controlling
for and how this affects the interpretation of its coeffcient.)
Data Question II
Download the earnings and height data from NYU Classes Week 8 tab.
a. Run a regression: earnings as dependent variable and height as independent variable.
b. From part (a), if your answer is correct, you estimated a relatively large and statistically significant
effect of a worker’s height on his or her earnings. One explanation for this result is omitted variable
bias:
Height is correlated with an omitted factor that affects earnings.
For example, Case and Paxson (2008)
suggest that cognitive ability (or intelligence) is the omitted factor. The mechanism they describe is straight-
forward: Poor nutrition and other harmful environmental factors in utero and in early childhood have, on
average, deleterious effects on both cognitive and physical development. Cognitive ability affects earnings
later in life and thus is an omitted variable in the regression.
Suppose that the mechanism described above is correct. Explain how this leads to omitted variable bias in
the OLS regression of Earnings on Height.
c. Does the bias lead the estimated slope to be too large or too small?
d. Use the years of education variable (educ) to construct four indicator variables for whether a worker
has less than a high school diploma (
𝐿𝑇
_
𝐻𝑆
=1 if
??𝑢?
<12, 0 otherwise), a high school diploma
(
𝐻𝑆
=1 if
??𝑢?
= 12, 0 otherwise), some college (
𝑆𝑜𝑚?
_
𝐶𝑜𝑙
= 1 if 12 < educ <16, 0 otherwise), or
a bachelor’s degree or higher (
𝐶𝑜𝑙𝑙?𝑔?
=1 if educ
≥
16, 0 otherwise). Focusing first on women only,
run a regression of (1) Earnings on Height and (2) Earnings on Height, including
𝐿𝑇
_
𝐻𝑆, 𝐻𝑆,
and
𝑆𝑜𝑚?
_
𝐶𝑜𝑙
as control variables.
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
11. Which of the following statements is not true about multicollinearity?
(a) Perfect multicollinearity will prevent you from being able to estimate a linear regression model.
(b) Imperfect mulitcollinearity affects the individual t-statistics of the regressors.
(c) Multicollinearity is defined as a linear relationship between different independent variables.
(d) Imperfect multicollinearity affects model validity of the model.
(e) The least squares estimators are unbiased in the presence of imperfect multicollinearity.
arrow_forward
Hand written answer plzzzz
arrow_forward
Please answer fast
arrow_forward
As an auto insurance risk analyst, it is your job to research risk profiles for various types of drivers. One common area of concern for auto insurance companies
is the risk involved when offering policies to younger, less experienced drivers. The U.S. Department of Transportation recently conducted a study in which it
analyzed the relationship between 1) the number of fatal accidents per 1000 licenses, and 2) the percentage of licensed drivers under the age of 21 in a sample
of 42 cities.
Your first step in the analysis is to construct a scatterplot of the data.
FIGURE. SCATTERPLOT FOR U.S. DEPARTMENT OF TRANSPORATION PROBLEM
U.S. Department of Transportation
The Relationship Between Fatal Accident Frequency and Driver Age
4.5
3.5
3
2.5
1.5
1
0.5
6.
10
12
14
16
18
Percentage of drivers under age 21
Upon visual inspection, you determine that the variables do have a linear relationship. After a linear pattern has been established visually, you now proceed with
performing linear…
arrow_forward
18. A multiple regression model, K = a + bX + cY + dZ, is estimated regression software, which produces the following output: D. If X equals 50, Y equals 200, and Z equals 45, what value do you predict K will take?
arrow_forward
The table to the right contains price-demand and total cost data for the production of projectors, where p is the
wholesale price (in dollars) of a projector for an annual demand of x projectors and C is the total cost (in dollars)
of producing x projectors. Answer the following questions (A) - (D).
(A) Find a quadratic regression equation for the price-demand data, using x as the independent variable.
X
270
360
520
780
The fixed costs are $.
(Round to the nearest dollar as needed.)
ITTI
y =
(Type an expression using x as the variable. Use integers or decimals for any numbers in the expression. Round to two decimal places as needed.)
Use the linear regression equation found in the previous step to estimate the fixed costs and variable costs per projector.
The variable costs are $ per projector.
(Round to the nearest dollar as needed.)
(C) Find the break even points.
The break even points are
(Type ordered pairs. Use a comma to separate answers as needed. Round to the nearest integer as…
arrow_forward
(2)What would the consequence be for a regression model if theerrors were not homoscedastic?
arrow_forward
If a regression equation contains an irrelevant variable, the parameter estimates will be
Select one:
a. Consistent and unbiased but inefficient
b. Consistent and asymptotically efficient but biased
c. Consistent, unbiased and efficient.
d. Inconsistent
arrow_forward
Answer in typing
arrow_forward
2. Which of the following types of regressions will always have a binary outcome variable?
(A) Probit
(B) Difference-in-differences
(C) Regression discontinuity
(D) (A) and (B) will both have binary outcome variables
arrow_forward
10- What would be the consequences for the OLS estimator if
heteroskedasticity is present in a regression model but ignored?*
a. It will be biased
b. It will be inconsistent
c. It will be inefficient
d. All of (a), (b), and (c) will be true
arrow_forward
1. Consider a linear regression model y = XB + € with E(e) = 0. The bias of the ridge estimator of 3 obtained
by minimizing Q(B) = (y — Xß)¹ (y — Xß) + r(BTB), for some r > 0, is
——(X²X + r1)-¹8
1
(X¹X +rI)-¹3
r
-r(XTX+rI) ¹8
r(X¹X+r1) ¹3
arrow_forward
d/My courses / Faculty Of Economics & Administratiive Sciences / ECON309 / Finals / ECON 309 Fin
13. In the simple linear regression model, the regression slope
a.
O a. indicates by how many percent Y increases, given a one percent increase in X.
ut of
O b. represents the elasticity of Y on X.
uestion
Oc. when multiplied with the explanatory variable will give you the predicted Y.
O d. indicates by how many units Y increases, given a one unit increase in X.
nage
arrow_forward
q9-
Which property of linear regression is related with the size effects of individual units in a cross-section data?
Select one:
a.
Heteroskedasticity
b.
Endogeneity
c.
Autocorrelation
d.
Non-normality
Clear my choice
arrow_forward
Conduct a regression analysis
arrow_forward
A linear regression model for the revenue data for a company is R=27.1t+203 where R is total annual revenue and t is time since 1/31/02 in years.
12 months
12 months
12 months
Billions of
Dollars
Revenue
Gross Profit
12 months
12 months
ending 1/31/02ending 1/31/03ending 1/31/04 ending 1/31/05 ending 1/31/06
500-
201
49
236
54
255
60
500-
277
65
(A) Draw a scatter plot of the data and a graph of the model on the same axes.
OA.
B.
O.C.
KICB
Q
2
316
72
500-
oo
D.
500-
Q
G
arrow_forward
Economic
arrow_forward
What is the functional form of this equation? What are the advantages and limitations of this functional form?
Interpret precisely the coefficients of Px and Py in the regression.
arrow_forward
Please no written by hand solution
a) Suppose in a regression of weekly salaries on years of schooling for males(m) and females(f), the following results are obtained. Wm = 50Sm and Wf = 40Sf. where Wm (Wf) denotes weekly salary and Sm (Sf) denotes years of schooling for males and females respectively. 50 and 40 are the coefficients on schooling in the male and female regression respectively. On average, men have 12 years of schooling and women have 10 years of schooling. What is the average male-female wage differential? Is this a good estimate of discrimination? Explain why/why not. Using the information in the question, what would you propose as a better estimate of discrimination? State any assumptions that you use and explain your answer.
arrow_forward
1. You are interested the causal effect of X on Y, B1. Suppose that X, and X2 are uncorrelated.
You estimate B1 by regressing Y onto X1 (so that X2 is not included in the regression). Does
this estimator suffer from omitted variable bias due to the exclusion of X2?
(a) Yes
(b) No
(c) Maybe
2. Omitted variable bias violates which of the following assumptions:
(a) The conditional distribution of u, given X1i X2i, ...Xki has a mean of zero
(b) (Xi, X2i...Y;), i = 1, ., n are independently and identically distributed
(c) Heteroskedasticity
(d) Perfect multicollinearity
arrow_forward
A scatter plot shows data for the cost of a vintage car from a dealership (y in dollars) in the year a years since 1990. The
least squares regression line is given by y-25,000 + 500z.
Interpret the y intercept of the least squares regression line.
Select the correct answer below
O The predicted cost of a vintage car from a dealership in the year is 820.000
O The predicted cost of a vintage car from a dealershpin the year 1090 is 85,000.
O The predicted cost of a vintage car from a dealershp in the year 1990 is sse.
The yintercept should not be interpreted.
arrow_forward
QUESTION 1 [10 marks]
Given the following table, use the matrix method to derive the constant and slope parameters of the
sample regression function: Productivity index = f(Daily sleep hours). X and Y stand for the daily
sleep hours and productivity index respectively.
X (Daily sleep hours)
Y (Productivity index)
30
4
35
5
40
6
65
8.
80
arrow_forward
If the number of miles for a delivery is 5 & the number of minutes is 30, what is the estimated overhead cost associated with that delivery rounded to two decimals?
arrow_forward
True or False
For a linear regression model including only an intercept, the OLS estimator of that intercept is equal to the sample mean of the independent variable.
arrow_forward
Consider the following data on production volume (x) and total cost (y) for a particular manufacturing operation.
Excel File: data14-29.xlsx
Production Volume (units)
Total Cost ($)
400
4,000
450
5,000
550
5,400
600
5,900
700
6,400
750
7,000
The estimated regression equation is ŷ = 1246.67 +7.6x.
Use a = 0.05 to test whether the production volume is significantly related to the total cost.
Complete the ANOVA table. Enter all values with nearest whole number, except the F test statistic (to 2 decimals) and the p-value (to 4 decimals).
Source of
Degrees of
Sum
Mean
Variation
Freedom
of Squares
Square
p-value
Regression
Error
Total
What is your conclusion?
- Select your answer -
arrow_forward
2. Consider a two variable regression model, which satisfies all the Gauss Markov
assumptions except that the error variance is proportional to X² i.e.E(u?) = o²X?
Y₁ = B₁ + B₂X₁ + Ui
How would you obtain the best linear unbiased estimates from the above regression.
arrow_forward
Please provide me with the correct answer, along with the calculations, and do not use any AI tools
arrow_forward
The linear regression equation, Y= a + bX, was estimated. The following computer output was obtained:
DEPENDENT VARIABLE: Y
OBSERVATIONS: 15
VARIABLE
INTERCEPT
Multiple Choice
O
X
R-SQUARE
0.6010
PARAMETER
ESTIMATE
412.18
0.6358
F-RATIO
19.58
STANDARD
ERROR
102.54
0.1765
P-VALUE ON F
0.0001
T-RATIO P-VALUE
0.0015
0.0032
In the regression above, the parameter estimate of b (on the variable X) indicates that
4.02
3.60
X increases by 0.1765 units when Yincreases by one unit.
X increases by 0.6358 units when Y increases by one unit.
Y increases by 0.1765 units when X increases by one unit.
Y increases by 0.6358 units when X increases by one unit.
Y increases by 3.60 units when X increases by one unit.
arrow_forward
Question:
a) Calculate Regression Analysis of the data.
b) Calculate Profit of a shop for 45 units sales
arrow_forward
3
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Managerial Economics: Applications, Strategies an...
Economics
ISBN:9781305506381
Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris
Publisher:Cengage Learning
Related Questions
- 11. Which of the following statements is not true about multicollinearity? (a) Perfect multicollinearity will prevent you from being able to estimate a linear regression model. (b) Imperfect mulitcollinearity affects the individual t-statistics of the regressors. (c) Multicollinearity is defined as a linear relationship between different independent variables. (d) Imperfect multicollinearity affects model validity of the model. (e) The least squares estimators are unbiased in the presence of imperfect multicollinearity.arrow_forwardHand written answer plzzzzarrow_forwardPlease answer fastarrow_forward
- As an auto insurance risk analyst, it is your job to research risk profiles for various types of drivers. One common area of concern for auto insurance companies is the risk involved when offering policies to younger, less experienced drivers. The U.S. Department of Transportation recently conducted a study in which it analyzed the relationship between 1) the number of fatal accidents per 1000 licenses, and 2) the percentage of licensed drivers under the age of 21 in a sample of 42 cities. Your first step in the analysis is to construct a scatterplot of the data. FIGURE. SCATTERPLOT FOR U.S. DEPARTMENT OF TRANSPORATION PROBLEM U.S. Department of Transportation The Relationship Between Fatal Accident Frequency and Driver Age 4.5 3.5 3 2.5 1.5 1 0.5 6. 10 12 14 16 18 Percentage of drivers under age 21 Upon visual inspection, you determine that the variables do have a linear relationship. After a linear pattern has been established visually, you now proceed with performing linear…arrow_forward18. A multiple regression model, K = a + bX + cY + dZ, is estimated regression software, which produces the following output: D. If X equals 50, Y equals 200, and Z equals 45, what value do you predict K will take?arrow_forwardThe table to the right contains price-demand and total cost data for the production of projectors, where p is the wholesale price (in dollars) of a projector for an annual demand of x projectors and C is the total cost (in dollars) of producing x projectors. Answer the following questions (A) - (D). (A) Find a quadratic regression equation for the price-demand data, using x as the independent variable. X 270 360 520 780 The fixed costs are $. (Round to the nearest dollar as needed.) ITTI y = (Type an expression using x as the variable. Use integers or decimals for any numbers in the expression. Round to two decimal places as needed.) Use the linear regression equation found in the previous step to estimate the fixed costs and variable costs per projector. The variable costs are $ per projector. (Round to the nearest dollar as needed.) (C) Find the break even points. The break even points are (Type ordered pairs. Use a comma to separate answers as needed. Round to the nearest integer as…arrow_forward
- (2)What would the consequence be for a regression model if theerrors were not homoscedastic?arrow_forwardIf a regression equation contains an irrelevant variable, the parameter estimates will be Select one: a. Consistent and unbiased but inefficient b. Consistent and asymptotically efficient but biased c. Consistent, unbiased and efficient. d. Inconsistentarrow_forwardAnswer in typingarrow_forward
- 2. Which of the following types of regressions will always have a binary outcome variable? (A) Probit (B) Difference-in-differences (C) Regression discontinuity (D) (A) and (B) will both have binary outcome variablesarrow_forward10- What would be the consequences for the OLS estimator if heteroskedasticity is present in a regression model but ignored?* a. It will be biased b. It will be inconsistent c. It will be inefficient d. All of (a), (b), and (c) will be truearrow_forward1. Consider a linear regression model y = XB + € with E(e) = 0. The bias of the ridge estimator of 3 obtained by minimizing Q(B) = (y — Xß)¹ (y — Xß) + r(BTB), for some r > 0, is ——(X²X + r1)-¹8 1 (X¹X +rI)-¹3 r -r(XTX+rI) ¹8 r(X¹X+r1) ¹3arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Managerial Economics: Applications, Strategies an...EconomicsISBN:9781305506381Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. HarrisPublisher:Cengage Learning
Managerial Economics: Applications, Strategies an...
Economics
ISBN:9781305506381
Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris
Publisher:Cengage Learning