ps6_Fall 2023

pdf

School

Columbia University *

*We aren’t endorsed by this school

Course

UN3412

Subject

Economics

Date

Jan 9, 2024

Type

pdf

Pages

8

Uploaded by JudgeMaskYak32

Report
Department of Economics UN3412 Columbia University Fall 2023 Problem Set 6 Introduction to Econometrics (Erden - Section 1) ______________________________________________________________________________ Please make sure to select the page number for each question while you are uploading your solutions to Gradescope. Otherwise, it is tough to grade your answers, and you may lose points. Part I (12p) True, False, Uncertain with Explanation: (a) (3p) In a probit /logit model, the marginal effects of regressor j is defined as ∆𝑃(?=1|?) ∆? 𝑗 . The marginal effect is dependent on X, and therefore, it is difficult to place a useful interpretation on 𝛽 ̂ alone. (b) (3p) As a simple diagnostic one can estimate OLS in data with a binary response variable, construct a histogram of the residuals, and compare their distribution to a normal or logistic to decide whether to use probit or logit. (c) (3p) The major flaw of the linear probability model is that the actuals can only be 0 and 1, but the predicted are almost always different from that. (d) (3p) The problem of whether being a female has an effect on earnings could be analyzed using probit and logit estimation ( i.e. in a wage regression that includes a gender dummy, we use a probit model to analyze whether being a female has an effect on earnings). Part II 1. (15p) In recent years, public concern about “second - hand” smoke has led to smoking bans in many US workplaces. In some cases, smoking bans are determined by local ordinance that covers indoor workplaces over a certain size, sometimes with exemptions such as for bars or restaurants. In other cases, smoking bans are voluntarily adopted by individual businesses (these voluntary bans were the main type of ban during the time period of the data for this problem set). It has been conjectured that workplace smoking bans induce smokers to quit by reducing their opportunities to smoke. In this assignment you will estimate the effect of workplace smoking bans on smoking. To do this you will use data on a sample of 10,000 US indoor workers in 1991-1993. The data set contains information on whether individuals were, or were not, subject to a workplace smoking ban, whether or not the individuals smoked, and other individual characteristics. The data are in a STATA dataset called smoking.dta (described below). (a) (3p) Estimate the probability of smoking for (i) all workers (the full sample) (ii) workers affected by workplace smoking bans (iii) workers not affected by workplace smoking bans (b) (3p) What is the difference in the probability of smoking between workers affected by
a workplace smoking ban and workers not affected by a workplace smoking ban? Use a linear probability model to determine whether this difference is statistically significant. (c) (3p) Estimate a linear probability model with smoker as the dependent variable and the following regressors: smkban, female, age, age 2 , hsdrop, hsgrad, colsome, colgrad, black, and hispanic. Compare the estimated effect of a smoking ban from this regression with your answer from 1(b). Suggest a reason, based on the substance of this regression, explaining the change in the estimated effect of a smoking ban between 1(b) and 1(c). (d) (3p) Test the hypothesis that the coefficient on smkban is zero in the population version of the regression in 1(c), against the alternative that it is nonzero, at the 5% significance level. (e) (3p) Test the hypothesis that the probability of smoking does not depend on the level of education in the regression in 1(c). In words, describe the estimated relationship between education and smoking (holding the other regressors constant ). 2. (12p) Estimate a probit model using the same regressors as in 1(c). (a) (3p) Consider a hypothetical individual, Mr. A, who is white, non-hispanic, 20 years old, and a high school dropout. Suppose Mr. A is subject to a workplace smoking ban. Calculate the probability that Mr. A smokes. Do the calculation “by hand,” that is, calculate the “z - value” of the probit model using the estimated coefficients, then look up the probability in a standard normal table. Show your work. (b) (3p) Repeat the calculation in 2(a) but have STATA to do the calculation for you (see the STATA hints). (c) (3p) Test the hypothesis that the coefficient on smkban is zero in the population version of this probit regression, against the alternative that it is nonzero, at the 5% significance level. Compare your t-statistic and your conclusion with those of question 1(d) based on the linear probability model. (d) (3p) Test the hypothesis that the probability of smoking does not depend on the level of education in this probit model. Compare your results with those in question 1(e) using the linear probability model. 3. (6p) Read Table 1 and its notes, then fill in the missing entries (an electronic copy is available on the Web site). 4. (10p) Write a short (300 words max) essay that: Summarize the findings of Table 1 about the estimated effect on smoking of workplace smoking bans. Do the probit and linear probability model results differ (qualitatively or quantitatively) and, if they do, which results make more sense? Are the estimated effects large in a real-world sense? (this part should be brief). Carefully discuss remaining threats to the internal validity of these estimates, focusing on what you consider to be the most important such threats (this discussion should constitute the bulk of your essay).
Table 1 Estimated Effect on the Probability of Smoking of a Workplace Smoking Ban on Two Hypothetical Workers Mr. A: male, white, non-hispanic, 20 years old, high school dropout Ms. B: female, black, 40 years old, college graduate Probit Model (1) Linear Prob. Model (2) Estimated coefficient on smkban (standard error in parentheses) Predicted probabilities of smoking for Mr. A: (i) with workplace ban (ii) without workplace smoking ban Difference, (i) (ii) Predicted probabilities of smoking for Ms. B: (iii) with workplace ban (iv) without workplace smoking ban Difference, (iii) (iv) Notes: The entry in the first row is the estimated coefficient on smkban in the probit model (column (1)) and the linear probability model (column (2)), with standard errors in parentheses; both regressions include the following control variables: female , age , age 2 , hsdrop , hsgrad , colsome , colgrad , black , and hispanic . The entries in the remaining rows are predicted probabilities of smoking for the indicated hypothetical individuals, and differences in those predicted probabilities.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5. (24p) What is the effect of children on the labor force participation of mothers? In the data set MROZ.dta, 428 of the 753 women in the sample report being in the labor force at some point during the year. Variables in the data file MROZ.dta are: inlf “in the labor force” =1 if women reports working for a wage outside the home at some point during the year, and zero otherwise educ Years of education exper Past years of labor market experience expersq Experience squared age Age in years kidslt6 Number of kids less than 6 years old kidsge6 Number of kids between 6 and 18 years of age nwifeinc Other sources of income including husband’s earnings (in $1000s) (a) (3p) Estimate a linear probability model by regressing inlf on nwifeinc, educ, exper, expersq, age, kidslt6 and kidsge6. (b) (3p) Estimate a probit model using the same control variables in part (a). (c) (3p) Estimate a logit model using the same control variables in part (a). (d) (3p) Report your results from 5(a), 5(b) and 5(c) on Table 2. (e) (3p) Test the hypothesis that the coefficient on kidslt6 is zero in the population version of the regression in 5(a), against the alternative that it is nonzero, at the 5% significance level. (f) (3p) Test the hypothesis that the probability of being in the labor force does not depend on the amount of work experience in the regression in 5(a). In words, describe the estimated relationship between working and experience (holding the other regressors constant). (g) (3p) Test the hypothesis that the coefficient on kidslt6 is zero in the population version of the regression in 5(b), against the alternative that it is nonzero, at the 5% significance level. (h) (3p) Test the hypothesis that the probability of being in the labor force does not depend on the amount of work experience in the regression in 5(b). In words, describe the estimated relationship between working and experience (holding the other regressors constant). 6. (5p) Read Table 3 and its notes, then fill in the missing entries (an electronic copy is available on Courseworks). (10p) Write a short (300 words max) essay: Summarize the findings of Table 3 about the estimated effect of having a newborn on being in the labor force. Do the probit, logit and linear probability model results differ (qualitatively or quantitatively) and, if they do, which results make more sense? Are the
estimated effects large in a real-world sense? (This part should be brief.) Carefully discuss remaining threats to the internal validity of these estimates, focusing on what you consider to be the most important of such threats (this discussion should constitute the bulk of your essay). Table 2 LPM, Probit and Logit results of PS7 question 5 Dependent Variable: inlf Independent Variables: LPM (OLS) Probit (MLE) Logit (MLE) nwifeinc ( ) ( ) ( ) educ ( ) ( ) ( ) exper ( ) ( ) ( ) exper2 ( ) ( ) ( ) age ( ) ( ) ( ) kidslt6 ( ) ( ) ( ) kidsge6 ( ) ( ) ( ) constant ( ) ( ) ( ) Percent Correctly Predicted Log-Likelihood Value Pseudo R-Squared
Table 3 Estimated Effect on the Probability of a Newborn on the Probability of Two Hypothetical Women Ms. A: 25 years old, high school graduate with 3 years of work experience in the past, she is the mother of a 3-year-old child and has no other income. Ms. B: 40 years old, college graduate with 10 years of work experience in the past, she has 2 kids ages 5 and 8 and has no other income. LPM Model (1) Probit Model (2) Logit Model (3) Estimated coefficient on kidslt6 (standard error in parentheses) Predicted probabilities of Ms. A being in the labor force: (i) with a newborn baby (ii) without a newborn baby Difference, (i) (ii) Predicted probabilities of Ms. B being in the labor force: (iii) with a newborn baby (iv) without a newborn baby Difference, (iii) (iv) Notes: The entry in the first row is the estimated coefficient on kidslt6 in the logit model (column(3)), the probit model (column (2)) and the linear probability model (column (1)), where the dependent variable is inlf , with standard errors in parentheses; each regression model includes the following control variables: nwifenc , educ, exper, expersq, age , kidsge6 . The entries in the remaining rows are predicted probabilities
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
of being in the labor force for the indicated hypothetical individuals, and differences in those predicted probabilities. 7. (6p) Consider the regression of stock return (? 𝑖 ) on a constant and market share (? 𝑖 ) : ? 𝑖 = 𝛽 0 + 𝛽 1 ? 1 + 𝑢 𝑖 Assume that (𝑢 𝑖 , ? 𝑖 ) 𝑖𝑖𝑑 and, 𝑢 𝑖 ~𝑁(0,1) . We do not observe ? 𝑖 , instead we only observe if it is positive or negative. Construct the variable ? 𝑖 , which takes value 1 if ? 𝑖 ≥ 0 , and value 0 if ? 𝑖 < 0 (a) (3p) Find 𝐸[? 𝑖 ] (b) (3p) Find the joint likelihood function of ? 1 … … ? 𝑛 (conditional on ? 1 … … ? 𝑛 ).
The following questions will not be graded, they are for you to practice and will be discussed at recitation: 8. SW Exercise 11.1 9. SW Exercise 11.6 10. SW Exercise 11.7 11. SW Exercise 11.9 12. SW Empirical Exercise 11.3 In this exercise you will study health insurance, health status, and employment using a random sample of more than 8000 workers in the United States surveyed in 1996. The data are available on the textbook website in the file Insurance. A detailed description is given in Insurance_Description, available on the website. a. Are the self-employed less likely to have health insurance than wage earners? If so, is the difference large in a real-world sense? Is the difference statistically significant? b. The self-employed might systematically differ from wage earners in their age, education and so forth. After you control for these other factors, are the self-employed less likely to have health insurance? c. How does health insurance status vary with age? Are older workers more likely to have health insurance? Less likely? d. Is the effect of self-employment on insurance status different for older workers than it is for younger workers? e. It has been argued that the self-employed are less likely to be insured, but despite this, they are just as healthy as wage-earners. Is this right? Does the argument hold for young workers? For older workers? Are there potential tow-way causality problems that might undermine the internal validity of this kind of statistical analysis?