ps1_Fall 2023

pdf

School

Columbia University *

*We aren’t endorsed by this school

Course

UN3412

Subject

Economics

Date

Jan 9, 2024

Type

pdf

Pages

7

Uploaded by JudgeMaskYak32

Report
1 Department of Economics UN3412 Columbia University Fall 2023 Problem Set 1 Introduction to Econometrics (Erden - Section 1) ______________________________________________________________________________ Please make sure to select the page number for each question while you are uploading your solutions to Gradescope. Otherwise, it is tough to grade your answers, and you may lose points. “Calculator” was once a job description. This p roblem set gives you an opportunity to do some calculations on the relation between smoking and lung cancer, using a (very) small sample of five countries. The purpose of this exercise is to illustrate the mechanics of ordinary least squares (OLS) regression. Y ou will calculate the regression “by hand” using formulas from class and the textbook. For these calculations, you may relive history and use long multiplication, long division, and tables of square roots and logarithms; or you may use an electronic calculator or a spreadsheet. The data are summarized in the following table. The variables are per capita cigarette consumption in 1930 (the independent variable, “ X ”) and t he death rate from lung cancer in 1950 (the dependent variable, “ Y ”). The cancer rates are shown for a later time period because it takes time for lung cancer to develop and be diagnosed. Observation # Country Cigarettes consumed per capita in 1930 ( X ) Lung cancer deaths per million people in 1950 ( Y ) 1 Switzerland 530 250 2 Finland 1115 350 3 Great Britain 1145 465 4 Canada 510 150 5 Denmark 380 165 Source: Edward R. Tufte, Data Analysis for Politics and Management , Table 3.3. 1. (21p) Use a calculator, a spreadsheet, or “by hand” methods to compute the following: refer to the textbook for the necessary formulas. ( Note : if you use a spreadsheet, attach a printout) (a) (3p) The sample means of X and Y , X and Y . (b) (3p) The standard deviations of X and Y , s X and s Y . (c) (3p) The correlation coefficient, r , between X and Y. (d) (3p) 𝛽 ̂ 1 , the OLS estimated slope coefficient from the regression Y i = 0 + 1 X i + u i (e) (3p) 𝛽 ̂ 0 , the OLS estimated intercept term from the same regression. (f) (3p) ˆ i Y , i = 1,…, n , the predicted values for each country from the regression (g) (3p) ˆ i u , the OLS residual for each country.
2 2. (4p) On graph paper or using a spreadsheet, graph the scatterplot of the five data points and the regression line. Be sure to label the axes, clearly show the data points. 3. (15p) You are hired by the governor to study whether a tax on liquor has decreased average liquor consumption in New York. From a random sample of n individuals in New York, you obtain each person’s liquor consumption both for the year before and for the year after the introduction of the tax. From this data, you compute Y i ="change in liquor consumption" for individual i = 1,…. n. Y i is measured in ounces so if, for example, Y i = 10, then individual i increased his liquor consumption by 10 ounces. Let the parameters μ y and σ y 2 of Y denote the population mean and variance of Y. (a) (3p) You are interested in testing the hypothesis H 0 that there was no change in liquor consumption due to the tax. State this formally in terms of the population parameters. (b) (3p) The alternative, H 1 , is that there was a decline in liquor consumption; state the alternative in terms of the population parameters. (c) (3p) Suppose that your sample size is n = 900 and you obtain estimates 𝑌 ̅ = - 32.8 and 𝑠 𝑌 = 466.4. Report the t-statistic for testing H 0 against H 1 . Obtain the p-value for the test [use Table 1 in Stock and Watson, p. 749-750]. Do you reject at a 5% level? At 1% level? (d) (3p) Would you say that the estimated fall in consumption is large in magnitude? Comment on the practical versus statistical significance of this estimate. (e) (3p) In your analysis, what has been implicitly assumed about other determinants of liquor consumption over the two-year period in order to infer causality from the tax change to liquor consumption? 4. (6p) Let Y be a Bernoulli random variable with success probability Pr(Y=1) = p, and let n Y Y ,..., 1 be i.i.d. draws from this distribution. Let p ˆ be the fraction of successes (1s) in this sample. (a) (2p) Show that p ˆ = Y (b) (2p) Show that p ˆ is an unbiased estimator of p. (c) (2p) Show that var( p ˆ ) = p(1-p)/n 5. (8p) Let Y 1 , Y 2 , Y 3, Y 4 , be independently, identically distributed random variables from a population with mean and variance 2 . Let Y = (1/4) ( Y 1 + Y 2 + Y 3 + Y 4 ) denote the average of these four random variables. (a) (2p) What are the expected value and variance of Y in terms of and 2 ? (b) (2p) Now, consider a different estimator of : =(1/8) Y 1 +(1/8) Y 2 ,+(1/4) Y 3 +(1/2) Y 4 . This is an example of a weighted average of the Y i . ’s. Show that is also an unbiased estimator of . Find the variance of . (c) (2p) Based on your answer to parts (a) and (b), which estimator of do you prefer, Y or ? (d) (2p) Suppose Y 1 , Y 2 , Y 3, Y 4 follow a Normal distribution with mean = 5 and variance 2 =3. What is the distribution of Y and ?
3 6. (6p) Suppose at Columbia University, grade point average (GPA) and SAT scores are related by the conditional expectation E(GPA|SAT) = .90 + .001 SAT. (a) (2p) Find the expected GPA when SAT = 1600. (b) (2p) Find E(GPA|SAT=2200) (c) (2p) If the average SAT in the university is 2000, what is the average GPA? 7. (12p) Suppose that X is randomly drawn from a uniform distribution on the interval [0, 3]. Also, suppose that after the value X = x has been observed (0 < x < 3), Y is randomly drawn from a uniform distribution on the interval [x, 3]. (a) (3p) For any given value of x (0 < x < 3), obtain E[Y |X = x]. (b) (3p) In view of part (i), obtain E[Y|X]. (c) (3p) What is the difference between E[Y|X = x] and E[Y |X]? (d) (3p) Obtain E[Y]. 8. (18p) Adult males are taller, on average, than adult females. Visiting two recent American Youth Soccer Organization (AYSO) under-12-years-old (U12) soccer matches on a Saturday, you do not observe an obvious difference in the height of boys and girls of that age. You suggest to your little sister that she collect data on height and gender of children in 4th to 6th grades as part of her science project. The accompanying table shows her findings. Height of Young Boys and Girls, Grades 4-6, in inches Boys Girls 𝒀 ̅ 𝑩?𝒚? ? 𝑩?𝒚? ? 𝑩?𝒚? 𝒀 ̅ 𝑮𝒊?𝒍? ? 𝑮𝒊?𝒍? ? 𝑮𝒊?𝒍? 57.8 3.9 55 58.4 4.2 57 Where 𝒀 ̅ 𝑩?𝒚? is the sample average height for boys , ? 𝑩?𝒚? is the number of boys in the sample , ? 𝟐 𝑩?𝒚? is the sample variance of height of boys. (a) (3p) Let your null hypothesis be that there is no difference in the height of females and males at this age level. Specify the alternative hypothesis. (b) (3p) What is the unbiased estimate of the difference in height between boys and girls? Provide a formula and check the unbiasedness. Calculate the value of this estimate for the given sample. (c) (3p) Derive the formula for the variance of the estimate from (b). Calculate the estimate of the variance for the given sample. (d) (3p) Create a statistic for testing the hypothesis in (a) using the Central Limit Theorem and the Law of Large Numbers. (e) (3p) Calculate the t-statistic for comparing the two means. Is the difference statistically significant at the 1% level? Which critical value did you use? Why would this number be smaller if you had assumed a one-sided alternative hypothesis? What is the intuition behind this? (f) (3p) Generate a 95% confidence interval for the difference in height .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 9. (10p) Use the following data to show Law of Iterated Expectations. ( i.e. Show that 𝐸(𝑀) = 𝐸[𝐸(𝑀|𝐴)] ) Following questions will not be graded, they are for you to practice and will be discussed at the recitation: 10. [Practice question, not graded] SW 2.3 Rain (X=0) No Rain (X=1) Total Long Commute (Y=0) 0.15 0.07 0.22 Short Commute (Y=1) 0.15 0.63 0.78 Total 0.30 .70 1.00 Using the random variables X and Y from Table 2.2 (given above), consider two new random variables W = 3 + 6X and V = 20 7Y. Compute: (a) E(W) and E(V). (b) σ² W and σ² V. (c) σ W,V and Corr(W,V).
5 11. [Practice question, not graded] SW 2.6 The following table gives the joint probability distribution between employment status and college graduation among those either employed or looking for work (unemployed) in the working age US population, based on the 1990 US Census. Unemployed (Y=0) Employed (Y=1) Total Non-college grads (X=0) 0.045 0.709 0.754 College grads (X=1) 0.005 0.241 0.246 Total 0.050 0.950 1.000 (a) Compute E(Y). (b) The unemployment rate is the fraction of the labor force that is unemployed. Show that the unemployment rate is given by 1-E(Y). (c) Calculate the E(Y|X=1) and E(Y|X=0). (d) Calculate the unemployment rate for (i) college graduates and (ii) non-college graduates. (e) A randomly selected member of this population reports being unemployed. What is the probability that this worker is a college graduate? A non-college graduate? (f) Are educational achievement and employment status independent? Explain. 12. [Practice question, not graded] SW 2.14 [Hint: Use SW Appendix Table 1.] In a population E[Y] = 100 and Var(Y) = 43. Use the central limit theorem to answer the following questions: (a) In a random sample of size n = 100, find Pr( Y 101) (b) In a random sample of size n = 165, find Pr( Y >98) (c) In a random sample of size n = 64, find Pr(101 Y 103) 13. [Practice question, not graded] SW 3.12 To investigate possible gender discrimination in a firm, a sample of 100 men and 64 women with similar job descriptions are selected at random. A summary of the resulting monthly salaries are: Avg. Salary ( Y ) Stand Dev (of Y) n
6 Men $3100 $200 100 Women $2900 $320 64 (a) What do these data suggest about wage differences in the firm? Do they represent statistically significant evidence that wages of men and women are different? (To answer this question, first state the null and alternative hypothesis; second, compute the relevant t-statistic; and finally, use the p-value to answer the equation.) (b) Do these data suggest that the firm is guilty of gender discrimination in its compensation politics? Explain. 14. [Practice question, not graded] SW 2.10 [Hint: Use SW Appendix Table 1.] Compute the following probabilities: (a) If Y is distributed N(1,4), find Pr(Y 3). (b) If Y is distributed N(3,9), find Pr(Y>0). (c) If Y is distribut ed N(50,25), find Pr(40≤Y≤52). (d) If Y is distributed N(5,2), find Pr(6 Y 8) 15. [Practice question, not graded] SW 3.3 In a survey of 400 likely voters, 215 responded that they would vote for the incumbent and 185 responded that they would vote for the challenger. Let p denote the fraction of all likely voters that preferred the incumbent at the time of the survey, and let p ˆ be the fraction of survey respondents that preferred the incumbent. (a) Use the survey results to estimate p. (b) Use the estimator of the variance of p ˆ , p ˆ (1 - p ˆ )/n to calculate the standard error of your estimator. (c) What is the p-value for the test H0: p=0.5 vs. H1:p ≠0.5? (d) What is the p-value for the test H0: p=0.5 vs. H1:p>0.5? (e) Why do the results from (c) and (d) differ? (f) Did the survey contain statistically significant evidence that the incumbent was ahead of the challenger at the time of the survey? Explain.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 16. [Practice question, not graded] Consider two events A and B with Pr(A) = 0.5 and Pr(B) = 0.9. Determine the maximum and minimum values of Pr(A B). 17. [Practice question, not graded] Assume that events A and B c are independent. That is, Pr(A ∩ B c ) = Pr(A)Pr(B c ). Are events A and B also independent? 18. [Practice question, not graded] Let X and Y denote two random variables. (i) Show that if at least one of X or Y has expectation equal to zero, then cov(X, Y) = E[XY]. 19. [Practice question, not graded] The following admission data are for the graduate program in the six largest majors at the University of California at Berkeley for the fall 1973 quarter. (a) What is the overall probability of being admitted for males? For females? What is the standard deviation for males and for females? (b) How would you write down the null and alternative hypotheses in order to test that the overall probability of admission is higher for men than for women? (c) Conduct a t-test of the hypothesis from part (b) and report the p-value. (d) Is the result significant at the 5% level? Does it provide evidence of discrimination? (e) Committee chairpersons claim they are more likely to admit women than men. Is this claim true? Compute acceptance rates for men and women by graduate program. (f) Do these data suggest that the university is guilty of gender discrimination in its admission policy? Explain briefly.