2023F_322_Midterm2_Solutions

pdf

School

Rowan University *

*We aren’t endorsed by this school

Course

322

Subject

Economics

Date

Jan 9, 2024

Type

pdf

Pages

18

Uploaded by bb055fisher

Report
Rutgers University Department of Economics Econometrics: 01:220:322:01 Midterm 2 Fall 2023 Instructor: Hector Blanco Exam Version: A Instructions - PLEASE READ ALL THESE INSTRUCTIONS FIRST Write your name on the front page. Do it now! This is a closed book exam –you may have one 3x5 notecard “cheatsheet” (both sides ok) Calculators may be used (no cell phones as calculators!) You have the full class period to complete the exam (approximately 1 hour and 20 minutes) There are a total of 80 points Use a 5% significance level for tests unless otherwise stated (critical value: 1.96) Unless stated otherwise, please round all answers to 2 decimal places Please fill all questions on this exam sheet. Do not unstaple it . If you need more paper there are extra sheets up front. Please clearly label all short/long answer responses with the number and/or letter of the question you are answering. You can write on the front and back of the sheet. Name: 1
Section 1. Warm-up. 3 Points Same as last time, we start with some warm-up questions, no wrong answers in this section (but do answer!) 1. [2 points] What is a TV show that I should be watching right now and why? 2. [1 point] Choose one of the options below: A) Taylor ham B) Pork roll C) I don’t eat meat D) What? 2
Section 2. Multiple Choice Questions. 20 points in total, 2 points each 1. Assume that Y is distributed like a standard normal, N (0 , 1). Then, the prob- ability that Y is between -1.96 and 1.96 is: (a) 0.90 (b) 0.925 (c) 0.95 (d) 0.975 (e) None of the above 2. New Brunswick’s daily temperature has an expected value of 52F and a standard deviation of 11F. The formula to convert degrees Fahrenheit (F) to degrees Celsius (C) is: C = 5 9 ( F 32) What is the expected value of New Brunswick’s daily temperature in Celsius (rounded to the nearest integer)? (a) 9C (b) 10C (c) 11C (d) 12C (e) None of the above 3. New Brunswick’s daily temperature has an expected value of 52F and a standard deviation of 11F. The formula to convert degrees Fahrenheit (F) to degrees Celsius (C) is: C = 5 9 ( F 32) What is the variance of New Brunswick’s daily temperature in Celsius (rounded to the nearest integer)? (a) 37C (b) 42C (c) 47C (d) 49C (e) None of the above 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4. What is the difference between an estimator and an estimate? (a) Both an estimator and an estimate are functions of a sample of data to be drawn randomly from a population. (b) An estimator is a function of a sample of data to be drawn randomly from a population whereas an estimate is the numerical value of the estimator when it is actually computed using data from a specific sample. (c) An estimate is a function of a sample of data to be drawn randomly from a population whereas an estimator is the numerical value of the estimator when it is actually computed using data from a specific sample. (d) Both an estimator and an estimate are numerical values computed using data from a specific sample. (e) None of the above 5. Consider the multivariate regression model Y i = θ 0 + θ 1 X 1 i + θ 2 X 2 i + ... + θ k X ki + ε i Ordinary Least Squares (OLS) estimates the parameters { θ 0 , θ 1 , ..., θ k } by min- imizing the following function: (a) n i =1 ( Y i θ 0 θ 1 X 1 i θ 2 X 2 i ... θ k X ki ε i ) 2 (b) n i =1 ( Y i + θ 0 + θ 1 X 1 i + θ 2 X 2 i + ... + θ k X ki + ε i ) 2 (c) n i =1 ( Y i θ 0 θ 1 X 1 i θ 2 X 2 i ... θ k X ki ) 2 (d) n i =1 ( Y i + θ 0 + θ 1 X 1 i + θ 2 X 2 i + ... + θ k X ki ) 2 (e) n i =1 ( Y 2 i θ 0 θ 1 X 2 1 i θ 2 X 2 2 i ... θ k X 2 ki ) 6. Imagine that you were told that the t-statistic for the slope coefficient of the regression line TestScore V = 698 . 8 + 2 . 28 StudentTeacherRatio was 4.38. What are the units of measurement for the t-statistic? (a) Points of the test score (b) Number of students per teacher (c) Points of the test score / Number of students per teacher (d) Standard deviations (estimated by the corresponding standard error) (e) Dollars 4
7. You estimate Y i = α + β 1 X i + β 2 X 2 i + ϵ i . How can you test for whether the relationship between Y i and X i is linear or quadratic? (a) Compute the t-statistic for β 2 to test the null hypothesis that β 2 = 0 (b) Compute the t-statistic for β 1 to test the null hypothesis that β 1 = 0 (c) Compute the F-statistic to test the null hypothesis that both β 1 = 0 and β 2 = 0 (d) Compare the t-statistic for β 1 when including X 2 in the regression to the t-statistic for β 1 when not including X 2 in the regression (e) None of the above 8. Which of the following statements is true? (a) The R 2 will always be greater than the adjusted R 2 as long as there is at least one independent variable in the regression (b) The R 2 will always be greater than the adjusted R 2 even if there is no independent variable in the regression (c) The adjusted R 2 cannot be negative (d) The adjusted R 2 accounts for omitted variable bias (e) none of the above 9. I am interested estimating the relationship between X and Y . I know there is a non-linear relationship between them. I am considering using a log-log specification, a log-linear specification, or a linear-log specification. Which of the following should help me pick among the three specifications? (a) I can compare the R 2 in the three specifications and pick the specification with the highest R 2 (b) I can compare the adjusted R 2 in the three specifications and pick the specification with the highest adjusted R 2 (c) I can compare the t-statistic for β in the three specifications and pick the specification with the highest t-statistic (d) I can compare the standard error of β in the three specifications and pick the specification with the lowest standard error (e) None of the above 10. If you had a two-regressor regression model, then omitting one of the regressors: (a) Will bias the coefficient of the included regressor upward (b) Will bias the coefficient of the included regressor downward (c) May not bias the coefficient of the included regressor (d) Will have no effect on the coefficient of the included regressor if the correlation between the excluded and the included regressor is negative (e) None of the above 5
Section 3. Short Answer Questions. 12 points in total 11. [6 points] The company Fashion Icon LLC is planning to increase its spending on the advertisement of their clothing items with the objective of increasing the visibility of the brand and, ultimately, increase their profits. Fashion Icon LLC has a bunch of data from previous years that contain information on two variables: revenue t , which denotes the total revenues from their sales in a given year t (in thousand $ ), and ad costs t , which denotes the total costs incurred by the company in the advertising campaign for year t (in thousand $ ). (a) [4 points] Using the two variables, write down a univariate regression equa- tion that the company could estimate where the slope coefficient β can be interpreted as the percentage change in total revenues that is associated with a 1 percent increase in advertising spending. (b) [2 points] If you estimate your regression equation in (a) using OLS, what would the distribution of ˆ β be if your sample is large enough and i.i.d? Why? 12. [6 points] Answer the following questions (round to two decimal places): (a) [3 points] In Metrics’ High School, the probability that a student takes a Statistics course is 0.27 and the probability that a student takes a Literature course is 0.12. Given that the joint probability that a student takes Literature but does not take Statistics is 0.62, what is the probability that a student takes Literature conditional on not taking a Statistics course? (b) [3 points] What is the probability that, when I roll a die twice, I get a two in the first roll and a five on the second roll? What is the probability that, when I roll a die twice, I get a five in each of the two rolls? Assume that the die is not rigged. Hint: does the second roll depend on what the realization of the first roll is? 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Section 4. Long Problem. 45 points in total 13. [21 points] Climate change is one of the biggest challenges of this century. Given this, suppose that we are interested in understanding to what extent economic progress has contributed to climate change. To examine this question, we collected data in 2023 for all counties in the United States. Counties are geographical areas similar to the size of an average metropolitan area. We have data for three variables: AQI : Annual average of the Air Quality Index (AQI), which goes from 0 to 500 degrees (the lower, the better air quality). Air quality is affected by pollution, which is an important driver of climate change. income capita : Income per capita in the county, in thousands of dollars, which is a proxy for economic progress. urban : Dummy variable that takes value 1 if the county is mostly urban and takes value 0 if the county is mostly rural. The table below shows the results of estimating several regression equations by OLS, using AQI as the dependent variable: Variables (1) (2) (3) income capita 2.281 1.537 1.343 (0.493) (0.354) (0.295) urban 14.199 12.304 (3.769) (3.514) income capita × urban 0.189 (0.059) constant 39.5 28.1 26.1 (4.26) (4.09) (3.78) Obs 3,143 3,143 3,143 Note: Standard error in parentheses. This table is made up. Answer the questions below (round your answers to two decimal places): (a) [4 points] Column (1) regresses AQI on income per capita. Interpret the slope coefficient. Interpret the intercept (does it make sense in this case? why?). (b) [6 points] The regression in Column (1) omits variables that may bias the estimate of the effect of income per capita on AQI. Argue why the variable urban may introduce omitted variable bias (OVB). Clearly state the two necessary conditions for OVB and how they apply to this case. Column (2) adds urban to the regression. Was the coefficient in Column (1) upward or downward biased? (c) [6 points] Interpret the estimated coefficient of urban in Column (2). Is this coefficient different from zero at the 5% significance level? (d) [2 points] Column (3) adds an interaction term between the two main re- gressors. Interpret the coefficient on income capita × urban . (e) [3 points] Using the estimates from Column (3) compute the expected value of the AQI for an urban county with an income per capita of $ 25,000. 7
14. [24 points] Rutgers-New Brunswick is interested in studying the differences in academic achievement across its five campuses: Busch, College Ave, Cook, Dou- glass, and Livingston. We collected data for a random sample of 1,500 Rutgers students. The data contains information about their GPA ( gpa i ) and the cam- pus where they take classes. To simplify the problem, assume that each student can only take classes in one campus. For example, if student i takes classes in College Ave, student i cannot take classes in other campuses. Consider the following regression equation (Equation (1)): gpa i = α + β campus i + u i (1) where campus i is a variable that can take 5 values depending on where the student is taking classes: 1 for Busch, 2 for College Ave, 3 for Cook, 4 for Douglass, and 5 for Livingston. Now consider the alternative regression equation below (Equation (2)): gpa i = α + β 1 busch i + β 2 cook i + β 3 douglass i + β 4 livingston i + ε i (2) where each of the independent variables is a dummy variable that takes value 1 if student i lives in that campus and takes value 0 otherwise. The table below shows the results of estimating Equation (2) by OLS: Variables (1) busch 0.221 (0.054) cook -0.017 (0.021) douglass -0.189 (0.045) livingston -0.010 (0.030) constant 3.730 (0.034) Obs 1,500 R 2 0.34 Note: Standard error in parentheses. This table is made up. Answer the questions below (round your answers to two decimal places): (a) [4 points] Can we estimate Equation (1)? If your answer is no, why? If your answer is yes, does the interpretation of β make sense? All the remaining questions refer to Equation (2): (b) [4 points] In Equation (2), what is the omitted group? Why is it omitted? (c) [6 points] How do we interpret ˆ α ? How do we interpret ˆ β 4 ? (d) [6 points] We want to test the null hypothesis that the campus where stu- dents take classes does not matter. Write down the joint null hypothesis and the alternative hypothesis. What statistic do we need to compute to test this hypothesis (no need to compute it)? Explain how you would reject or fail to reject the null hypothesis. (e) [4 points] Suppose that instead of busch i , we included college ave i in the regression. In this hypothetical regression, can we know what would be the estimated ˆ β associated with college ave i ? If your answer is no, explain why. If your answer is yes, provide a number. 8
Rutgers University Department of Economics Econometrics: 01:220:322:01 Midterm 2 Fall 2023 Instructor: Hector Blanco Exam Version: A Instructions - PLEASE READ ALL THESE INSTRUCTIONS FIRST Write your name on the front page. Do it now! This is a closed book exam –you may have one 3x5 notecard “cheatsheet” (both sides ok) Calculators may be used (no cell phones as calculators!) You have the full class period to complete the exam (approximately 1 hour and 20 minutes) There are a total of 80 points Use a 5% significance level for tests unless otherwise stated (critical value: 1.96) Unless stated otherwise, please round all answers to 2 decimal places Please fill all questions on this exam sheet. Do not unstaple it . If you need more paper there are extra sheets up front. Please clearly label all short/long answer responses with the number and/or letter of the question you are answering. You can write on the front and back of the sheet. Name: 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Answer Key for Exam A Section 1. Warm-up. 3 Points Same as last time, we start with some warm-up questions, no wrong answers in this section (but do answer!) 1. [2 points] What is a TV show that I should be watching right now and why? Answer: Any answer is sufficient 2. [1 point] Choose one of the options below: A) Taylor ham B) Pork roll C) I don’t eat meat D) What? Answer: I do not dare giving my opinion on the Taylor ham/Pork roll NJ debate, so I accepted any answer 2
Section 2. Multiple Choice Questions. 20 points in total, 2 points each 1. Assume that Y is distributed like a standard normal, N (0 , 1). Then, the prob- ability that Y is between -1.96 and 1.96 is: (a) 0.90 (b) 0.925 (c) 0.95 (d) 0.975 (e) None of the above 2. New Brunswick’s daily temperature has an expected value of 52F and a standard deviation of 11F. The formula to convert degrees Fahrenheit (F) to degrees Celsius (C) is: C = 5 9 ( F 32) What is the expected value of New Brunswick’s daily temperature in Celsius (rounded to the nearest integer)? (a) 9C (b) 10C (c) 11C (d) 12C (e) None of the above 3. New Brunswick’s daily temperature has an expected value of 52F and a standard deviation of 11F. The formula to convert degrees Fahrenheit (F) to degrees Celsius (C) is: C = 5 9 ( F 32) What is the variance of New Brunswick’s daily temperature in Celsius (rounded to the nearest integer)? (a) 37C (b) 42C (c) 47C (d) 49C (e) None of the above 3
4. What is the difference between an estimator and an estimate? (a) Both an estimator and an estimate are functions of a sample of data to be drawn randomly from a population. (b) An estimator is a function of a sample of data to be drawn randomly from a population whereas an estimate is the numerical value of the estimator when it is actually computed using data from a specific sample. (c) An estimate is a function of a sample of data to be drawn randomly from a population whereas an estimator is the numerical value of the estimator when it is actually computed using data from a specific sample. (d) Both an estimator and an estimate are numerical values computed using data from a specific sample. (e) None of the above 5. Consider the multivariate regression model Y i = θ 0 + θ 1 X 1 i + θ 2 X 2 i + ... + θ k X ki + ε i Ordinary Least Squares (OLS) estimates the parameters { θ 0 , θ 1 , ..., θ k } by min- imizing the following function: (a) n i =1 ( Y i θ 0 θ 1 X 1 i θ 2 X 2 i ... θ k X ki ε i ) 2 (b) n i =1 ( Y i + θ 0 + θ 1 X 1 i + θ 2 X 2 i + ... + θ k X ki + ε i ) 2 (c) n i =1 ( Y i θ 0 θ 1 X 1 i θ 2 X 2 i ... θ k X ki ) 2 (d) n i =1 ( Y i + θ 0 + θ 1 X 1 i + θ 2 X 2 i + ... + θ k X ki ) 2 (e) n i =1 ( Y 2 i θ 0 θ 1 X 2 1 i θ 2 X 2 2 i ... θ k X 2 ki ) 6. Imagine that you were told that the t-statistic for the slope coefficient of the regression line TestScore V = 698 . 8 + 2 . 28 StudentTeacherRatio was 4.38. What are the units of measurement for the t-statistic? (a) Points of the test score (b) Number of students per teacher (c) Points of the test score / Number of students per teacher (d) Standard deviations (estimated by the corresponding standard error) (e) Dollars 7. You estimate Y i = α + β 1 X i + β 2 X 2 i + ϵ i . How can you test for whether the relationship between Y i and X i is linear or quadratic? (a) Compute the t-statistic for β 2 to test the null hypothesis that β 2 = 0 (b) Compute the t-statistic for β 1 to test the null hypothesis that β 1 = 0 (c) Compute the F-statistic to test the null hypothesis that both β 1 = 0 and β 2 = 0 (d) Compare the t-statistic for β 1 when including X 2 in the regression to the t-statistic for β 1 when not including X 2 in the regression (e) None of the above 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
8. Which of the following statements is true? (a) The R 2 will always be greater than the adjusted R 2 as long as there is at least one independent variable in the regression (b) The R 2 will always be greater than the adjusted R 2 even if there is no independent variable in the regression (c) The adjusted R 2 cannot be negative (d) The adjusted R 2 accounts for omitted variable bias (e) none of the above 9. I am interested estimating the relationship between X and Y . I know there is a non-linear relationship between them. I am considering using a log-log specification, a log-linear specification, or a linear-log specification. Which of the following should help me pick among the three specifications? (a) I can compare the R 2 in the three specifications and pick the specification with the highest R 2 (b) I can compare the adjusted R 2 in the three specifications and pick the specification with the highest adjusted R 2 (c) I can compare the t-statistic for β in the three specifications and pick the specification with the highest t-statistic (d) I can compare the standard error of β in the three specifications and pick the specification with the lowest standard error (e) None of the above 10. If you had a two-regressor regression model, then omitting one of the regressors: (a) Will bias the coefficient of the included regressor upward (b) Will bias the coefficient of the included regressor downward (c) May not bias the coefficient of the included regressor (d) Will have no effect on the coefficient of the included regressor if the correlation between the excluded and the included regressor is negative (e) None of the above 5
Section 3. Short Answer Questions. 12 points in total 11. [6 points] The company Fashion Icon LLC is planning to increase its spending on the advertisement of their clothing items with the objective of increasing the visibility of the brand and, ultimately, increase their profits. Fashion Icon LLC has a bunch of data from previous years that contain information on two variables: revenue t , which denotes the total revenues from their sales in a given year t (in thousand $ ), and ad costs t , which denotes the total costs incurred by the company in the advertising campaign for year t (in thousand $ ). (a) [4 points] Using the two variables, write down a univariate regression equa- tion that the company could estimate where the slope coefficient β can be interpreted as the percentage change in total revenues that is associated with a 1 percent increase in advertising spending. (b) [2 points] If you estimate your regression equation in (a) using OLS, what would the distribution of ˆ β be if your sample is large enough and i.i.d? Why? Answer: (a) β can be interpreted as the percentage change in total revenues associated with a 1 percent increase in advertising costs in a log-log regres- sion: ln( revenue t ) = α + β ln( ad costs t ) + ϵ t (b) ˆ β will be normally distributed when n is large enough because of the Central Limit Theorem. More specifically, ˆ β N ( β, V ar ( β )) 12. [6 points] Answer the following questions (round to two decimal places): (a) [3 points] In Metrics’ High School, the probability that a student takes a Statistics course is 0.27 and the probability that a student takes a Literature course is 0.12. Given that the joint probability that a student takes Literature but does not take Statistics is 0.62, what is the probability that a student takes Literature conditional on not taking a Statistics course? (b) [3 points] What is the probability that, when I roll a die twice, I get a two in the first roll and a five on the second roll? What is the probability that, when I roll a die twice, I get a five in each of the two rolls? Assume that the die is not rigged. Hint: does the second roll depend on what the realization of the first roll is? Answer: a) Using the conditional joint probability formula: P (Lit | No Stats) = P (Lit , No Stats) P (No Stats) = P (Lit , No Stats) 1 P (Stats) = 0 . 62 1 0 . 27 = 0 . 85 (b) The two rolls are independent events: the second roll does not depend on the first roll. Thus, the joint probability is equal to the product of the probabilities of the two outcomes. Let X 1 and X 2 be the first and second rolls, respectively: P ( X 1 = 2 , X 2 = 5) = P ( X 1 = 2) × P ( X 2 = 5) = 1 / 6 1 / 6 = 0 . 0278 = 0 . 03 Same applies to the probability of getting two fives. 6
Section 4. Long Problem. 45 points in total 13. [21 points] Climate change is one of the biggest challenges of this century. Given this, suppose that we are interested in understanding to what extent economic progress has contributed to climate change. To examine this question, we collected data in 2023 for all counties in the United States. Counties are geographical areas similar to the size of an average metropolitan area. We have data for three variables: AQI : Annual average of the Air Quality Index (AQI), which goes from 0 to 500 degrees (the lower, the better air quality). Air quality is affected by pollution, which is an important driver of climate change. income capita : Income per capita in the county, in thousands of dollars, which is a proxy for economic progress. urban : Dummy variable that takes value 1 if the county is mostly urban and takes value 0 if the county is mostly rural. The table below shows the results of estimating several regression equations by OLS, using AQI as the dependent variable: Variables (1) (2) (3) income capita 2.281 1.537 1.343 (0.493) (0.354) (0.295) urban 14.199 12.304 (3.769) (3.514) income capita × urban 0.189 (0.059) constant 39.5 28.1 26.1 (4.26) (4.09) (3.78) Obs 3,143 3,143 3,143 Note: Standard error in parentheses. This table is made up. Answer the questions below (round your answers to two decimal places): (a) [4 points] Column (1) regresses AQI on income per capita. Interpret the slope coefficient. Interpret the intercept (does it make sense in this case? why?). (b) [6 points] The regression in Column (1) omits variables that may bias the estimate of the effect of income per capita on AQI. Argue why the variable urban may introduce omitted variable bias (OVB). Clearly state the two necessary conditions for OVB and how they apply to this case. Column (2) adds urban to the regression. Was the coefficient in Column (1) upward or downward biased? (c) [6 points] Interpret the estimated coefficient of urban in Column (2). Is this coefficient different from zero at the 5% significance level? (d) [2 points] Column (3) adds an interaction term between the two main re- gressors. Interpret the coefficient on income capita × urban . (e) [3 points] Using the estimates from Column (3) compute the expected value of the AQI for an urban county with an income per capita of $ 25,000. 7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Answer: (a) Slope : an $ 1,000 increase in income per capita is associated with an increase in AQI of 2.281 degrees. That is, the greater the economic progress, the more polluted the air is. Intercept : the AQI would be 39.5 degrees if income per capita was 0. In this case, it does not make sense to interpret the coefficient because income per capita will never be zero for any county. (b) There are two conditions that should be met in order for urban to cause OVB: (1) urban areas have a different income per capita than rural areas ( cov ( income capita , urban ) is not zero), (2) urban areas should have a different impact on air quality than rural areas ( β 2 is not zero). In this exercise, we may think that urban areas have (1) higher income per capita and also have (2) higher pollution levels. Thus, the coefficient in Column (1) should be upward biased. Since the coefficient went down after we included the variable urban , we can confirm that the previous estimated coefficient of income per capita was upward biased. (c) The coefficient on urban can be interpreted as the difference in AQI between urban and rural counties holding income per capita constant . We can test the hypothesis using two alternative approaches (one is enough): (1) 2 SE ( ˆ β ) < | ˆ β 0 | . In this case, 7.538¡14.199; (2) t = (14 . 199 0) / 3 . 769 = 3 . 77 > 1 . 96 Using both methods, we can reject the null hy- pothesis that the difference in AQI between urban and rural areas is zero. (d) The coefficient on income capita × urban can be interpreted as the additional impact that income per capita has on AQI in urban counties relative to the impact of income per capita on AQI in rural counties. To be very specific: it is the difference in the change in AQI that is associated with a one thousand dollar increase in income per capita in urban counties compared to the change in AQI that is associated with a one thousand dollar increase in income per capita in rural counties. (e) The expected value can be expressed as: E [ aqi | income capita = 25 , urban = 1] = = E [ α + β 1 income capita + β 2 urban + β 3 income capita × urban | income capita = 25 , urban = 1] = = ˆ α + ˆ β 1 25 + ˆ β 2 + ˆ β 3 25 = = 26 . 1 + 1 . 343 25 + 12 . 304 + 0 . 189 25 = 76 . 7 8
14. [24 points] Rutgers-New Brunswick is interested in studying the differences in academic achievement across its five campuses: Busch, College Ave, Cook, Dou- glass, and Livingston. We collected data for a random sample of 1,500 Rutgers students. The data contains information about their GPA ( gpa i ) and the cam- pus where they take classes. To simplify the problem, assume that each student can only take classes in one campus. For example, if student i takes classes in College Ave, student i cannot take classes in other campuses. Consider the following regression equation (Equation (1)): gpa i = α + β campus i + u i (1) where campus i is a variable that can take 5 values depending on where the student is taking classes: 1 for Busch, 2 for College Ave, 3 for Cook, 4 for Douglass, and 5 for Livingston. Now consider the alternative regression equation below (Equation (2)): gpa i = α + β 1 busch i + β 2 cook i + β 3 douglass i + β 4 livingston i + ε i (2) where each of the independent variables is a dummy variable that takes value 1 if student i lives in that campus and takes value 0 otherwise. The table below shows the results of estimating Equation (2) by OLS: Variables (1) busch 0.221 (0.054) cook -0.017 (0.021) douglass -0.189 (0.045) livingston -0.010 (0.030) constant 3.730 (0.034) Obs 1,500 R 2 0.34 Note: Standard error in parentheses. This table is made up. Answer the questions below (round your answers to two decimal places): (a) [4 points] Can we estimate Equation (1)? If your answer is no, why? If your answer is yes, does the interpretation of β make sense? All the remaining questions refer to Equation (2): (b) [4 points] In Equation (2), what is the omitted group? Why is it omitted? (c) [6 points] How do we interpret ˆ α ? How do we interpret ˆ β 4 ? (d) [6 points] We want to test the null hypothesis that the campus where stu- dents take classes does not matter. Write down the joint null hypothesis and the alternative hypothesis. What statistic do we need to compute to test this hypothesis (no need to compute it)? Explain how you would reject or fail to reject the null hypothesis. (e) [4 points] Suppose that instead of busch i , we included college ave i in the regression. In this hypothetical regression, can we know what would be the estimated ˆ β associated with college ave i ? If your answer is no, explain why. If your answer is yes, provide a number. 9
Answer: (a) We can estimate Equation (1). However, the interpretation of β does not make sense. It could mean the impact on GPA of changing classes from Busch to College Ave, or from Cook to Douglass. It is not readily interpretable. (b) The omitted group is students taking classes in College Ave. It is omitted because if adding a dummy variable for each of the five campuses would result in perfect multicollinearity. That is, we would be able to write one of the dummies as a perfect linear combination of the others because they are mutually exclusive, which violates the fourth assumption of multivariate regression. (c) ˆ α is the average GPA of students taking classes in College Ave. ˆ β 4 is the difference in average GPAs between students taking classes in Livingston and students taking classes in College Ave. (d) The joint null hypothesis and the alternative hypothesis are: H 0 : β 1 = 0 and β 2 = 0 and β 3 = 0 and β 4 = 0 H A : β 1 ̸ = 0 and/or β 2 ̸ = 0 and/or β 3 ̸ = 0 and/or β 4 ̸ = 0 You can also express H A as: at least one of the restriction in the joint null hypothesis does not hold. To test this, we would need to compute the F-statistic with 4 restrictions (degrees of freedom). To reject/fail to reject the null hypothesis, there are two alternatives: (1) compare the F- statistic to the critical value in the F-distribution table at the end of the textbook (if the F-statistic is higher than the critical value, we can reject the null), or (2) the statistical software will spit out a p-value associated with the F-statistic: if it is lower than 0.05, we can reject the null at the 5% significance value. (e) Yes. Since the β coefficients indicate the difference in the average GPA between the included groups and the omitted group, if we omit Busch and include College Ave, the coefficient for College Ave should be the negative of the coefficient for Busch, -0.221. 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help