Econ 140 Fall 2023 Midterm Review Solutions - Tagged (1)

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

140

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by LieutenantReindeer2533

Report
Econ 140 Fall 2023 Midterm Review 1. RAND HIE Experiment a) Summarize the RAND HIE Experiment The RAND Corporation ran its health insurance experiment (HIE) between 1974 and 1982. This mass experiment randomly assigned 2,000 US families across 6 cities into difference health insurance plans with different rates of coinsurance. Their experiment categorized health plans into 5 categories: catastrophic, deductible-catastrophic, coinsurance-catastrophic, free- catastrophic, and any insurance-catastrophic. The control group in this case is the people with catastrophic health insurance only. The authors of the experiment primarily looked at deductible-catastrophic, coinsurance-catastrophic, and free-catastrophic health insurance plans. Using publicly available data of health expenditure and health outcomes from the RAND HIE Experiment, we decide to run an OLS regression on Face to Face visits. The authors from the RAND HIE experiment decided to exclude data from participants who were not enrolled in the health insurance plans of interests. Below is the summary of our statistical analysis. (Don’t be too concerned about classical standard error types) Plan_deduc is the deductible catastrophic health insurance plan; plan_coins is the coinsurance- catastrophic health insurance plan; plan_free is the free-catastrophic health insurance plan. Looking at our summary above, answer the following questions. b) Explain what our intercept represents and what our estimate of our intercept means? Our intercept occurs when all other variables are zero, in other words, this represents
the baseline of face to face visits from participants enrolled in the catastrophic health insurance plan. The estimated effect of this is 2.7841 Face-to-face visits. c) What is the estimated effect of plan_deduc on our outcome variable? How about plan_coins? How about plan_free? For each of them, explain what that means for our outcome variable Plan_deduc = 0.1927 Plan_coins = 0.4811 plan_free = 1.6637 We did not perform a logarithmic transformation on our outcome variable, face to face visits, hence, these quantitative estimates are in the units of face to face visits and not percentage. If we performed a logarithmic transformation, then we can claim a percentage change for every unit change in our treatment variable. d) Are they all / none / some statistically significant? Explain how you arrived at your answers. Feel free to use standard errors, t-tests, p-value, or confidence intervals. plan_coins and plan_free are statistically significant while plan_deduc is not. Plan_deduc: Coefficient (0.1927) is not at least 2 times greater than its Standard Error (0.1031). T-test is similar. P-value of 0.1745 is greater than 0.05. Confidence interval contains 0, meaning it is not statistically different from 0. Plan_coins: Coefficient (0.4811) is at least 2 times greater than its Standard Error (0.1338). T-test is similar. P-value of 0.0003228 is smaller than 0.05. Confidence Interval does not contain 0. Same thought process was applied to plan_free e) Based on our analysis of the coefficients so far, explain whether you can interpret the coefficients causally? Our deductible catastrophic health insurance plan is not statistically significant, leaving little room for discussion of making a causal claim. Since our RAND HIE experiment randomly assigned the participants into treatment and control groups, we can assume that they are balanced and are identical. This indicates that bias has been minimized. This does allow us to infer that coinsurance catastrophic and free catastrophic health insurance plan are likely to cause an increase in face to face visits. However, we must be aware that this experiment was done in the 1970s where the US still face massive inequalities across socioeconomic and ethnic groups, which can confound the experiment.
2. Conceptual a) Describe what a randomized controlled trial is and how it is used to eliminate selection bias? Could omitted variable bias still occur? Randomized controlled trial is a type of experiment where subjects are randomly assigned into at least two groups: treatment group and control group. Assuming random assignment was done well, we can claim all variables apart from the treatment are kept constant, ceteris paribus, allowing us to eliminate selection bias. OVB can still occur in a RCT if there is a misspecification in our regression model and we did not account for it during our study that cannot be solved through random assignment. b) Describe what omitted variable bias is? Omitted Variable Bias occurs when a relevant independent or explanatory variable has been left out of the regression model, which can cause the coefficient estimate of one or more of the explanatory variables to be biased. c) Let’s start with a simple observational model with an arbitrary Y outcome variable and some arbitrary X treatment variable. This model has been misspecified and we have omitted variable A. Write out our short regression. Write out our long regression. What is our formula for Omitted Variable Bias using our short and long regression? Write out our auxiliary regression. Using our auxiliary regression and long regression, derive OVB in terms of 𝛄 and π 1 We start off with the simple observational model that compares one explanatory variable with our explained variable. Short Regression: Y i = 𝝰 short + 𝛽 short X i + e short i Long Regression: Y i = 𝝰 long + 𝛽 long X i + 𝛄 A i + e long i Let’s assume we have omitted a variable A i . This means that our short regression coefficient 𝛽 short for X i is biased. We include the omitted variable in our long regression which contains the true coefficient of X i as 𝛽 long . If this is the case, then our omitted variable bias is then: OVB = 𝛽 short - 𝛽 long
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Auxiliary Regression: A i = π 0 + π 1 X i + u i For the omitted variable to be relevant in our study, the following must be true: 1. Our long regression model is true which means 𝛄 does not equal 0 and thus Y i and A i are correlated 2. Our auxiliary regression model is true and π 1 does not equal 0 and thus X i and A i are correlated If above is true, then part of 𝛽 short is included in the effect of the omitted variable. We can then substitute the auxiliary regression into the omitted variable A i : Y i = 𝝰 long + 𝛽 long X i + 𝛄 0 + π 1 X i + u i ) + e long i After distributing 𝛄 into the substituted auxiliary regression: Y i = 𝝰 long + 𝛽 long X i + 𝛄 π 0 + 𝛄 π 1 X i + 𝛄 u i + e long i Collect like terms so that it follows the same structure as your short regression: Y i = ( 𝝰 long + 𝛄 π 0 ) + ( 𝛽 long + 𝛄 π 1 ) X i + ( 𝛄 u i + e long i ) We can now compare and isolate the coefficients for our X variables and end up with the following: 𝛽 short = 𝛽 long + 𝛄 π 1 𝛽 short - 𝛽 long = 𝛄 π 1 d) If our calculated omitted variable bias is positive, what does that mean for our biased estimate? Was it biased upwards? Downwards? Unbiased? How about when our calculated omitted variable bias is negative? Having defined omitted variable bias as: OVB = 𝛽 short - 𝛽 long We have the following two scenarios: 1. 𝛽 short > 𝛽 long OVB is positive and 𝛽 short was too large 2. 𝛽 short < 𝛽 long OVB is negative and 𝛽 short was too small
e) Describe what a control variable is? What makes a control variable bad? What makes a control variable good? Control variable is any variable that is held constant for all observations in an experiment. A control variable can affect the outcome variable and if it is omitted, our results would be distorted and biased. A control variable becomes bad when it is an outcome of our X variable of interest. For example, we saw occupation being a bad control when we are analyzing the relationship between earnings and education. This is because the type of occupation heavily depends on the education received. A good control variable has a clear relationship with our independent variables and are not themselves outcome variables. Controlling for them should make our regression analysis more reliable. f) Define p-value and critical value. How do we use them? p-value: probability of obtaining test statistic at least as extreme as the observed results of a statistical hypothesis test, assuming the null hypothesis is true critical value: cut-off value that is used to mark the start of a region where the test statistic, obtained in hypothesis testing, is unlikely to fall in. We use the critical value to compare with the test statistic to determine whether we reject the null hypothesis. 3. Omitted Variable Bias a) Supposed you are interested in second hand cars and you want to find out what determines the prices of used cars. In order to answer this, you collect data on cars, including factors you think may affect the price of the car. You end up with a sample size of 1,000. For each car in the sample, you observed the price of the car, the brand of the car, the number of seats a car has, the brand of the car (car brands were ranked on an ordinal scale with 1 being lowest and 5 being highest), whether the car had an accident, the size of the car’s engine, the amount of miles it has already driven, and the age of the car. We specify the regression model: price = 𝝰 + 𝛽 1 brand + 𝛽 2 seats + 𝛽 3 accident + 𝛽 4 engine + 𝛽 5 mileage + e We run an OLS regression on our model and come up with the following results: Constant Brand Seats Accident Engine Mileage Dependent variable: price -633.799 (814.939) 2,090.848 (89.591) 376.811 (124.930) -2,588.826 (691.917) 4.865 (0.165) -0.025 (0.003) Observations = 1,000.
Using our regression results from part B, describe the effect each of our independent variable has on the dependent variable, price. Explain whether any or all of them are statistically significant. Brand: For every unit change in brand on the rankings, there is an associated unit change of 2,090.848 in price. Seats: For every unit change in seats, there is an associated unit change of 376.811 in price. Accident: If the car had been an accident, the price of the car is predicted to go down by -2,588.826. Engine: For every unit change in engine, there is an associated unit change of 4.865 in price. Mileage: For every unit change in mileage, there is an associated unit change of -0.025 in price. The standard error for Brand is 89.591. Our coefficient estimate of 2,090.848 is more than twice its standard error, hence it is statistically significant. This process is applied to the rest of the variables and we determine that they are all statistically significant. b) Comment on the magnitude of the coefficient estimates of our independent variables. Are they too large, too small, or reasonable. Since price of second hand cars are in the thousands to tens of thousands of dollars, the coefficient estimates of brand, seats, and accident are reasonable since they are within the same order of magnitude. However, for engine and mileage, their coefficient estimates are very small relative to price. c) You realized you have omitted the variable, age of the car, and as a result, have biased estimates. You know mileage and age are positively correlated and that age has a negative impact on price. From this understanding, what direction is 𝛽 5 biased in and explain how you arrive at that answer? We know the age of the car is an omitted variable. Our omitted variable is negatively correlated with our dependent variable, price of car, while it is positively correlated with our independent variable, mileage. From our initial regression, we can see there is a negative correlation between mileage and price. This does make sense in real life. Using our short regression, long regression, and auxiliary regression, we can deduce that 𝛄 is less than 0 (negative) and π 1 is greater than 0 (positive). We know OVB is the product of 𝛄 π 1 and OVB would then be negative. Since OVB is negative, it means our 𝛽 5 is biased downwards.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
d) You decide to run an OLS regression on a more specified model that includes age of the car now. price = 𝝰 + 𝛽 1 brand + 𝛽 2 seats + 𝛽 3 accident + 𝛽 4 engine + 𝛽 5 mileage + 𝛽 6 age+ e You then come up with the following results: Constant Brand Seats Accident Engine Mileage Age Dependent variable: price -342.778 (821.856) 2,075.163 (89.610) 384.272 (124.333) -2,628.229 (690.435) 4.880 (0.156) -0.014 (0.006) -160.562 (66.507) Observations = 1,000. Using our results, compute the Omitted Variable Bias and π 1 . Given our understanding of the relationship between mileage and age, we will use their coefficient estimates for calculation. Our OVB formula is as follows: OVB = 𝛽 short - 𝛽 long = 𝛄 π 1 Taking our biased estimate from our short regression for mileage, which is -0.025, and our true estimate from our long regression for mileage, which is -0.014, we determine the difference to b -0.025-(-0.014) = -0.011. This is our calculated omitted variable bias. Now to calculate π 1 . We take calculated omitted variable bias of -0.011 and divide that by 𝛄 which is the coefficient estimate (-160.562) of age on price. Our π 1 is then estimated to be 0.0000685.