project#2sp23-1

docx

School

Indiana University, Bloomington *

*We aren’t endorsed by this school

Course

MISC

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

6

Uploaded by EarlNightingale4120

Report
G350: Business Econometrics Mini-Project #2 50 points QUESTION #1 You have been hired as a consultant by the American Automobile Association (AAA) to analyze the relationship between price, age, and miles of used cars. You have been assigned to do an empirical study to analyze the relationship between the price of a used car and the age and miles on the car. Specifically, you want to address the following: 1) Is there a relationship between used car price and age of car? 2) Is there a relationship between used car price and number of miles on car? To conduct this study, you will use dataset usedautos to estimate two linear regression models. DATA: USEDAUTOS The data file USEDAUTOS consists of cross-sectional data for price, age, and miles for a random sample of 182,020. The variables are defined as: PRICE = measured in dollars, AGE = measured in years, and MILES = measured in miles. Descriptive Statistics 1. Report the sample mean, standard deviation, CV, minimum, and maximum for the variables PRICE, AGE, and MILES. 1
a. Evaluate the sampling distribution for each of the three variables. Which variable appears to have the least dispersion of data (smallest distribution)? Most dispersion of data (largest distribution)? Explain. The variable with the least dispersion / smallest distribution is age which is statistically accurate and has a standard deviation of 4.24. The variable with the most dispersion of data/ largest distribution is miles which has a standard deviation of 25377.71. These are statistically accurate given that miles can be significantly larger quantities than age. b. Is there any reason to believe that the data exhibits extreme outliers in any of the three variables: price, age, and miles? Explain. There is reason to believe that there are extreme outliers in the variable of price which can be observed from the minimum and maximum values which are a minimum of $500 and a maximum of $1250000. c. Use the sample means to describe a “typical car” in the sample. A typical car according to the sample is 9.86 years old, has a cost of $8466.94, and has mileage of 76123.88. Statistical Model 2. Specify the two population regression models that describes the assumed relationship between the dependent and the independent variables in the population: 1) between price and age and 2) between price and miles. In the population regression model that describes the relationship between price and age y = price, x = age, B1 = captures the effect of one more year of age on price, all other factors affecting income constant, u = captures effects NOT in model: such as color, size, and geographic location. In the population regression model that describes the relationship between price and miles y = price, x = miles, B1 = captures the effect of one more mile on price, all other factors affecting income constant, u = captures effects NOT in model: such as color, size, and geographic location. a. State and explain the deterministic component for each model. The deterministic component for the model that describes price and age is the average price for the fixed value of age. The deterministic component for the model that describes price and mileage is the average price for the fixed value of miles. b. State and explain the random component for each model. The random component of the model that describes price and age is unmeasurable/unobservable effects and the pure random behavior of human beings purchasing cars. The random component of the model that describes price and miles is also unmeasurable/unobservable effects and the pure random behavior of human beings purchasing cars. Estimation, Interpretation, and Prediction 3. Use the sample data and OLS to estimate the regression coefficients for the two equations you stated in (#2). Report the coefficients, the standard errors of these estimates, and the t- values of the estimates. a. Interpret the point estimates for the AGE and MILES coefficients. AGE: for every year the car ages, its value depreciates by ~$1,170.12 2
MILES: for every mile the car runs, its value depreciates by ~$0.174 b. What does the AGE estimates suggest about the relationship between the price of a used car and age of the car? What does the MILES estimate suggest about the relationship between the price of a used car and the number of miles on the car? According to the AGE assessment, there is a negative correlation between car price and age. According to the MILES estimate, there is a negative relationship between a used car's price and its mileage. c. On average, if the number of miles on a car increases by 1000 miles, what is the change in the car’s price? With reference to the (a) part 0.174*1000 = 174 If the car drives for a 1000 miles, its value will depreciate by ~$174. d. The estimates of the regression coefficients you obtained in (#3) are point estimates. Calculate a 95% interval estimate for the “AGE” regression coefficient. Show your work. When confidence level is 95%, alpha is .05; therefore, our critical t-value is 1.96. Now, we can regress price and age to retrieve our standard error and coefficient to calculate a 95% interval estimate. -1170.118 + (1.96*6.394057) = -1157.585648 -1170.118 - (1.96*6.394057) = -1182.650352 e. Interpret the 95% interval estimate for the AGE coefficient. Explain. The interval [-1182.65, -1157.59], shows that 95% of the time, it will be the true value in repeated samples. f. On average, if the age of a car increases by 5 years, what is the change in the car’s price? With reference to the (a) part 1170.12*5 = 5850.6 If the car ages by 5 years, its value will depreciate by ~$1,170.12 g. What is the predicted price for a used car that is 8 years old? Price = 19,998.35 - 1170.118(8) = $10,637.406 h. What is the predicted price for a used car that has 125,000 miles? Price = 21,773.55 - .1748021(125,000) = ~ -$76.71 i. Interpret the standard error and the t-value for the AGE coefficient. What do they tell you about the reliability/precision of the estimate? Does the estimate seem to be relatively precise or imprecise? Justify your answer. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Standard error: 6.394057. This means that the population mean is 6.394057 different from the sample mean. This is a relatively small standard error, meaning that there is little variation and a narrow distribution. When there is little variation, the estimates are more precise. T-value: -183.00; since t-values are unitless measures, they are used to interpret whether or not our coefficients are reasonably close or different from the hypothesized value. Since - 183.00 is a relatively large t-value, the hypothesized null value can be rejected and we can establish causality. Goodness-of-Fit 4. Report the coefficient of determination and the standard error for both regressions. Coefficient of determination for AGE - 0.1554 Standard error for AGE - 0.63941 Coefficient of determination for MILES - 0.1242 Standard error for MILES - 0.00109 a. Do you believe that your two models fit the sample data well? What measure did you use to gauge the goodness-of-fit? Explain. I believe that the regression for the two models, price and age, do not fit the sample data well. Since the R-squared coefficient of determination values are extremely close to 0 with values of 0.1554 and 0.1242, The standard error for Age is also on the higher side at 0.63941. However, the standard error for Miles is on the lower side. Overall, the model is not a great fit for the sample data. b. Do you believe that there is any reason to assume that your model violates any of the five Gauss-Markov Assumptions? If so, which ones and why? The model violates assumption 1. This assumption is “Conditional mean of Y is linear in parameters”. I believe this assumption was violated due to the coefficients of determination being low. This shows that there is no linear relationship or trend between the variables. c. Do you believe that the estimate for AGE is an “unbiased” estimate of the true unknown population relationship between age and price? Explain. d. I believe there is a slight bias due to the violation of the linearity assumption. Since the coefficient of determination was so low, there was a violation due to there no seeming to be a linear relationship. However, there do not seem to be other violations overall, so there is not much of a bias. e. Do you believe that the estimate for MILES is an “unbiased” estimate of the true unknown population relationship between miles and price? Explain. f. I believe there is a slight bias due to the violation of the linearity assumption. Since the coefficient of determination was so low, there was a violation due to there no seeming to be a linear relationship. However, there do not seem to be other violations overall, so there is not much of a bias. 4
Hypothesis Testing Administrators at the AAA proposed the following two questions/statements. Please answer each question below – be sure to provide statistical evidence by using the appropriate statistical test. You must show your work to receive full credit. 5. Report the t-statistics for the AGE and MILES coefficients. a. One staff member argues that your estimate of the effect of age on car price may be due to random error (chance), and therefore car age has no real effect on car price. Conduct an appropriate statistical test and respond to this argument? In order to respond to this argument, we will use the P-value approach: Our null hypothesis is H0: The coefficient for age is equal to zero and there is no effect of age on car price. Our alternative hypothesis is H1: The coefficient of age is not equal to zero and there is an effect of age on car price. Since our p-value is less than our significance level of (a = .05), we can reject the null hypothesis and assume that there is an effect of age on car price. b. A second staff member suggests that an increase of 5 years will decrease car price by $10,000. Use the appropriate statistical test and respond to this suggestion. HINT: First use the 5 years and $10,000 to find the beta for the population. 5
For this suggestion we will be using a one sample T-test to test our hypotheses. H0: The estimated effect of a 5 year increase in age is equal to -10,000 H1: The estimated effect of a 5 year increase in age is not equal to -10,000 B = -1170.12 * 5 c. The same staff member suggests that an increase of 10,000 miles will decrease car price by $2,000. Use the appropriate statistical test and respond to this statement. HINT: First use the 10,0000 miles and $2,000 to find the beta for the population. Conclusions 6. Please summarize the results of your study. Be sure to discuss which variables, if any, have a statistically significant effect on car price, the direction of the effect, and the size of the effect. a. Summarize whether AGE and MILES have an independent causal effect on price. If so, what is the direction and size of the effect? Age and miles do have an independent causal effect on price. Both move in a negative direction and have a large effect on price. As price increases we can see that mileage and age decreases which is statistically accurate given that cars with less miles/ age are considered newer and would thus cost more. b. Are the effects of AGE and MILES on price economically significant? Explain. Since the standard of error is on hte higher side for age, and extremely low for miles, and the R- squared values are also low, the effects of AGE and MILES on price are economically significant. Due to the data, it has become clear that AGE and MILES do not have a strong linear relationship with PRICE. This leads us to conclude that age and miles do not have a significant impact on price or linearity would be completely evident. c. Is there any reason to believe that the results are biased and invalid, or do a poor job at estimating the two relationships? Explain. There is not much of a reason to believe that the results are biased or invalid. There seems to be only one violation of a Gauss-Markov assumption, which is a violation of the linearity assumption. There seems to be little to no linear trend in the two relationships, but the results do not do a poor job of estimating the relationships. The only reason that could be an issue is multicollinearity. Since there are multiple independent variables, it becomes slightly harder to understand their effect on price. Overall, the sample data seems to be collected randomly and the data does not seem invalid. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help