2 Examining Waugh

.docx

School

Nairobi Institute of Technology - Westlands *

*We aren’t endorsed by this school

Course

301

Subject

Statistics

Date

Nov 24, 2024

Type

docx

Pages

8

Uploaded by DoctorGuanacoPerson736

2. Examining Waugh’s 1927 Asparagus Data a. Estimation of the multiple regression equation Using the provided data, we can estimate the parameters of the multiple regression equation using ordinary least squares (OLS) regression. The estimated regression equation is: PRICE = 74.61 + 0.13694 * GREEN - 1.51734 * NOSTALKS - 0.16949 * DISPERSE We find that our estimates for the GREEN and DISPERSE coefficients are very close to those given by Waugh, but our estimate for the NOSTALKS coefficient is slightly smaller in magnitude. Also noticeably different is the intercept. b. Comparison of sample means Computing the sample means of the variables PRICE, GREEN, NOSTALKS, and DISPERSE, we obtain: PRICE = 88.45 GREEN = 5.8775 NOSTALKS = 19.395 DISPERSE = 14.75 When compared to the means reported by Waugh, we observe that the means for GREEN and DISPERSE are relatively similar, whereas the means for NOSTALKS and PRICE are marginally and much less, respectively. This raises the possibility of inconsistencies in the data processing. c. Comparison of moment matrices Computing the moment matrix of the data, we obtain: VC Matrix PRICE GREEN NOSTALKS DISPERSE PRICE 1002.59 3421.68 -108.41 -75.38 GREEN 24379.76 -21.54 -164.16 NOSTALKS 66.46 30.86 DISPERSE 81.90 Comparing this moment matrix to Waugh's, we find that the variances for GREEN and DISPERSE are very similar, the variance for NOSTALKS is slightly larger, and the variance for PRICE is much smaller. The covariance are all larger than those reported by Waugh, suggesting that there may be some inconsistencies in the scaling or normalization of the variables. d. Interpretation of regression coefficients Waugh's data and his published estimates of regression coefficients are incompatible, but we can still make sense of the coefficients we estimated for ourselves. Asparagus with more green coloration (measured in inches) fetches a higher relative price per bunch, as indicated by the positive GREEN coefficient. The coefficient of NOSTALKS is negative, showing that an increase in the number of stalks per bunch is connected with a lower relative price per
bunch. If the coefficient of DISPERSE is negative, it means that the relative price per bunch decreases as the variation in stalk size increases. Knowing that the average market quote PMi was $2.782 allows us to calculate the effects of one-unit changes in each of the regressors on the absolute price per bunch of asparagus. For instance, a $0.38 increase in the absolute price per bunch would result from a 1% increase in GREEN, while a $0.42 drop would result from a 1% increase in NOSTALKS. The per-bunch cost would go down by $0.047 if DISPERSE was increased by one unit. T-tests can be used to determine the level of statistical significance of the parameter estimations. The relative price of a bunch of asparagus is affected strongly by all three regressors, with t-statistics that are all significantly different from zero. e. Final thoughts on the discrepancies The discrepancies between our results and Waugh's may be due to a number of factors, including: a. Errors in the data entry or transcription b. Differences in the way the data was cleaned or preprocessed c. Differences in the statistical software used d. Differences in the specification of the regression model Without further study, it is difficult to ascertain the specific cause of the differences. Nonetheless, it can't be denied that there are discrepancies in the information or in the manner it was processed. 3 . Exploring Relationships among R 2 , Coefficients of Determination, and Correlation Coefficients a. Simple Correlations The correlation matrix shows how linearly related each two variables are in strength. They can vary within the span of the values: between one minus one and one plus one. As in this example, the strongest correlation is observed for PRICE and GREEN (0.74834), followed by PRICE and NOSTATS (0.040656). It is worth noting, that there exists a negative correlation (correlation coefficient = 0.01403) between GREEN and NOSTALKS since they are almost orthogonal. b. Simple Regressions and R 2 PRICE on GREEN 1. TSS = ∑(yi - )² = 8681.29 y 2. ESS = ∑(ŷi - )² = 4837.66 y 3. R2 = ESS / TSS = 4837.66 / 8681.29 = 0.5600
PRICE on NOSTALKS 1. TSS = ∑(yi - )² = 8681.29 y 2. ESS = ∑(ŷi - )² = 1403.44 y 3. R2 = ESS / TSS = 1403.44 / 8681.29 = 0.1625 PRICE on DISPERSE 1. TSS = ∑(yi - )² = 8681.29 y 2. ESS = ∑(ŷi - )² = 381.76 y 3. R2 = ESS / TSS = 381.76 / 8681.29 = 0.0446 When compared to other R 2 values, they appear quite reasonable (the signs aside) and correlate well with the respective correlation coefficient values of 0.74834, 0.40656, and 0.2111. The other R 2 , which is the squared value of the correlation coefficient, indicates this. We could have done “reverse” regressions and still ended up with the same other R 2 measures if we had accidently run them. The other R 2 does not change when you re-order the variables in your regression equation. c. Multiple Regressions and Change in R 2 Simple regression of PRICE on GREEN yields an R 2 value of 0.5600. It is anticipated that adding the regressor NOSTALKS in the regression equation will enhance the R2 value since it adds information on the variation of PRICE. The R2 value thus represents the percentage of the variance in the dependent variable accounted for by the independent variables. The multiple regression yielded R 2 = 0.6287 for running PRICE with GREEN and NOSTALGS as independent variables. As expected, it is more than R 2 from the simple regression of PRICE on GREEN (0.5600). Just like that, the R 2 value obtained from the simple regression of Price against Disperse is 0.0446. The addition of the regressor NOSTALK into the regression model should also elevate the R 2 value, though it will probably not match or even surpass that expected because PRICE and DISPENSE are correlated less than it is for PRICE and GREEN. We get an R 2 value of 0.0761 when running the multiple regression of Price and Disperse as well as Nostalks. This is greater than the R 2 value obtained by simple regression between PRICE and DISPERSE (0.0446) but smaller than the observed increment in Multiple Regression involving PRICE, GREEN, and NOSTALGS. This is in line with a smaller correlation of DISPERSE on PRICE than that of GREEN. d. Multiple Regressions and Sum of R2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help