HW#3

pdf

School

Texas A&M University *

*We aren’t endorsed by this school

Course

652

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

12

Uploaded by BailiffRaven2568

Report
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu Problem 1. (1) Are there any collinearity problems based on the above data? Please look at the scatter plot matrix to answer this question. - AGE and COLLEGE: There seems to appear to be a relationship but not linear realtionship. - AGE and INCOME: No strong linear relationship is visible. - AGE and GENDER: Gender is typically a categorical variable, and there's no linear relationship evident. - COLLEGE and INCOME: There is a positive trend indicating that as college level increases, income tends to increase as well. But there is no significant linear relationship. - COLLEGE and GENDER: Like AGE and GENDER, there's no linear relationship as GENDER is categorical. - INCOME and GENDER: There's no clear linear relationship visible.
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu Which pairs have statistically significant correlations? Be 95% confident. none of the pairs • college and income • all of the pairs • college and age (correct) • income and gender • age and gender college and gender • age and income (2) Use the output from the individual regression (fit these explanatory variables at a time with the response) to determine which explanatory variables should be included in the model.
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu The p-values less than 0.05 imply the value of each predictor in the model. Given the results from effect summary table, p-value for Income = 0.0000<0.05, for Gender = 0.00248 < 0.05. So these ones should be included in the model. • gender • income (3) Use the output from a stepwise regression to determine which explanatory variables should be included in the model. Compare the results of your conclusions from the stepwise to your results from the complete regression. Are they different?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu Based on the stepwise regression output the good model should include age, income, and gender. And the results are slightly different from the individual regression because age doesn’t have significant value as a predictor in that one. (4) Use the Cp value to determine which explanatory variables should be included in the model. Does it agree with the previous models? What is the corresponding Cp value in decision making?
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu For each predictor that the Cp is less than p, we can conclude that is the best number for explanatory of variables. Therefore, Gender with Cp = 3.02 <4 implies the best model has p-1 = 3 parameters, which are income, age and gender. And yes it is same with the previous models. Problem 2. (1) Use scatter plots and VIF to determine if there is evidence of collinearity in the explanatory variables. (a) Is there an evidence for collinearity according to the scatter plots? Based on the scatter plots, correlation coefficients and p-values. There is a collinearity between y and X1. But no evidence for collinearity between independent variables. Correlation coefficient, Y & X1 = 0.764
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu P-value < 0.0001 (b) Is there an evidence for collinearity according to the VIF? Given the VIF there is no evidence for collinearity between parameters. Because all of them are around 1. (2) Use a variable selection procedure with maximum R square or any other criterion to formulate a model. Which variables are in the model? - Model for Y and X1: - Model for Y and X2:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu - Model for Y and X3: - Model for Y and X4: By comparing the Adj RSquare, the maximum is for model formulated for Y and X1, we can conclude that this model is the best one among the individual regression models. By running the stepwise regression the model will be different. See below:
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu Given the stepwise regression model, the model with x1, x2 and x4 predictors is the most fitted one. Also, the Adj RSquare for this model is 0.7096 which is more than the same parameter for model related to Y and x1. In conclusion, x1, x2 and x4 are the significant value predictors. (3) Fit the model with all the given independent variables. (a) What are the coefficients in the regression model y = β 0 + β 1 x1 + β 2 x2 + β 3 x3 + β 4 x4? β 0 = 80.42, β 1 = 6.18, β 2 = -2.90, β 3 = 0.11, β 4 = 0.017 (b) Is there evidence in the residual plots of a violation of the constant variance condition?
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu From the "Residual by Predicted Plot," the residuals do not show a clear pattern; they seem to be randomly dispersed around the horizontal line that represents a residual value of zero. There is no obvious pattern such as a funnel shape or a curve which would suggest increasing or decreasing variance of the residuals as the predicted values change. There is a slight concentration of residuals around the center of the predicted value range, but without a clear increase in the spread of residuals as the predicted values increase or decrease. This slight concentration does not necessarily indicate a violation of the constant variance condition. Therefore, based on this plot alone, there is no strong evidence of a violation of the constant variance condition (no clear signs of heteroscedasticity) Problem 3 (1) Fit the linear model with both independent variables. (a) What are the coefficients of the regression model y = β0 + β1 x1 + β 2 x2? y = 20.00 -0.29 x1 + 1.40 x2 (b) Is the model significant? Be 95% confident. H0: all of the model coefficients are zero. Ha: at least one of the β i is not equal to zero (model is significant).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu Since the p-value <0.0001, we can reject the null hypothesis that all of the model coefficients are zero (meaning the model has no predictive capability). Therefore, we conclude that the regression model is statistically significant at the 95% confidence level. (c) Now look at the individual parameters. i. Is x1 a significant predictor? p-value = 0.8378 > 0.05, so no significant predictor. ii. Is x2 a significant predictor? p-value = 0.0019 < 0.05, so significant predictor. iii. Is there a collinearity problem according to VIF? Since VIF < 10, there is no collinearity problem. (2) Fit the following new model, y = β0 + β1 x1 + β 2 x2 + β 3 x1 2 + β 4 x2 2 + β 5 x1 x2 + ε to the aphid data. Compare the linear model (reduced model) with the new model (complete model). (a) What is the value of the test statistics and p value? F-test = 9.2219, p-value < 0.0001
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu (b) Did you reject H0? Be 95% confident. H0: all of the model coefficients are zero. Ha: at least one of the β i is not equal to zero (model is significant). p-value < 0.0001, so we can reject the null. (c) Which model has a higher adjusted R square? Model with x1 and x2: adjusted R square = 0.5200 Model with 5 predictors: adjusted R square = 0.5473, is higher than the previous model. (3) Calculate a new response variable ty = log(y), the natural logarithm of the aphid count. Fit the new model, ty = β 0 + β 1 x1 + β 2 x2 + ε to the aphid data.
Farzaneh Hashemabadi HW#3 STAT 652-Spring 2024 UIN: 233005386 Email: Farzaneh.hashemabadi@tamu.edu (a) Which has the highest Adjusted R square? y = β 0 + β 1 x1 + β 2 x2 + ε y = β 0 + β 1 x1 + β 2 x2 + β 3 x12 + β 4 x22 + β 5 x1 x2 + ε ty = β 0 + β 1 x1 + β 2 x2 + ε (b) Which has a lowest RMSE? y = β 0 + β 1 x1 + β 2 x2 + ε : RMSE = 23.11 y = β 0 + β 1 x1 + β 2 x2 + β 3 x12 + β 4 x22 + β 5 x1 x2 + ε : RMSE = 22.44 ty = β 0 + β 1 x1 + β 2 x2 + ε : RMSE = 0.52 (c) Which has the lowest PRESS? (You get PRESS by pressing the red triangle button at the top, selecting Row Diagnostics and clicking on PRESS) y = β 0 + β 1 x1 + β 2 x2 + ε y = β 0 + β 1 x1 + β 2 x2 + β 3 x12 + β 4 x22 + β 5 x1 x2 + ε ty = β 0 + β 1 x1 + β 2 x2 + ε
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help