Assignment 11

pdf

School

California State University, Chico *

*We aren’t endorsed by this school

Course

105

Subject

Economics

Date

Jan 9, 2024

Type

pdf

Pages

7

Uploaded by ProfessorUniverseTarsier7

Report
Example Assignment Eleven: Using multiple regression to analyze the gender pay gap Part I: Use APA style and formatting for all assignments, references, and citations. Yes, have a cover page, too, as well as a running head. Try Purdue Owl for an example APA style paper: https://owl.english.purdue.edu/owl/resource/560/18/ For this final analysis you get to bring together many of the variables we have been using this term to better understand difference in income. In particular we want to explain the gender pay gap between women and men. For Ordinary Least Squares (OLS) regression analyses, which we are using for this assignment, you want to have at least one interval/ratio independent variable and an interval/ratio dependent variable. Your dependent variable is pincp. Your independent variables will be sex, agep, and schl. But, there are other variables that might explain variation in income. For this analysis we will add race to our independent variable list. However, as your book tells you, for the nominal variables we need to do a little recoding into “dummy” variables so we can use OLS regression more effectively. We will recode sex into a dummy variable called “Male.” And, we will recode rac1p into a dummy variable called “White.” Also, know that there are more tests that need to be done to come to firmer conclusions from an OLS analysis. For example, two independent variables might also have a strong association where one predicts the other to a large degree. Might this be the case for sex and schl? When this happens it is known as multicollinearity or just collinearity and it can impact OLS regression results. There are ways to test for it and correct the problem, but we are not going to do that in this course. Just know that there is more to OLS regression than what you practice here. You are practicing running and interpreting the analysis. 1. What is the measure (nominal, ordinal, or interval/ratio) of each of your independent variables and your dependent variable? Dependent variable, pincp: I/R Independent variable, rac1p: Nominal Independent variable, sex: Nominal Independent variable, agep: I/R Independent variable, schl: I/R I answer for you because I want you to treat this as an I/R variable for years of schooling even though it is not exactly year for year the years of schooling. You can check the data dictionary for schl to see how the answers are coded. They are coded from 1 to 16 where each number means progressively more education. 2. Using your 2014-2018 ACS data file, recode your nominal independent variables as instructed in the text under 17.2 Recoding to Create Dummy Variables and from past assignments to transform each nominal variable, sex and rac1p, into a new variable. 3. For sex code Male=1 and Female = 0 in a new variable Male. Male is already coded 1, but you need to make 1 = 1 in the new variable anyway. Female is coded as 2, so you have to change the 2 to a 0. The new variable, male, should be numeric when you are done. Here is a screen shot to help you:
4. Next assign labels to the values for your new variable, male. So, 1=Male and 0=Female. We have done assigned labels before. See screen shot below to help guide you. 5. Save your file with the new variable, male. 6. For rac1p code white=1 and nonwhite = 0 in a new variable white. Recoding rac1p is a little more complicated to recode thnt sex was because it has many values-white, black, native, etc. If
you want to see the coding for rac1p, it is in the data dictionary starting on page 101. White is already coded 1, but you need to make 1=1 in the new variable. All other race categories are coded 2-9, so you have to change them all together to =0. Race categories are much more complicated than white/nonwhite. We are coding them this way for ease of practice. The new variable, white, should be numeric when you are done. Here is a screen shot to help you: 7. Next assign labels to the values for your new variable, white. So, 1=White and 0=Nonwhite. We have done assigned labels before. See screen shot below to help guide you.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
8. Save your file with the new variable, white. 9. Congratulations, you have just created your first dummy variables. You can now use these recoded variables in a regression analysis and treat them as interval/ratio variables, known as dummy variables. 10. Follow the directions in your text for demonstration 17.3, which begins on page 324, entitled “Multiple Regression.” Use your new variables male and white in addition to agep and schl. So four total independent variables. You should know your dependent variable. 11. Copy and paste your output table that look like the “Model Summary,” table in your text on page 324. Here is where this is going to get fun for you to see the numbers. Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .416a .173 .173 72246.727 a. Predictors: (Constant), SCHL, AGEP, White, Male 12. Remember R? What is R for our model? R = 0.416 13. Using your text in past chapters to guide you, how strong (or not) is the correlation (R) between the independent variables and the dependent variable? Hint: This answer is not a number but is based on a number. Correlation strength = moderate
14. Remember our friend R-square that tells us how much of the variation in, in this case, income, is explained by the independent variables, in this case male, agep, schl, and white? What is R- square for this full model? R-square = 0.173 15. How much of the variation in income is accounted for using male, agep, schl, and white? Hint: this number is a percent. Explained variation in income = 17.3% 16. In a perfect model that predicts 100 percent of the variation in income, R-square =100 percent or 1.00. So, now think about this, if this model predicts income at the percent you gave in number 14, how much of the variation in income is still unaccounted for? Hint: this number is a percent and you have to use subtraction. 100% - R-square = 100% - 17.3% = 82.7% Unaccounted for variation = 82.7% 17. Copy and paste your table that looks like the one on page 325 entitled “Coefficients.” Now you get to use the equation you practiced in a past assignment for one independent variable, but instead now for many independent variables used to predict income. This is the equation for the model. Coefficients a Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -127344.606 124.980 -1018.922 .000 Male 21983.308 41.675 .136 527.493 .000 White 9245.157 41.821 .057 221.063 .000 AGEP 1348.528 1.604 .216 840.529 .000 SCHL 6742.781 5.170 .337 1304.091 .000 a. Dependent Variable: PINCP 18. Using this equation: = a + b X 1 + b X 2 + b X 3 + b X 4 from your text, fill in the values for the constant, and the Beta values for each independent variable. Do you notice how there is a “b” in the equation for each independent variable? If you had five independent variables you would have one more b X term in your equation. 19. Write out and solve the equation for a 25-year-old, white, male with 12 years of education. = -127344.606 + 21983.308(25) + 9245.157(1) + 1348.528(1) + 6742.781(12) = -127344.606 + 549582.7 + 9245.157 + 1348.528 + 80913.372 = 446745.151 20. Write a concluding statement about the income of your results in number 19. Use a full sentence. The predicted income for a 25-year-old, white, male with 12 years of education is approximately $446,745.15. This value comes from the combination of age, race, gender, and education level indicated by the coefficients in the model.
21. Write out and solve the equation for a 40-year-old, nonwhite, female with 16 years of education. = -127344.606 + 21983.308(40) + 9245.157(0) + 1348.528(0) + 6742.781(16) = -127344.606 + 879332.32 + 0 + 0 + 107884.496 = 859872.21 22. Write a concluding statement about the income of your results in number 21.Use a full sentence. The predicted income for a 40-year-old, nonwhite, female with 16 years of education is approximately $859,872.21. 23. Any thoughts about what you just found from using your model to compare two people who occupy different social categories? Use complete sentences and numbers from the analyses to support your thoughts. The model's results show that there is a significant difference in the expected income for a white male 25- year-old with 12 years of education which is around $446,745.15 and a non-white female 40-year-old with 16 years of education which is around $859,872.21. This means that people from different social categories experience significant differences in predicted income based on the model's variables (age, race, gender, and education. These results emphasize the complexity of social and economic dynamics that go beyond individual characteristics and the importance of taking social categories into account when studying income disparities. 24. Challenging question might the unexplained variance also be important in predicting how much income a full-time, year-round Californian worker earns? Yes, the unexplained variance is important in predicting how much income a full-time, year-round Californian worker earns. Even though the model considers a number of variables, including age, race, gender, and education level, there may be more factors that affect income that the model does not account for. Investigating the unexplained variance can reveal information about factors that the current analysis may not have taken into account or measured. 25. What is the standardized coefficient used for according to your text? When comparing variables measured on different scales, the standardized coefficient helps determine the relative significance and strength of each independent variable's impact in a regression model. 26. Given the standardized coefficients, which independent variable appears to have the most impact on income levels? The variable SCHL which has a standardized coefficient of β=0.337 has the most impact on predicted income levels in this model. 27. Which variable do you think we could add to our model, that we haven’t, that might explain some variation in incomes? OCCP Occupation, the type of profession individuals work in can significantly impact income levels. 28. Would you have to recode the variable you think is missing to include it in a multivariate regression analysis? Why or why not? Because OCCP is a categorical variable with multiple categories representing different occupations, I would possibly recode it into numerical or dummy variables to include it in a regression analysis since regression models usually require numerical input. 29. Report the significance levels for each independent variable in the model. Male = 0.000
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Age = 0.000 Education = 0.000 White =0.000 30. Are your independent variables significant to p<.05 level? Yes, all the independent variables in the model which are Male, White, Age, and Education are statistically significant to the p < .05 level. 31. What does significance mean? Offer sentences that concludes the significance levels and what they mean in interpreting your results. In the model all the independent variables Male, White, Age, and Education have p-values less than 0.05. The probability of the observed relationships between these variables and income occurring by random chance is very low. Based on the evidence we can suggest that being male, being white, increasing age, and higher levels of education are associated with statistically significant changes in income. 32. We know there is a gender pay gap in the U.S. and in California. Given what you know from your analyses here, what would you say about income, gender, and what people earn in California for full-time, year-round work? Use as much information from this analysis and past analyses as you think relevant to discuss the gender pay-gap in California for full-time year-round workers. According to the analysis there is a statistically significant difference between the expected incomes of men and women for full-time, year-round workers in California, with men generally earning higher incomes. The coefficient for the variable "Male" is positive 21983.308, indicating that on average, males tend to have higher predicted incomes compared to females when other variables are held constant. It's important to consider the complexity of the gender pay gap, considering social and practical factors that also contribute to the gender pay gap.