Example Assignment Eleven: Using multiple regression to analyze the gender pay gap
Part I: Use APA style and formatting for all assignments, references, and citations. Yes, have a cover
page, too, as well as a running head. Try Purdue Owl for an example APA style paper:
https://owl.english.purdue.edu/owl/resource/560/18/
For this final analysis you get to bring together many of the variables we have been using this term to
better understand difference in income. In particular we want to explain the gender pay gap between
women and men. For Ordinary Least Squares (OLS) regression analyses, which we are using for this
assignment, you want to have at least one interval/ratio independent variable and an interval/ratio
dependent variable. Your dependent variable is pincp. Your independent variables will be sex, agep, and
schl. But, there are other variables that might explain variation in income. For this analysis we will add
race to our independent variable list. However, as your book tells you, for the nominal variables we
need to do a little recoding into “dummy” variables so
we can use OLS regression more effectively. We
will recode sex into a dummy variable called “Male.” And, we will recode rac1p into a dummy variable
called “White.”
Also, know that there are more tests that need to be done to come to firmer conclusions from an OLS
analysis. For example, two independent variables might also have a strong association where one
predicts the other to a large degree. Might this be the case for sex and schl? When this happens it is
known as multicollinearity or just collinearity and it can impact OLS regression results. There are ways to
test for it and correct the problem, but we are not going to do that in this course. Just know that there is
more to OLS regression than what you practice here. You are practicing running and interpreting the
analysis.
1.
What is the measure (nominal, ordinal, or interval/ratio) of each of your independent variables
and your dependent variable?
Dependent variable, pincp: I/R
Independent variable, rac1p: Nominal
Independent variable, sex: Nominal
Independent variable, agep: I/R
Independent variable, schl: I/R I answer for you because I want you to treat this as an I/R
variable for years of schooling even though it is not exactly year for year the years of schooling.
You can check the data dictionary for schl to see how the answers are coded. They are coded
from 1 to 16 where each number means progressively more education.
2.
Using your 2014-2018 ACS data file, recode your nominal independent variables as instructed in
the text under 17.2 Recoding to Create Dummy Variables and from past assignments to
transform each nominal variable, sex and rac1p, into a new variable.
3.
For sex code Male=1 and Female = 0 in a new variable Male. Male is already coded 1, but you
need to make 1 = 1 in the new variable anyway. Female is coded as 2, so you have to change the
2 to a 0. The new variable, male, should be numeric when you are done. Here is a screen shot to
help you: