DAT 430 Project Two Part One

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

430

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

docx

Pages

9

Uploaded by ChancellorNightingale2248

Report
DAT 430: Leverage Data for Org Results Tiffany Rudman Quinn Southern New Hampshire University Project Two Part One December 10, 2023
Establishing a baseline on the HR Attrition data set To begin the pre-processing on the HR Attrition data set, I first imported the libraries I would need. I used pandas and numpy for data manipulation and numercial calculations. I also imported sklearn for preprocessing, ordinal encoder, train test split, classification report, confusion matrix and more. I also imported matplotlib and seaborn for data visualizations. I then read the data set and used the .head function to see the first 7 rows of the data set. I used the .info function to determine number of rows and columns, the data types of each columns and if any null values are present.
Feature Engineering Next, I set the y variable to our Attrition attribute, and the x variables to all other attributes in the data set. I then used the .replace function to set the Attrition values of Yes and No to 1 and 0, respectively. Then I used the ordinal encorder to change the data types on the variables. Finally, I converted the y variable Attrition to a string.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ML Modeling selections The machine learning models that I selected to are Random Forest Classification and Logistic Regression. I imported the machine learning algorithm and trained the model with the
training data set and confirmed accuracy. I did run a few logistic regression models to ensure that the variables were reporting correctly with Attrition on the y-axis and Job Role and Eduction Field on the x-axis in bar charts. I also have a Histogram of Age to show the range of Age in our data set.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Next, I used the test data set to make predicitons using the confusion matrix and classification report. I visualized the results of the confusion matrix with a heatmap to show the comparison of the actual vs the predicted values.
Using Recursive Feature Elimination for feature selection, I determined fifteen features that are relevant to the data set. These features include Performance Rating, Job Role, Work Life Balance, and more.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help