IE6400_Day6_Lab6

html

School

Northeastern University *

*We aren’t endorsed by this school

Course

6400

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

html

Pages

20

Uploaded by ColonelStraw13148

Report
IE6400 Foundations for Data Analytics Engineering Fall 2023 Module 1: Pandas Library - Part 2 - Lab 6 Handling Missing Data Again, real-world datasets are rarely perfect. They may contain missing values, wrong data types, unreadable characters, and other unexpected values. The first step to any proper data analysis is cleansing and organizing the data. We'll be working with small employee dataset. In [1]: import pandas as pd dataframe = pd.read_csv('employee.csv') In [2]: # Displaying using Jupyter Notebook dataframe # For non-Jupyter environments # print(dataframe) Out[2]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis n.a. 115163.0 10.125 False Legal 4 NaN Female 0.0 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False na 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development 10 NaN NaN NaN NaN NaN NaN
First Name Gender Salary Bonus % Senior Management Team Taking a closer look at the dataset, we note that Pandas assigns NaN if the value for a particular column is empty string. However, there are cases where missing values are represented by a custom value, for example, the string 'na' or 0 for a numeric column. These technically aren't missing values, as there's something there, but they're functionally missing values. If we try using utility methods such as dropna( ), it will not work properly. We'll want to first clean this up and categorize them, before we try to handle them as missing values. Exercise 6.1 Customizing Missing Data Values In our dataset, we want to consider these as missing values: A 0 value in the Salary column An na value in the Team column The easiest way to deal with missing values is to handle them at import-time. When loading our dataset in, let's set 0 and na as missing values for the Salary and Team columns: In [3]: import pandas as pd df = pd.read_csv('employee.csv', na_values = {"Salary" : [0], "Team" : ['na']}) In [4]: # Display using Jupyter Notebook df Out[4]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis n.a. 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services
First Name Gender Salary Bonus % Senior Management Team 9 Stephanie Female 36844.0 5.574 True Business Development 10 NaN NaN NaN NaN NaN NaN Now, when we load our data in, all instances of 0 and na will be turned into NaN, which is the correct data type of missing values. Great! Now we don't have hidden missing values anymore. Or do we? There's a n.a. cell in the Gender column, on index 3. We'll want to add more matchers for missing values to solve this. We will create a list of values which will be treated as missing globally, in all columns. Let's make a list of various "hidden" missing values and pass that list to the na_values argument: In [5]: missing_values = ["n.a.", "NA", "n/a", "na", 0] In [6]: df = pd.read_csv('employee.csv', na_values = missing_values) In [7]: # Display using Jupyter Notebook df Out[7]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
First Name Gender Salary Bonus % Senior Management Team 9 Stephanie Female 36844.0 5.574 True Business Development 10 NaN NaN NaN NaN NaN NaN Now, the dataset is finally cleaned from the garbage values, and we can work on handling missing values now. Exercise 6.2 Removing Rows With Missing Values One approach would be removing all the rows which contain missing values. This can easily be done with dropna( ) function. In [8]: df Out[8]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development 10 NaN NaN NaN NaN NaN NaN In [9]: # drops all rows with NaN values clean_df = df.dropna(axis = 0, inplace = False) inplace = True makes all the changes in the existing DataFrame without returning a new one.
The axis argument specifies if you're working with rows or columns - 0 being rows and 1 being columns. In [10]: clean_df Out[10]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development You can control whether you want to remove the rows containing at least 1 NaN or all NaN values by setting the how parameter in the dropna method. how: any : if any NaN values are present, drop the row all : if all values are NaN, drop the row In [11]: df.dropna(axis = 0, inplace = True, how = 'all') In [12]: df Out[12]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development
First Name Gender Salary Bonus % Senior Management Team 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development This would only remove the last row from the dataset since how = 'all' would only drop a row if all of the values are missing from it. Filling out Missing Values It might not be the best approach to remove the rows containing missing values. They might contain valuable data in other columns and we don't want to skew the data towards an inaccurate state. There are several options for dealing with this type of issue, and the most common ones are: Fill NaNs with Mean, Median, or Mode of the data Fill NaNs with a constant value Forward Fill or Backward Fill NaNs Interpolate Data and Fill NaNs Exercise 6.3 Filling Missing Values with Column Mean Let's start out with the fillna( ) method. It fills the NaN-marked values with values you supply the method with. For example, you can use .mean( ) function on a column and supply those as the fill value: In [13]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[13]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
First Name Gender Salary Bonus % Senior Management Team 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [14]: # Using mean df['Salary'].fillna(int(df['Salary'].mean()), inplace = True) In [15]: df Out[15]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male 88920.0 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female 88920.0 11.598 NaN Finance 5 Angela NaN 88920.0 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development Exercise 6.4 Filling Missing Values with Column Median In [16]:
import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[16]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [17]: # Using median df['Salary'].fillna(int(df['Salary'].median()), inplace = True) In [18]: df Out[18]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN
First Name Gender Salary Bonus % Senior Management Team 2 Jerry Male 97308.0 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female 97308.0 11.598 NaN Finance 5 Angela NaN 97308.0 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development Exercise 6.5 Filling Missing Values with Column Mode In [19]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[19]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
First Name Gender Salary Bonus % Senior Management Team 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [20]: # Using mode df['Salary'].fillna(df['Salary'].mode(), inplace = True) In [21]: df Out[21]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male 61933.0 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female 111737.0 11.598 NaN Finance 5 Angela NaN 115163.0 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development
Now, if a salary is missing from a person's row, a mean, mode, or median are used to fill that value. Exercise 6.6 Filling Missing Values with a Constant You could also decide to fill the NaN-marked values with a constant value. For example, you can put in a special string or numerical value: In [22]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[22]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [23]: df['Salary'].fillna(0, inplace = True) In [24]: df Out[24]:
First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male 0.0 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female 0.0 11.598 NaN Finance 5 Angela NaN 0.0 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development Notice that all NaN values in Salary column are changed to "0" now. Exercise 6.7 Forward Fill Missing DataFrame Values This method would fill the missing values with first non-missing value that shows before it. In [25]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[25]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
First Name Gender Salary Bonus % Senior Management Team 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [26]: df['Salary'].fillna(method = 'ffill', inplace = True) In [27]: df Out[27]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male 61933.0 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female 115163.0 11.598 NaN Finance 5 Angela NaN 115163.0 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services
First Name Gender Salary Bonus % Senior Management Team 9 Stephanie Female 36844.0 5.574 True Business Development For example, Jerry would inherit the Salary from Thomas. Exercise 6.8 Backward Fill Missing DataFrame Values This method would fill the missing values with first non-missing value that shows after it. In [28]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[28]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [29]:
df['Team'].fillna(method = 'bfill', inplace = True) In [30]: df Out[30]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True Finance 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False Business Development 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development This work exactly the other way around from the previous approach, and Thomas will inherit Finance as his Team from Jerry. Exercise 6.9 Fill Missing DataFrame Values with Interpolation - polynomial Finally, this method uses mathematical interpolation to determine what value would have been in the place of a missing value. In [31]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[31]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [32]: df['Salary'].interpolate(method = 'polynomial', order = 5, inplace = True) In [33]: df Out[33]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.000000 6.945 True Marketing 1 Thomas Male 61933.000000 NaN True NaN 2 Jerry Male 108558.907129 9.340 True Finance 3 Dennis NaN 115163.000000 10.125 False Legal 4 NaN Female 85046.355671 11.598 NaN Finance 5 Angela NaN 73044.323666 18.523 True Engineering 6 Shawn Male 111737.000000 6.414 False NaN
First Name Gender Salary Bonus % Senior Management Team 7 Rachel Female 142032.000000 12.599 False Business Development 8 Linda Female 57427.000000 9.557 True Client Services 9 Stephanie Female 36844.000000 5.574 True Business Development Exercise 6.10 Fill Missing DataFrame Values with Interpolation - spline In [34]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[34]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business
First Name Gender Salary Bonus % Senior Management Team Development In [35]: df['Salary'].interpolate(method = 'spline', order = 5, inplace = True) In [36]: df Out[36]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.000000 6.945 True Marketing 1 Thomas Male 61933.000000 NaN True NaN 2 Jerry Male 108557.233181 9.340 True Finance 3 Dennis NaN 115163.000000 10.125 False Legal 4 NaN Female 85049.266288 11.598 NaN Finance 5 Angela NaN 73048.433135 18.523 True Engineering 6 Shawn Male 111737.000000 6.414 False NaN 7 Rachel Female 142032.000000 12.599 False Business Development 8 Linda Female 57427.000000 9.557 True Client Services 9 Stephanie Female 36844.000000 5.574 True Business Development Exercise 6.11 Fill Missing DataFrame Values with Interpolation - linear In [37]: import pandas as pd # Cleansing the missing values missing_values = ["n.a.", "NA", "n/a", "na", 0] df = pd.read_csv('employee.csv', na_values = missing_values) # Removing all NaN df.dropna(axis = 0, inplace = True, how = 'all') # Show df df Out[37]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male NaN 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female NaN 11.598 NaN Finance 5 Angela NaN NaN 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [38]: df['Salary'].interpolate(method = 'linear', inplace = True) In [39]: df Out[39]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male 88548.0 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female 114021.0 11.598 NaN Finance 5 Angela NaN 112879.0 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN
First Name Gender Salary Bonus % Senior Management Team 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development Exercise 6.12 Writing CSV File with to_csv( ) Here, we've cleaned the employee DataFrame. Then, we've gone ahead and saved that data into a CSV file using to_csv( ) and providing the filename. In [40]: df Out[40]: First Name Gender Salary Bonus % Senior Management Team 0 Douglas Male 97308.0 6.945 True Marketing 1 Thomas Male 61933.0 NaN True NaN 2 Jerry Male 88548.0 9.340 True Finance 3 Dennis NaN 115163.0 10.125 False Legal 4 NaN Female 114021.0 11.598 NaN Finance 5 Angela NaN 112879.0 18.523 True Engineering 6 Shawn Male 111737.0 6.414 False NaN 7 Rachel Female 142032.0 12.599 False Business Development 8 Linda Female 57427.0 9.557 True Client Services 9 Stephanie Female 36844.0 5.574 True Business Development In [41]: df.to_csv('cleaned_employee.csv') Revised Date: September 7th, 2023 In [ ]: