Question 2: Feature Engineering Having considered missing data, we can now further prepare our dataset for use within a model by performing feature engineering. Question 2.1: What's in a name? When you originally received the titanic dataset, you were excited to see the Name feature included, as you believed this might be another source of information to help infer a person's social status on the ship. To use this feature however, we need to extract an individual's title from his/her name, as calling train_df['Name'].unique().shape tells us that every Name entry within our train dataset is currently unique. Go ahead and perform this transformation by writing a function called extract_title, which adds an extra Title column to our dataframe into which is placed a person's given title found within the Name column. Examples of title extraction; Braund, Mr. Owen Harris maps to a title of Mr. Heikkinen, Miss. Laina maps to a title of Miss. Function arguments: input_df -> input Pandas DataFrame. Function specifications: Name the function extract_title Must take any Pandas DataFrame as input and return a DataFrame as output with an additional Title column. Assume that input_df represents a DataFrame possessing a 'Name' column, with each corresponding row entry being a string-based name containing exactly one title. Assume that a title is represented by a word with two or more characters ending in a . Hint : you can import (import re) and use a regular expression search (re.search()) to find the title in each name. ### START FUNCTION def extract_title(input_df): # your code here return ### END FUNCTION Expected output: extract_title(train_df)['Title'].unique() == ['Mr.', 'Mrs.', 'Miss.', 'Master.', 'Don.', 'Rev.', 'Dr.', 'Mme.','Ms.', 'Major.', 'Lady.', 'Sir.', 'Mlle.', 'Col.', 'Capt.','Countess.', 'Jonkheer.'] extract_title(test_df)['Title'].unique() == ['Mr.', 'Mrs.', 'Miss.', 'Master.', 'Ms.', 'Col.', 'Rev.', 'Dr.','Dona.']
Question 2: Feature Engineering
Having considered missing data, we can now further prepare our dataset for use within a model by performing feature engineering.
Question 2.1: What's in a name?
When you originally received the titanic dataset, you were excited to see the Name feature included, as you believed this might be another source of information to help infer a person's social status on the ship. To use this feature however, we need to extract an individual's title from his/her name, as calling train_df['Name'].unique().shape tells us that every Name entry within our train dataset is currently unique.
Go ahead and perform this transformation by writing a function called extract_title, which adds an extra Title column to our dataframe into which is placed a person's given title found within the Name column.
Examples of title extraction;
- Braund, Mr. Owen Harris maps to a title of Mr.
- Heikkinen, Miss. Laina maps to a title of Miss.
Function arguments:
- input_df -> input Pandas DataFrame.
Function specifications:
- Name the function extract_title
- Must take any Pandas DataFrame as input and return a DataFrame as output with an additional Title column.
- Assume that input_df represents a DataFrame possessing a 'Name' column, with each corresponding row entry being a string-based name containing exactly one title.
- Assume that a title is represented by a word with two or more characters ending in a .
- Hint : you can import (import re) and use a regular expression search (re.search()) to find the title in each name.
'Don.', 'Rev.', 'Dr.', 'Mme.','Ms.',
'Major.', 'Lady.', 'Sir.', 'Mlle.',
'Col.', 'Capt.','Countess.', 'Jonkheer.']
extract_title(test_df)['Title'].unique() == ['Mr.', 'Mrs.', 'Miss.', 'Master.',
'Ms.', 'Col.', 'Rev.', 'Dr.','Dona.']
Step by step
Solved in 2 steps with 1 images