Question 2: Feature Engineering Having considered missing data, we can now further prepare our dataset for use within a model by performing feature engineering. Question 2.1: What's in a name? When you originally received the titanic dataset, you were excited to see the Name feature included, as you believed this might be another source of information to help infer a person's social status on the ship. To use this feature however, we need to extract an individual's title from his/her name, as calling train_df['Name'].unique().shape tells us that every Name entry within our train dataset is currently unique. Go ahead and perform this transformation by writing a function called extract_title, which adds an extra Title column to our dataframe into which is placed a person's given title found within the Name column. Examples of title extraction; Braund, Mr. Owen Harris maps to a title of Mr. Heikkinen, Miss. Laina maps to a title of Miss. Function arguments: input_df -> input Pandas DataFrame. Function specifications: Name the function extract_title Must take any Pandas DataFrame as input and return a DataFrame as output with an additional Title column. Assume that input_df represents a DataFrame possessing a 'Name' column, with each corresponding row entry being a string-based name containing exactly one title. Assume that a title is represented by a word with two or more characters ending in a . Hint : you can import (import re) and use a regular expression search (re.search()) to find the title in each name. ### START FUNCTION def extract_title(input_df):     # your code here     return ### END FUNCTION   Expected output: extract_title(train_df)['Title'].unique() == ['Mr.', 'Mrs.', 'Miss.', 'Master.',                                               'Don.', 'Rev.', 'Dr.', 'Mme.','Ms.',                                               'Major.', 'Lady.', 'Sir.', 'Mlle.',                                               'Col.', 'Capt.','Countess.', 'Jonkheer.'] extract_title(test_df)['Title'].unique() == ['Mr.', 'Mrs.', 'Miss.', 'Master.',                                               'Ms.', 'Col.', 'Rev.', 'Dr.','Dona.']

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

Question 2: Feature Engineering

Having considered missing data, we can now further prepare our dataset for use within a model by performing feature engineering.

Question 2.1: What's in a name?

When you originally received the titanic dataset, you were excited to see the Name feature included, as you believed this might be another source of information to help infer a person's social status on the ship. To use this feature however, we need to extract an individual's title from his/her name, as calling train_df['Name'].unique().shape tells us that every Name entry within our train dataset is currently unique.

Go ahead and perform this transformation by writing a function called extract_title, which adds an extra Title column to our dataframe into which is placed a person's given title found within the Name column.

Examples of title extraction;

  • Braund, Mr. Owen Harris maps to a title of Mr.
  • Heikkinen, Miss. Laina maps to a title of Miss.

Function arguments:

  • input_df -> input Pandas DataFrame.

Function specifications:

  • Name the function extract_title
  • Must take any Pandas DataFrame as input and return a DataFrame as output with an additional Title column.
  • Assume that input_df represents a DataFrame possessing a 'Name' column, with each corresponding row entry being a string-based name containing exactly one title.
  • Assume that a title is represented by a word with two or more characters ending in a .
  • Hint : you can import (import re) and use a regular expression search (re.search()) to find the title in each name.
### START FUNCTION
def extract_title(input_df):
    # your code here
    return

### END FUNCTION
 
Expected output:
extract_title(train_df)['Title'].unique() == ['Mr.', 'Mrs.', 'Miss.', 'Master.',
                                              'Don.', 'Rev.', 'Dr.', 'Mme.','Ms.',
                                              'Major.', 'Lady.', 'Sir.', 'Mlle.',
                                              'Col.', 'Capt.','Countess.', 'Jonkheer.']


extract_title(test_df)['Title'].unique() == ['Mr.', 'Mrs.', 'Miss.', 'Master.', 
                                             'Ms.', 'Col.', 'Rev.', 'Dr.','Dona.']
Expert Solution
steps

Step by step

Solved in 2 steps with 1 images

Blurred answer
Knowledge Booster
Fundamentals of Datawarehouse
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education