Case Write upPraveen Reddy
docx
keyboard_arrow_up
School
University of Massachusetts, Boston *
*We aren’t endorsed by this school
Course
835
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
docx
Pages
3
Uploaded by DoctorLemur3792
Case Write-Up
To build a prediction model to predict whether a customer will book a trip with Scholastic Travel
Company (STC), we can use the CRISP-DM framework. Here is a step-by-step guide on how to
approach this case:
1.
Business Understanding
:
o
Clearly understand the objective of the prediction model, which is to predict
whether a customer will book a trip with STC.
o
Identify the key stakeholders and their requirements.
o
Determine the success criteria for the prediction model.
2.
Data Understanding
:
o
Gather the necessary data for analysis, including historical customer data, trip
booking data, and any other relevant data sources.
o
Explore the data to identify the variables/features available.
o
Assess the quality, completeness, and relevance of the available data.
3.
Data Preparation
:
o
Cleanse and preprocess the data, handling missing values, outliers, and any data
quality issues.
o
Perform feature engineering, creating new features or transforming existing ones.
o
Split the available data into a training set and a test/validation set for model
evaluation.
4.
Modeling
:
o
Select an appropriate algorithm for prediction based on the nature of the problem
and the available data.
o
Train the prediction model using the training set.
o
Optimize the model by tuning the hyperparameters or using feature selection
techniques.
o
Validate the model's performance using the test/validation set.
5.
Evaluation
:
o
Assess the performance of the prediction model using appropriate evaluation
metrics such as accuracy, precision, recall, and F1-score.
o
Compare the model's performance against the success criteria set in the Business
Understanding phase.
o
Interpret the results and provide insights on the factors influencing the booking
decisions.
6.
Deployment
:
o
Create a deployment plan for implementing the prediction model into production.
o
Document the details of the analysis, including the methodology, data
preprocessing steps, model selection, and evaluation results.
o
Prepare the two-page case writeup and submit it along with the SAS EG project
file, which contains the analysis.
In the case write-up, make sure to include a summary of the case situation, an explanation of the
analysis approach using the CRISP-DM framework, and a detailed discussion of the findings and
insights from the prediction model. Cite any references used appropriately.
It is important to note that implementing this case analysis requires access to the required data
and the SAS EG software. Make sure you have the necessary resources before proceeding.
Recommendations
:
Here is a summary of the columns from Exhibit 1 that could potentially be deleted, and the reasoning
provided in a tabular format:
Column Name
Delete
Reason
ID
Yes
Just a row identifier, not a useful predictor
From.Grade
Yes
Redundant with GroupGradeTypeLow
To.Grade
Yes
Redundant with GroupGradeTypeHigh
Departure.Date
Yes
Temporal info captured in Days and
DepartureMonth
Return.Date
Yes
Temporal info captured in Days and
DepartureMonth
Deposit.Date
Yes
Unclear predictive value for bookings
Special.Pay
Yes
Much missing data, meaning unclear
Tuition
Yes
Better to use total revenue data
Reasons
:
Redundant information
- Columns like grade ranges captured elsewhere
Missing data
- Columns with many missing values
Irrelevant to prediction
- Columns like IDs that are just identifiers
Insufficient meaning
- Columns where predictive value is unclear
Better alternatives
- Columns like Tuition where total revenue is better
Here are some potential problems I see in Exhibit 1 and possible solutions:
Problem 1: Missing values
Issue:
Many cells contain "." which indicates missing data.
Solution:
Impute missing numeric values with mean or median. Impute missing categorical values with
mode.
Problem 2: Sparse categories
Issue:
Columns like "Group.State" and "Travel.Type" have many unique low frequency categories.
Solution:
Combine categories with few observations into an "Other" level.
Problem 3: Temporal data
Issue:
Columns like dates not in ideal format for modeling.
Solution:
Extract date parts like year, month, day into separate columns.
Problem 4: Ambiguous categories
Issue:
Values like "OTHER" in "Group.State" are ambiguous.
Solution:
Confirm meaning and recode into more descriptive categories.
Problem 5: Mixed data types
Issue:
Columns contain mix of numeric, categorical, date, text data.
Solution:
Set appropriate types (number, nominal, ordinal, date) for each.
Problem 6: Non-standard column names
Issue:
Column names are abbreviated, inconsistent, hard to interpret.
Solution:
Rename columns to be more descriptive.
Problem 7: Small sample size
Issue:
Only 5 rows shown, hard to assess distributions.
Solution:
Profile full dataset to understand all variables.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help