Case Write upPraveen Reddy

docx

School

University of Massachusetts, Boston *

*We aren’t endorsed by this school

Course

835

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

docx

Pages

3

Uploaded by DoctorLemur3792

Report
Case Write-Up To build a prediction model to predict whether a customer will book a trip with Scholastic Travel Company (STC), we can use the CRISP-DM framework. Here is a step-by-step guide on how to approach this case: 1. Business Understanding : o Clearly understand the objective of the prediction model, which is to predict whether a customer will book a trip with STC. o Identify the key stakeholders and their requirements. o Determine the success criteria for the prediction model. 2. Data Understanding : o Gather the necessary data for analysis, including historical customer data, trip booking data, and any other relevant data sources. o Explore the data to identify the variables/features available. o Assess the quality, completeness, and relevance of the available data. 3. Data Preparation : o Cleanse and preprocess the data, handling missing values, outliers, and any data quality issues. o Perform feature engineering, creating new features or transforming existing ones. o Split the available data into a training set and a test/validation set for model evaluation. 4. Modeling : o Select an appropriate algorithm for prediction based on the nature of the problem and the available data. o Train the prediction model using the training set. o Optimize the model by tuning the hyperparameters or using feature selection techniques. o Validate the model's performance using the test/validation set. 5. Evaluation : o Assess the performance of the prediction model using appropriate evaluation metrics such as accuracy, precision, recall, and F1-score. o Compare the model's performance against the success criteria set in the Business Understanding phase. o Interpret the results and provide insights on the factors influencing the booking decisions. 6. Deployment : o Create a deployment plan for implementing the prediction model into production. o Document the details of the analysis, including the methodology, data preprocessing steps, model selection, and evaluation results. o Prepare the two-page case writeup and submit it along with the SAS EG project file, which contains the analysis. In the case write-up, make sure to include a summary of the case situation, an explanation of the analysis approach using the CRISP-DM framework, and a detailed discussion of the findings and insights from the prediction model. Cite any references used appropriately.
It is important to note that implementing this case analysis requires access to the required data and the SAS EG software. Make sure you have the necessary resources before proceeding. Recommendations : Here is a summary of the columns from Exhibit 1 that could potentially be deleted, and the reasoning provided in a tabular format: Column Name Delete Reason ID Yes Just a row identifier, not a useful predictor From.Grade Yes Redundant with GroupGradeTypeLow To.Grade Yes Redundant with GroupGradeTypeHigh Departure.Date Yes Temporal info captured in Days and DepartureMonth Return.Date Yes Temporal info captured in Days and DepartureMonth Deposit.Date Yes Unclear predictive value for bookings Special.Pay Yes Much missing data, meaning unclear Tuition Yes Better to use total revenue data Reasons : Redundant information - Columns like grade ranges captured elsewhere Missing data - Columns with many missing values Irrelevant to prediction - Columns like IDs that are just identifiers Insufficient meaning - Columns where predictive value is unclear Better alternatives - Columns like Tuition where total revenue is better Here are some potential problems I see in Exhibit 1 and possible solutions: Problem 1: Missing values Issue: Many cells contain "." which indicates missing data. Solution: Impute missing numeric values with mean or median. Impute missing categorical values with mode. Problem 2: Sparse categories Issue: Columns like "Group.State" and "Travel.Type" have many unique low frequency categories. Solution: Combine categories with few observations into an "Other" level. Problem 3: Temporal data Issue: Columns like dates not in ideal format for modeling. Solution: Extract date parts like year, month, day into separate columns. Problem 4: Ambiguous categories Issue: Values like "OTHER" in "Group.State" are ambiguous. Solution: Confirm meaning and recode into more descriptive categories. Problem 5: Mixed data types
Issue: Columns contain mix of numeric, categorical, date, text data. Solution: Set appropriate types (number, nominal, ordinal, date) for each. Problem 6: Non-standard column names Issue: Column names are abbreviated, inconsistent, hard to interpret. Solution: Rename columns to be more descriptive. Problem 7: Small sample size Issue: Only 5 rows shown, hard to assess distributions. Solution: Profile full dataset to understand all variables.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help