Homework_3_group_15

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

7275

Subject

Marketing

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by EarlFang5804

Problem 3 The data science team in a genetic testing company has developed a predictive model to identify Type 1 Gaucher disease. From domain knowledge, the prevalence of Type 1 Gaucher disease in the US population is 4%. The model was built on a dataset of 4000 samples, of which 1800 samples were diagnosed as positive. The team partitioned the dataset into 70% training and 30% validation with a stratified sampling technique. The sensitivity and specificity achieved on the validation set are 70% and 90%, respectively. a. Calculate the adjusted misclassification rate, precision, and recall on the validation set. Comment on the model performance. Records in data set = 4000 Trained data = (70/100)*4000 = 2800 Validation data = (30/100)*4000 = 1200 As we use stratified sampling technique, out of 1200 records, 540 records are diseases and 660 records are not diseases. Since sensitivity is 70% and specificity 90% based on this we calculate, Actual Class Predicted Class Positive Negative Positive 378 162 Negative 66 594 Since only 4% of the entire have that disease and since the data we selected is oversampled and based on the previous confusion matrix the corrected confusion matrix is Based on this we calculate, Actual Class Predicted Class Positive Negative Positive 34 14 Negative 115 1037 Misclassification rate = (FP+FN)/(TP+TN+FP+FN) = (15+115)/(33+1037+15+115) = 130/1200 = 0.108 Recall = TP/(TP+FP) = 33/(33+15) = 33/48 = 0.68 Precision = TP/(TP+FN) = 33/33+115 = 33/148 = 0.22 b. Recommend another scheme to deal with the unbalanced data for this data science team. By using oversampling on the training data instead of the validation, such as SMOTE we can deal with the unbalanced data as this would eliminate adjustments while calculating the parameters.

Problem 4 Coursera released a new data mining online seminar. To increase the purchase rate, the marketing team decided to feed the users customized advertisements. The customized ad will cost the company $2 per user. The registration fee for the seminar is $30. For example, the profit for Coursera will be $28 if a user sees the customized ad and purchases the seminar. If a user sees the customized ad but doesn’t purchase it, Coursera will lose $2. The marketing team then sought help from the data science team to maximize the profit. The data science team built a predictive model to classify a user as a potential purchaser or non-potential purchaser based on its enrolling history and profile information. The file coursera.xlsx contains the model output on the validation set. Note that 1 indicates the case of the registration to the seminar. a. Build a lift chart of net profit using the validation result. 0 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 Lift Chart Net profit using Predictive model Net profit using Base model b. Assume eight seats are remaining in the seminar. Based on the validation result, which eight users should the marketing team feed the advertisement to? How much profit will Coursera gain from these targeted users? And what is the lift of profit compared to the baseline? 1 2 3 4 5 6 7 8 9 10 -0.50 0.00 0.50 1.00 1.50 2.00 2.50 Decline-Wise Lift Chart Lift

If Coursera marketing team feeds the advertisement to top 8 users according to the predictive model, the profit will be $176 as compared to base line model profit of $106. The lift will be of 1.67. While the marketing team ignores top 12 or 30% users and feeds the advertisement to next 8 users then they gain a profit of $240 with a lift of 2.27.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Homework_3_group_15

Related Documents