PROJECT DOC

docx

School

University of Nebraska, Lincoln *

*We aren’t endorsed by this school

Course

451

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

4

Uploaded by MateTankKudu37

Report
TERM PROJECT- SCMA 451 PREDICTIVE ANALYTICS (Due on May 12 th , 2023) Aims: Data analysis, transformation, model development, assessment, and prediction. Following deliverables will be submitted as part of this project: 1. Written report in a Word Document: This report will include your answers to the questions with the appropriate data analysis and model output. (Organization of the report – 10 points) 2. 8-10 slides PowerPoint presentation: This presentation should be intended for presenting and summarizing project steps and questions answered in the report. Discuss with your group how to organize and what to include to make the point with the presentation. (Organization of the presentation 10 points) 3. R script file with its complied report. Be sure to submit the project report (PDF file) and R code to Canvas by the project deadline. You must write up your course project results in a professional report, which should be no more than 15 double -spaced pages long. The report should include substantive details of your analysis, and it should have several sections (e.g., Introduction, Analysis, Results, Conclusions). The report should provide sufficient details so that anyone with a reasonable statistical background can understand exactly what you have done. You should consider using tables and figures to enhance your report. The quality of your report including adherence to report guidelines stated; clarity of writing; organization and layout; appropriate use of tables and figures; careful proof-reading to minimize typos, incorrect spelling and grammatical errors will be considered in grading. TERM PROJECT DESCRIPTION MidWest University Foundation (MWUF) wishes to improve the cost- effectiveness of their direct marketing campaigns to previous donors. According to their recent mailing records, the typical overall response rate is 10%. Out of those who respond (donate) to the mailing, the average donation is $14.50. Each mailing costs $2.00 to produce and send; the mailing includes a gift of personalized address labels and assortment of cards and envelopes. It is not cost-effective to mail everyone because the expected profit from each mailing is 14.50 x 0.10 – 2 = -$0.55. We would like to develop a classification model using data from the most recent campaign that can effectively capture likely donors so that the expected net profit is maximized. We would also like to build a prediction model to predict expected gift amounts from donors – the data for this will consist of the records for donors only. The data are available in the file “MWUF.csv” (available in Canvas):
COURSE PROJECT REQUIREMENTS 1. Discuss with your group how the CRISP-DM process would apply to this project. Explain the project goals and how each step applies to this project with 2-3 sentences. 2. Check if there are any missing values in the dataset provided. If there are, discuss with your group how you would like to process the data and move forward for data analysis. 3. Conduct exploratory data analysis on the data set prior to building classification and prediction models. a. Look at the correlations between donation amount (DAMT) and potential input variables for predicting DAMT and also present these correlations in a table. b. Use proper data visualization tools to explore relations between potential input variables to predict DONR and DONR (do not include more than 5 visualizations). 4. For predictive modeling purposes, form a data frame in RStudio and make sure all categorical variables are coded as factors. Discuss if you need to make any other data transformation for this project. 5. Develop following classification models for predicting DONR variable using any of the variables as predictors (do not include DAMT and REG1, REG2, REG3, REG4 variables ). Use seed 123 for 70-30% data partitioning ratio for all models to train and test models’ predictive performance. a. Logistic regression model (LogR1). Which variables are statistically significant? State with the threshold value you use. Plot the ROC curve and state the AUC statistic. b. Run feature selection over the LogR1 and call it LogR2. Explain the method you used and which variables are in the final model. Plot the ROC curve and state the AUC statistic. c. A decision tree classification model (DT). Explain which variables are used in the model. Extract the rules from DR and state them. Plot the ROC curve and state the AUC statistic. d. A neural network model with 20 hidden nodes (ANN1). Plot the ROC curve and state the AUC statistic. e. A neural network model with 100 hidden nodes (ANN2). Plot the ROC curve and state the AUC statistic.
6. Fill the following table for the models developed. (10) MODEL ACCURACY PPV AUC F1- Score LogR1 LogR2 DT ANN1 ANN2 7. Which model would you suggest MWUF should use for targeting donors using the statistics in the table above. Discuss which model would maximize the profit. 8. Develop a prediction model for the DAMT variable using any of the variables as predictors (now except DONR and REG1, REG2, REG3, REG4 variables). Fit these following models: a. Ordinary least squares regression (LR1) b. Stepwise variable selection using the training data and evaluate the fitted models using the test data. Use “mean percent error” as the evaluation criteria and use your final selected prediction model to predict DAMT responses in the test dataset. 9. Use the model you have identified in part 7 to make predictions in the MWUF_new.csv file to identify who would make donations. 10. Using the best model in 8 and the outcome of the predictions in part 9, predict how much each donor would make donation. Present the process you use to come up with these predictions. You can use a flowchart to explain it.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
DATA DICTIONARY NAME DATA TYPE DESCRIPTION INDICATION ID NUMBER NUMERICAL Unique ID Number REG1 REG2 INDICATOR Region A “1” indicates the potential donor belongs to this region REG3 REG4 HOME INDICATOR Homeowner Status 1 = homeowner 0 = not a homeowner CHLD NUMERICAL Number of Children Number (1, 2, 3, …) HINC CATEGORICAL Household Income 7 categories GENF INDICATOR Gender 0 = Male, 1 = Female WRAT INDICATOR Wealth Rating (Wealth rating uses median family income & population stats from each area to index relative wealth within each state) Segments denoted 0-9 (9 being the highest wealth group and 0 being the lowest) AVHV NUMERICAL Average Home Value in potential donor's neighborhood in $ thousands INCM NUMERICAL Median Family Income in potential donor's neighborhood in $ thousands INCA NUMERICAL Average Family Income in potential donor's neighborhood in $ thousands PLOW NUMERICAL % categorized “low income” in potential donor's neighborhood Percentage % NPRO NUMERICAL Lifetime number of promotions received to date Number (1, 2, 3, …) TGIF NUMERICAL Dollar amount of lifetime gifts to date in $ Dollars LGIF NUMERICAL Dollar amount of largest gift to date in $ Dollars RGIF NUMERICAL Dollar amount of most recent gift in $ Dollars TDON NUMERICAL Number of months since last donation Number of Months TLAG NUMERICAL Number of months between first and second gift Number of Months AGIF NUMERICAL Average dollar amount of gifts to date in $ Dollars DONR INDICATOR Classification Response Variable 1 = donor, 0 = non-donor DAMT NUMERICAL Prediction Response Variable Donation Amount in $