PROJECT DOC
docx
keyboard_arrow_up
School
University of Nebraska, Lincoln *
*We aren’t endorsed by this school
Course
451
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
4
Uploaded by MateTankKudu37
TERM PROJECT- SCMA 451 PREDICTIVE
ANALYTICS
(Due on May 12
th
, 2023)
Aims: Data analysis, transformation, model development, assessment, and
prediction. Following deliverables will be submitted as part of this project:
1.
Written report in a Word Document: This report will include your answers
to the questions with the appropriate data analysis and model output.
(Organization of the report – 10 points)
2.
8-10
slides
PowerPoint
presentation:
This
presentation
should
be
intended
for
presenting and summarizing project steps and questions answered in the
report.
Discuss
with
your
group
how
to
organize
and
what
to
include
to
make
the point with the presentation. (Organization of the presentation 10
points)
3.
R
script
file
with
its
complied
report.
Be sure to submit the project report (PDF file) and R code to Canvas by the project deadline.
You must write up your course project results in a professional report, which should be no more than 15 double
-spaced pages long. The report should include substantive details of your analysis, and it should have several sections (e.g., Introduction, Analysis, Results, Conclusions). The report should provide sufficient details so that anyone with a reasonable statistical background can understand exactly what you have done. You should consider using tables and figures to enhance your report. The quality of your report including adherence to report guidelines stated; clarity of writing; organization and layout; appropriate use of tables and figures; careful proof-reading to minimize typos, incorrect spelling and grammatical errors will be considered in grading.
TERM
PROJECT
DESCRIPTION
MidWest University Foundation (MWUF) wishes to improve the cost- effectiveness of their direct marketing campaigns to previous donors. According to their recent mailing records, the typical overall response rate is 10%. Out of those who respond (donate) to the mailing, the average donation is $14.50. Each mailing costs $2.00 to produce and send; the mailing includes a gift of personalized address labels and assortment of cards and envelopes. It is not cost-effective to mail everyone because the expected profit from each mailing is 14.50 x 0.10 – 2 = -$0.55.
We would like to develop a classification model using data from the most recent campaign that can effectively capture likely donors so that the expected net profit is maximized. We would also like to build
a prediction model to predict expected gift amounts from donors – the data for this will consist of the records for donors only. The data are available in the file “MWUF.csv” (available in Canvas):
COURSE
PROJECT
REQUIREMENTS
1.
Discuss with your group how the CRISP-DM process would apply to this project. Explain the project goals and how each step applies to this project with 2-3 sentences.
2.
Check if there are any missing values in the dataset provided. If there are, discuss with your group how you would like to process the data and move forward for data analysis.
3.
Conduct exploratory data analysis on the data set prior to building classification and prediction models.
a.
Look at the correlations between donation amount (DAMT) and potential input
variables for predicting DAMT and also present these correlations in a table.
b.
Use proper data visualization tools to explore relations between potential input variables to predict DONR and DONR (do not include more than 5 visualizations).
4.
For predictive modeling purposes, form a data frame in RStudio and make sure all categorical
variables are coded as factors. Discuss if you need to make any other data transformation for this project.
5.
Develop following classification models for predicting DONR variable using any of the variables as predictors (do not include DAMT and REG1, REG2, REG3, REG4 variables
).
Use seed 123 for 70-30% data partitioning ratio for all models to train and test models’ predictive performance.
a.
Logistic regression model (LogR1). Which variables are statistically significant? State with the threshold value you use. Plot the ROC curve and state
the AUC statistic.
b.
Run feature selection over the LogR1 and call it LogR2. Explain the method you
used and which variables are in the final model. Plot the ROC curve and state
the AUC statistic.
c.
A decision tree classification model (DT). Explain which variables are used in the
model. Extract the rules from DR and state them. Plot the ROC curve and state the AUC statistic.
d.
A neural network model with 20 hidden nodes (ANN1). Plot the ROC curve and
state the AUC statistic.
e.
A neural network model with 100 hidden nodes (ANN2). Plot the ROC curve and
state the AUC statistic.
6.
Fill the following table for the models developed. (10)
MODEL
ACCURACY
PPV
AUC
F1- Score
LogR1
LogR2
DT
ANN1
ANN2
7.
Which model would you suggest MWUF should use for targeting donors using the statistics in the table above. Discuss which model would maximize the profit.
8.
Develop a prediction model for the DAMT variable using any of the variables as predictors (now except DONR and REG1, REG2, REG3, REG4 variables). Fit these following models:
a.
Ordinary least squares regression (LR1)
b.
Stepwise variable selection using the training data and evaluate the fitted models using the test data. Use “mean percent error” as the evaluation criteria and use your final selected prediction model to predict DAMT responses in the test dataset.
9.
Use the model you have identified in part 7 to make predictions in the MWUF_new.csv file to identify who would make donations.
10. Using the best model in 8 and the outcome of the predictions in part 9, predict how much each donor would make donation. Present the process you use to come up with these predictions. You can use a flowchart to explain it.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
DATA DICTIONARY
NAME
DATA TYPE
DESCRIPTION
INDICATION
ID NUMBER
NUMERICAL
Unique ID
Number
REG1 REG2
INDICATOR
Region
A “1” indicates the potential donor belongs to this region
REG3 REG4
HOME
INDICATOR
Homeowner Status
1 = homeowner
0 = not a homeowner
CHLD
NUMERICAL
Number of Children
Number (1, 2, 3, …)
HINC
CATEGORICAL
Household Income
7 categories
GENF
INDICATOR
Gender
0 = Male, 1 = Female
WRAT
INDICATOR
Wealth Rating (Wealth rating uses median family income & population stats from each area to index relative wealth within each state)
Segments denoted 0-9 (9 being the highest wealth group and 0 being the lowest)
AVHV
NUMERICAL
Average Home Value in potential donor's neighborhood in $ thousands
INCM
NUMERICAL
Median Family Income in potential donor's neighborhood
in $ thousands
INCA
NUMERICAL
Average Family Income in potential donor's neighborhood
in $ thousands
PLOW
NUMERICAL
% categorized “low income” in potential donor's neighborhood
Percentage %
NPRO
NUMERICAL
Lifetime number of promotions received to date
Number (1, 2, 3, …)
TGIF
NUMERICAL
Dollar amount of lifetime gifts to date
in $ Dollars
LGIF
NUMERICAL
Dollar amount of largest gift to date
in $ Dollars
RGIF
NUMERICAL
Dollar amount of most recent gift
in $ Dollars
TDON
NUMERICAL
Number of months since last donation
Number of Months
TLAG
NUMERICAL
Number of months between first and second gift
Number of Months
AGIF
NUMERICAL
Average dollar amount of gifts to date
in $ Dollars
DONR
INDICATOR
Classification Response Variable
1 = donor, 0 = non-donor
DAMT
NUMERICAL
Prediction Response Variable
Donation Amount in $
Related Documents
Related Questions
The Ministry of Tourism in Trinidad and Tobago is interested in developing a campaign to increase the number of visitors to the island. The Ministry in collaboration with the island’s hotels collected data to be used as a guide to determine what steps should be taken going forward. Using the data in the Microsoft Excel file attached you are required to use the knowledge you have acquired during the semester to answer the following question. Ensure that your responses are detailed and all the necessary steps are clearly outlined.
Derive a model for the estimation of the probability of returning to the island from the average money spent during the visit.
Discuss why regression analysis is important in decision-making.
arrow_forward
plz solve question (b) with explanation within 30-40 mins and get upvotes.
arrow_forward
Give 2 characteristics that indicate a linear model may be appropriate to model a data set
arrow_forward
How can you evaluate the accuracy of a forecast model?
arrow_forward
All analysis, calculations, and explanations must be done in a single Excel file (use separate Excel sheets for each question). Upload the completed Excel file using the file extension format Lastname_Firstname_RegressionProblem.
Regression Problem
Sarah Anderson, the business analyst at TV Revolution, is conducting research on the dealership’s various television brands. She has collected data over the past year (2022) on the manufacturer, screen size, and price of various television brands. The data is given in the file below.
You have been hired as an intern to run analyses on the data and report the results back to Sarah; the five questions that Sarah needs you to address are given below.
Does there appear to be a positive or negative relationship between price and screen size? Use a scatter plot to examine the relationship.
Determine and interpret the correlation coefficient between the two variables. In your interpretation, discuss the direction of the relationship (positive,…
arrow_forward
Fully discuss whether we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage, and why.
arrow_forward
What is model breakdown?
arrow_forward
Define multiperiod forecasting. Which Method Should we Use?
arrow_forward
Plot the sales revenue figures and appropriate trend line on a graph and identify from this the choice of model for working out the seasonal component.
Using the appropriate seasonal model, complete the calculations that are required to produce the required forecasts for quarters 1, 2, 3 and 4 of 2021.
On the basis of your analysis explain the recommendation that you would make to ALAW Ltd regarding its future plans.
arrow_forward
What connects both internal and external data in operations and supply chain analytics?
Ai
Danalytics
Teradata
Deep Learning.
arrow_forward
Population data for endangered panthers have been collected since 2010 and are displayed in the scatter plot.Part A: Calculate a curve of fit to model the population of the endangered panthers. Explain what the variables represent.Part B: Use the model to determine the predicted population of endangered panthers in the year 2019. Show all work.Part C: Use the model to determine the predicted population of endangered panthers in the year 2039. Is this an appropriate use of the model?
arrow_forward
If data set has a relationtionship that is best described by a linear model, then the residual plot will
arrow_forward
Alert dont submit AI generated answer.
arrow_forward
Can you answer A,B,C with clear answers. You can use the data in the second photo
arrow_forward
Two-dimension coding of quantitative data in charts is not the best practice?
True or false?
arrow_forward
What does a Model mean? Discuss FIVE characteristics of a Model.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Related Questions
- The Ministry of Tourism in Trinidad and Tobago is interested in developing a campaign to increase the number of visitors to the island. The Ministry in collaboration with the island’s hotels collected data to be used as a guide to determine what steps should be taken going forward. Using the data in the Microsoft Excel file attached you are required to use the knowledge you have acquired during the semester to answer the following question. Ensure that your responses are detailed and all the necessary steps are clearly outlined. Derive a model for the estimation of the probability of returning to the island from the average money spent during the visit. Discuss why regression analysis is important in decision-making.arrow_forwardplz solve question (b) with explanation within 30-40 mins and get upvotes.arrow_forwardGive 2 characteristics that indicate a linear model may be appropriate to model a data setarrow_forward
- How can you evaluate the accuracy of a forecast model?arrow_forwardAll analysis, calculations, and explanations must be done in a single Excel file (use separate Excel sheets for each question). Upload the completed Excel file using the file extension format Lastname_Firstname_RegressionProblem. Regression Problem Sarah Anderson, the business analyst at TV Revolution, is conducting research on the dealership’s various television brands. She has collected data over the past year (2022) on the manufacturer, screen size, and price of various television brands. The data is given in the file below. You have been hired as an intern to run analyses on the data and report the results back to Sarah; the five questions that Sarah needs you to address are given below. Does there appear to be a positive or negative relationship between price and screen size? Use a scatter plot to examine the relationship. Determine and interpret the correlation coefficient between the two variables. In your interpretation, discuss the direction of the relationship (positive,…arrow_forwardFully discuss whether we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage, and why.arrow_forward
- What is model breakdown?arrow_forwardDefine multiperiod forecasting. Which Method Should we Use?arrow_forwardPlot the sales revenue figures and appropriate trend line on a graph and identify from this the choice of model for working out the seasonal component. Using the appropriate seasonal model, complete the calculations that are required to produce the required forecasts for quarters 1, 2, 3 and 4 of 2021. On the basis of your analysis explain the recommendation that you would make to ALAW Ltd regarding its future plans.arrow_forward
- What connects both internal and external data in operations and supply chain analytics? Ai Danalytics Teradata Deep Learning.arrow_forwardPopulation data for endangered panthers have been collected since 2010 and are displayed in the scatter plot.Part A: Calculate a curve of fit to model the population of the endangered panthers. Explain what the variables represent.Part B: Use the model to determine the predicted population of endangered panthers in the year 2019. Show all work.Part C: Use the model to determine the predicted population of endangered panthers in the year 2039. Is this an appropriate use of the model?arrow_forwardIf data set has a relationtionship that is best described by a linear model, then the residual plot willarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage