DecisionTree RandomForest_Medhavi
docx
keyboard_arrow_up
School
University of Illinois, Urbana Champaign *
*We aren’t endorsed by this school
Course
557
Subject
Industrial Engineering
Date
Apr 3, 2024
Type
docx
Pages
6
Uploaded by MajorDiscovery13494
Decision Tree Exercise
1.
Use a graph (ggplot) to show the distribution of LIFETIME_GIFT_COUNT
Code:
ggplot(X15_donor_exercise, aes(x = LIFETIME_GIFT_COUNT)) +
geom_histogram(binwidth = 1, fill = "blue", color = "black") +
theme_minimal() +
labs(title = "Distribution of Lifetime Gift Count",
x = "Lifetime Gift Count",
y = "Frequency")
Output:
2.
Use a graph (ggplot) to show relationship between MEDIAN_HOME_VALUE and LIFETIME_GIFT_COUNT
Code:
X15_donor_exercise$MEDIAN_HOME_VALUE <- gsub("[^0-9]", "", X15_donor_exercise$MEDIAN_HOME_VALUE)
X15_donor_exercise$MEDIAN_HOME_VALUE <- as.numeric(X15_donor_exercise$MEDIAN_HOME_VALUE)
ggplot(X15_donor_exercise, aes(x = MEDIAN_HOME_VALUE, y = LIFETIME_GIFT_COUNT)) +
geom_point(alpha = 0.5) +
theme_minimal() +
labs(title = "Relationship between Median Home Value and Lifetime Gift Count",
x = "Median Home Value",
y = "Lifetime Gift Count")
Output:
3.
Build a decision tree to predict if an individual should be targeted as a donor Code:
library(rpart)
library(rpart.plot)
library(caret)
X15_donor_exercise$TARGET_B <- as.factor(X15_donor_exercise$TARGET_B)
set.seed(50)
index <- createDataPartition(X15_donor_exercise$TARGET_B, p = 0.8, list = FALSE)
trainData <- X15_donor_exercise[index,]
testData <- X15_donor_exercise[-index,]
model <- rpart(TARGET_B ~ ., data = trainData, method = "class")
rpart.plot(model, type = 4, extra = 102)
Output:
4.
(predict variable: TARGET_B).
(set.seed(50)). Use 80% in the training set.
Code:
X15_donor_exercise$TARGET_B <- as.factor(X15_donor_exercise$TARGET_B)
set.seed(50)
index <- createDataPartition(X15_donor_exercise$TARGET_B, p = 0.8, list = FALSE)
trainData <- X15_donor_exercise[index,]
testData <- X15_donor_exercise[-index,]
model <- rpart(TARGET_B ~ ., data = trainData, method = "class")
rpart.plot(model, type = 4, extra = 102)
predictions <- predict(model, testData, type = "class")
confusionMatrix(predictions, testData$TARGET_B)
Output:
Confusion Matrix and Statistics
Reference
Prediction NO YES
NO 1157 723
YES 114 245
Accuracy : 0.6262 95% CI : (0.6058, 0.6463)
No Information Rate : 0.5677 P-Value [Acc > NIR] : 1.055e-08
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Kappa : 0.1767 Mcnemar's Test P-Value : < 2.2e-16 Sensitivity : 0.9103 Specificity : 0.2531 Pos Pred Value : 0.6154 Neg Pred Value : 0.6825 Prevalence : 0.5677 Detection Rate : 0.5167 Detection Prevalence : 0.8397 Balanced Accuracy : 0.5817 'Positive' Class : NO 5.
Discuss the accuracy (sensitivity and specificity) of the prediction (with testing set)
Accuracy tells us how often a model gets its predictions right overall. If a model has an accuracy
of 62.62%, it means it guessed correctly a bit more than half the time whether someone should be considered a potential donor.
Sensitivity is about how well the model identifies people who shouldn't be targeted for donations when they really shouldn't be. So, with a sensitivity of 91.03%, the model is good at recognizing who shouldn't be bothered for donations.
Specificity, on the other hand, is about how well the model identifies people who should be targeted for donations when they should be. A specificity of 25.31% means the model struggles to correctly identify true potential donors.
So, the model is good at avoiding wasting resources on people who won't donate (NO class) because it's good at sensitivity. However, it's not so great at spotting new potential donors (YES class) because its specificity is low. To make the model better, we need to improve its ability to find true potential donors while still being good at sensitivity. This would help the charity use its
resources more effectively.
6.
Recommend promotion strategies (to promote donations) for the company. Targeted Outreach Based on Predictive Model Insights:
Use decision trees to find traits of past or potential donors. Customize outreach to match these traits, increasing appeal and likelihood of donation.
Enhance Personalization:
Personalize communication based on donor behavior and preferences. Use data analytics to understand past donation patterns, communication preferences, and interests to tailor your messages.
Leverage Social Proof and Community Building:
Share stories of how donations have made an impact, including testimonials from beneficiaries and donors. This not only provides social proof
but also helps potential donors feel connected to the cause.
Optimize Digital Presence:
Use social media and email marketing to reach potential donors, incorporating A/B testing to refine your messaging and improve engagement rates.
Offer Recognition and Incentives:
Recognize donations, such as donor walls, personalized thank you notes, or public acknowledgments (with permission). Recognition can motivate further donations.
7.
Build a random forest using the dataset, and report the accuracy. Set.seed(50), ntree=500,
mtry=5. (
please check if there is any character variable, if there is, change them into factor
)
Code:
library(randomForest)
library(dplyr)
X15_donor_exercise[sapply(X15_donor_exercise, is.character)] <- lapply(X15_donor_exercise[sapply(X15_donor_exercise, is.character)], as.factor)
numeric_columns <- sapply(X15_donor_exercise, is.numeric)
factor_columns <- sapply(X15_donor_exercise, is.factor)
X15_donor_exercise[numeric_columns] <- lapply(X15_donor_exercise[numeric_columns], function(x) ifelse(is.na(x), median(x, na.rm = TRUE), x))
X15_donor_exercise[factor_columns] <- lapply(X15_donor_exercise[factor_columns], function(x) ifelse(is.na(x), as.factor(names(which.max(table(x)))), x))
X15_donor_exercise$TARGET_B <- as.factor(X15_donor_exercise$TARGET_B)
set.seed(50)
index <- createDataPartition(X15_donor_exercise$TARGET_B, p = 0.8, list = FALSE)
trainData <- X15_donor_exercise[index,]
testData <- X15_donor_exercise[-index,]
rf_model <- randomForest(TARGET_B ~ ., data = trainData, ntree = 500, mtry = 5)
predictions <- predict(rf_model, testData)
accuracy <- sum(predictions == testData$TARGET_B) / nrow(testData)
print(accuracy)
Output:
> # Calculate accuracy
> accuracy <- sum(predictions == testData$TARGET_B) / nrow(testData)
> print(accuracy)
[1] 0.6552032
Variable
Description
CARD_PROM_12
Number of card promotions sent to the individual by the charitable organization in the past 12 months
DONOR_AGE
Age as of last year's mail solicitation
INCOME_GROUP
one of 7 possible income level groups based on a number of demographic characteristics
LIFETIME_CARD_PROM
total number of card promotions sent to the individual by the charitable organization
LIFETIME_GIFT_COUNT
total number of donations from the individual to the charitable organization
MEDIAN_HOME_VALUE
median home value (in $100) as determined by other input variables
MEDIAN_HOUSEHOLD_INCO
ME
median household income (in $100) as determined by other input variables
MONTHS_SINCE_FIRST_GIFT
number of months since the first donation from the individual to the charitable organization
MONTHS_SINCE_LAST_GIFT
number of months since the most recent donation from the individual to the charitable organization
MONTHS_SINCE_LAST_PRO
M_RESP
number of months since the individual has responded to a promotion by the charitable organization
MONTHS_SINCE_ORIGIN
number of months that the individual has been in the charitable organization's database
NUMBER_PROM_12
number of promotions (card or other) sent to the individual by the charitable organization in the past 12 months
PER_CAPITA_INCOME
per capita income (in $) of the neighborhood in which the individual lives
RECENT_CARD_RESPONSE_
COUNT
number of times the individual has responded to a card solicitation from the charitable organization since four years ago
RECENT_RESPONSE_COUNT
number of times the individual has responded to a promotion (card or other) from the charitable organization since four years ago
WEALTH_RATING
one of 10 possible wealth rating groups based on a number of demographic characteristics
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help