DecisionTree RandomForest_Medhavi

docx

School

University of Illinois, Urbana Champaign *

*We aren’t endorsed by this school

Course

557

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by MajorDiscovery13494

Decision Tree Exercise 1. Use a graph (ggplot) to show the distribution of LIFETIME_GIFT_COUNT Code: ggplot(X15_donor_exercise, aes(x = LIFETIME_GIFT_COUNT)) + geom_histogram(binwidth = 1, fill = "blue", color = "black") + theme_minimal() + labs(title = "Distribution of Lifetime Gift Count", x = "Lifetime Gift Count", y = "Frequency") Output: 2. Use a graph (ggplot) to show relationship between MEDIAN_HOME_VALUE and LIFETIME_GIFT_COUNT Code: X15_donor_exercise$MEDIAN_HOME_VALUE <- gsub("[^0-9]", "", X15_donor_exercise$MEDIAN_HOME_VALUE) X15_donor_exercise$MEDIAN_HOME_VALUE <- as.numeric(X15_donor_exercise$MEDIAN_HOME_VALUE) ggplot(X15_donor_exercise, aes(x = MEDIAN_HOME_VALUE, y = LIFETIME_GIFT_COUNT)) + geom_point(alpha = 0.5) + theme_minimal() + labs(title = "Relationship between Median Home Value and Lifetime Gift Count", x = "Median Home Value", y = "Lifetime Gift Count")

Output: 3. Build a decision tree to predict if an individual should be targeted as a donor Code: library(rpart) library(rpart.plot) library(caret) X15_donor_exercise$TARGET_B <- as.factor(X15_donor_exercise$TARGET_B) set.seed(50) index <- createDataPartition(X15_donor_exercise$TARGET_B, p = 0.8, list = FALSE) trainData <- X15_donor_exercise[index,] testData <- X15_donor_exercise[-index,] model <- rpart(TARGET_B ~ ., data = trainData, method = "class") rpart.plot(model, type = 4, extra = 102)

Output: 4. (predict variable: TARGET_B). (set.seed(50)). Use 80% in the training set. Code: X15_donor_exercise$TARGET_B <- as.factor(X15_donor_exercise$TARGET_B) set.seed(50) index <- createDataPartition(X15_donor_exercise$TARGET_B, p = 0.8, list = FALSE) trainData <- X15_donor_exercise[index,] testData <- X15_donor_exercise[-index,] model <- rpart(TARGET_B ~ ., data = trainData, method = "class") rpart.plot(model, type = 4, extra = 102) predictions <- predict(model, testData, type = "class") confusionMatrix(predictions, testData$TARGET_B) Output: Confusion Matrix and Statistics Reference Prediction NO YES NO 1157 723 YES 114 245 Accuracy : 0.6262 95% CI : (0.6058, 0.6463) No Information Rate : 0.5677 P-Value [Acc > NIR] : 1.055e-08

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Kappa : 0.1767 Mcnemar's Test P-Value : < 2.2e-16 Sensitivity : 0.9103 Specificity : 0.2531 Pos Pred Value : 0.6154 Neg Pred Value : 0.6825 Prevalence : 0.5677 Detection Rate : 0.5167 Detection Prevalence : 0.8397 Balanced Accuracy : 0.5817 'Positive' Class : NO 5. Discuss the accuracy (sensitivity and specificity) of the prediction (with testing set) Accuracy tells us how often a model gets its predictions right overall. If a model has an accuracy of 62.62%, it means it guessed correctly a bit more than half the time whether someone should be considered a potential donor. Sensitivity is about how well the model identifies people who shouldn't be targeted for donations when they really shouldn't be. So, with a sensitivity of 91.03%, the model is good at recognizing who shouldn't be bothered for donations. Specificity, on the other hand, is about how well the model identifies people who should be targeted for donations when they should be. A specificity of 25.31% means the model struggles to correctly identify true potential donors. So, the model is good at avoiding wasting resources on people who won't donate (NO class) because it's good at sensitivity. However, it's not so great at spotting new potential donors (YES class) because its specificity is low. To make the model better, we need to improve its ability to find true potential donors while still being good at sensitivity. This would help the charity use its resources more effectively. 6. Recommend promotion strategies (to promote donations) for the company. Targeted Outreach Based on Predictive Model Insights: Use decision trees to find traits of past or potential donors. Customize outreach to match these traits, increasing appeal and likelihood of donation. Enhance Personalization: Personalize communication based on donor behavior and preferences. Use data analytics to understand past donation patterns, communication preferences, and interests to tailor your messages.

Leverage Social Proof and Community Building: Share stories of how donations have made an impact, including testimonials from beneficiaries and donors. This not only provides social proof but also helps potential donors feel connected to the cause. Optimize Digital Presence: Use social media and email marketing to reach potential donors, incorporating A/B testing to refine your messaging and improve engagement rates. Offer Recognition and Incentives: Recognize donations, such as donor walls, personalized thank you notes, or public acknowledgments (with permission). Recognition can motivate further donations. 7. Build a random forest using the dataset, and report the accuracy. Set.seed(50), ntree=500, mtry=5. ( please check if there is any character variable, if there is, change them into factor ) Code: library(randomForest) library(dplyr) X15_donor_exercise[sapply(X15_donor_exercise, is.character)] <- lapply(X15_donor_exercise[sapply(X15_donor_exercise, is.character)], as.factor) numeric_columns <- sapply(X15_donor_exercise, is.numeric) factor_columns <- sapply(X15_donor_exercise, is.factor) X15_donor_exercise[numeric_columns] <- lapply(X15_donor_exercise[numeric_columns], function(x) ifelse(is.na(x), median(x, na.rm = TRUE), x)) X15_donor_exercise[factor_columns] <- lapply(X15_donor_exercise[factor_columns], function(x) ifelse(is.na(x), as.factor(names(which.max(table(x)))), x)) X15_donor_exercise$TARGET_B <- as.factor(X15_donor_exercise$TARGET_B) set.seed(50) index <- createDataPartition(X15_donor_exercise$TARGET_B, p = 0.8, list = FALSE) trainData <- X15_donor_exercise[index,] testData <- X15_donor_exercise[-index,] rf_model <- randomForest(TARGET_B ~ ., data = trainData, ntree = 500, mtry = 5) predictions <- predict(rf_model, testData) accuracy <- sum(predictions == testData$TARGET_B) / nrow(testData) print(accuracy) Output:

> # Calculate accuracy > accuracy <- sum(predictions == testData$TARGET_B) / nrow(testData) > print(accuracy) [1] 0.6552032 Variable Description CARD_PROM_12 Number of card promotions sent to the individual by the charitable organization in the past 12 months DONOR_AGE Age as of last year's mail solicitation INCOME_GROUP one of 7 possible income level groups based on a number of demographic characteristics LIFETIME_CARD_PROM total number of card promotions sent to the individual by the charitable organization LIFETIME_GIFT_COUNT total number of donations from the individual to the charitable organization MEDIAN_HOME_VALUE median home value (in $100) as determined by other input variables MEDIAN_HOUSEHOLD_INCO ME median household income (in $100) as determined by other input variables MONTHS_SINCE_FIRST_GIFT number of months since the first donation from the individual to the charitable organization MONTHS_SINCE_LAST_GIFT number of months since the most recent donation from the individual to the charitable organization MONTHS_SINCE_LAST_PRO M_RESP number of months since the individual has responded to a promotion by the charitable organization MONTHS_SINCE_ORIGIN number of months that the individual has been in the charitable organization's database NUMBER_PROM_12 number of promotions (card or other) sent to the individual by the charitable organization in the past 12 months PER_CAPITA_INCOME per capita income (in $) of the neighborhood in which the individual lives RECENT_CARD_RESPONSE_ COUNT number of times the individual has responded to a card solicitation from the charitable organization since four years ago RECENT_RESPONSE_COUNT number of times the individual has responded to a promotion (card or other) from the charitable organization since four years ago WEALTH_RATING one of 10 possible wealth rating groups based on a number of demographic characteristics

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

DecisionTree RandomForest_Medhavi

Related Documents