HOMEWORK 3

docx

School

Saint Paul College *

*We aren’t endorsed by this school

Course

2420

Subject

Computer Science

Date

Apr 3, 2024

Type

docx

Pages

9

Uploaded by Aishaomar922

Report
HOMEWORK 3 -- Classification Business Context: Banks/Financial Companies 1) Load the data Load the data. Then, use table() to obtain information on how many customers defaulted and did not default in this month, respectively. 2) Use ggplot2 to create a boxplot of Age grouped by DEFAULT . You plot should have proper title and axis labels. Based on the plot, do the defaulters and non-defaulters differ by age? o Based on the median values presented on the box plot, it appears that defaulters differ in age compared to non-defaulters. The median age of defaulters is observed to be younger than that of non-defaulters.
3) Use ggplot2 to create a boxplot of LIMIT_BAL by DEFAULT . You plot should have proper title and axis labels. Please set your y = LIMIT_BAL/1000 to make the numbers on the Y axis more readable. Based on the plot, do the defaulters and non-defaulters differ in credit limit? o The credit limit doesn’t differ by aloo the median values look close to each other and box look about the same size and the outer whiskers and extended point on both. 4) Split the data into 80% training data and 20% test data. to increase the reproducibility of the results, set the random seed to 123456 using set. Seed(123456) before random partitioning.
B. k-NN 1) Next, you decide to train a k-NN model First, assess whether you need to standardize the data and explain why (or why not) . If yes, apply the standardization to the chosen fields, making sure not to override the original dataset. 2) Train a k-NN model Train the model for 5 different values of k : 5,10,20,30,40 Which k is the best according to the model summary? o If you prioritize accuracy, then k=40 would be considered the best. On the other hand, if you prioritize Kappa, then k=5 would be considered the best.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3) Plot the accuracy of kNN as a function of k Explain the concept of accuracy. Explain how the model accuracy changes with the number of neighbors k. Note: since the model is trained on a random subset of data, we may get slightly different results
- The accuracy is increasing as the umber of neighbors increase. This meaning the more people in the dataset the more accurate the dataset will become. 4) Make (class) predictions on the test set using the k-NN model 5) Create the confusion matrix In doing so, please use "YES" as the positive class. The chosen mode of confusion Matrix should allow you to answer the questions below. Use the results to answer: o Among 100 customers predicted to default on their debt, how many are expected to truly default. o For every 100 default customers, how many are expected to be caught by this algorithm (i.e. classified as "YES"). o What does the accuracy you obtained in the result tell us? Report the accuracy, recall, precision, and F1-Score of the positive class (DEFAULT=YES) This metric indicates the overall correctness of the model's predictions. In this case, the model is correct approximately 78.46% of the time.
C. Decision Trees 1) Next, you decide to train a Decision Tree model Decide whether to use the standardized data or not and explain your choice . Use the data partitions you have created earlier, if possible. Otherwise, create the partitions. The decision to use the standardized dataset stems from the presence of default instances and concerns about overlapping variables in the non-standardized data. Standardizing ensures proportional feature contribution, mitigates scale-related issues, and maintains consistency across models 2) Train the decision tree It is ok if your tree is small or very large. If the tree obtained does not seem meaningful (for example, it only shows one node and no branches), you may run it again.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3) Based on the tree you obtain, answer the following questions What attributes are used in your decision tree? Did it use all the attributes available? Interpret the label on one of the nodes. Pick the first two leaf nodes , identify the conditions leading to these nodes and the prediction made for these nodes. First Leaf Node: Conditions: Marriage: No Age: Less than 27 Limit balance: Less than 125,000 Pay amount: Less than 855 Prediction: No default (0.31) Second Leaf Node: Conditions: Marriage: No Age: Less than 27 Limit balance: Less than 125,000 Pay amount: Greater than or equal to 855 Gender: 2 (unclear representation) Prediction: Default (0.74) These leaf nodes represent specific conditions under which predictions are made regarding loan default. For example, the first leaf node predicts a lower probability of default (0.31) for borrowers who are not married, younger than 27, have a limit balance less than 125,000, and have made a payment of less than 855. The second leaf node, under similar demographic conditions, predicts a higher probability of default (0.74) when the payment amount is greater than or equal to 855, with an additional condition related to an unclear representation of gender.
4) Make the class predictions on the test set and produce the confusion matrix Save the predictions in DT_predictions. Produce a confusion matrix. Use the YES as the positive class. Use the same mode as specified in kNN. Identify accuracy, recall, precision, and F1-Score of the positive class ("YES") Compared the performance of the kNN and DT models, which one you would prefer and why? Based on the provided metrics, the Decision Tree model generally outperforms the kNN model across various evaluation criteria. Therefore, if you prioritize overall performance, precision,
recall, and balanced accuracy, you may prefer the Decision Tree model.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help