shurenb2_HW 8_Classifiers_student.ipynb - Colaboratory

pdf

School

William Rainey Harper College *

*We aren’t endorsed by this school

Course

MISC

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by elliebat

4/9/23, 10:15 PM shurenb2_HW 8_Classifiers_student.ipynb - Colaboratory https://colab.research.google.com/drive/1fwUSKjz7CZjzcP2EUZsUYqUebUGd7WNC#scrollTo=vg6YnVRB2Ihq&printMode=true 1/5 In this assignment, you will calculate classi±cation metrics for a dataset. You are given a dataset with two columns: 1. purchase , which represents the true classes (0 for not purchased and 1 for purchased), and 2. purchase_prob , which represents the predicted probability of purchase for each observation. You will calculate the following classi±cation metrics using different thresholds: Accuracy Precision Recall Speci±city In particular, you will take the following steps: Load the dataset Calculate the classi±cation metrics based on different thresholds manually or by using the dmba package function classificationSummary Create three columns for predicted class, one for each of the following threshold values: 0.35, 0.5 and 0.65. Report the classi±cation metrics for the three different thresholds Calculate thee following metrics Accuracy Precision Recall Speci±city Once you have taken these steps, answer the quiz questions on Canvas. Overview Install and Import Packages !pip install dmba Looking in indexes: https://pypi.org/simple , https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting dmba Downloading dmba-0.1.0-py3-none-any.whl (11.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 19.0 MB/s eta 0:00:00 Installing collected packages: dmba Successfully installed dmba-0.1.0 import pandas as pd import numpy as np from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score from dmba import classificationSummary no display found. Using non-interactive Agg backend The confusion matrix data is saved at https://raw.githubusercontent.com/irenebao2020/badm211/main/HW8.csv Import and inspect the dataset (1 point) #import the dataset df=pd.read_csv("https://raw.githubusercontent.com/irenebao2020/badm211/main/HW8.csv") #show summary statistics of the dataset df.describe()

4/9/23, 10:15 PM shurenb2_HW 8_Classifiers_student.ipynb - Colaboratory https://colab.research.google.com/drive/1fwUSKjz7CZjzcP2EUZsUYqUebUGd7WNC#scrollTo=vg6YnVRB2Ihq&printMode=true 2/5 purchase purchase_prob count 1000.000000 1000.000000 mean 0.099000 0.502628 std 0.298811 0.291268 min 0.000000 0.000804 25% 0.000000 0.259324 50% 0.000000 0.497751 75% 0.000000 0.757124 max 1.000000 0.998261 In the following section, you will create predictions using a threshold value of 0.35. Then, generate the confusion matrix and calculate various prediction accuracy measures. Threshold = 0.35 Create a column of predicted class values using the threshold 0.35 (1 point) # Create a new column with predicted labels based on the threshold df["purchase_class"] = np.where(df['purchase_prob'] >= 0.35, 1, 0) Show the confusion matrix using the function " classificationSummary " (1 point) # Calculate the confusion matrix print(classificationSummary(df["purchase"], df["purchase_class"])) Confusion Matrix (Accuracy 0.3690) Prediction Actual 0 1 0 304 597 1 34 65 None Show the precision and recall of the model (1 point) accuracy_score(df["purchase"], df["purchase_class"]) 0.369 precision_score(df["purchase"], df["purchase_class"]) 0.09818731117824774 recall_score(df["purchase"], df["purchase_class"]) 0.6565656565656566 Q1 What is the total number of actual purchases (i.e., observations where purchase is 1) in the dataset? A) 99 B) 891 C) 340 D) 660 Q2 What is the accuracy of the model? A) 0.369 B) 0.443 C) 0.531 D) 0.670 Q3 What is the precision of the model? A) 0.10 B) 0.33 C) 0.67 D) 0.79

4/9/23, 10:15 PM shurenb2_HW 8_Classifiers_student.ipynb - Colaboratory https://colab.research.google.com/drive/1fwUSKjz7CZjzcP2EUZsUYqUebUGd7WNC#scrollTo=vg6YnVRB2Ihq&printMode=true 3/5 Q4 What is the recall of the model? A) 0.19 B) 0.33 C) 0.66 D) 0.83 Go through the same exercise as above, this time using a threshold of 0.5. Threshold = 0.5 Create a column of predicted class values using the threshold 0.50 (1 point) # Create a new column with predicted labels based on the threshold df["purchase_class"] = np.where(df['purchase_prob'] >= 0.5, 1, 0) Show the confusion matrix using the function " classificationSummary " (1 point) # Calculate the confusion matrix print(classificationSummary(df["purchase"], df["purchase_class"])) Confusion Matrix (Accuracy 0.4970) Prediction Actual 0 1 0 451 450 1 53 46 None Show the precision and recall of the model (1 point) # write your code here accuracy_score(df["purchase"], df["purchase_class"]) 0.497 precision_score(df["purchase"], df["purchase_class"]) 0.09274193548387097 recall_score(df["purchase"], df["purchase_class"]) 0.46464646464646464 Q5 What is the accuracy at threshold = 0.5? A) 0.50 B) 0.90 C) 0.46 D) 0.85 Q6 What is the precision of the model? A) 0.11 B) 0.09 C) 0.46 D) 0.49 Q7 What is the recall of the model? A) 0.11 B) 0.09 C) 0.46 D) 0.49

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4/9/23, 10:15 PM shurenb2_HW 8_Classifiers_student.ipynb - Colaboratory https://colab.research.google.com/drive/1fwUSKjz7CZjzcP2EUZsUYqUebUGd7WNC#scrollTo=vg6YnVRB2Ihq&printMode=true 4/5 Go through the same exercise as above, this time using a threshold of 0.65. Threshold = 0.65 Create a column of predicted class values using the threshold 0.65 (1 point) # Create a new column with predicted labels based on the threshold df["purchase_class"] = np.where(df['purchase_prob'] >= 0.65, 1, 0) Show the confusion matrix using the function " classificationSummary " (1 point) # Calculate the confusion matrix print(classificationSummary(df["purchase"], df["purchase_class"])) Confusion Matrix (Accuracy 0.6130) Prediction Actual 0 1 0 582 319 1 68 31 None Show the precision and recall of the model (1 point) #write your code here accuracy_score(df["purchase"], df["purchase_class"]) 0.613 precision_score(df["purchase"], df["purchase_class"]) 0.08857142857142856 recall_score(df["purchase"], df["purchase_class"]) 0.31313131313131315 Q8 What is the accuracy at threshold = 0.65? A) 0.61 B) 0.08 C) 0.31 D) 0.43 Q9 What is the recall of the model? A) 0.61 B) 0.08 C) 0.31 D) 0.43 Q10 What happens to recall when we increase the threshold and why? A) Recall increases at higher threshold because more observations are classi±ed as positive at higher threshold. B) Recall decreases at higher threshold because more observations are classi±ed as positive at higher threshold. C) Recall decreases at higher threshold because less observations are classi±ed as positive at higher threshold. D) Recall increases at higher threshold because less observations are classi±ed as positive at higher threshold. #Answer: Recall decreases at higher threshold because less observations are classified as positive at higher threshold.