Define accuracy for binary and multi-class classifications.
Define accuracy for binary and multi-class classifications.
Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions that our model has fulfilled. Formally, precision has the following definition:
Accuracy = Number of correct predictions / Total number of predicitons
For binary classification, accuracy can also be calculated in terms of positive and negative values as follows:
Accuracy = (TP + TN) / (TP + TN + FP +FN)
Where TP = True Positives, TN = True Negatives, FP = False Positives and FN = False Negatives.
Let's try to calculate the accuracy for the following model that classified 100 tumors as malignant (positive class) or benign (negative class):
True Positive (TP):
Reality: Malignant
Predicted ML model: Malignant
Number of TP results: 1
False positive (FP):
Reality: Benign
Predicted ML model: Malignant
Number of FP results: 1
False negative (FN):
Reality: Malignant
Predicted ML model: Benign
Number of FN results: 8
True Negative (TN):
Reality: Benign
Predicted ML model: Benign
Number of TN results: 90
The accuracy comes out to 0.91 or 91% (91 correct predictions out of a total of 100 examples). That means our tumor classifier is doing a great job of identifying malignancies, right?
In fact, let's do a closer analysis of the positives and negatives to get a better insight into the performance of our model.
Of the 100 tumor examples, 91 are benign (90 TN and 1 FP) and 9 are malignant (1 TP and 8 FN).
Of the 91 benign tumors, the model correctly identifies 90 as benign. It is well. However, out of 9 cancers, the model correctly identifies only 1 as malignant - a terrible result, since 8 out of 9 cancers go undiagnosed!
While 91% accuracy may seem good at first glance, another tumor classifier model that always predicts benign would achieve exactly the same accuracy (91/100 correct predictions) on our examples. In other words, our model is no better than a model that has zero predictive ability to distinguish malignant tumors from benign tumors.
Accuracy alone doesn't tell the whole story when you're working with a class-imbalanced dataset like this, where there's a significant disparity between the number of positive and negative labels.
Step by step
Solved in 2 steps