A large number of insurance records are to be examined to develop a model for predicting fraudulent claims. Of the claims in the historical database, 1% were judged to be fraudulent. A sample database is taken to develop a model, and oversampling is used to provide a balanced sample in light of the very low response rate. When applied to this sample database (total number of records, N =800), the model ends up correctly classifying 310 frauds, and 270 non-frauds. It misses 90 frauds, and classified 130 records incorrectly as frauds when they were not. a. Produce the classification matrix for the sample as it stands. b. Find the adjusted misclassification rate (adjusting for the oversampling). c. What percentage of new records would you expected to be classified as fraudulent?
A large number of insurance records are to be examined to develop a model for predicting fraudulent claims. Of the claims in the historical database, 1% were judged to be fraudulent.
A sample database is taken to develop a model, and oversampling is used to provide a balanced sample in light of the very low response rate. When applied to this sample database (total number of records, N =800), the model ends up correctly classifying 310 frauds, and 270 non-frauds. It misses 90 frauds, and classified 130 records incorrectly as frauds when they were not.
a. Produce the classification matrix for the sample as it stands.
b. Find the adjusted misclassification rate (adjusting for the oversampling).
c. What percentage of new records would you expected to be classified as fraudulent?
Trending now
This is a popular solution!
Step by step
Solved in 2 steps