MIS 655 Topic 7 DQ 1

docx

School

Grand Canyon University *

*We aren’t endorsed by this school

Course

MIS 655

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

2

Uploaded by MasterTitanium11775

Report
A banking company wants to build a neural network to predict who will default on a 30-year fixed-rate home mortgage loan. Historically, approximately 2.5% of individuals default. Given the small percentage of defaulters, what are some of the problems that may be encountered when fitting a neural network model? Is this a problem specific to neural networks, or is this a problem other modeling techniques have as well? What are some of the solutions that can be implemented to overcome the insufficient minority class problem? Provide two or three examples. The class imbalance problem typically occurs when there are many more instances of some classes than others. In such cases, standard classifiers tend to be overwhelmed by the large classes and ignore the small ones. Various strategies have been proposed to deal with class imbalance problems, such as increasing the penalty associated with misclassifying the positive class relative to the negative class, over-sampling the majority class, and undersampling the minority class. Among these approaches, the undersampling method has been very popular. Undersampling is sampling the over-represented class less often. Oversampling is sampling the under-represented class more often. Thresholding is after the neural network learns the weights, during inference, multiply the output class probability by a weight (the class prior, which is different for each class). One of the widely adopted class imbalance techniques for dealing with highly unbalanced datasets is called resampling. It consists of removing samples from the majority class (under- sampling) and/or adding more examples from the minority class (over-sampling). For extreme ratio of imbalance and large portion of classes being minority, undersampling performs on a par with oversampling. If training time is an issue, undersampling is a better choice in such a scenario since it dramatically reduces the size of the training set. Undersampling can be defined as  removing some observations of the majority class . This is done until the majority and minority class is balanced out. Undersampling can be a good choice when you have a ton of data -think millions of rows. But a drawback to undersampling is that we are removing information that may be valuable. Oversampling can be defined as adding more copies to the minority class. Oversampling can be a good choice when you don’t have a ton of data to work with. A con to consider when undersampling is that it can cause overfitting and poor generalization to your test set. To achieve the best accuracy, one should apply thresholding to compensate for prior class probabilities. A combination of thresholding with baseline and oversampling is the most preferable, whereas it should not be combined with undersampling. Oversampling should be applied to the level that eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance. The higher a fraction of minority classes in the imbalanced training set, the more imbalance ratio should be reduced. Oversampling does not cause overfitting of convolutional neural networks, as opposed to some classical machine learning models. Analytics Vidhya. (2024). 10 Techniques to Solve Imbalanced Classes in Machine Learning https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in- machine-learning/#Resampling_Techniques_to_Solve_Class_Imbalance
Brownlee, J. (2021). Machine Learning Mastery. Random Oversampling and Undersampling for Imbalanced Classification https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced- classification/
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help