Loan Repayment Prediction Using Machine Learning Techniques

LOAN REPAYMENT PREDICTION USING MACHINE LEARNING IDS 400 - Programming for Data Science University of Illinois at Chicago Spring 2021 Group Project Report Date: May 6, 2021

Table of Contents 1. Introduction 3 2. Project Description and Objectives 3 3. Business Value 4 4. Project Analysis 4 5. Packages and Functions Used: 5 6. Data Preparation 6 7. Exploratory Data Analysis 9 8. Data Transformation & Processing for Machine Learning 13 9. Machine Learning Models 14 Chosen Models: 14 LogisticRegression from sklearn.linear_model 15 DecisionTreeClassifier from sklearn.tree 15 KNeighborsClassifier from sklearn.neighbors 16 XGBClassifier from xgboost 16 Tuning the Parameters for our xGBoost Model 17 Scale_pos_weight 17 Max_depth 19 Learning_rate 20 10. Conclusion 21

1. Introduction According to marutitech.com, until recently, only the hedge funds were the primary users of Artificial Intelligence and Machine Learning in Finance, but the last few years have seen the applications of Machine Learning spreading to various other areas, including banks, fintech, regulators, and insurance firms, to name a few. Loans are the core business of many financial institutions and financial service providers. In the past, companies would have to rely on a limited amount of data, and a set of policies and processes in order to assess a customer’s financial position and their intention to repay before issuing a loan, which would be very time consuming. However, tremendous improvements in computational power and increased research and development of machine learning algorithms have helped make predictions quicker and more accurate, and it is not surprising that financial institutions and financial service providers are currently one of the top spenders in Big Data and Data Analytics. According to Soma Metrics, between 2014 and 2017, mortgage industry spending on big data increased from $2.6 billion to $3.2 billion. With the growing interest in using Machine Learning to predict the outcome and return from loans in the financial industry, we thought it would be interesting to see if we could come use machine learning algorithms to accurately predict whether a loan would default or not. 2. Project Description and Objectives The dataset we have decided to work with comes from Lending Club, which is a digital marketplace for peer-to-peer lending, connecting borrowers looking for loans and lenders interested in making an investment. They replace the high cost and complexity of bank lending with a faster, smarter way to borrow and invest, providing lower interest rates, better terms and amounts offered to borrowers by utilizing data, analytics, and technology-enabled models. The data in this dataset was initially scraped from the thousands of loans made through the Lending Club platform between 9/1/14 to 1/1/15. The raw dataset includes 81023 and 146 columns of information either pertaining to the loan or information on the financial history of the borrower which was captured at the time of the loan application.

Your preview ends here