DE PT 1 Report

docx

School

Georgian College *

*We aren’t endorsed by this school

Course

105

Subject

Information Systems

Date

Apr 3, 2024

Type

docx

Pages

4

Uploaded by BailiffRose16582

Report
Dataset Exploration Part 1 Analysis Report – Loan Approval -By Kartik Description of dataset: The chosen dataset represents the loan approval or rejection of the customer. It contains 4269 records and has 13 attributes/variables. The dataset has been divided into two halves using random functions. The first half of the data contains 2229 records and is labelled as training data. The other half contains the remaining records and is labelled as test data which will be used later to check the precision of our analysis. Arranging the dataset: After having a good look at the dataset, it has be re-arranged for better analysis. All the dependent categorical variables are placed in left and all numerical variables including the result(categorical) placed on the right side. Freeze pans set for the column headings. After the rearrangement of the data, all the variables were analyzed for any missing or invalid values.
Dataset Dictionary Variable Description Ranges Loan ID Unique Loan/Applicant ID 1 to 4269 No of dependents No of dependents of the applicant (discrete Continuous) 0 to 5 Education Level of education of the applicant(Dichotomous) Graduate/Not Graduate Self employed Employment status of the applicant(Dichotomous) Yes/No Income annum Annual income (in INR) of the applicant 200000 to 9900000 Loan amount Amount of loan requested by the applicant 300000 to 39500000 Loan term Tenure of Loan (in years) 2 to 20 years Cibil score Credit score of the applicant 300 to 900 Residential assets value Value of the residential assets of the applicant 100000 to 29100000 Commercial assets value Value of the commercial assets of the applicant 0 to 19400000 Luxury assets values Value of the Luxury assets of the applicant 300000 to 39200000 Bank assets values Value of the Bank assets of the applicant 0 to 14700000 Loan status Status of the loan (Dichotomous) (Approved/Rejected)
Assumptions: As the currency unit is not given in the data and by evaluating the average annual income of the geographic location, it is assumed that it’s in INR Indian Rupee ( ). As multiple assets are given, making a new variable named Total_assets_value which is total of all the assets can provide a detailed insight in the loan approval process. Some applicants have negative assets values under the variable residential asset value which is not possible. To handle these invalid values, using the ABS functions to remove the – sign Variable MIN MAX COUNT zero COUNT blank loan_id 1 4269 0 0 no_of_dependents 0 5 712 0 income_annum 200000 9900000 0 0 loan_amount 300000 39500000 0 0 loan_term 2 20 0 0 cibil_score 300 900 0 0 residential_assets_val ue -100000 29100000 45 0 commercial_assets_v alue 0 19400000 107 0 luxury_assets_value 300000 39200000 0 0 bank_asset_value 0 14700000 8 0 FINER Research Questions: 1. What are the factors that heavily influence the approval of loans? Is it their income, their assets, or their educational background? 2. How do their individual and collective impacts vary? 3. Does CIBIL score really affect the chances of getting a loan approved? 4. How does the number of dependents of the applicant affect their loan application?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Motivation: There are several reasons to study loan approval process in depth. Some of them are as follows: It might yield important insights into application approval trends, risk evaluation and decision-making procedures. By understanding the process, loan approval processes can be made more accurate and financial risks can be reduced. This will be beneficial for both customers and financial institutions as customers can do pre assessment on their application before applying for a loan approval to increase their chances of getting their loan approved. Source of the dataset: Loan Approval Dataset - Kaggle