DE PT 1 Report
docx
keyboard_arrow_up
School
Georgian College *
*We aren’t endorsed by this school
Course
105
Subject
Information Systems
Date
Apr 3, 2024
Type
docx
Pages
4
Uploaded by BailiffRose16582
Dataset Exploration Part 1
Analysis Report – Loan Approval -By Kartik
Description of dataset:
The chosen dataset represents the loan approval or rejection of the
customer.
It contains 4269 records and has 13 attributes/variables.
The dataset has been divided into two halves using random functions. The
first half of the data contains 2229 records and is labelled as training data.
The other half contains the remaining records and is labelled as test data
which will be used later to check the precision of our analysis.
Arranging the dataset:
After having a good look at the dataset, it has be re-arranged for
better analysis.
All the dependent categorical variables are placed in left and all
numerical variables including the result(categorical) placed on the
right side.
Freeze pans set for the column headings.
After the rearrangement of the data, all the variables were analyzed
for any missing or invalid values.
Dataset Dictionary
Variable
Description
Ranges
Loan ID
Unique Loan/Applicant ID
1 to 4269
No of dependents
No of dependents of the applicant (discrete Continuous)
0 to 5
Education
Level of education of the applicant(Dichotomous)
Graduate/Not Graduate
Self employed
Employment status of the applicant(Dichotomous)
Yes/No
Income annum
Annual income (in INR) of the applicant
₹
200000 to 9900000
Loan amount
Amount of loan requested by the applicant
₹
300000 to 39500000
Loan term
Tenure of Loan (in years)
2 to 20 years
Cibil score
Credit score of the applicant 300 to 900
Residential assets value
Value of the residential assets of the applicant
₹
100000 to 29100000
Commercial assets value
Value of the commercial assets of the applicant
₹
0 to 19400000
Luxury assets values
Value of the Luxury assets of the applicant
₹
300000 to 39200000
Bank assets values
Value of the Bank assets of the applicant
₹
0 to 14700000
Loan status
Status of the loan (Dichotomous)
(Approved/Rejected)
Assumptions:
As the currency unit is not given in the data and by evaluating the
average annual income of the geographic location, it is assumed that
it’s in INR Indian Rupee (
).
₹
As multiple assets are given, making a new variable named
Total_assets_value which is total of all the assets can provide a
detailed insight in the loan approval process.
Some applicants have negative assets values under the variable
residential asset value which is not possible. To handle these invalid
values, using the ABS functions to remove the – sign
Variable
MIN
MAX
COUNT zero
COUNT
blank
loan_id
1
4269
0
0
no_of_dependents
0
5
712
0
income_annum
200000
9900000
0
0
loan_amount
300000
39500000
0
0
loan_term
2
20
0
0
cibil_score
300
900
0
0
residential_assets_val
ue
-100000
29100000
45
0
commercial_assets_v
alue
0
19400000
107
0
luxury_assets_value
300000
39200000
0
0
bank_asset_value
0
14700000
8
0
FINER Research Questions:
1.
What are the factors that heavily influence the approval of loans? Is
it their income, their assets, or their educational background?
2.
How do their individual and collective impacts vary?
3. Does CIBIL score really affect the chances of getting a loan
approved?
4.
How does the number of dependents of the applicant affect their loan
application?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Motivation:
There are several reasons to study loan approval process in depth. Some of
them are as follows:
It might yield important insights into application approval trends,
risk evaluation and decision-making procedures.
By understanding the process, loan approval processes can be made
more accurate and financial risks can be reduced.
This will be beneficial for both customers and financial institutions
as customers can do pre assessment on their application before
applying for a loan approval to increase their chances of getting their
loan approved.
Source of the dataset: Loan Approval Dataset - Kaggle