Lab 2 300357960

docx

School

Douglas College *

*We aren’t endorsed by this school

Course

3880

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

4

Uploaded by EarlStrawCaterpillar18

Report
Lab 2-3 Part 1 Transform and Clean the Data Q1. Which attributes would you expect to contain date values? Answer: The attributes "issue_d" (Date of loan issue) and "earliest_cr_line" (Oldest credit account) are likely to contain date values. Q2. Which attributes would you expect to contain text values? Answer: Attributes such as "grade," "home_ownership," "loan_status," "title," "zip_code," "addr_state," and "application_type" are likely to contain text values. Q3. Which attributes would you expect to contain numerical values? Answer: Attributes like "loan_amnt," "term," "int_rate," "emp_length," "annual_inc," "dti," "delinq_2y," "open_acc," "revol_bal," "revol_util," "total_acc" are likely to contain numerical values. Q4. Which attribute most directly impacts a borrower’s cost of capital? Answer: The attribute "int_rate" (Interest rate of the loan) most directly impacts a borrower's cost of capital. Analysis Questions: Q1. What do you expect will be major data quality issues with LendingClub’s data? Answer: Possible data quality issues may include missing values in crucial fields, inconsistencies or errors in date formats, outliers or inaccuracies in numerical values, and variations in text entries (e.g., different ways of representing the same information). Q2. Given this list of attributes, what types of questions do you think you could answer regarding approved loans? (If you worked through Lab 1-2, what concerns do you have with the data’s ability to predict answers to the questions you identified in Chapter 1)? Answer: With these attributes, one could analyze trends such as the relationship between loan amount and interest rate, the impact of employment length on loan approval, and the correlation between debt- to-income ratio and loan status. Concerns may arise from missing data, potential outliers, and the need for additional variables to create a comprehensive predictive model for credit risk assessment.
Lab 2-3 Part 1 Transform and Clean the Data How many records or rows appear in your cleaned dataset? 199+ How many attributes or columns appear in your cleaned dataset? 19 Why do you think it is important to remove text values from your data before you conduct your analysis? Ans. To perform calculations in the file. We cannot perform calculations with the text in the data. What do you think would happen in your analysis if you didn’t remove the text values? Ans. Errors would come up and data will not be cleaned.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Did you run into any major issues when you attempted to clean the data? How did you resolve those? Ans. The main issue I was facing was an error was coming again and again on the top of the emp_length cannot convert into whole number (because that was in text format). Than, I changed only its type to text and error disappeared. What are some steps you could take to clean the data and resolve the difficulties you identified? Ans. Some steps – Changing data to numbers. Getting rid of the text from data.