Lab-Assignment-2

docx

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

MISC

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

12

Uploaded by SuperWalrus4091

Report
Lab Assignment 2 Samantha West 02/11/2024 Load Packages and Data #Load packages I may need library (scales) library (dplyr) library (ggplot2) library (grid) library (gridExtra) library (gtable) library (histogram) library (knitr) library (markdown) library (plyr) library (scales) library (stargazer) library (tibble) library (tidyr) library (tidyverse) library (xtable) library (magrittr) library (ggpubr) library (gmodels) library (descr) #Upload the HMDA dataset: hmda20 <- read.csv ( "/Users/samanthawest/Downloads/hmda20.csv" ) Question 1. Provide the appropriate labels for the following variables: Gender (female): 0 – male, 1 – female Race (race): 1 – white, 2- black, 3- Latino, 4- Asian, 5- Native American, 6- Pacific Island Age (age): 1- <25 yrs, 2- 25-34 yrs, 3- 35-44 yrs, 4- 45-54 yrs, 5- 55-64 yrs, 6- 65-74 yrs, 7- >75 yrs Debt-to-Income Ratio (dti): 1- <36%, 2- 36-40%, 3- 40-45%, 4- >45% Co-applicant (coapplicant): 0- single, 1- co-applicant hmda20 $ female1 <- factor (hmda20 $ female, levels = c ( 0 , 1 ), labels = c ( "male" , "female" ) ) view (hmda20 $ female1) hmda20 $ race1 <- factor (hmda20 $ race, levels = c ( 1 , 2 , 3 , 4 , 5 , 6 ), labels =
c ( "white" , "black" , "latino" , "asian" , "native american" , "pacific island" )) view (hmda20 $ race1) hmda20 $ age1 <- factor (hmda20 $ age, levels = c ( 1 , 2 , 3 , 4 , 5 , 6 , 7 ), labels = c ( "<25 years" , "25-34 years" , "35-44 years" , "45-54 years" , "55-64 years" , "65-74 years" , ">75 years" )) view (hmda20 $ age1) hmda20 $ dti1 <- factor (hmda20 $ dti, levels = c ( 1 , 2 , 3 , 4 ), labels = c ( "<36%" , "36-40%" , "40-45%" , ">45%" )) view (hmda20 $ dti1) hmda20 $ coapplicant1 <- factor (hmda20 $ coapplicant, levels = c ( 0 , 1 ), labels = c ( "single" , "coapplicant" )) view (hmda20 $ coapplicant1) Question 2: Create the following interval ratio variables (Show summary statistics): #down payment percentage: hmda20 $ down_payment <- (hmda20 $ property_value1 - hmda20 $ loan_amount1) hmda20 $ down_payment[hmda20 $ down_payment < 0 ] <- 0 hmda20 $ down_payment_percentage <- (hmda20 $ down_payment / hmda20 $ property_value1) * 100 summary (hmda20 $ down_payment_percentage) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.00 13.86 26.51 32.32 46.07 99.08 #loan to value ratio hmda20 $ loan_to_value_ratio1 <- (hmda20 $ loan_amount1 / hmda20 $ property_value) hmda20 $ loan_to_value_ratio1[hmda20 $ loan_to_value_ratio1 > 1 ] <- 1 summary (hmda20 $ loan_to_value_ratio1) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.009174 0.539267 0.734940 0.676819 0.861386 1.000000 Answer 2: The mean for down payment percentage is 32.32, and the median is 26.51 The mean for loan to value ratio is 0.677, and the median is 0.735
Question 3: Provide a histogram for down payment percentage and loan to value ratio with a distribution line in the histogram. hist (hmda20 $ down_payment_percentage, freq= FALSE , col= "blue" , xlab= "Down Payment Percentage Level" , main= "Down Payment Percentage Histogram" ) curve ( dnorm (x, mean= mean (hmda20 $ down_payment_percentage, na.rm= TRUE ), sd= sd (hmda20 $ down_payment_percentage, na.rm= TRUE )), add= TRUE , col= "red" ) #line hist (hmda20 $ loan_to_value_ratio1, freq= FALSE , col= "orange" , main= "Loan-to-Value Histogram" , xlab= "Loan/Value" ) curve ( dnorm (x, mean= mean (hmda20 $ loan_to_value_ratio1, na.rm= TRUE ), sd= sd (hmda20 $ loan_to_value_ratio1, na.rm= TRUE )), add= TRUE , col= "green" ) #line
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 4: Provide the frequency distribution of debt-to-income ratio, age, and race freq (hmda20 $ dti1)
## hmda20$dti1 ## Frequency Percent ## <36% 27237 54.47 ## 36-40% 6439 12.88 ## 40-45% 8061 16.12 ## >45% 8263 16.53 ## Total 50000 100.00 freq (hmda20 $ age1)
## hmda20$age1 ## Frequency Percent ## <25 years 1308 2.616 ## 25-34 years 10479 20.958 ## 35-44 years 13468 26.936 ## 45-54 years 10977 21.954 ## 55-64 years 8145 16.290 ## 65-74 years 4292 8.584 ## >75 years 1331 2.662 ## Total 50000 100.000 freq (hmda20 $ race1)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## hmda20$race1 ## Frequency Percent ## white 36862 73.724 ## black 2956 5.912 ## latino 5648 11.296 ## asian 3998 7.996 ## native american 391 0.782 ## pacific island 145 0.290 ## Total 50000 100.000 Answer 4: For debt-to-income ratio (the first graph), the majority is < 36% Age (2nd graph) The majority of the pool of respondents is between the ages of 35-44 years old race (3rd graph) Additionally, a vast majority of the respondents are white Question 5: Provide the summary statistics of down payment, loan to value ratio (mean, median, s.d) summary (hmda20 $ down_payment) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0 30.0 90.0 169.8 200.0 11890.0
sd (hmda20 $ down_payment, na.rm= TRUE ) ## [1] 286.4509 summary (hmda20 $ loan_to_value_ratio1) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.009174 0.539267 0.734940 0.676819 0.861386 1.000000 sd (hmda20 $ loan_to_value_ratio1, na.rm= TRUE ) ## [1] 0.2477057 Answer 5: down payment: mean: $169.8k, median: $90k, std deviation: $286.5k loan to value ratio: mean: ~0.677, median: ~0.735, standard deviation: ~0.248 Question 6: Create a 6-category variable of income (Show frequency table): summary (hmda20 $ income1) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0 61.0 94.0 241.2 147.0 10000.0 hmda20 $ level_income2[hmda20 $ income1 < 100 ] <- 1 hmda20 $ level_income2[hmda20 $ income1 >= 100 & hmda20 $ income1 < 200 ] <- 2 hmda20 $ level_income2[hmda20 $ income1 >= 200 & hmda20 $ income1 < 300 ] <- 3 hmda20 $ level_income2[hmda20 $ income1 >= 300 & hmda20 $ income1 < 400 ] <- 4 hmda20 $ level_income2[hmda20 $ income1 >= 400 & hmda20 $ income1 < 500 ] <- 5 hmda20 $ level_income2[hmda20 $ income1 >= 500 ] <- 6 hmda20 $ level_income3 <- factor (hmda20 $ level_income2, levels = c ( 1 , 2 , 3 , 4 , 5 , 6 ), labels = c ( "<$100k" , "$100-200k" , "200-300k" , "$300-400k" , "$400-500k" , ">$500k" )) freq (hmda20 $ level_income3)
## hmda20$level_income3 ## Frequency Percent ## <$100k 26532 53.064 ## $100-200k 16902 33.804 ## 200-300k 3755 7.510 ## $300-400k 1166 2.332 ## $400-500k 426 0.852 ## >$500k 1219 2.438 ## Total 50000 100.000 Answer 6: I chose my increments to be every $100k as that seemed to be correct based on the median price. However, the majority of respondents have an income of below $100k. This makes sense why the mean is over $200k though because the higher income is dragging it upwards. Question 7: Generate a bar graph of the income variable counts <- table (hmda20 $ level_income3) barplot (counts, main= "Income Level of Respondents" , xlab= "Income Ranges of Respondents" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 8: Create a bivariate table of income and gender table (hmda20 $ level_income3, hmda20 $ female1) ## ## male female ## <$100k 15743 10789 ## $100-200k 11772 5130 ## 200-300k 2804 951 ## $300-400k 919 247 ## $400-500k 339 87 ## >$500k 859 360 Question 9: Create a bivariate table of income and gender among single applicants single_respondents <- subset (hmda20, hmda20 $ coapplicant1 == "single" ) view (single_respondents) table (single_respondents $ level_income3, single_respondents $ female1) ## ## male female ## <$100k 9555 8208 ## $100-200k 4658 2223 ## 200-300k 860 286
## $300-400k 316 65 ## $400-500k 123 26 ## >$500k 343 165 Question 10: Conduct a two-sample t-test examining down payment percentage and gender a. Include: null and research hypothesis b. Alpha level c. T-value d. P-value e. Interpret results t.test (hmda20 $ down_payment_percentage ~ hmda20 $ female1) ## ## Welch Two Sample t-test ## ## data: hmda20$down_payment_percentage by hmda20$female1 ## t = 4.6269, df = 34749, p-value = 3.726e-06 ## alternative hypothesis: true difference in means between group male and group female is not equal to 0 ## 95 percent confidence interval: ## 0.6264607 1.5473127 ## sample estimates: ## mean in group male mean in group female ## 32.69993 31.61305 Answer 10: a. null hypothesis: means of group male and female are equal alternative hypothesis: means of groups male and female and female are not equal b. alpha level: 0.05 c. t-value: 4.6269 d. p-value: .000003726 e. interpret results: since the p-value is very small (much smaller than the alpha level of 0.05), then the null hypothesis is rejected at an alpha level of 0.05. So, there is evidence to conclude that there is a statistically significant difference in the mean down payment between males and females Question 11: Conduct a two-sample t-test examining income and co-applicant a. Include: null and research hypothesis b. Alpha level c. T-value d. P-value e. Interpret results t.test (hmda20 $ income1 ~ hmda20 $ coapplicant1)
## ## Welch Two Sample t-test ## ## data: hmda20$income1 by hmda20$coapplicant1 ## t = -9.7547, df = 44567, p-value < 2.2e-16 ## alternative hypothesis: true difference in means between group single and group coapplicant is not equal to 0 ## 95 percent confidence interval: ## -116.18798 -77.30858 ## sample estimates: ## mean in group single mean in group coapplicant ## 196.3392 293.0875 Answer 11: a. null hypothesis: means of single and coapplicant groups are equal alternative hypothesis: means of single and coapplicant groups are not equal to each other b. alpha level: .05 c. t-value: -9.7547 d. p-value: 0.00000000000000022 e. interpret results: the p-value is significantly smaller than the alpha level of 0.05, so this is sufficient evidence to reject the null hypothesis, and conclude that the means of the single and coapplicant groups are statistically much different and far from equal to each other.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help