Statistical Learning: Homework 5 on Boston Housing Data

10/27/23, 7:41 PM Homework 5 - MATH 4322 localhost:3483 1/14 library (MASS) hat_mu = mean (Boston $ medv) hat_mu sd_hat_mu = sd (Boston $ medv) / sqrt ( nrow (Boston)) sd_hat_mu library (boot) set.seed ( 10 ) boot.fn = function(data, index){ mu = mean (data[index]) return (mu) } Homework 5 - MATH 4322 Instructions 1. Due date: November 2, 2023 2. Answer the questions fully for full credit. 3. Scan or Type your answers and submit only one file. (If you submit several files only the recent one uploaded will be graded). 4. Preferably save your file as PDF before uploading. 5. Submit in Canvas. 6. These questions are from An Introduction to Statistical Learning with Applications in R by James, et. al., chapter 8. Problem 1 We will consider the Boston housing data set, from the MASS library. a. Based on this data set, provide an estimate for the population mean of medv . Call this estimate μ ^ . [1] 22.53281 b. Provide an estimate of the standard error of μ ^ . Interpret this result. H i n t : We can compute the standard error of the sample mean by dividing the sample standard deviation by the square root of the number of observations. [1] 0.4088611 c. Now estimate the standard error of μ ^ using the bootstrap. How does this compare to your answer from (b)?

10/27/23, 7:41 PM Homework 5 - MATH 4322 localhost:3483 2/14 boot_sd_hat_mu $ t0 -2 * 0.40171224 boot_sd_hat_mu $ t0 + 2 * 0.40171224 t.test (Boston $ medv) med_hat_mu = median (Boston $ medv) med_hat_mu ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Boston$medv, statistic = boot.fn, R = 1000) Bootstrap Statistics : original bias std. error t1* 22.53281 -0.008041502 0.4017124 Standard error of μ ^ using the bootstrap is 0.4017124, it is very close to the answer for (b) d. Based on your bootstrap estimate from (c), provide a 95% confidence interval for the mean of medv . Compare it to the results obtained using t.test(Boston$medv) . Hint : You can approximate a 95% confidence interval using the formula $$ [ - 2 (), + 2 ()]. [1] 21.72938 [1] 23.33623 One Sample t-test data: Boston$medv t = 55.111, df = 505, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 21.72953 23.33608 sample estimates: mean of x 22.53281 The 95% confidence intervals from t-test of Boston$medv and from bootstrap estimate are very close. e. Based on this data set, provide an estimate, μ ^ m e d , for the median value of medv in the population. boot_sd_hat_mu = boot (Boston $ medv, boot.fn, 1000 ) boot_sd_hat_mu

10/27/23, 7:41 PM Homework 5 - MATH 4322 localhost:3483 3/14 set.seed ( 10 ) boot.fn = function(data,index){ med = median (data[index]) return (med) } se_boot_med_hat_mu = boot (Boston $ medv,boot.fn, 1000 ) se_boot_med_hat_mu quantity_hat_mu_0 .1 = quantile (Boston $ medv, . 1 ) quantity_hat_mu_0 .1 set.seed ( 10 ) boot.fn = function(data,index){ hat_mu_0 .1 = quantile (data[index], c ( 0.1 )) return (hat_mu_0 .1 ) } se_hat_mu_0 .1 = boot (Boston $ medv, boot.fn, 1000 ) se_hat_mu_0 .1 [1] 21.2 f. We now would like to estimate the standard error of μ ^ m e d . Unfortunately, there is no simple formula for computing the standard error of the median. Instead, estimate the standard error of the median using the bootstrap. Comment on your findings. ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Boston$medv, statistic = boot.fn, R = 1000) Bootstrap Statistics : original bias std. error t1* 21.2 -0.00665 0.3745779 The standard error is 0.3745779, it is relatively small g. Based on this data set, provide an estimate for the tenth percentile of medv in Boston suburbs. Call this quantity μ ^ 0 . 1 . (You can use the quantile() function.) 10% 12.75 h. Use the bootstrap to estimate the standard error of μ ^ 0 . 1 . Comment on your findings. ORDINARY NONPARAMETRIC BOOTSTRAP

Your preview ends here