HW_5

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6501

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by DukeOysterMaster937

Report
HW_5 2023-09-24 8.1 A linear regression model would be great for trying to predict what a fair price should be for a home vs what an inflated price is. It would also be great to try and predict my grades based off of factors like study time sleep or attendance. Another that most people should be familar with is Climate Change and the relation between global temperature and greenhouse gas emissions as well as looking at historical data. Another would be exercising and understanding what help contribute to weight loss and looking at dieting types of weighlifting or types of cardio. Finally businesses would love to know what directly correlates to an increase in sales whether that be advertising, seasonality, or something like inventory management. 1
library (ggplot2) library (tidyverse) ## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 -- ## v dplyr 1.1.3 v readr 2.1.4 ## v forcats 1.0.0 v stringr 1.5.0 ## v lubridate 1.9.2 v tibble 3.2.1 ## v purrr 1.0.2 v tidyr 1.3.0 ## -- Conflicts ------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors library (caret) ## Loading required package: lattice ## ## Attaching package: ’caret’ ## ## The following object is masked from ’package:purrr’: ## ## lift crime <- read.table ( crime.txt , stringsAsFactors = FALSE , header = TRUE ) vice <- data.frame ( M = 14.0 , So = 0 , Ed = 10.0 , Po1 = 12.0 , Po2 = 15.5 , LF = 0.640 , M.F = 94.0 , Pop = 15 set.seed ( 42 ) # as simple linear model model_1 <- lm (Crime ~ ., data= crime) summary (model_1) ## ## Call: ## lm(formula = Crime ~ ., data = crime) ## ## Residuals: ## Min 1Q Median 3Q Max ## -395.74 -98.09 -6.69 112.99 512.67 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5.984e+03 1.628e+03 -3.675 0.000893 *** ## M 8.783e+01 4.171e+01 2.106 0.043443 * ## So -3.803e+00 1.488e+02 -0.026 0.979765 ## Ed 1.883e+02 6.209e+01 3.033 0.004861 ** ## Po1 1.928e+02 1.061e+02 1.817 0.078892 . ## Po2 -1.094e+02 1.175e+02 -0.931 0.358830 ## LF -6.638e+02 1.470e+03 -0.452 0.654654 ## M.F 1.741e+01 2.035e+01 0.855 0.398995 ## Pop -7.330e-01 1.290e+00 -0.568 0.573845 2
## NW 4.204e+00 6.481e+00 0.649 0.521279 ## U1 -5.827e+03 4.210e+03 -1.384 0.176238 ## U2 1.678e+02 8.234e+01 2.038 0.050161 . ## Wealth 9.617e-02 1.037e-01 0.928 0.360754 ## Ineq 7.067e+01 2.272e+01 3.111 0.003983 ** ## Prob -4.855e+03 2.272e+03 -2.137 0.040627 * ## Time -3.479e+00 7.165e+00 -0.486 0.630708 ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 209.1 on 31 degrees of freedom ## Multiple R-squared: 0.8031, Adjusted R-squared: 0.7078 ## F-statistic: 8.429 on 15 and 31 DF, p-value: 3.539e-07 predict (model_1,vice) ## 1 ## 155.4349 range (crime $ Crime) ## [1] 342 1993 model_2 <- lm (Crime ~ M + Ed + Ineq + Prob, data= crime, x= TRUE , y= TRUE ) summary (model_2) The predicted crime rate is very close to the minimum value of crime rate for our data set. While this is possible we should take another look out our regression model to try and see if this is due to overfitting. To do this wee look at coefficents of the model that are no significant on the reponse (if the p-value is greater than 0.5). By looking at the summary table above we will go ahead and use the following variables: M, Ed, Ineq, Prob. ## ## Call: ## lm(formula = Crime ~ M + Ed + Ineq + Prob, data = crime, x = TRUE, ## y = TRUE) ## ## Residuals: ## Min 1Q Median 3Q Max ## -532.97 -254.03 -55.72 137.80 960.21 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1339.35 1247.01 -1.074 0.28893 ## M 35.97 53.39 0.674 0.50417 ## Ed 148.61 71.92 2.066 0.04499 * ## Ineq 26.87 22.77 1.180 0.24458 ## Prob -7331.92 2560.27 -2.864 0.00651 ** ## --- 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 347.5 on 42 degrees of freedom ## Multiple R-squared: 0.2629, Adjusted R-squared: 0.1927 ## F-statistic: 3.745 on 4 and 42 DF, p-value: 0.01077 predict (model_2,vice) ## 1 ## 897.2307 model_3 <- lm (Crime ~ M + Ed + U2 + Ineq + Prob, data= crime, x= TRUE , y= TRUE ) summary (model_3) This prediction works better with our data because we are using values that had a low p-value and in doing that we also saw our Adjusted R-squared value drop significantly. While this is good I am curious to see if the variable U2, which barely missed our p-value cut off of 0.05 makes any difference. ## ## Call: ## lm(formula = Crime ~ M + Ed + U2 + Ineq + Prob, data = crime, ## x = TRUE, y = TRUE) ## ## Residuals: ## Min 1Q Median 3Q Max ## -478.8 -233.6 -46.5 143.2 797.1 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3336.52 1435.26 -2.325 0.02512 * ## M 85.33 54.39 1.569 0.12437 ## Ed 214.69 73.20 2.933 0.00547 ** ## U2 160.01 65.54 2.441 0.01903 * ## Ineq 29.50 21.56 1.368 0.17880 ## Prob -6897.24 2427.81 -2.841 0.00697 ** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 328.6 on 41 degrees of freedom ## Multiple R-squared: 0.3565, Adjusted R-squared: 0.278 ## F-statistic: 4.542 on 5 and 41 DF, p-value: 0.002186 predict (model_3,vice) ## 1 ## 898.1004 4
model_4 <- lm (Crime ~ Ed + Prob, data= crime, x= TRUE , y= TRUE ) summary (model_4) After looking at the model including those variables with a p-value less then 0.05 and including U2 which bareley missed the cut off we see very little change so our previous model is probably ok. ## ## Call: ## lm(formula = Crime ~ Ed + Prob, data = crime, x = TRUE, y = TRUE) ## ## Residuals: ## Min 1Q Median 3Q Max ## -650.98 -279.57 -14.06 198.00 957.48 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 517.30 588.48 0.879 0.3842 ## Ed 63.67 50.26 1.267 0.2119 ## Prob -6049.00 2472.93 -2.446 0.0185 * ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 351.2 on 44 degrees of freedom ## Multiple R-squared: 0.2115, Adjusted R-squared: 0.1756 ## F-statistic: 5.899 on 2 and 44 DF, p-value: 0.005373 predict (model_4,vice) ## 1 ## 912.0796 Finally I want to highlight one additional model. From our first model prediciton we also saw that the p-values of the predictors used changed again. I wanted to investigate if we only used the lowest p-values from that model to fine tune our model. Note that is bad and not how regression like this should be used because then you do not allow other variables to play a role in the model and ultimatly the decision making process that would stem from this bad practice. By only looking at two variables a decision might be made to increase policing where it is unneccasry simply because not enough varibales were taken into account and poor statistical analysis was done. Also to answer the question that was posed in the homework. We predict that the crim rate in a city with the given data is 897.2307. vice_pred <- data.frame ( M = 14.0 , So = 0 , Ed = 10.0 , Po1 = 12.0 , Po2 = 15.5 , LF = 0.640 , M.F = 94.0 , Pop predicted_value <- predict (model_2, newdata = vice_pred[ 1 ,]) 5
confidence_interval <- predict (model_2, newdata = vice_pred[ 1 , ], interval = "confidence" , level = 0.95 ) print (predicted_value) ## 1 ## 897.2307 print (confidence_interval) ## fit lwr upr ## 1 897.2307 767.6195 1026.842 I also attempted to construct a confidence interval to see how good my prediction is. Looking at the results of the confidence interval I would say that the mode is ok but further analysis should be done before taking any actions based of of the analysis done up to this point. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help