HW_5

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6501

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by DukeOysterMaster937

HW_5 2023-09-24 8.1 A linear regression model would be great for trying to predict what a fair price should be for a home vs what an inflated price is. It would also be great to try and predict my grades based off of factors like study time sleep or attendance. Another that most people should be familar with is Climate Change and the relation between global temperature and greenhouse gas emissions as well as looking at historical data. Another would be exercising and understanding what help contribute to weight loss and looking at dieting types of weighlifting or types of cardio. Finally businesses would love to know what directly correlates to an increase in sales whether that be advertising, seasonality, or something like inventory management. 1

library (ggplot2) library (tidyverse) ## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 -- ## v dplyr 1.1.3 v readr 2.1.4 ## v forcats 1.0.0 v stringr 1.5.0 ## v lubridate 1.9.2 v tibble 3.2.1 ## v purrr 1.0.2 v tidyr 1.3.0 ## -- Conflicts ------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors library (caret) ## Loading required package: lattice ## ## Attaching package: ’caret’ ## ## The following object is masked from ’package:purrr’: ## ## lift crime <- read.table ( crime.txt , stringsAsFactors = FALSE , header = TRUE ) vice <- data.frame ( M = 14.0 , So = 0 , Ed = 10.0 , Po1 = 12.0 , Po2 = 15.5 , LF = 0.640 , M.F = 94.0 , Pop = 15 set.seed ( 42 ) # as simple linear model model_1 <- lm (Crime ~ ., data= crime) summary (model_1) ## ## Call: ## lm(formula = Crime ~ ., data = crime) ## ## Residuals: ## Min 1Q Median 3Q Max ## -395.74 -98.09 -6.69 112.99 512.67 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5.984e+03 1.628e+03 -3.675 0.000893 *** ## M 8.783e+01 4.171e+01 2.106 0.043443 * ## So -3.803e+00 1.488e+02 -0.026 0.979765 ## Ed 1.883e+02 6.209e+01 3.033 0.004861 ** ## Po1 1.928e+02 1.061e+02 1.817 0.078892 . ## Po2 -1.094e+02 1.175e+02 -0.931 0.358830 ## LF -6.638e+02 1.470e+03 -0.452 0.654654 ## M.F 1.741e+01 2.035e+01 0.855 0.398995 ## Pop -7.330e-01 1.290e+00 -0.568 0.573845 2

## NW 4.204e+00 6.481e+00 0.649 0.521279 ## U1 -5.827e+03 4.210e+03 -1.384 0.176238 ## U2 1.678e+02 8.234e+01 2.038 0.050161 . ## Wealth 9.617e-02 1.037e-01 0.928 0.360754 ## Ineq 7.067e+01 2.272e+01 3.111 0.003983 ** ## Prob -4.855e+03 2.272e+03 -2.137 0.040627 * ## Time -3.479e+00 7.165e+00 -0.486 0.630708 ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 209.1 on 31 degrees of freedom ## Multiple R-squared: 0.8031, Adjusted R-squared: 0.7078 ## F-statistic: 8.429 on 15 and 31 DF, p-value: 3.539e-07 predict (model_1,vice) ## 1 ## 155.4349 range (crime $ Crime) ## [1] 342 1993 model_2 <- lm (Crime ~ M + Ed + Ineq + Prob, data= crime, x= TRUE , y= TRUE ) summary (model_2) The predicted crime rate is very close to the minimum value of crime rate for our data set. While this is possible we should take another look out our regression model to try and see if this is due to overfitting. To do this wee look at coefficents of the model that are no significant on the reponse (if the p-value is greater than 0.5). By looking at the summary table above we will go ahead and use the following variables: M, Ed, Ineq, Prob. ## ## Call: ## lm(formula = Crime ~ M + Ed + Ineq + Prob, data = crime, x = TRUE, ## y = TRUE) ## ## Residuals: ## Min 1Q Median 3Q Max ## -532.97 -254.03 -55.72 137.80 960.21 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1339.35 1247.01 -1.074 0.28893 ## M 35.97 53.39 0.674 0.50417 ## Ed 148.61 71.92 2.066 0.04499 * ## Ineq 26.87 22.77 1.180 0.24458 ## Prob -7331.92 2560.27 -2.864 0.00651 ** ## --- 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 347.5 on 42 degrees of freedom ## Multiple R-squared: 0.2629, Adjusted R-squared: 0.1927 ## F-statistic: 3.745 on 4 and 42 DF, p-value: 0.01077 predict (model_2,vice) ## 1 ## 897.2307 model_3 <- lm (Crime ~ M + Ed + U2 + Ineq + Prob, data= crime, x= TRUE , y= TRUE ) summary (model_3) This prediction works better with our data because we are using values that had a low p-value and in doing that we also saw our Adjusted R-squared value drop significantly. While this is good I am curious to see if the variable U2, which barely missed our p-value cut off of 0.05 makes any difference. ## ## Call: ## lm(formula = Crime ~ M + Ed + U2 + Ineq + Prob, data = crime, ## x = TRUE, y = TRUE) ## ## Residuals: ## Min 1Q Median 3Q Max ## -478.8 -233.6 -46.5 143.2 797.1 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3336.52 1435.26 -2.325 0.02512 * ## M 85.33 54.39 1.569 0.12437 ## Ed 214.69 73.20 2.933 0.00547 ** ## U2 160.01 65.54 2.441 0.01903 * ## Ineq 29.50 21.56 1.368 0.17880 ## Prob -6897.24 2427.81 -2.841 0.00697 ** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 328.6 on 41 degrees of freedom ## Multiple R-squared: 0.3565, Adjusted R-squared: 0.278 ## F-statistic: 4.542 on 5 and 41 DF, p-value: 0.002186 predict (model_3,vice) ## 1 ## 898.1004 4

model_4 <- lm (Crime ~ Ed + Prob, data= crime, x= TRUE , y= TRUE ) summary (model_4) After looking at the model including those variables with a p-value less then 0.05 and including U2 which bareley missed the cut off we see very little change so our previous model is probably ok. ## ## Call: ## lm(formula = Crime ~ Ed + Prob, data = crime, x = TRUE, y = TRUE) ## ## Residuals: ## Min 1Q Median 3Q Max ## -650.98 -279.57 -14.06 198.00 957.48 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 517.30 588.48 0.879 0.3842 ## Ed 63.67 50.26 1.267 0.2119 ## Prob -6049.00 2472.93 -2.446 0.0185 * ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 351.2 on 44 degrees of freedom ## Multiple R-squared: 0.2115, Adjusted R-squared: 0.1756 ## F-statistic: 5.899 on 2 and 44 DF, p-value: 0.005373 predict (model_4,vice) ## 1 ## 912.0796 Finally I want to highlight one additional model. From our first model prediciton we also saw that the p-values of the predictors used changed again. I wanted to investigate if we only used the lowest p-values from that model to fine tune our model. Note that is bad and not how regression like this should be used because then you do not allow other variables to play a role in the model and ultimatly the decision making process that would stem from this bad practice. By only looking at two variables a decision might be made to increase policing where it is unneccasry simply because not enough varibales were taken into account and poor statistical analysis was done. Also to answer the question that was posed in the homework. We predict that the crim rate in a city with the given data is 897.2307. vice_pred <- data.frame ( M = 14.0 , So = 0 , Ed = 10.0 , Po1 = 12.0 , Po2 = 15.5 , LF = 0.640 , M.F = 94.0 , Pop predicted_value <- predict (model_2, newdata = vice_pred[ 1 ,]) 5

confidence_interval <- predict (model_2, newdata = vice_pred[ 1 , ], interval = "confidence" , level = 0.95 ) print (predicted_value) ## 1 ## 897.2307 print (confidence_interval) ## fit lwr upr ## 1 897.2307 767.6195 1026.842 I also attempted to construct a confidence interval to see how good my prediction is. Looking at the results of the confidence interval I would say that the mode is ok but further analysis should be done before taking any actions based of of the analysis done up to this point. 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Fall 2023 Midterm Exam.pdf

hw4_sol.pdf

HW5.pdf

hw3_sol.pdf

Lab0M_Report_v4 (1) 2.pdf

Session13 Ordered Logit.pdf

QSCI381 Assignment 4 - Poisson and Normal distribution.docx

HW1 SP2024.pdf

HW_1.Rmd

PracticeFinalExam.pdf

Lab three 4-6.docx

Find the critical value for.docx

Recommended textbooks for you

College Algebra

Algebra

ISBN:9781305115545

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Algebra and Trigonometry (MindTap Course List)

Algebra

ISBN:9781305071742

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Linear Algebra: A Modern Introduction

Algebra

ISBN:9781285463247

Author:David Poole

Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

College Algebra

Algebra

ISBN:9781938168383

Author:Jay Abramson

Publisher:OpenStax

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

SEE MORE TEXTBOOKS

Recommended textbooks for you

College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt