Lab 8

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

8

Uploaded by DeanTigerMaster997

Report
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 1/8 Intro to Data Science - Lab 8 Copyright 2022, Jeffrey Stanton and Jeffrey Saltz Please do not post online. Week 8 - Linear Models # Enter your name here: Swapnil Deore Please include nice comments. Instructions: Run the necessary code on your own instance of R-Studio. Attribution statement: (choose only one and delete the rest) # 1. I did this lab assignment by myself, with help from the book and the professor. Linear modeling , also referred to as regression analysis or multiple regression bold text , is a technique for fitting a line, plane, or higher order linear object to data. In their simplest form, linear models have one metric outcome variable and one or more predictor variables (any combination of metric values, ordered scales such as ratings, or dummy codes). Make sure to library the MASS and ggplot2 packages before running the following: ggplot(data=Boston) + aes(x=rm, y=medv) + geom_point() + geom_smooth(method="lm", se=FALSE) library (ggplot2); library (MASS) ggplot(data=Boston) + aes(x=rm, y=medv) + geom_point() + geom_smooth(method="lm", se=FALSE) ## `geom_smooth()` using formula = 'y ~ x'
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 2/8 1. Explore this dataset descrption by typing ?Boston in a code cell. ?Boston ## starting httpd help server ... done 2. The graphic you just created fits a best line to a cloud of points. Copy and modify the code to produce a plot where ** crim ** is the x variable instead of ** rm**. ggplot(data=Boston) + aes(x=crim, y=medv) + geom_point() + geom_smooth(method="lm", se=FALSE) ## `geom_smooth()` using formula = 'y ~ x'
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 3/8 3. Produce a histogram and descriptive statistics for Boston$crim . Write a comment describing any anomalies or oddities. hist(Boston$crim)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 4/8 summary(Boston$crim) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.00632 0.08204 0.25651 3.61352 3.67708 88.97620 # Crime rate frequency goes on decreasing from 0 to 60. Highest frequency between 0-20. 4. Produce a linear model, using the lm( ) function where crim predicts medv . Remember that in R s formula language, the outcome variable comes first and is separated from the predictors by a tilde , like this: medv ~ crim Try to get in the habit of storing the output object that is produced by lm and other analysis procedures. For example, I often use lmOut <- lm( . . .) lm1 <- lm(medv~crim, data = Boston) lm1
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 5/8 ## ## Call: ## lm(formula = medv ~ crim, data = Boston) ## ## Coefficients: ## (Intercept) crim ## 24.0331 -0.4152 5. Run a multiple regression where you use rm , crim , and dis (distance to Boston employment centers). You will use all three predictors in one model with this formula: medv ~ crim + rm + dis Now run three separate models for each independent variable separate. lm2 <- lm(medv~crim+rm+dis, data = Boston) summary(lm2) ## ## Call: ## lm(formula = medv ~ crim + rm + dis, data = Boston) ## ## Residuals: ## Min 1Q Median 3Q Max ## -21.247 -2.930 -0.572 2.390 39.072 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -29.45838 2.60010 -11.330 < 2e-16 *** ## crim -0.25405 0.03532 -7.193 2.32e-12 *** ## rm 8.34257 0.40870 20.413 < 2e-16 *** ## dis 0.12627 0.14382 0.878 0.38 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 6.238 on 502 degrees of freedom ## Multiple R-squared: 0.5427, Adjusted R-squared: 0.5399 ## F-statistic: 198.6 on 3 and 502 DF, p-value: < 2.2e-16 lm3 <- lm(medv~crim, data = Boston) summary(lm3)
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 6/8 ## ## Call: ## lm(formula = medv ~ crim, data = Boston) ## ## Residuals: ## Min 1Q Median 3Q Max ## -16.957 -5.449 -2.007 2.512 29.800 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 24.03311 0.40914 58.74 <2e-16 *** ## crim -0.41519 0.04389 -9.46 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.484 on 504 degrees of freedom ## Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491 ## F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16 lm4 <- lm(medv~rm, data = Boston) summary(lm4) ## ## Call: ## lm(formula = medv ~ rm, data = Boston) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.346 -2.547 0.090 2.986 39.433 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -34.671 2.650 -13.08 <2e-16 *** ## rm 9.102 0.419 21.72 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 6.616 on 504 degrees of freedom ## Multiple R-squared: 0.4835, Adjusted R-squared: 0.4825 ## F-statistic: 471.8 on 1 and 504 DF, p-value: < 2.2e-16 lm5 <- lm(medv~dis, data = Boston) summary(lm5)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 7/8 ## ## Call: ## lm(formula = medv ~ dis, data = Boston) ## ## Residuals: ## Min 1Q Median 3Q Max ## -15.016 -5.556 -1.865 2.288 30.377 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 18.3901 0.8174 22.499 < 2e-16 *** ## dis 1.0916 0.1884 5.795 1.21e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.914 on 504 degrees of freedom ## Multiple R-squared: 0.06246, Adjusted R-squared: 0.0606 ## F-statistic: 33.58 on 1 and 504 DF, p-value: 1.207e-08 6. Interpret the results of your analysis in a comment. Make sure to mention the p-value , the adjusted R- squared , the list of significant predictors and the coefficient for each significant predictor. # R-squared value tells us how much the factors(crim, medv, etc) effect result and it should be closer to 1. The value is closest to 1 for medv ~ crim + rm + dis. # p-value should be closer to 0 as it tells us how valid the relationship is.This is closest to 0 for medv ~ dis. 7. Create a one-row data frame that contains some plausible values for the predictors. For example, this data frame contains the median values for each predictor: predDF <- data.frame(crim = 0.26, dis=3.2, rm=6.2) The numbers used here were selected randomly by looking at min and max data of the variables. predDF <- data.frame(crim = 0.26, dis=3.2, rm=6.2) predDF ## crim dis rm ## 1 0.26 3.2 6.2 8. Use the predict( ) command to predict a new value of medv from the one-row data frame. If you stored the output of your lm model in lmOut , the command would look like this: predict(lmOut, predDF) predict(lm2,predDF)
10/19/23, 10:14 PM Lab8.knit file:///C:/Users/mahaj/Downloads/IDS/Lab8.html 8/8 ## 1 ## 22.60355