Assignment7-320-MultipleRegression2

docx

School

The University of Tennessee, Knoxville *

*We aren’t endorsed by this school

Course

320

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

19

Uploaded by ProfessorSquidMaster853

Report
BAS 320 - Assignment 7 - Multiple Regression Part 2 Matthew Zook M <- lm (review_taste ~ ., data= BEER) drop1 (M, test= "F" ) ## Single term deletions ## ## Model: ## review_taste ~ ABV + Min.IBU + Max.IBU + Astringency + Body + ## Alcohol + Bitter + Sweet + Sour + Salty + Fruits + Hoppy + ## Spices + Malty ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 173.65 -1720.7 ## ABV 1 6.8645 180.52 -1683.9 38.9373 6.491e-10 *** ## Min.IBU 1 2.2290 175.88 -1710.0 12.6432 0.0003948 *** ## Max.IBU 1 0.0203 173.67 -1722.6 0.1150 0.7346205 ## Astringency 1 0.0989 173.75 -1722.1 0.5610 0.4540476 ## Body 1 2.1096 175.76 -1710.6 11.9661 0.0005650 *** ## Alcohol 1 3.1541 176.81 -1704.7 17.8911 2.557e-05 *** ## Bitter 1 0.1093 173.76 -1722.1 0.6200 0.4312491 ## Sweet 1 0.7480 174.40 -1718.4 4.2429 0.0396778 * ## Sour 1 1.8677 175.52 -1712.0 10.5939 0.0011732 ** ## Salty 1 0.3135 173.97 -1720.9 1.7784 0.1826602 ## Fruits 1 2.6961 176.35 -1707.3 15.2932 9.836e-05 *** ## Hoppy 1 0.2920 173.94 -1721.0 1.6563 0.1984012 ## Spices 1 6.5096 180.16 -1685.9 36.9244 1.753e-09 *** ## Malty 1 1.6272 175.28 -1713.4 9.2299 0.0024437 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 BEER <- BEER[, c ( "review_taste" , "ABV" , "Fruits" , "Sour" , "Spices" , "Body" )] The following gives a skeleton/template that you could fill out to narrate your analysis. You’ll need to adapt words throughout to make it specific to your dataset. Feel free to deviate from the skeleton as long as you’re hitting all the required points (see Rubric and detailed list of requirements on Canvas)! Delete this paragraph (and any prompts provided in other paragraphs) before knitting/submitting. The document should flow nicely as a professional report.
Analysis of … (replace the … with what you’re analyzing) I’m using a multiple regression model to predict the overall taste review from the alcohol by volume, amount of fruits, sour score, amount of spices, and body score. I’m making this model because I am curious to see which score has the most effect on the overall score and see how well I can predict the overall score based on the predictors. The data I’m using comes from https://www.kaggle.com/datasets/ruthgn/beer-profile- and-ratings-data-set/ and contains a total of 1000 rows and 6 total predictors Investigation of the relationship between the taste score and spices I am investigating the relationship between the overall taste score the amounnt of spices in the beer. a polynomial would do a better job than the linear regression model. I would suggest the fourth order polynomial predicting the value from the spices, spices^2, Spices^3, and the Spices^4 for this. M <- lm (review_taste ~ Spices, data= BEER); choose_order (M) ## order R2adj AICc ## 1 1 0.06564513 1417.809 ## 2 2 0.14089397 1336.854
## 3 3 0.16009325 1317.265 ## 4 4 0.18179266 1294.106 ## 5 5 0.18539948 1292.707 ## 6 6 0.18551032 1295.592 #choose_order(M) Multiple regression model and checking of assumptions While there are some violations, they are not enough to cause us to ditch the model. from a statistical standpoint, we have linearity in body, fruits, and sour, but we failed equal spread and normality, but they are relatively small. M <- lm (review_taste ~ ., data= BEER) #Fit a multiple linear regression summary (M) ## ## Call: ## lm(formula = review_taste ~ ., data = BEER) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.24191 -0.20268 0.04858 0.25555 1.48930 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.9959999 0.0413381 72.475 < 2e-16 *** ## ABV 0.0362721 0.0052925 6.853 1.26e-11 *** ## Fruits 0.0022523 0.0007679 2.933 0.00343 ** ## Sour 0.0018520 0.0006586 2.812 0.00502 ** ## Spices 0.0028970 0.0006593 4.394 1.23e-05 *** ## Body 0.0059590 0.0005607 10.628 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4326 on 994 degrees of freedom ## Multiple R-squared: 0.2743, Adjusted R-squared: 0.2706 ## F-statistic: 75.13 on 5 and 994 DF, p-value: < 2.2e-16 check_regression (M, extra= TRUE )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## ## Tests of Assumptions: ( sample size n = 1000 ): ## Linearity ## p-value for ABV : 0 ## p-value for Fruits : 0.9313 ## p-value for Sour : 0.9122 ## p-value for Spices : 0 ## p-value for Body : 0.2418 ## p-value for overall model : NA (not enough duplicate rows) ## Equal Spread: p-value is 0 ## Normality: p-value is 0 ## ## Advice: if n<25 then all tests must be passed. ## If n >= 25 and test is failed, refer to diagnostic plot to see if violation is severe ## or is small enough to be ignored. ## ## Press [enter] to continue to Predictor vs. Residuals plots or q (then Return) to quit ( 5 plots to show )
Identification of influential points There are 7 influential points. The first influential point is unusual because it has a 0 in fruits, Sour, and Spices yet has a very high review score. #M <- lm() #Fit a multiple regression model influence_plot (M) ## $Leverage ## [1] 210 477 523 690 824 885 907 influential.rows <- influence_plot (M) $ Leverage
INFLUENCE <- data.frame ( matrix ( 0 , nrow= length (influential.rows), ncol= ncol (BEER)) ) names (INFLUENCE) <- names (BEER) for (r in 1 : length (influential.rows)) { x <- as.numeric (BEER[influential.rows[r],]) INFLUENCE[r,] <- sapply ( 1 : length (x), function (i) { mean (BEER[[i]] <= x[i])} ) } round (INFLUENCE, digits= 2 ) ## review_taste ABV Fruits Sour Spices Body ## 1 0.99 0.98 0.02 0.01 0.05 0.02 ## 2 0.14 0.57 0.99 0.98 0.32 0.68 ## 3 0.10 0.07 0.47 0.52 1.00 0.60 ## 4 0.28 1.00 0.23 0.14 0.15 0.03 ## 5 0.11 0.33 0.06 0.05 0.98 0.94 ## 6 0.08 0.56 0.40 0.56 1.00 0.14 ## 7 0.06 0.11 0.98 0.46 0.98 0.21 BEER[ 210 ,] ## review_taste ABV Fruits Sour Spices Body ## 3098 4.5 12 0 0 0 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
summary (BEER) ## review_taste ABV Fruits Sour ## Min. :1.439 Min. : 0.000 Min. : 0.00 Min. : 0.00 ## 1st Qu.:3.495 1st Qu.: 5.000 1st Qu.: 12.00 1st Qu.: 11.00 ## Median :3.787 Median : 5.900 Median : 28.00 Median : 22.00 ## Mean :3.697 Mean : 6.502 Mean : 37.31 Mean : 33.09 ## 3rd Qu.:4.028 3rd Qu.: 7.500 3rd Qu.: 56.00 3rd Qu.: 42.00 ## Max. :4.923 Max. :57.500 Max. :148.00 Max. :241.00 ## Spices Body ## Min. : 0.00 Min. : 0.00 ## 1st Qu.: 4.00 1st Qu.: 29.00 ## Median : 10.00 Median : 39.00 ## Mean : 17.41 Mean : 45.17 ## 3rd Qu.: 22.25 3rd Qu.: 57.25 ## Max. :170.00 Max. :175.00 Investigation of an interaction between ____ and ____ (replace with what you’re looking at) Review = 3.505842 - 0.177307*Spices Review = 3.70636 -0.355975*Spices at small spice scores, bodies which have higher scores get larger taste scores, but as we go up in spice score, beers with a high body gets lower scores than ones with higher #You'll need to run the next 2 lines, but not include them in your report #M <- lm( y ~ .^2, data=DATA) # do all 2 way interactions #see_interactions(M,cex=0.6,pos="topleft",many=TRUE) #This is what we want to see in your report #M <- lm( y ~ x1*x2, data=DATA) #summary(DATA$x1) #visualize_model(M) M2 <- lm (review_taste ~ . ^ 2 , data= BEER) see_interactions (M2, many= TRUE )
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## ## Press [enter] to continue to see next set of interactions or q (then Enter) to quit
#Body and Sour #Body and Spices M.int <- lm (review_taste ~ Body * Spices, data= BEER) visualize_model (M.int)
## ## Interaction term has p-value 0.005983 summary (M.int) ## ## Call: ## lm(formula = review_taste ~ Body * Spices, data = BEER) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.17116 -0.21003 0.03287 0.28643 1.48974 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.300e+00 3.614e-02 91.322 < 2e-16 *** ## Body 7.098e-03 7.275e-04 9.756 < 2e-16 *** ## Spices 7.742e-03 1.428e-03 5.423 7.35e-08 *** ## Body:Spices -6.381e-05 2.316e-05 -2.755 0.00598 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4655 on 996 degrees of freedom
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## Multiple R-squared: 0.158, Adjusted R-squared: 0.1555 ## F-statistic: 62.3 on 3 and 996 DF, p-value: < 2.2e-16 summary (BEER $ Body) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.00 29.00 39.00 45.17 57.25 175.00