Apr 3, 2024





Uploaded by ProfessorSquidMaster853

BAS 320 - Assignment 7 - Multiple Regression Part 2 Matthew Zook M <- lm (review_taste ~ ., data= BEER) drop1 (M, test= "F" ) ## Single term deletions ## ## Model: ## review_taste ~ ABV + Min.IBU + Max.IBU + Astringency + Body + ## Alcohol + Bitter + Sweet + Sour + Salty + Fruits + Hoppy + ## Spices + Malty ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 173.65 -1720.7 ## ABV 1 6.8645 180.52 -1683.9 38.9373 6.491e-10 *** ## Min.IBU 1 2.2290 175.88 -1710.0 12.6432 0.0003948 *** ## Max.IBU 1 0.0203 173.67 -1722.6 0.1150 0.7346205 ## Astringency 1 0.0989 173.75 -1722.1 0.5610 0.4540476 ## Body 1 2.1096 175.76 -1710.6 11.9661 0.0005650 *** ## Alcohol 1 3.1541 176.81 -1704.7 17.8911 2.557e-05 *** ## Bitter 1 0.1093 173.76 -1722.1 0.6200 0.4312491 ## Sweet 1 0.7480 174.40 -1718.4 4.2429 0.0396778 * ## Sour 1 1.8677 175.52 -1712.0 10.5939 0.0011732 ** ## Salty 1 0.3135 173.97 -1720.9 1.7784 0.1826602 ## Fruits 1 2.6961 176.35 -1707.3 15.2932 9.836e-05 *** ## Hoppy 1 0.2920 173.94 -1721.0 1.6563 0.1984012 ## Spices 1 6.5096 180.16 -1685.9 36.9244 1.753e-09 *** ## Malty 1 1.6272 175.28 -1713.4 9.2299 0.0024437 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 BEER <- BEER[, c ( "review_taste" , "ABV" , "Fruits" , "Sour" , "Spices" , "Body" )]
Analysis of … (replace the … with what you’re analyzing) I’m using a multiple regression model to predict the overall taste review from the alcohol by volume, amount of fruits, sour score, amount of spices, and body score. I’m making this model because I am curious to see which score has the most effect on the overall score and see how well I can predict the overall score based on the predictors. The data I’m using comes from and-ratings-data-set/ and contains a total of 1000 rows and 6 total predictors Investigation of the relationship between the taste score and spices I am investigating the relationship between the overall taste score the amounnt of spices in the beer. a polynomial would do a better job than the linear regression model. I would suggest the fourth order polynomial predicting the value from the spices, spices^2, Spices^3, and the Spices^4 for this. M <- lm (review_taste ~ Spices, data= BEER); choose_order (M) ## order R2adj AICc ## 1 1 0.06564513 1417.809 ## 2 2 0.14089397 1336.854
## 3 3 0.16009325 1317.265 ## 4 4 0.18179266 1294.106 ## 5 5 0.18539948 1292.707 ## 6 6 0.18551032 1295.592 #choose_order(M) Multiple regression model and checking of assumptions While there are some violations, they are not enough to cause us to ditch the model. from a statistical standpoint, we have linearity in body, fruits, and sour, but we failed equal spread and normality, but they are relatively small. M <- lm (review_taste ~ ., data= BEER) #Fit a multiple linear regression summary (M) ## ## Call: ## lm(formula = review_taste ~ ., data = BEER) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.24191 -0.20268 0.04858 0.25555 1.48930 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.9959999 0.0413381 72.475 < 2e-16 *** ## ABV 0.0362721 0.0052925 6.853 1.26e-11 *** ## Fruits 0.0022523 0.0007679 2.933 0.00343 ** ## Sour 0.0018520 0.0006586 2.812 0.00502 ** ## Spices 0.0028970 0.0006593 4.394 1.23e-05 *** ## Body 0.0059590 0.0005607 10.628 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4326 on 994 degrees of freedom ## Multiple R-squared: 0.2743, Adjusted R-squared: 0.2706 ## F-statistic: 75.13 on 5 and 994 DF, p-value: < 2.2e-16 check_regression (M, extra= TRUE )
## ## Tests of Assumptions: ( sample size n = 1000 ): ## Linearity ## p-value for ABV : 0 ## p-value for Fruits : 0.9313 ## p-value for Sour : 0.9122 ## p-value for Spices : 0 ## p-value for Body : 0.2418 ## p-value for overall model : NA (not enough duplicate rows) ## Equal Spread: p-value is 0 ## Normality: p-value is 0 ## ## Advice: if n<25 then all tests must be passed. ## If n >= 25 and test is failed, refer to diagnostic plot to see if violation is severe ## or is small enough to be ignored. ## ## Press [enter] to continue to Predictor vs. Residuals plots or q (then Return) to quit ( 5 plots to show )
Identification of influential points There are 7 influential points. The first influential point is unusual because it has a 0 in fruits, Sour, and Spices yet has a very high review score. #M <- lm() #Fit a multiple regression model influence_plot (M) ## $Leverage ## [1] 210 477 523 690 824 885 907 influential.rows <- influence_plot (M) $ Leverage
INFLUENCE <- data.frame ( matrix ( 0 , nrow= length (influential.rows), ncol= ncol (BEER)) ) names (INFLUENCE) <- names (BEER) for (r in 1 : length (influential.rows)) { x <- as.numeric (BEER[influential.rows[r],]) INFLUENCE[r,] <- sapply ( 1 : length (x), function (i) { mean (BEER[[i]] <= x[i])} ) } round (INFLUENCE, digits= 2 ) ## review_taste ABV Fruits Sour Spices Body ## 1 0.99 0.98 0.02 0.01 0.05 0.02 ## 2 0.14 0.57 0.99 0.98 0.32 0.68 ## 3 0.10 0.07 0.47 0.52 1.00 0.60 ## 4 0.28 1.00 0.23 0.14 0.15 0.03 ## 5 0.11 0.33 0.06 0.05 0.98 0.94 ## 6 0.08 0.56 0.40 0.56 1.00 0.14 ## 7 0.06 0.11 0.98 0.46 0.98 0.21 BEER[ 210 ,] ## review_taste ABV Fruits Sour Spices Body ## 3098 4.5 12 0 0 0 4
