Assignment7-320-MultipleRegression2

.docx

School

The University of Tennessee, Knoxville *

*We aren’t endorsed by this school

Course

320

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

19

Uploaded by ProfessorSquidMaster853

BAS 320 - Assignment 7 - Multiple Regression Part 2 Matthew Zook M <- lm (review_taste ~ ., data= BEER) drop1 (M, test= "F" ) ## Single term deletions ## ## Model: ## review_taste ~ ABV + Min.IBU + Max.IBU + Astringency + Body + ## Alcohol + Bitter + Sweet + Sour + Salty + Fruits + Hoppy + ## Spices + Malty ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 173.65 -1720.7 ## ABV 1 6.8645 180.52 -1683.9 38.9373 6.491e-10 *** ## Min.IBU 1 2.2290 175.88 -1710.0 12.6432 0.0003948 *** ## Max.IBU 1 0.0203 173.67 -1722.6 0.1150 0.7346205 ## Astringency 1 0.0989 173.75 -1722.1 0.5610 0.4540476 ## Body 1 2.1096 175.76 -1710.6 11.9661 0.0005650 *** ## Alcohol 1 3.1541 176.81 -1704.7 17.8911 2.557e-05 *** ## Bitter 1 0.1093 173.76 -1722.1 0.6200 0.4312491 ## Sweet 1 0.7480 174.40 -1718.4 4.2429 0.0396778 * ## Sour 1 1.8677 175.52 -1712.0 10.5939 0.0011732 ** ## Salty 1 0.3135 173.97 -1720.9 1.7784 0.1826602 ## Fruits 1 2.6961 176.35 -1707.3 15.2932 9.836e-05 *** ## Hoppy 1 0.2920 173.94 -1721.0 1.6563 0.1984012 ## Spices 1 6.5096 180.16 -1685.9 36.9244 1.753e-09 *** ## Malty 1 1.6272 175.28 -1713.4 9.2299 0.0024437 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 BEER <- BEER[, c ( "review_taste" , "ABV" , "Fruits" , "Sour" , "Spices" , "Body" )] The following gives a skeleton/template that you could fill out to narrate your analysis. You’ll need to adapt words throughout to make it specific to your dataset. Feel free to deviate from the skeleton as long as you’re hitting all the required points (see Rubric and detailed list of requirements on Canvas)! Delete this paragraph (and any prompts provided in other paragraphs) before knitting/submitting. The document should flow nicely as a professional report.
Analysis of … (replace the … with what you’re analyzing) I’m using a multiple regression model to predict the overall taste review from the alcohol by volume, amount of fruits, sour score, amount of spices, and body score. I’m making this model because I am curious to see which score has the most effect on the overall score and see how well I can predict the overall score based on the predictors. The data I’m using comes from https://www.kaggle.com/datasets/ruthgn/beer-profile- and-ratings-data-set/ and contains a total of 1000 rows and 6 total predictors Investigation of the relationship between the taste score and spices I am investigating the relationship between the overall taste score the amounnt of spices in the beer. a polynomial would do a better job than the linear regression model. I would suggest the fourth order polynomial predicting the value from the spices, spices^2, Spices^3, and the Spices^4 for this. M <- lm (review_taste ~ Spices, data= BEER); choose_order (M) ## order R2adj AICc ## 1 1 0.06564513 1417.809 ## 2 2 0.14089397 1336.854
## 3 3 0.16009325 1317.265 ## 4 4 0.18179266 1294.106 ## 5 5 0.18539948 1292.707 ## 6 6 0.18551032 1295.592 #choose_order(M) Multiple regression model and checking of assumptions While there are some violations, they are not enough to cause us to ditch the model. from a statistical standpoint, we have linearity in body, fruits, and sour, but we failed equal spread and normality, but they are relatively small. M <- lm (review_taste ~ ., data= BEER) #Fit a multiple linear regression summary (M) ## ## Call: ## lm(formula = review_taste ~ ., data = BEER) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.24191 -0.20268 0.04858 0.25555 1.48930 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.9959999 0.0413381 72.475 < 2e-16 *** ## ABV 0.0362721 0.0052925 6.853 1.26e-11 *** ## Fruits 0.0022523 0.0007679 2.933 0.00343 ** ## Sour 0.0018520 0.0006586 2.812 0.00502 ** ## Spices 0.0028970 0.0006593 4.394 1.23e-05 *** ## Body 0.0059590 0.0005607 10.628 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4326 on 994 degrees of freedom ## Multiple R-squared: 0.2743, Adjusted R-squared: 0.2706 ## F-statistic: 75.13 on 5 and 994 DF, p-value: < 2.2e-16 check_regression (M, extra= TRUE )
## ## Tests of Assumptions: ( sample size n = 1000 ): ## Linearity ## p-value for ABV : 0 ## p-value for Fruits : 0.9313 ## p-value for Sour : 0.9122 ## p-value for Spices : 0 ## p-value for Body : 0.2418 ## p-value for overall model : NA (not enough duplicate rows) ## Equal Spread: p-value is 0 ## Normality: p-value is 0 ## ## Advice: if n<25 then all tests must be passed. ## If n >= 25 and test is failed, refer to diagnostic plot to see if violation is severe ## or is small enough to be ignored. ## ## Press [enter] to continue to Predictor vs. Residuals plots or q (then Return) to quit ( 5 plots to show )
Identification of influential points There are 7 influential points. The first influential point is unusual because it has a 0 in fruits, Sour, and Spices yet has a very high review score. #M <- lm() #Fit a multiple regression model influence_plot (M) ## $Leverage ## [1] 210 477 523 690 824 885 907 influential.rows <- influence_plot (M) $ Leverage
INFLUENCE <- data.frame ( matrix ( 0 , nrow= length (influential.rows), ncol= ncol (BEER)) ) names (INFLUENCE) <- names (BEER) for (r in 1 : length (influential.rows)) { x <- as.numeric (BEER[influential.rows[r],]) INFLUENCE[r,] <- sapply ( 1 : length (x), function (i) { mean (BEER[[i]] <= x[i])} ) } round (INFLUENCE, digits= 2 ) ## review_taste ABV Fruits Sour Spices Body ## 1 0.99 0.98 0.02 0.01 0.05 0.02 ## 2 0.14 0.57 0.99 0.98 0.32 0.68 ## 3 0.10 0.07 0.47 0.52 1.00 0.60 ## 4 0.28 1.00 0.23 0.14 0.15 0.03 ## 5 0.11 0.33 0.06 0.05 0.98 0.94 ## 6 0.08 0.56 0.40 0.56 1.00 0.14 ## 7 0.06 0.11 0.98 0.46 0.98 0.21 BEER[ 210 ,] ## review_taste ABV Fruits Sour Spices Body ## 3098 4.5 12 0 0 0 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help