Assignment8-320-CategoricalVariables

docx

School

The University of Tennessee, Knoxville *

*We aren’t endorsed by this school

Course

320

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

8

Uploaded by ProfessorSquidMaster853

Report
BAS 320 - Assignment 8 - Categorical Predictors Matthew Zook The following gives a skeleton/template that you could fill out to narrate your analysis. You’ll need to adapt words throughout to make it specific to your dataset. Feel free to deviate from the skeleton as long as you’re hitting all the required points (see Rubric and detailed list of requirements on Canvas)! Delete this paragraph (and any prompts provided in other paragraphs) before knitting/submitting. The document should flow nicely as a professional report. Code you should keep: lm, summary, drop1, visualize_model; basically everything in the R chunks below this Code you should delete after knitting: code reading in the data, combining categories, creating categorical variables from numerical ones if you used it; basically everything in the R chunk above this (and any other code you created that isn’t requested) Analysis of … (replace the … with what you’re analyzing) I’m using a multiple regression model to predict the overall taste score from the amount of spices I’m making this model because I am curious to see if the spices score can tell us anything about the overall taste score The data I’m using comes from https://www.kaggle.com/datasets/ruthgn/beer-profile- and-ratings-data-set/ and contains a total of 1000 rows and 6 total predictors Task 1 - Investigation of the relationship between … (replace with Y) and … (replace with a numeric X1 and a categorical X2) (If the interaction IS statistically significant, write out the full regression equation and the implicit regression equations for both levels. Write a short paragraph telling us which level has the stronger relationship between Y and X1. Further, discuss the difference in the average value of Y between levels when X1 is small and when X1 is big. Does the difference shrink? Grow? Flip signs? Because this requires more writing, you’ll get a small bonus if the interaction is statistically significant.) The full regression equation is taste = 3.46 + .0.005 Spices +0.36 BodyCategory - 0.003 Spices BodyCategory and the impicit regression equation for a low Body score is taste = 3.46 + .0.005 Spices and for a high score is taste = 3.82 + 0.002 Spices. As we saw in the model, the low body beers have a stronger relationship because we see in the slope that the further from zero, the higher the taste score is. When you have a higher body score with
low spice levels we see that they tend to have higher taste reviews, but that switches signs once we get to higher spice levels. M1 <- lm (review_taste ~ Spices + BodyCategory, data= BEER) #Fit a regression predicting y from x1 (numeric) and x2 (categorical) that EXCLUDES the interaction summary (M1) ## ## Call: ## lm(formula = review_taste ~ Spices + BodyCategory, data = BEER) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.51084 -0.21876 0.04815 0.30742 1.38592 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.481846 0.012950 268.86 <2e-16 *** ## Spices 0.003616 0.000358 10.10 <2e-16 *** ## BodyCategory(40,175] 0.308704 0.017007 18.15 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4744 on 3194 degrees of freedom ## Multiple R-squared: 0.1363, Adjusted R-squared: 0.1358 ## F-statistic: 252.1 on 2 and 3194 DF, p-value: < 2.2e-16 visualize_model (M1)
## ## Effect test for BodyCategory has p-value 3.718e-70 M2 <- lm (review_taste ~ Spices * BodyCategory, data= BEER) summary (M2) ## ## Call: ## lm(formula = review_taste ~ Spices * BodyCategory, data = BEER) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.64245 -0.22413 0.04427 0.30390 1.40610 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.4584374 0.0142091 243.397 < 2e-16 *** ## Spices 0.0052311 0.0005421 9.649 < 2e-16 *** ## BodyCategory(40,175] 0.3596036 0.0212880 16.892 < 2e-16 *** ## Spices:BodyCategory(40,175] -0.0028532 0.0007206 -3.959 7.68e-05
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
*** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4734 on 3193 degrees of freedom ## Multiple R-squared: 0.1406, Adjusted R-squared: 0.1398 ## F-statistic: 174.1 on 3 and 3193 DF, p-value: < 2.2e-16 visualize_model (M2) ## ## Effect test for interaction with BodyCategory has p-value 7.682e-05 Task 2 - Investigation of the relationship between … (replace with Y) and … (replace with a numeric X3 and a categorical X4) (If the interaction IS NOT statistically significant, write out the implicit regression equations for the reference level and for a level of your choice. Specify the names of these levels in your writeup. Write a short paragraph telling us which of those 2 levels has the higher average value of Y at all values of X3 (and by how much). Continue this discussion and comment on which level overall has the highest/lowest values of Y at all values of X3.)
For the implicit regression equation, we are usind the spices as our reference level and the fruit style as our other level. the equation is score = 3.964 + .0055*ABV. We see that the review score is stronger for the fruit style than the combined because it has the stronger slope. M3 <- lm (review_taste ~ Spices + BeerStyle, data= BEER) #Fit a regression predicting y from x3 (numeric) and x4 (categorical) that EXCLUDES the interaction #summary(M3) drop1 (M3, test= "F" ) ## Single term deletions ## ## Model: ## review_taste ~ Spices + BeerStyle ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 724.72 -4730.9 ## Spices 1 30.776 755.49 -4600.0 135.470 < 2.2e-16 *** ## BeerStyle 5 68.409 793.13 -4452.6 60.224 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 visualize_model (M3)
## ## Effect test for BeerStyle has p-value 4.044e-60 M4 <- lm (review_taste ~ Spices * BeerStyle, data= BEER) #M4 <- lm() #Fit a regression predicting y from x3 (numeric) and x4 (categorical) that INCLUDES the interaction summary (M4) ## ## Call: ## lm(formula = review_taste ~ Spices * BeerStyle, data = BEER) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.72459 -0.20952 0.06275 0.30202 1.23056 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.6451969 0.0110485 329.927 < 2e-16 ## Spices 0.0041414 0.0003601 11.500 < 2e-16 ## BeerStyleLager - Adjunct -1.0990617 0.1232489 - 8.917 < 2e-16 ## BeerStyleLager - European Pale -0.7385926 0.1252382 - 5.898 4.08e-09 ## BeerStyleLambic - Fruit 0.3137275 0.1066297 2.942 0.00328 ## BeerStyleStout - Irish Dry 0.0825196 0.1065284 0.775 0.43862 ## BeerStyleWheat Beer - Hefeweizen -0.3579474 0.2258641 - 1.585 0.11311 ## Spices:BeerStyleLager - Adjunct 0.0330344 0.0290612 1.137 0.25574 ## Spices:BeerStyleLager - European Pale 0.0056527 0.0223584 0.253 0.80042 ## Spices:BeerStyleLambic - Fruit 0.0014617 0.0129964 0.112 0.91046 ## Spices:BeerStyleStout - Irish Dry -0.0025620 0.0064937 - 0.395 0.69321 ## Spices:BeerStyleWheat Beer - Hefeweizen 0.0100828 0.0062049 1.625 0.10427 ## ## (Intercept) *** ## Spices *** ## BeerStyleLager - Adjunct *** ## BeerStyleLager - European Pale *** ## BeerStyleLambic - Fruit ** ## BeerStyleStout - Irish Dry ## BeerStyleWheat Beer - Hefeweizen
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## Spices:BeerStyleLager - Adjunct ## Spices:BeerStyleLager - European Pale ## Spices:BeerStyleLambic - Fruit ## Spices:BeerStyleStout - Irish Dry ## Spices:BeerStyleWheat Beer - Hefeweizen ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4767 on 3185 degrees of freedom ## Multiple R-squared: 0.1306, Adjusted R-squared: 0.1276 ## F-statistic: 43.48 on 11 and 3185 DF, p-value: < 2.2e-16 drop1 (M4, test= "F" ) ## Single term deletions ## ## Model: ## review_taste ~ Spices * BeerStyle ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 723.77 -4725.1 ## Spices:BeerStyle 5 0.94664 724.72 -4730.9 0.8332 0.5259 visualize_model (M4)
## ## Effect test for interaction with BeerStyle has p-value 0.5259