Assignment7-320-MultipleRegression2
.docx
keyboard_arrow_up
School
The University of Tennessee, Knoxville *
*We aren’t endorsed by this school
Course
320
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
19
Uploaded by ProfessorSquidMaster853
BAS 320 - Assignment 7 - Multiple Regression Part 2
Matthew Zook
M <-
lm
(review_taste ~
., data=
BEER)
drop1
(M,
test=
"F"
)
## Single term deletions
## ## Model:
## review_taste ~ ABV + Min.IBU + Max.IBU + Astringency + Body + ## Alcohol + Bitter + Sweet + Sour + Salty + Fruits + Hoppy + ## Spices + Malty
## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 173.65 -1720.7 ## ABV 1 6.8645 180.52 -1683.9 38.9373 6.491e-10 ***
## Min.IBU 1 2.2290 175.88 -1710.0 12.6432 0.0003948 ***
## Max.IBU 1 0.0203 173.67 -1722.6 0.1150 0.7346205 ## Astringency 1 0.0989 173.75 -1722.1 0.5610 0.4540476 ## Body 1 2.1096 175.76 -1710.6 11.9661 0.0005650 ***
## Alcohol 1 3.1541 176.81 -1704.7 17.8911 2.557e-05 ***
## Bitter 1 0.1093 173.76 -1722.1 0.6200 0.4312491 ## Sweet 1 0.7480 174.40 -1718.4 4.2429 0.0396778 * ## Sour 1 1.8677 175.52 -1712.0 10.5939 0.0011732 ** ## Salty 1 0.3135 173.97 -1720.9 1.7784 0.1826602 ## Fruits 1 2.6961 176.35 -1707.3 15.2932 9.836e-05 ***
## Hoppy 1 0.2920 173.94 -1721.0 1.6563 0.1984012 ## Spices 1 6.5096 180.16 -1685.9 36.9244 1.753e-09 ***
## Malty 1 1.6272 175.28 -1713.4 9.2299 0.0024437 ** ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
BEER <-
BEER[,
c
(
"review_taste"
,
"ABV"
,
"Fruits"
,
"Sour"
,
"Spices"
,
"Body"
)]
The following gives a skeleton/template that you could fill out to narrate your analysis. You’ll need to adapt words throughout to make it specific to your dataset. Feel free to deviate from the skeleton as long as you’re hitting all the required points (see Rubric and detailed list of requirements on Canvas)! Delete this paragraph (and any prompts provided in other paragraphs) before knitting/submitting. The document should flow nicely as a professional report.
Analysis of … (replace the … with what you’re analyzing)
I’m using a multiple regression model to predict the overall taste review from the alcohol by volume, amount of fruits, sour score, amount of spices, and body score.
I’m making this model because I am curious to see which score has the most effect on the overall score and see how well I can predict the overall score based on the predictors.
The data I’m using comes from https://www.kaggle.com/datasets/ruthgn/beer-profile-
and-ratings-data-set/ and contains a total of 1000 rows and 6 total predictors
Investigation of the relationship between the taste score and spices
I am investigating the relationship between the overall taste score the amounnt of spices in the beer. a polynomial would do a better job than the linear regression model. I would suggest the fourth order polynomial predicting the value from the spices, spices^2, Spices^3, and the Spices^4 for this.
M <-
lm
(review_taste ~
Spices, data=
BEER); choose_order
(M) ## order R2adj AICc
## 1 1 0.06564513 1417.809
## 2 2 0.14089397 1336.854
## 3 3 0.16009325 1317.265
## 4 4 0.18179266 1294.106
## 5 5 0.18539948 1292.707
## 6 6 0.18551032 1295.592
#choose_order(M)
Multiple regression model and checking of assumptions
While there are some violations, they are not enough to cause us to ditch the model. from a statistical standpoint, we have linearity in body, fruits, and sour, but we failed equal spread and normality, but they are relatively small.
M <-
lm
(review_taste ~
., data=
BEER) #Fit a multiple linear regression
summary
(M)
## ## Call:
## lm(formula = review_taste ~ ., data = BEER)
## ## Residuals:
## Min 1Q Median 3Q Max ## -2.24191 -0.20268 0.04858 0.25555 1.48930 ## ## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.9959999 0.0413381 72.475 < 2e-16 ***
## ABV 0.0362721 0.0052925 6.853 1.26e-11 ***
## Fruits 0.0022523 0.0007679 2.933 0.00343 ** ## Sour 0.0018520 0.0006586 2.812 0.00502 ** ## Spices 0.0028970 0.0006593 4.394 1.23e-05 ***
## Body 0.0059590 0.0005607 10.628 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## Residual standard error: 0.4326 on 994 degrees of freedom
## Multiple R-squared: 0.2743, Adjusted R-squared: 0.2706 ## F-statistic: 75.13 on 5 and 994 DF, p-value: < 2.2e-16
check_regression
(M,
extra=
TRUE
)
## ## Tests of Assumptions: ( sample size n = 1000 ):
## Linearity
## p-value for ABV : 0 ## p-value for Fruits : 0.9313 ## p-value for Sour : 0.9122 ## p-value for Spices : 0 ## p-value for Body : 0.2418 ## p-value for overall model : NA (not enough duplicate rows)
## Equal Spread: p-value is 0 ## Normality: p-value is 0 ## ## Advice: if n<25 then all tests must be passed.
## If n >= 25 and test is failed, refer to diagnostic plot to see if violation is severe
## or is small enough to be ignored.
## ## Press [enter] to continue to Predictor vs. Residuals plots or q (then Return) to quit ( 5 plots to show )
Identification of influential points
There are 7 influential points. The first influential point is unusual because it has a 0 in fruits, Sour, and Spices yet has a very high review score.
#M <- lm() #Fit a multiple regression model
influence_plot
(M)
## $Leverage
## [1] 210 477 523 690 824 885 907
influential.rows <-
influence_plot
(M)
$
Leverage
INFLUENCE <-
data.frame
(
matrix
(
0
,
nrow=
length
(influential.rows),
ncol=
ncol
(BEER)) )
names
(INFLUENCE) <-
names
(BEER)
for
(r in
1
:
length
(influential.rows)) {
x <-
as.numeric
(BEER[influential.rows[r],])
INFLUENCE[r,] <-
sapply
(
1
:
length
(x),
function
(i) { mean
(BEER[[i]] <=
x[i])} )
}
round
(INFLUENCE,
digits=
2
)
## review_taste ABV Fruits Sour Spices Body
## 1 0.99 0.98 0.02 0.01 0.05 0.02
## 2 0.14 0.57 0.99 0.98 0.32 0.68
## 3 0.10 0.07 0.47 0.52 1.00 0.60
## 4 0.28 1.00 0.23 0.14 0.15 0.03
## 5 0.11 0.33 0.06 0.05 0.98 0.94
## 6 0.08 0.56 0.40 0.56 1.00 0.14
## 7 0.06 0.11 0.98 0.46 0.98 0.21
BEER[
210
,]
## review_taste ABV Fruits Sour Spices Body
## 3098 4.5 12 0 0 0 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Please help me to write the interpretation for ANOVA and linear regression for both countries in Austria and United Kingdom (Attachment is there)
arrow_forward
Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model
What is the predicted average mileage at tire pressure x = 31?
arrow_forward
Mental development in humans is related to the
volume of the part of the brain known as the
hippocampus. The given regression output shows the
mental development index at age 24 months vs. the
hippocampus volume in ml at birth for a
representative sample of 17 premature infants.
MDI_24 By Vol(ml)
2.5
2.4-
2.3-
2.2-
2.1-
2-
1.9-
1.8-
1.7-
1.6-
1.5-
50
60
70
80
90
100
110
12
Vol(ml)
Regression Analysis
MDI_24MO = 1.1359094 + 0.0093475*HippoVol
Summary of Fit
RSquare
RSquare Adj
S
Mean of Response
0.265
0.216
0.223
1.97758
NObservations
17
Analysis of Variance
Source
Model
DF Sum of Squares
1 0.268
Mean Square F Ratio
0.268
5.4023
MDI_24M0
arrow_forward
how to plot the estimated regression line on the scatter diagram in letter c and how to solve for letter d with graph
arrow_forward
Define regression line
arrow_forward
create graph of the two-variable data with a regression line, r, r2, and separate residual plot
arrow_forward
We estimate a simple regression Grade-B,+B,Effort + u where the red line represents
Qur estimate for Grade relative to Effort, and the dots represent our data points. Which
OLS assumption is violated in this regression?
arrow_forward
Show the best fitted line on scatter diagram and Find the predicted value for each y using the exposure time and the equation obtained in part b (b. Find the equation of regression line between radiation doses on exposure time .usingleast square method)
arrow_forward
Describe about how to place a regression line?
arrow_forward
Use the least squares regression line of this data set to predict a value.
Meteorologists in a seaside town wanted to understand how their annual rainfall
is affected by the temperature of coastal waters.
For the past few years, they monitored the average temperature of coastal
waters (in Celsius), x, as well as the annual rainfall (in millimetres), y.
Rainfall statistics
• The mean of the x-values is 11.503.
• The mean of the y-values is 366.637.
• The sample standard deviation of the x-values is 4.900.
• The sample standard deviation of the y-values is 44.387.
• The correlation coefficient of the data set is 0.896.
The least squares regression line of this data set is:
y = 8.116x + 273.273
How much rainfall does this line predict in a year if the average temperature of coastal waters
is 15 degrees Celsius?
Round your answer to the nearest integer.
millimetres
arrow_forward
Find the slope of regression line, y-intercept of regression line, coefficient of determination (r^2), and the linear correlation coefficient (r)
arrow_forward
A regression was run to determine if there is a relationship between the happiness index (y) and lifeexpectancy in years of a given country (x). The results of the regression were: y^=a+bx ; a=-0.423 ,b=0.07
a. Write the equation of the Least Squares Regression line.b. Find the value for the correlation coefficient, r?c. If a country increases its life expectancy, the happiness index will Increase or decrease ( circleone)d. If the life expectancy is increased by 1 year in a certain country, how much will the happinessindex change? Round to two decimal places.e. Use the regression line to predict the happiness index of a country with a life expectancy of 85years. Round to two decimal places.-
arrow_forward
Study conductes in patients with HIV. Primary outcome is CD4 coubt(measure stage of disease). Lower CD4=more advanced disease. Wqnt to find association between taking vitamins and supplements and CD4 count. Multiple regression analysis done relating CD4 to use of supplements (1=yes 0=no) and to duration of HIV in years. (# of yeara between diagnosis and study date) y=CD4 Count
Y=501.41+12.67 supplements -30 23 duration of HIV.
A. What is ezpected CD4 count for patients taking supplements who had HIv for 2.5 years?
B. Expected CD4 count not takong supplements with HIv at study start date?
C. Expected CD4 coubt for patients not taking supplements with HIV for 2.5 years?
D. Uf compare 2 patients, 1 HIV for 5 years longer than other, whats expected difference in CD4 count?
arrow_forward
Using SAS, draw a scatterplot between variables CRIME_RATE and PROP_CHANGE_INCOME. Attach the
scatterplot. Are those two variables good candidates to be analyzed using linear regression? Explain why or why
not.
crime_rate
150
100
50
O
15
O
O
20
O
O
O
25
O
8
O
O
8
O
O
o
O
prop_change_income
O
O
O
30
O
O
O
35
O
O
O
40
arrow_forward
For a regression line y^=mx+b, what does the Sum of Squares Due to Error measure?
Select the correct answer below:
the total variation in x that cannot be explained by variation in y
the total variation in y that cannot be explained by the error, or residuals, for x
the total variation in y that cannot be explained by the variation in x
the total variation in x that cannot be explained by the error, or residuals, for y
arrow_forward
Interpret the least squares regression line of this data set.
Meteorologists in a seaside town wanted to understand how their annual rainfall
is affected by the temperature of coastal waters.
For the past few years, they monitored the average temperature of coastal
waters (in Celsius), x, as well as the annual rainfall (in millimetres), y.
Rainfall statistics
• The mean of the x-values is 11.503.
• The mean of the y-values is 366.637.
• The sample standard deviation of the x-values is 4.900.
• The sample standard deviation of the y-values is 44.387.
• The correlation coefficient of the data set is 0.896.
The correct least squares regression line for the data set is:
y = 8.116x + 273.273
Use it to complete the following sentence:
The least squares regression line predicts an additional
annual rainfall if the average temperature of coastal waters increases by one degree
millimetres of
Celsius.
arrow_forward
Pre-study scores versus post-study scores for a class of 120 college freshman English students were considered. The residual plot for the least squares regression line showed no pattern. The least squares regression line was \hat{y} = 0.2 + 0.9xy^=0.2+0.9x with a correlation coefficient r = 0.76. What percent of the variation of post-study scores can be explained by the variation in pre-study scores?
57.8%
87.2%
52.0%
76.0%
We cannot determine the answer using the information given.
arrow_forward
Managers rate employees according to job performance and attitude. The results for several randomly selected employees are given below.
Performance
59
63
65
69
58
77
76
69
70
64
Attitude
72
67
78
82
75
87
92
83
87
78
Use the given data to find the equation of the regression line.
y^ = - 47.3 + 2.02x
y^ = 2.81 + 1.35x
y^ = 11.7 + 1.02x
y^ = 92.3 - 0.669x
arrow_forward
A multiple regression model has the form
y^=b0+b1x1+b2x2
The coefficient b1 is interpreted as the:A. change in y per unit change in x1, holding x2 constantB. change in the average value of y per unit change in x1, holding x2 constantC. change in y per unit change in x1, when x1 and x2 values are correlatedD. change in y per unit change in x1
If multicollinearity exists among the independent variables included in a multiple regression model, the:A. multiple coefficient of determination will assume a value close to zeroB. standard errors of the regression coefficients for the correlated independent variables will increaseC. regression coefficients will be difficult to interpretD. regression coefficients will be difficult to interpret and the standard errors of the regression coefficients for the correlated independent variables will increase
please explain which answer is correct and why
arrow_forward
call:
Researchers measured the percent
1m(formula = Symptoms - wear_mask, data - some_states)
of people in 25 states who ʻknew
someone with COVID symptoms' (ŷ)
and regressed this on the
percent of the population frequently
wearing a mask in public (x).
Residuals:
Min
-7.9167 -2.3306 -0.2469 2.5020 7. 3345
10 Median
30
Маx
coefficients:
(Intercept) 111.0981
wear_mask
Estimate std. Error t value Pr (>|t|)
10. 5423 10. 538 2.82e-10 ***
-8. 375 1. 94 e-08 ***
-1.0419
0.1244
signif. codes: 0 ****' 0.001 ***' 0.01 **' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.859 on 23 degrees of freedom
Multiple R-squared: 0.7531,
F-statistic: 70.15 on 1 and 23 DF, p-value: 1.936e-08
Adjusted R-squared: 0.7423
If 75 percent of people in a state wear masks regularly, what % of people does this
model predict will know someone with COVID symptons?
1) 32
2) 33
3) 34
4) 35
arrow_forward
a. what is the equation of the regression line?
b. interpret r of the regression line
c. interpret r^2 of the regression line
L1 - 4,15,12,11,8,6,7,2,7,14,20,3,13
L2 - 120,200,140,110,120,80,190,100,120,190,190,110,120
arrow_forward
What is a “residuals plot” and how is it used in assessing linearity of calibration curves?
arrow_forward
Expression levels of GeneA and GeneB were measured in 10 cell lines. The researcher would like to know if expression levels of GeneA and GeneB are related.Use a non-parametric method to test if GeneA and GeneB are correlated.
arrow_forward
Benign prostatic hyperplasia is a noncancerous
enlargement of the prostate gland that adversely
affects the quality of life (QoL) of millions of men. A
study of minimally invasive procedures for the
treatment for this condition looked at pretreatment
QoL (qol_base) and quality of life after 3 month on
treatment (qol_3mo)
The baseline data for 10 patients and their 3 month
follow-up data is presented below:
MAXFLO_B = maximum urine flow at baseline (urine
flow measurement scale misplaced)
MAXFLO3M = maximum urine flow after 3 months of
treatment
maxflo_b
maxflo3m
7
8
18
8
13
9.
16
11
8
4
12
10
8
14
10
13
arrow_forward
Find the best predicted value of y corresponding to the given value of x.
^
21) Six pairs of data yield r = 0.789 and the regression equation y = 4x - 2. Also, y = 19.0. What is the
best predicted value of y for x = 5?
A) 18.0
B) 18.5
C) 19.0
D) 22.0
21)
arrow_forward
The Average final score was 78, the average midterm score was 80. The standard deviation for the average final midterm score was 4.5 and the standard deviation for the average midterm score was 5.5. The correlation coefficient is 0.76
Find the least squares regression line.
Supposed that wanted to find to predict the final exam score based the midterm score
arrow_forward
Esc
Essentials of Gener...
ohm.lumenlearning.com/assess2/?cid=65363&aid=4812910#/skip/6
Lumen OHM Bb My Grades-2022 F...
·lumenohm
online homework manager
Course Messages Forums Calendar Gradebook
Home > Hunter College Math12550 Fall 2022 - Wiggins - TTh > Assessment
Question 6
Essentials of Gener...
HW37 - Assignment 7.5: Solving Trigonometric Equations
Score: 3.5/6 4/6 answered
0=
Solve 4 sin(20) 6 cos(0) = 0 for all solutions 0 < 0 < 2TT.
Y <
Submit Question
G periodic table - Go...
Give your answers accurate to at least 2 decimal places and in a list separated by commas.
Question Help: Video Message instructor
F3
A
arrow_forward
Highlight four ways of carrying out regression diagnostics
arrow_forward
Does the Regression line give information about all the data points in the data set? Does the Regression line usually have all the points in the data set on it?
arrow_forward
Calculate the equation of the regression line and calculate the correlation coefficient
arrow_forward
Whta kind of plot is useful for deciding whether it is reasonable to find a regression plane for a set of data points involving several predictor variables?
arrow_forward
A regression was run to determine if there is a relationship between hours a week of study (x) and the test scores (y).
The results of the regression were:
y = ax + b
a = 6.225
b = 37.66
r^2 = 0.531441
r = 0.729
Predict the final exam score of a student studies 10.5 hours per week.
Round to a whole number
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305652224/9781305652224_smallCoverImage.gif)
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning
Related Questions
- Please help me to write the interpretation for ANOVA and linear regression for both countries in Austria and United Kingdom (Attachment is there)arrow_forwardTire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model What is the predicted average mileage at tire pressure x = 31?arrow_forwardMental development in humans is related to the volume of the part of the brain known as the hippocampus. The given regression output shows the mental development index at age 24 months vs. the hippocampus volume in ml at birth for a representative sample of 17 premature infants. MDI_24 By Vol(ml) 2.5 2.4- 2.3- 2.2- 2.1- 2- 1.9- 1.8- 1.7- 1.6- 1.5- 50 60 70 80 90 100 110 12 Vol(ml) Regression Analysis MDI_24MO = 1.1359094 + 0.0093475*HippoVol Summary of Fit RSquare RSquare Adj S Mean of Response 0.265 0.216 0.223 1.97758 NObservations 17 Analysis of Variance Source Model DF Sum of Squares 1 0.268 Mean Square F Ratio 0.268 5.4023 MDI_24M0arrow_forward
- We estimate a simple regression Grade-B,+B,Effort + u where the red line represents Qur estimate for Grade relative to Effort, and the dots represent our data points. Which OLS assumption is violated in this regression?arrow_forwardShow the best fitted line on scatter diagram and Find the predicted value for each y using the exposure time and the equation obtained in part b (b. Find the equation of regression line between radiation doses on exposure time .usingleast square method)arrow_forwardDescribe about how to place a regression line?arrow_forward
- Use the least squares regression line of this data set to predict a value. Meteorologists in a seaside town wanted to understand how their annual rainfall is affected by the temperature of coastal waters. For the past few years, they monitored the average temperature of coastal waters (in Celsius), x, as well as the annual rainfall (in millimetres), y. Rainfall statistics • The mean of the x-values is 11.503. • The mean of the y-values is 366.637. • The sample standard deviation of the x-values is 4.900. • The sample standard deviation of the y-values is 44.387. • The correlation coefficient of the data set is 0.896. The least squares regression line of this data set is: y = 8.116x + 273.273 How much rainfall does this line predict in a year if the average temperature of coastal waters is 15 degrees Celsius? Round your answer to the nearest integer. millimetresarrow_forwardFind the slope of regression line, y-intercept of regression line, coefficient of determination (r^2), and the linear correlation coefficient (r)arrow_forwardA regression was run to determine if there is a relationship between the happiness index (y) and lifeexpectancy in years of a given country (x). The results of the regression were: y^=a+bx ; a=-0.423 ,b=0.07 a. Write the equation of the Least Squares Regression line.b. Find the value for the correlation coefficient, r?c. If a country increases its life expectancy, the happiness index will Increase or decrease ( circleone)d. If the life expectancy is increased by 1 year in a certain country, how much will the happinessindex change? Round to two decimal places.e. Use the regression line to predict the happiness index of a country with a life expectancy of 85years. Round to two decimal places.-arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageTrigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305652224/9781305652224_smallCoverImage.gif)
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning