Wee 3 _peer_review_M
pdf
keyboard_arrow_up
School
Poolesville High *
*We aren’t endorsed by this school
Course
DTSA5003
Subject
Statistics
Date
Jun 3, 2024
Type
Pages
12
Uploaded by MateComputerAntelope99
C1M3_peer_reviewed
May 29, 2024
1
Module 3: Peer Reviewed Assignment
1.0.1
Outline:
The objectives for this assignment:
1. Learn how to read and interpret p-values for coefficients in R.
2. Apply Partial F-tests to compare different models.
3. Compute confidence intervals for model coefficients.
4. Understand model significance using the Overall F-test.
5. Observe the variability of coefficients using the simulated data.
General tips:
1. Read the questions carefully to understand what is being asked.
2. This work will be reviewed by another human, so make sure that you are clear and concise
in what your explanations and answers.
[16]:
# Load Required Packages
library(ggplot2)
1.1
Problem 1: Individual t-tests
The dataset below measures the chewiness (mJ) of different berries along with their sugar equiv-
alent and salt (NaCl) concentration. Let’s use these data to create a model to finally understand
chewiness.
Here are the variables: 1.
nacl
: salt concentration (NaCl) 2.
sugar
: sugar equivalent 3.
chewiness
:
chewiness (mJ)
Dataset Source: I. Zouid, R. Siret, F. Jourjion, E. Mehinagic, L. Rolle (2013). “Impact of Grapes
Heterogeneity According to Sugar Level on Both Physical and Mechanical Berries Properties and
their Anthocyanins Extractability at Harvest,” Journal of Texture Studies, Vol. 44, pp. 95-103.
1. (a) Simple linear regression (SLR) parameters
In the below code, we load in the data
and fit a SLR model to it, using
chewiness
as the response and
sugar
as the predictor.
The
summary of the model is printed. Let
α
= 0
.
05
.
1
Look at the results and answer the following questions: * What is the hypothesis test related to the
p-value
2.95e-09
? Clearly state the null and alternative hypotheses and the decision made based
on the p-value. * Does this mean the coefficient is statistically significant? * What does it mean
for a coefficient to be statistically significant?
[4]:
# Load the data
chew
.
data
=
read
.
csv(
"berry_sugar_chewy.csv"
)
chew
.
lmod
=
lm(chewiness
~
sugar, data
=
chew
.
data)
summary(chew
.
lmod)
Call:
lm(formula = chewiness ~ sugar, data = chew.data)
Residuals:
Min
1Q
Median
3Q
Max
-2.4557 -0.5604
0.1045
0.5249
1.9559
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.662878
0.756610
10.128
< 2e-16 ***
sugar
-0.022797
0.003453
-6.603 2.95e-09 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9178 on 88 degrees of freedom
Multiple R-squared:
0.3313,Adjusted R-squared:
0.3237
F-statistic: 43.59 on 1 and 88 DF,
p-value: 2.951e-09
1. The p value tests the null hypothesis that sugar has no effect on chewiness. The null hypoth-
esis: Sugar has no effect on chewiness. Alternate hypothesis: Chewiness is affected by sugar
and there is a relationship between chewiness and sugar.
2. The very low p value («<0.05)indicates that the correlation between chewiness and sugar is
statistically significant
3. Statistical significance means that the null hypothesis can be rejects, i.e., the value of chewi-
ness will increase or decrease based on the value of sugar
1. (b) MLR parameters
Now let’s see if the second predictor/feature
nacl
is worth adding to
the model. In the code below, we create a second linear model fitting
chewiness
as the response
with
sugar
and
nacl
as predictors.
Look at the results and answer the following questions: * Which, if any, of the slope parameters
are statistically significant? * Did the statistical significance of the parameter for
sugar
stay the
same, when compared to 1 (a)? If the statistical signficance changed, explain why it changed. If it
didn’t change, explain why it didn’t change.
2
[5]:
chew
.
lmod
.2 =
lm(chewiness
~ .
, data
=
chew
.
data)
summary(chew
.
lmod
.2
)
Call:
lm(formula = chewiness ~ ., data = chew.data)
Residuals:
Min
1Q
Median
3Q
Max
-2.3820 -0.6333
0.1234
0.5231
1.9731
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-7.1107
13.6459
-0.521
0.604
nacl
0.6555
0.6045
1.084
0.281
sugar
-0.4223
0.3685
-1.146
0.255
Residual standard error: 0.9169 on 87 degrees of freedom
Multiple R-squared:
0.3402,Adjusted R-squared:
0.325
F-statistic: 22.43 on 2 and 87 DF,
p-value: 1.395e-08
The statistical significance of the paremeter “sugar” changes (it decreased) with the addition of nacl
as a parameter. The decrease in statistical significance of sugar paramter could be due to multiple
reasons: (1) nacl and sugar parameters might be correlated, (2) complex model leading to overfitting
- the models aims to explain more variance of the reponse variable, but the added paramter (nacl)
does not add substantial new information, and this can affect the pereceived significance of teh
original parameter.
1. (c) Model Selection
Determine which of the two models we should use. Explain how you
arrived at your conclusion and write out the actual equation for your selected model.
[6]:
anova(chew
.
lmod,chew
.
lmod
.2
)
A anova: 2 × 6
Res.Df
RSS
Df
Sum of Sq
F
Pr(>F)
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
1
88
74.12640
NA
NA
NA
NA
2
87
73.13801
1
0.9883882
1.175719
0.2812249
The chew.lmod should be used as the Pr(>F) value for the chew.lmod2 is significantly higher than
0.05. chewness = 7.682 - 0.023*sugar
1. (d) Parameter Confidence Intervals
Compute
95
% confidence intervals for each parameter
in your selected model. Then, in words, state what these confidence intervals mean.
[6]:
# Your Code Here
confint(chew
.
lmod)
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
A matrix: 2 × 2 of type dbl
2.5 %
97.5 %
(Intercept)
6.15927388
9.16648152
sugar
-0.02965862
-0.01593536
(1) Intercept - we can be 95% confident that the intercept parameter falls between 6.159 and
9.166, (2) sugar (slope) - we can be 95% confident that the sugar parameter falls between
-0.0296 and -0.0159.
2
Problem 2: Variability of Slope in SLR
In this exercise we’ll look at the variability of slopes of simple linear regression models fitted to
realizations of simulated data.
Write a function, called
sim_data()
, that returns a simulated sample of size
n
= 20
from the model
Y
= 1 + 2
.
5
X
+
ϵ
where
ϵ
iid
∼
N
(0
,
1)
. We will then use this generative funciton to understand how
fitted slopes can vary, even for the same underlying population.
[7]:
sim_data
<-
function(n
=20
, var
=1
, beta
.0=1
, beta
.1=2.5
){
# BEGIN SOLUTION HERE
x
=
seq(
-1
,
1
, length
.
out
=
n); beta0
= 1
; beta1
= 2.5
; e
=
rnorm(n,
0
,
␣
,
→
sqrt(var))
y
=
beta0
+
beta1
*
x
+
e
# END SOLUTION HERE
data
=
data
.
frame(x
=
x, y
=
y)
return
(data)
}
2.
(a) Fit a slope
Execute the following code to generate 20 data points, fit a simple linear
regression model and plot the results.
Just based on this plot, how well does our linear model fit the data?
[11]:
data
=
sim_data()
lmod
=
lm(y
~
x, data
=
data)
ggplot(aes(x
=
x, y
=
y), data
=
data)
+
geom_point()
+
geom_smooth(method
=
"lm"
, formula
=
y
~
x, se
=
FALSE, color
=
"#CFB87C"
)
4
The linear model reasonably fits the data.
2. (b) Do the slopes change?
Now we want to see how the slope of our line varies with different
random samples of data.
Call our data genaration funciton
50
times to gather
50
independent
samples. Then we can fit a SLR model to each of those samples and plot the resulting slope. The
function below performs this for us.
Experiment with different variances and report on what effect that has to the spread of the slopes.
[15]:
gen_slopes
<-
function(num
.
slopes
=50
, var
=1
, num
.
samples
=20
){
g
=
ggplot()
# Repeat the sample for the number of slopes
for
(ii
in
1
:num
.
slopes){
5
# Generate a random sampling of data
data
=
sim_data(n
=
num
.
samples, var
=
var)
# Add the slope of the best fit linear model to the plot
g
=
g
+
stat_smooth(aes(x
=
x, y
=
y), data
=
data, method
=
"lm"
, geom
=
"line"
,
se
=
FALSE, alpha
=0.4
, color
=
"#CFB87C"
, size
=1
)
}
return
(g)
}
[21]:
# gen_slopes()
gen_slopes(
50
,
10
,
20
)
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
7
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
`geom_smooth()` using formula 'y ~ x'
8
The sprad of the slope increases with increase in variance. At low variance (like 1 or 2) many lines
are bunched together except for one or two lines which are outside the cluster.
2. (c) Distributions of Slopes
As we see above, the slopes are somewhat random. That means
that they follow some sort of distribution, which we can try to discern. The code below computes
num_samples
independent realizations of the model data, computes the SLR model, and generates
a histogram of the resulting slopes.
Again, experiment with different variances for the simulated data and record what you notice.
What do you notice about the shapes of the resulting histograms?
[23]:
hist_slopes
<-
function(num
.
slopes
=500
, var
=1
, num
.
samples
=20
){
slopes
=
rep(
0
, num
.
slopes)
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
# For num.slopes, compute a SLR model slope
for
(i
in
1
:num
.
slopes){
# Simulate the desired data
data
=
sim_data(var
=
var, n
=
num
.
samples)
# Fit an SLR model to the data
lmod
=
lm(y
~
x, data
=
data)
# Add the slopes to the vector of slopes
slopes[i]
=
lmod
$
coef[
2
]
}
# Plot a histogram of the resulting slopes
g
=
ggplot()
+
aes(slopes)
+
geom_histogram(color
=
"black"
, fill
=
"#CFB87C"
)
return
(g)
}
[35]:
# hist_slopes()
hist_slopes(
500
,
1
,
20
)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
10
The slopes, in general foll a normal distribution for all variances. However, with increase in the
variance, there is an increase in upper and lower bounds of the normal distribution curve.
2. (d) Confidence Intervals of Slopes
What does that all mean? It means that when we fit a
linear regression model, our parameter
estimates
will not be equal to the true parameters. Instead,
the estimates will vary from sample to sample, and form a distribution. This is true for any linear
regression model with any data - not just simulated data - as long as we assume that there is a
large population that we can resample the response from (at fixed predictor values).
Also note
that we only demonstrated this fact with the slope estimate, but the same principle is true for the
intercept, or if we had several slope parameters.
This simulation shows that there is a chance for a linear regression model to have a slope that is
very different from the true slope.
But with a large sample size,
n
, or small error variance,
σ
2
,
11
the distribution will become narrower. Confidence intervals can help us understand this variability.
The procedure that generates confidence intervals for our model parameters has a high probability
of covering the true parameter. And, the higher
n
is, for a fixed
σ
2
, or the smaller
σ
2
is, for a fixed
n
, the narrower the confidence interval will be!
Draw a single sample of size
n
= 20
from
sim_data()
with variance
σ
2
= 1
. Use your sample to
compute a 95% confidence interval for the slope. Does the known slope for the model (which we
can recall is
2
.
5
) fall inside your confidence interval? How does the value of
σ
2
affect the CI width?
[38]:
# Your code here
lmod
=
lm(y
~
x, data
=
data)
confint(lmod)
A matrix: 2 × 2 of type dbl
2.5 %
97.5 %
(Intercept)
0.325123
1.196637
x
1.651787
3.087614
1. The known slope of the model is within the confidence interval.
2. The CI width increases with increase in variance (var = 1, width = 1.44; var = 2, width =
2.04; var = 3, width = 2.92; var = 5, width = 4.19)
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Fit these three regression models and then discuss the similarities and differences between them, particularly as relates to slope estimates (use CI’s) and R2. Also address why this is a “special case” and we wouldn’t necessarily expect to see these model characteristics for a typical dataset.
a) Additive model including both predictors (output attached)
b) Model including only Moisture (output attached)
c) Model including only Sweetness
BrandLiking = 68.62 + 4.38 Sweetness
Term 95% CI P-ValueConstant (50.16, 87.09) 0.000Sweetness (-1.46, 10.21) 0.130
S R-sq R-sq(adj)10.8915 15.57% 9.54%
arrow_forward
I ran these regressions in SAS but don't know how to interpret them.
1a. What is the causal effect of gun control on life expectancy?
b. What potential sources of bias might there be in the regression model?
arrow_forward
a, b, and c
arrow_forward
The attached results are for a multiple regression study of smartphone addiction (SSA-SV) proneness in relation to 1) Gender 2)Age 3) Anxiety (GAD-7). I just want to clarify what the F-statistic means and the effects of AGE on the F-statistic; given that AGE is statistically significant compared to the other predictor variables.
arrow_forward
Explain whether each scenario below is a regression, classification, or unsupervised learn-
ing problem. If it is a supervised learning scenario, indicate whether we are more interested
in inference or prediction. Finally, provide in each case the number of observations, n,
and the number of predictors, p.
(1) An online retailer must decide whether to display advertisement A or advertisement
B to each customer on the basis of collected customer demographics (age, income,
zip code, and gender). A set of 150 of its customers have already expressed a
preference for one advertisement or the other.
(2) A policy analyst is interested in discovering factors that are associated with the
crime rate across different U.S. cities. For each of 500 cities, the policy analyst
gathers the following data: the crime rate, unemployment rate, population, median
income, median home price, and state.
(3) The
the channel owner to see where the subscribers are located, their age and gender, the
times and days…
arrow_forward
ANOVA or Regression based on the project data (provided in the module 4) and research question in the project file.
Your answer need to include 1. Output, 2. Ho and Ha, 3. P value, 4, statistical decision and 5. Interpretation.
arrow_forward
Online clothes II For the online clothing retailer dis-cussed in the previous problem, the scatterplot of Total
Yearly Purchases by Income showsThe correlation between Total Yearly Purchases and Incomeis 0.722. Summary statistics for the two variables are:
a) What is the linear regression equation for predictingTotal Yearly Purchase from Income?
b) Do the assumptions and conditions for regression ap-pear to be met?
c) What is the predicted average Total Yearly Purchasefor someone with a yearly Income of $20,000? Forsomeone with an annual Income of $80,000?d) What percent of the variability in Total YearlyPurchases is accounted for by this model?e) Do you think the regression might be a useful one forthe company? Comment.
arrow_forward
Part A: find R and the regression equation, then use Table I for the critical value. Let x =
Weight and y = Handling. Test at the .05 significance level.
1: Null and Alternative hypotheses
2: Critical score
3: Test score
4: Decision.
Part B: If there is a strong linear correlation, do the following.
1. State the regression equation using variables x and y.
2. State the regression equation using Weight and Handling instead of x and y.
arrow_forward
Modified Exercise
1. In an effort to determine the cost of air conditioning, a resident in College Station, TX,
recorded daily values of the variables
Tavg = mean temperature
Kwh = electricity consumption
for the period from September 19 through November 4 (Table 7.20).
(a) Make a scatterplot to show the relationship of power consumption and temperature.
Describe the relationship you see in the data.
(b) State the LS regression line for predicting electricity consumption using the mean
temperature.
(c) Give the point estimate for the parameters Bo, B1, and o2.
(d) Interpret Bo, B1, and o² in terms of this problem.
(e) What is the percentage of raw variability in electricity consumption that is explained by
mean temperature?
(f) Use R2 to calculate the value of r by hand.
(g) Give and interpret a 98% confidence interval for o.
(h) Give and interpret a 90% confidence interval for B1.
(i) Give and interpret a 90% interval for the average electrify consumption for days that are
79 degrees…
arrow_forward
Please rearrange the following steps of Multiple Linear Regression so that they are in order from the first step to the last step.
(You may repeat this question if you don't get it all correct the first time.)
Drag and drop options into correct order and submit. For keyboard navigation... SHOW MORE ✓
=
Create scatterplots of each x-variable with the y-variable to assess the assumption linearity and confirm by checking correlations.
III
III
=
Review the Variance Inflation Factor values and use a backward-elimination process to remove the variable with the largest VIF
above 10 and refit the model with the remaining x-variables and reassess the new VIF values.
Use the backward-elimination procedure to remove any unneeded x-variables from the model to obtain the reduced model.
III
Perform a 6-step ANOVA test to determine if at least one x-variable in the full model is useful for predicting the y-variable.
Check the residuals for the final model to determine if they are normally…
arrow_forward
Nine data points of data yield
r=0.867
and the regression equation
y=19.4+0.93x.
Also,
y=64.7.
What is the best predicted value of y for
x=40?
arrow_forward
[You must include the statistical software output that you used for obtaining the statistics to conclude your analysis. Please place them right after your answer for each part of the question.]
Part 1 Correlation and Simple Linear Regression
Correlation between Two Quantitative Variables and Simple Linear Regression
Oxygen consumption rate is a measure of aerobic fitness. However, the procedure is expensive and cumbersome. Your task in this project is to use the following recorded data to learn about the project participants and build a model to estimate oxygen consumption using one of the other variables provided in the following data set. The variables are
Oxygen in take rate (ml per kg body weight per minute),
Age (in years),
BMI (Weight/Height2)
RunTime (time to run one mile, in minutes),
RestPulse (resting pulse rate per minute),
RunPulse (heart rate while running the same time Oxygen rate measured),
MaxPulse (maximum heart rate recorded while running),
Ranking (a runner’s…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Fit these three regression models and then discuss the similarities and differences between them, particularly as relates to slope estimates (use CI’s) and R2. Also address why this is a “special case” and we wouldn’t necessarily expect to see these model characteristics for a typical dataset. a) Additive model including both predictors (output attached) b) Model including only Moisture (output attached) c) Model including only Sweetness BrandLiking = 68.62 + 4.38 Sweetness Term 95% CI P-ValueConstant (50.16, 87.09) 0.000Sweetness (-1.46, 10.21) 0.130 S R-sq R-sq(adj)10.8915 15.57% 9.54%arrow_forwardI ran these regressions in SAS but don't know how to interpret them. 1a. What is the causal effect of gun control on life expectancy? b. What potential sources of bias might there be in the regression model?arrow_forwarda, b, and carrow_forward
- The attached results are for a multiple regression study of smartphone addiction (SSA-SV) proneness in relation to 1) Gender 2)Age 3) Anxiety (GAD-7). I just want to clarify what the F-statistic means and the effects of AGE on the F-statistic; given that AGE is statistically significant compared to the other predictor variables.arrow_forwardExplain whether each scenario below is a regression, classification, or unsupervised learn- ing problem. If it is a supervised learning scenario, indicate whether we are more interested in inference or prediction. Finally, provide in each case the number of observations, n, and the number of predictors, p. (1) An online retailer must decide whether to display advertisement A or advertisement B to each customer on the basis of collected customer demographics (age, income, zip code, and gender). A set of 150 of its customers have already expressed a preference for one advertisement or the other. (2) A policy analyst is interested in discovering factors that are associated with the crime rate across different U.S. cities. For each of 500 cities, the policy analyst gathers the following data: the crime rate, unemployment rate, population, median income, median home price, and state. (3) The the channel owner to see where the subscribers are located, their age and gender, the times and days…arrow_forwardANOVA or Regression based on the project data (provided in the module 4) and research question in the project file. Your answer need to include 1. Output, 2. Ho and Ha, 3. P value, 4, statistical decision and 5. Interpretation.arrow_forward
- Online clothes II For the online clothing retailer dis-cussed in the previous problem, the scatterplot of Total Yearly Purchases by Income showsThe correlation between Total Yearly Purchases and Incomeis 0.722. Summary statistics for the two variables are: a) What is the linear regression equation for predictingTotal Yearly Purchase from Income? b) Do the assumptions and conditions for regression ap-pear to be met? c) What is the predicted average Total Yearly Purchasefor someone with a yearly Income of $20,000? Forsomeone with an annual Income of $80,000?d) What percent of the variability in Total YearlyPurchases is accounted for by this model?e) Do you think the regression might be a useful one forthe company? Comment.arrow_forwardPart A: find R and the regression equation, then use Table I for the critical value. Let x = Weight and y = Handling. Test at the .05 significance level. 1: Null and Alternative hypotheses 2: Critical score 3: Test score 4: Decision. Part B: If there is a strong linear correlation, do the following. 1. State the regression equation using variables x and y. 2. State the regression equation using Weight and Handling instead of x and y.arrow_forwardModified Exercise 1. In an effort to determine the cost of air conditioning, a resident in College Station, TX, recorded daily values of the variables Tavg = mean temperature Kwh = electricity consumption for the period from September 19 through November 4 (Table 7.20). (a) Make a scatterplot to show the relationship of power consumption and temperature. Describe the relationship you see in the data. (b) State the LS regression line for predicting electricity consumption using the mean temperature. (c) Give the point estimate for the parameters Bo, B1, and o2. (d) Interpret Bo, B1, and o² in terms of this problem. (e) What is the percentage of raw variability in electricity consumption that is explained by mean temperature? (f) Use R2 to calculate the value of r by hand. (g) Give and interpret a 98% confidence interval for o. (h) Give and interpret a 90% confidence interval for B1. (i) Give and interpret a 90% interval for the average electrify consumption for days that are 79 degrees…arrow_forward
- Please rearrange the following steps of Multiple Linear Regression so that they are in order from the first step to the last step. (You may repeat this question if you don't get it all correct the first time.) Drag and drop options into correct order and submit. For keyboard navigation... SHOW MORE ✓ = Create scatterplots of each x-variable with the y-variable to assess the assumption linearity and confirm by checking correlations. III III = Review the Variance Inflation Factor values and use a backward-elimination process to remove the variable with the largest VIF above 10 and refit the model with the remaining x-variables and reassess the new VIF values. Use the backward-elimination procedure to remove any unneeded x-variables from the model to obtain the reduced model. III Perform a 6-step ANOVA test to determine if at least one x-variable in the full model is useful for predicting the y-variable. Check the residuals for the final model to determine if they are normally…arrow_forwardNine data points of data yield r=0.867 and the regression equation y=19.4+0.93x. Also, y=64.7. What is the best predicted value of y for x=40?arrow_forward[You must include the statistical software output that you used for obtaining the statistics to conclude your analysis. Please place them right after your answer for each part of the question.] Part 1 Correlation and Simple Linear Regression Correlation between Two Quantitative Variables and Simple Linear Regression Oxygen consumption rate is a measure of aerobic fitness. However, the procedure is expensive and cumbersome. Your task in this project is to use the following recorded data to learn about the project participants and build a model to estimate oxygen consumption using one of the other variables provided in the following data set. The variables are Oxygen in take rate (ml per kg body weight per minute), Age (in years), BMI (Weight/Height2) RunTime (time to run one mile, in minutes), RestPulse (resting pulse rate per minute), RunPulse (heart rate while running the same time Oxygen rate measured), MaxPulse (maximum heart rate recorded while running), Ranking (a runner’s…arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Algebra and Trigonometry (MindTap Course List)AlgebraISBN:9781305071742Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningCollege AlgebraAlgebraISBN:9781305115545Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt