Check whether a linear model is appropriate for the data using the scatterplot .

Question

Want to see more full solutions like this?

Answer 1

Question

Chapter 12.4, Problem 48E

a.

To determine

Check whether a linear model is appropriate for the data using the scatterplot.

a.

Expert Solution

Answer to Problem 48E

Output using MINITAB software is given below:

Probability and Statistics for Engineering and the Sciences, Chapter 12.4, Problem 48E , additional homework tip 1

Yes, a simple linear model is appropriate for the data.

Explanation of Solution

Given info:

The data represents the values of the variables % total suspended solids removed (y) and amount filtered (x). The variable amount filtered is measured in 1000’s of liters.

Justification:

Software Procedure:

Step by step procedure to obtain scatterplot using MINITAB software is given as,

Choose Graph > Scatter plot.
Choose Simple, and then click OK.
Under Y variables, enter a column of % Total suspended solids removed.
Under X variables, enter a column of Amount filtered.
Click Ok.

Observation:

From the scatterplot it is clear that, as the values of amount filtered increases the values of % total suspended solids removed decreases linearly. Thus, there is a negative association between the variables amount filtered and % total suspended solids removed.

Appropriateness of regression linear model:

The conditions for a scatterplot that is well fitted for the data are,

Straight Enough Condition: The relationship between y and x straight enough to proceed with a linear regression model.
Outlier Condition: No outlier must be there which influences the fit of the least square line.
Thickness Condition: The spread of the data around the generally straight relationship seem to be consistent for all values of x.

The scatterplot shows a fair enough linear relationship between the variables amount filtered and % total suspended solids removed. The spread of the data seem to roughly consistent.

Moreover, the scatterplot does not show any outliers.

Therefore, all the three conditions of appropriateness of simple linear model are satisfied.

Thus, a linear model is appropriate for the data.

b.

To determine

Find the regression line for the variables % total suspended solids removed (y) and amount filtered (x).

b.

Expert Solution

Answer to Problem 48E

The regression line for the variables % total suspended solids removed (y) and amount filtered (x) is y⌢=52.6−0.220x_.

Explanation of Solution

Calculation:

Linear regression model:

A linear regression model is given as y^=b0+b1x where be the predicted values of response variable and x be the predictor variable. The b1 be the slope and b0 be the intercept of the line.

A linear regression model is given as y^=β^0+β^1x where y^ be the predicted values of response variable and x be the predictor variable. The β^1 be the estimate of slope and β^0 be the estimate of intercept of the line.

In the given problem the % of total suspended solids remove is the response variable y and the amount filtered is the predictor variable x

Regression:

Software procedure:

Step by step procedure to obtain regression equation using MINITAB software is given as,

Choose Stat > Regression > Fit Regression Line.
In Response (Y), enter the column of Removal efficiency.
In Predictor (X), enter the column of Inlet temperature.
Click OK.

The output using MINITAB software is given as,

Probability and Statistics for Engineering and the Sciences, Chapter 12.4, Problem 48E , additional homework tip 2

Thus, the regression line for the variables % total suspended solids removed (y) and amount filtered (x) is y⌢=52.6−0.220x_.

Interpretation:

The slope estimate implies a decrease in % total suspended solids removed by 22.0% for every 1,000 liters increase in amount filtered. It can also be said that, for every 1% increase in amount filtered the % total suspended solids removed decreases 22%.

c.

To determine

Find the proportion of observed variation in % total suspended solids removed that can be explained by amount filtered using the simple linear regression model.

c.

Expert Solution

Answer to Problem 48E

The proportion of observed variation in % total suspended solids removed that can be explained by amount filtered using the simple linear regression model is r2=0.701_.

Explanation of Solution

Justification:

R2(R-squared):

The coefficient of determination (R2) is defined as the proportion of variation in the observed values of the response variable that is explained by the regression. The squared correlation gives fraction of variability of response variable (y) accounted for by the linear regression model.

The general formula to obtain coefficient of variation is,

R2=r2

From the regression output obtained in part (b), the value of coefficient of determination is 0.701.

Thus, the coefficient of determination is r2=0.701_.

Interpretation:

From this coefficient of determination it can be said that, the amount filtered can explain only 70.1% variability in % total suspended solids removed. Then remaining variability of % total suspended solids removed is explained by other variables.

Thus, the percentage of variation in the observed values of %total suspended solids removed that is explained by the regression is 70.1%, which indicates that 70.1% of the variability in %total suspended solids removed is explained by variability in the amount filtered using the linear regression model.

d.

To determine

Test whether there is enough evidence to conclude that the predictor variable amount filtered is useful for predicting the value of the response variable %total suspended solids removed at α=0.05.

d.

Expert Solution

Answer to Problem 48E

There is sufficient evidence to conclude that the predictor variable amount filtered is useful for predicting the value of the response variable %total suspended solids removed.

Explanation of Solution

Calculation:

From the MINITAB output obtained in part (b), the regression line for the variables %total suspended solids removed (y) and amount filtered (x) is y⌢=52.6−0.220x_.

The test hypotheses are given below:

Null hypothesis:

H0:β1=0

That is, there is no useful relationship between the variables %total suspended solids removed (y) and amount filtered (x).

Alternative hypothesis:

H1:β1≠0

That is, there is useful relationship between the variables %total suspended solids removed (y) and amount filtered (x).

T-test statistic:

The test statistic is,

t=β^1−β1sβ^1∼t(n−2)

From the MINITAB output obtained in part (b), the test statistic is -4.33 and the P-value is 0.003.

Thus, the value of test statistic is -4.33 and P-value is 0.003.

Level of significance:

Here, level of significance is α=0.05.

Decision rule based on p-value:

If P-value≤α, then reject the null hypothesis H0.

If P-value>α, then fail to reject the null hypothesis H0.

Conclusion:

The P-value is 0.003 and α value is 0.05.

Here, P-value is less than the α value.

That is 0.003(=p)<0.05(=α).

By the rejection rule, reject the null hypothesis.

Thus, there is sufficient evidence to conclude that the predictor variable amount filtered is useful for predicting the value of the response variable %total suspended solids removed.

e.

To determine

Test whether there is enough evidence to infer that the true average decrease in “%total suspended solids removed” associated with 10,000 liters increase in “amount filtered” is greater than or equal to 2 at α=0.05.

e.

Expert Solution

Answer to Problem 48E

There is no sufficient evidence to infer that the true average decrease in “%total suspended solids removed” associated with 10,000 liters increase in “amount filtered” is greater than or equal to 2.

Explanation of Solution

Calculation:

Linear regression model:

A linear regression model is given as y^=b0+b1x where y^ be the predicted values of response variable and x be the predictor variable. b1 be the slope and b0 be the intercept of the line.

A linear regression model is given as y^=β^0+β^1x where y^ be the predicted values of response variable and x be the predictor variable. The β^1 be the estimate of slope and β^0 be the estimate of intercept of the line.

From the MINITAB output in part (b), the slope coefficient of the regression equation is b1=β^1=−0.22.

Here, β^1 be the slope of the sample regression line and β1 is the slope of the population regression line.

Here, the claim is that, when the amount filtered is increased from 10,000 liters the true average decrease in %total suspended solids removed is greater than or equal to 2.

The claim states that, amount filtered is increased by 10,000 liters.

Decrease in the %total suspended solids removed for 1,000 liters increase in amount filtered:

The true average decrease in the %total suspended solids removed for 1,000 liters increase in amount filtered is,

10,000 liters=21 ,000 liters=210=0.2

That is, when the amount filtered is increased by 1,000 liters the true average decrease in %total suspended solids removed is greater than or equal to 0.2.

The test hypotheses are given below:

Null hypothesis:

H0:β1≥−0.2

That is, the true average decrease in %total suspended solids removed is greater than or equal to 0.2.

Alternative hypothesis:

H1:β1<−0.2

That is, the true average decrease in %total suspended solids removed is less than 0.2.

Test statistic:

The test statistic is,

t=β^1−β1sβ^1∼t(n−2)

Degrees of freedom:

The sample size is n=10

The degrees of freedom is,

d.f=10−2=10−2=8

Thus, the degree of freedom is 8.

Here, level of significance is α=0.05.

Critical value:

Software procedure:

Step by step procedure to obtain the critical value using the MINITAB software:

Choose Graph > Probability Distribution Plot choose View Probability > OK.
From Distribution, choose ‘t’ distribution and enter 8 as degrees of freedom.
Click the Shaded Area tab.
Choose Probability Value and Left Tail for the region of the curve to shade.
Enter the Probability value as 0.05.
Click OK.

Output using the MINITAB software is given below:

Probability and Statistics for Engineering and the Sciences, Chapter 12.4, Problem 48E , additional homework tip 3

From the output, the critical value is –1.860.

Thus, the critical value is (−t0.05,10)=−1.860.

From the MINITAB output obtained in part (b), the estimate of error standard deviation of slope coefficient is sβ^1=0.05088.

Test statistic under null hypothesis:

Under the null hypothesis, the test statistic is obtained as follows:

t=β^1−β1sβ^1=−0.22−(−0.2)0.05088=−0.3931

Thus, the test statistic is -0.3931.

Decision criteria for the classical approach:

If t<−tα (test statistic < critical value), then reject the null hypothesis (H0).

Conclusion:

Here, the test statistic is -0.3931 and critical value is –1.860.

The t statistic is less than the critical value.

That is, −0.3931(=test statistic)<−1.860(=critical value)

Based on the decision rule, reject the null hypothesis.

Hence, the true average decrease in %total suspended solids removed is not greater than or equal to 0.2.

Therefore, there is no sufficient evidence to infer that the true average decrease in “%total suspended solids removed” associated with 10,000 liters increase in “amount filtered” is greater than or equal to 2.

f.

To determine

Find the 95% specified confidence interval for the true mean %total suspended solids removed when the amount filtered is 100,000 liters.

Compare the width of the confidence intervals for 100,000 liters and 200,000 liters amount filtered.

f.

Expert Solution

Answer to Problem 48E

The 95% specified confidence interval for the true mean %total suspended solids removed when the amount filtered is 100,000 liters is (22.37244,38.82756)_.

The confidence interval for 100,000 liters of amount filtered will be narrower than the interval for 200,000 liters of amount filtered.

Explanation of Solution

Calculation:

From the MINITAB output obtained in part (b), the regression line for the variables %total suspended solids removed (y) and amount filtered (x) is y⌢=52.6−0.220x_.

Here, the variable amount filtered (x) is measured in 1000’s of liters.

Hence, the value of 100,000 for amount filtered is x=100.

Expected %total suspended solids removed when the amount filtered is x=100:

The expected value of %total suspended solids removed with x=100 is obtained as follows:

μ^y^=52.6−0.220x=52.6−0.220×100=30.6

Thus, the expected value of %total suspended solids removed with x=100 is 30.6.

Confidence interval:

The general formula for the (1−α)% confidence interval for the conditional mean at x=xp is,

CI=μ^y^±t(α2,n−2)s1n+(xp−x¯)2Sxx

Where, y^p be the point estimate for the conditional mean of the response variable at x=xp, and Sxx=∑ixi2−(∑ixi)2n.

From the MINITAB output in part (a), the value of the standard error of the estimate is s=10.5350.

The value of Sxx is obtained as follows:

From the give data, the sum of amount filtered is ∑xi=1,251 and ∑xi2=199,365.

The mean amount filtered is,

x¯=∑xin=1,25110=125.1

Thus, the mean amount filtered is x¯=125.1_.

Covariance term Sxx:

The value of Sxx is,

Sxx=∑ixi2−(∑ixi)2n=199,365−1251210=199,365−156,500.1=42,864.9

Thus, the covariance term Sxx is 42,864.9.

Critical value:

For 95% confidence level,

1−α=1−0.95α=0.05α2=0.052=0.025

Degrees of freedom:

The sample size is n=10

The degrees of freedom is,

d.f=n−2=10−2=8

From Table A.5 of the t-distribution in Appendix A, the critical value corresponding to the right tail area 0.025 and 8 degrees of freedom is 2.306.

Thus, the critical value is (tα2)=2.306.

The 95% confidence interval is,

C.I=μ^y^−t(α2,n−2)s1n+(xp−x¯)2Sxx≤μy≤μ^y^+t(α2,n−2)s1n+(xp−x¯)2Sxx=(30.6±2.306×10.5350110+(200−125.1)242,864.9)=(30.6±8.227558)=(22.37244,38.82756)

Thus, the 95% specified confidence interval for the true mean %total suspended solids removed when the amount filtered is 100,000 liters is (22.37244,38.82756)_.

Interpretation:

There is 95% confident that, the true mean %total suspended solids removed when the amount filtered is 100,000 liters lies between 22.37244 and 38.82756.

Comparison:

For 100,000 amount filtered, the value of x is x=100 and for 200,000 amount filtered, the value of x is x=200.

The mean amount filtered is x¯=125.1_.

Here, the observation x*=100 is close to mean value x¯=125.1 than the observation x*=200.

The general formula to obtain sY^ is,

sY^=s×1n+(x*−x¯)2Sxx.

For x*=100 the value of sY^ is,

sY^(100)=s×110+(100−125.1)242,864.9.

For x*=200 the value of sY^ is,

sY^(200)=s×110+(200−125.1)242,864.9.

In the two quantities, the only difference is the values (200−125.1)2 and (100−125.1)2.

In general, the value of the quantity (200−125.1)2 will be larger than the value of (100−125.1)2. Since, x*=100 is close to x¯=125.1 than x*=200.

Therefore, the value sY^(200) will be larger than the value of sY^(100).

The confidence interval will be wider for large value of sY^.

Here, sY^ is higher for x*=200.

Thus, the confidence interval is wider for x*=200.

g.

To determine

Find the 95% prediction interval for the single value of %total suspended solids removed when the amount filtered is 100,000 liters.

Compare the width of the prediction intervals for 100,000 liters and 200,000 liters amount filtered.

g.

Expert Solution

Answer to Problem 48E

The 95% prediction interval for the single value of %total suspended solids removed when the amount filtered is 100,000 liters is (4.950886,56.24911)_.

The prediction interval for 100,000 liters of amount filtered will be narrower than the interval for 200,000 liters of amount filtered.

Explanation of Solution

Calculation:

From the MINITAB output obtained in part (b), the regression line for the variables %total suspended solids removed (y) and amount filtered (x) is y⌢=52.6−0.220x_.

From part (c), the expected value of %total suspended solids removed with x=100 is 30.6.

Prediction interval for a single future value:

Prediction interval is used to predict a single value of the focus variable that is to be observed at some future time. In other words it can be said that the prediction interval gives a single future value rather than estimating the mean value of the variable.

The general formula for (1−α)% prediction interval for the conditional mean at x=xp is,

P.I=y^p±tα2s1+1n+(xp−∑ixin)2Sxx

where y^p be the predicted value of the response variable at x=xp and Sxx=∑ixi2−(∑ixi)2n

From the MINITAB output in part (b), the value of the standard error of the estimate is s=10.5350.

From part (c), the mean chlorine flow is x¯=125.1_ and the covariance term is Sxx=42,864.9.

Critical value:

For 95% confidence level,

1−α=1−0.95α=0.05α2=0.052=0.025

Degrees of freedom:

The sample size is n=10

The degrees of freedom is,

d.f=n−2=10−2=8

From Table A.5 of the t-distribution in Appendix A, the critical value corresponding to the right tail area 0.025 and 8 degrees of freedom is 2.306.

Thus, the critical value is (tα2)=2.306.

The 95% prediction interval is,

P.I=y^p±tα2s1+1n+(xp−x¯)2Sxx=30.6±(2.306)(10.5350)1+110+(100−125.1)242,864.9=30.6±25.64911=(4.950886,56.24911)

Thus, the 95% prediction interval for the single value of %total suspended solids removed when the amount filtered is 100,000 liters is (4.950886,56.24911)_.

Interpretation:

For repeated samples, there is 95% confident that the single value of % total suspended solids removed when the amount filtered is 100,000 liters will lie between 4.950886 and 56.24911.

Comparison:

For 100,000 amount filtered, the value of x is x=100 and for 200,000 amount filtered, the value of x is x=200.

The mean amount filtered is x¯=125.1_.

Here, the observation x*=100 is close to mean value x¯=125.1 than the observation x*=200.

The general formula to obtain sY^ is,

sY^=s×1n+(x*−x¯)2Sxx.

For x*=100 the value of sY^ is,

sY^(100)=s×110+(100−125.1)242,864.9.

For x*=200 the value of sY^ is,

sY^(200)=s×110+(200−125.1)242,864.9.

In the two quantities, the only difference is the values (200−125.1)2 and (100−125.1)2.

In general, the value of the quantity (200−125.1)2 will be larger than the value of (100−125.1)2. Since, x*=100 is close to x¯=125.1 than x*=200.

Therefore, the value sY^(200) will be larger than the value of sY^(100).

The prediction interval will be wider for large value of sY^.

Here, sY^ is higher for x*=200.

Thus, the prediction interval is wider for x*=200.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Students have asked these similar questions

The population mean and standard deviation are given below. Find the required probability and determine whether the given sample mean would be considered unusual. For a sample of n = 65, find the probability of a sample mean being greater than 225 if μ = 224 and σ = 3.5. For a sample of n = 65, the probability of a sample mean being greater than 225 if μ=224 and σ = 3.5 is 0.0102 (Round to four decimal places as needed.)

***Please do not just simply copy and paste the other solution for this problem posted on bartleby as that solution does not have all of the parts completed for this problem. Please answer this I will leave a like on the problem. The data needed to answer this question is given in the following link (file is on view only so if you would like to make a copy to make it easier for yourself feel free to do so) https://docs.google.com/spreadsheets/d/1aV5rsxdNjHnkeTkm5VqHzBXZgW-Ptbs3vqwk0SYiQPo/edit?usp=sharing

The data needed to answer this question is given in the following link (file is on view only so if you would like to make a copy to make it easier for yourself feel free to do so) https://docs.google.com/spreadsheets/d/1aV5rsxdNjHnkeTkm5VqHzBXZgW-Ptbs3vqwk0SYiQPo/edit?usp=sharing

Answer 2

Question

Chapter 12.4, Problem 48E

a.

To determine

Check whether a linear model is appropriate for the data using the scatterplot.

a.

Expert Solution

Answer to Problem 48E

Output using MINITAB software is given below:

Probability and Statistics for Engineering and the Sciences, Chapter 12.4, Problem 48E , additional homework tip 1

Yes, a simple linear model is appropriate for the data.

Explanation of Solution

Given info:

The data represents the values of the variables % total suspended solids removed (y) and amount filtered (x). The variable amount filtered is measured in 1000’s of liters.

Justification:

Software Procedure:

Step by step procedure to obtain scatterplot using MINITAB software is given as,

Choose Graph > Scatter plot.
Choose Simple, and then click OK.
Under Y variables, enter a column of % Total suspended solids removed.
Under X variables, enter a column of Amount filtered.
Click Ok.

Observation:

From the scatterplot it is clear that, as the values of amount filtered increases the values of % total suspended solids removed decreases linearly. Thus, there is a negative association between the variables amount filtered and % total suspended solids removed.

Appropriateness of regression linear model:

The conditions for a scatterplot that is well fitted for the data are,

Straight Enough Condition: The relationship between y and x straight enough to proceed with a linear regression model.
Outlier Condition: No outlier must be there which influences the fit of the least square line.
Thickness Condition: The spread of the data around the generally straight relationship seem to be consistent for all values of x.

The scatterplot shows a fair enough linear relationship between the variables amount filtered and % total suspended solids removed. The spread of the data seem to roughly consistent.

Moreover, the scatterplot does not show any outliers.

Therefore, all the three conditions of appropriateness of simple linear model are satisfied.

Thus, a linear model is appropriate for the data.

b.

To determine

Find the regression line for the variables % total suspended solids removed (y) and amount filtered (x).

b.

Expert Solution

Answer to Problem 48E

The regression line for the variables % total suspended solids removed (y) and amount filtered (x) is y⌢=52.6−0.220x_.

Explanation of Solution

Calculation:

Linear regression model:

A linear regression model is given as y^=b0+b1x where be the predicted values of response variable and x be the predictor variable. The b1 be the slope and b0 be the intercept of the line.

A linear regression model is given as y^=β^0+β^1x where y^ be the predicted values of response variable and x be the predictor variable. The β^1 be the estimate of slope and β^0 be the estimate of intercept of the line.

In the given problem the % of total suspended solids remove is the response variable y and the amount filtered is the predictor variable x

Regression:

Software procedure:

Step by step procedure to obtain regression equation using MINITAB software is given as,

Choose Stat > Regression > Fit Regression Line.
In Response (Y), enter the column of Removal efficiency.
In Predictor (X), enter the column of Inlet temperature.
Click OK.