Beginning Statistics, 2nd Edition
Beginning Statistics, 2nd Edition
2nd Edition
ISBN: 9781932628678
Author: Carolyn Warren; Kimberly Denley; Emily Atchley
Publisher: Hawkes Learning Systems
Question
Book Icon
Chapter 12.CR, Problem 9CR
To determine

(a)

To calculate:

The sum of squared errors, SSE.

Expert Solution
Check Mark

Answer to Problem 9CR

Solution:

The required SSE is 16.7246.

Explanation of Solution

Given Information:

The following data is collected on the number of years of post high-school education and the annual incomes of eight people ten years after graduation from high school.

Thread Count 150 200 225 250 275 300 350 400
Price (in Dollars) 18 21 25 28 30 31 35 45

The least Squares regression line is the line for which the average variation from the data is the smallest, also called the line of best fit, given by

y^=b0+b1x.

Where b1 is the slope of the least-squares regression line for paired data from a sample,

And b0 is the y-intercept for the regression line.

Formula used:

The equation of least-squares regression line is given by,

y^=b0+b1x

Where b1 is the slope of the least-squares regression line given as,

b1=nxiyi(xi)(yi)nxi2(xi)2

And b0 is y-intercept given as,

b0=yinb1xin

Where n is the number of data pairs in the sample,

xi is the ith value of the explanatory variable,

And yi is the ith value of response variable.

The sum of squared errors (SSE) for a regression line is calculated as,

SSE=(yiy^i)2

Where, yi is the ith value of response variable,

And y^i is the predicted value of yi, using the least-squares regression model.

Calculation:

Thread Count Price(in Dollars) xiyi xi2 yi2
150 18 2700 22500 324
200 21 4200 40000 441
225 25 5625 50625 625
250 28 7000 62500 784
275 30 8250 75625 900
300 31 9300 90000 961
350 35 12250 122500 1225
400 45 18000 160000 2025
xi=2150 yi=233 xiyi=67325 xi2=623750 yi2=7285

Let xi be the thread counts of various bed sheets,

And yi be the price of the bed sheets.

The slope of the least-squares regression line is calculated as,

b1=nxiyi(xi)(yi)nxi2(xi)2

Where, xi=x1+x2+....+x8.

Substitute 150 for x1, 200 for x2 ….., 400 for x8 in the above formula.

xi=150+200+.....+350+400=2150

Proceed in the same manner to calculate yi,xiyi,xi2andyi2 for the rest of the data and refer table for the rest of the values calculated.

yi=18+21+.......+34+45=233

xiyi=2700+4200+.......+12250+18000=67325

xi2=22500+40000+.......+122500+160000=623750

yi2=324+441+625+.......+1225+2025=7285

The slope of the least-squares regression line is calculated as,

b1=nxiyi(xi)(yi)nxi2(xi)2

Substitute 2150 for xi, 233 for yi, 67325 for xiyi, 623750 for xi2 and 8 for n in the above formula.

b1=(8×67325)(233×2150)8(623750)(2150)2=0.1024

The y-intercept of regression line is calculated as,

b0=yinb1xin

Substitute 2150 for xi, 233 for yi, 8 for n and 0.1024for b1.

b0=2338(0.1024)21508=1.6050

The equation of least-squares regression line is given by,

y^=b0+b1x

Substitute 28.6514 for b0 and 0.1024 for b1 in the above formula.

y^=1.6050+0.1024x.

Number of years xi Annual income yi Predicted value y^i yiy^i (yiy^i)2
150 18 16.965 1.035 1.071225
200 21 22.085 -1.085 1.177225
225 25 24.645 0.355 0.126025
250 28 27.205 0.795 0.632025
275 30 29.765 0.235 0.055225
300 31 32.325 -1.325 1.755625
350 35 37.445 -2.445 5.978025
400 45 42.565 2.435 5.929225

The predicted values are calculated as,

y^=1.6050+0.1024x

The predicted value y1 is calculated as,

y^1=1.6050+0.1024x1

Substitute 150 for x1 in the above formula.

y^1=1.6050+0.1024(150)=16.965

Proceed in the same manner to calculate y^1 for the rest of the data and refer table for the rest of the values calculated.

The residual is calculated as, yiy^i,

Substitute 18 for y1 and 16.965for y^1.

y1y^1=1816.965=1.035

Square both sides of the equation.

(y1y^1)2=(1.035)2=1.0712

Proceed in the same manner to calculate (yiy^i)2 for all the 1in for the rest data and refer table for the rest of the (yiy^i)2 values calculated. Then the value of (yiy^i)2 is calculated as,

SSE=(yiy^ i)2=1.0712+1.177+0.1260+......+5.92922=16.7246

Conclusion:

Thus, the SSE is 16.7246

To determine

(b)

To calculate:

The standard error of estimate, Se.

Expert Solution
Check Mark

Answer to Problem 9CR

Solution:

The required standard error of estimate is 1.6696.

Explanation of Solution

Given Information:

The following data is collected on the number of years of post high-school education and the annual incomes of eight people ten years after graduation from high school.

Thread Count 150 200 225 250 275 300 350 400
Price (in Dollars) 18 21 25 28 30 31 35 45

Formula used:

The standard error of estimate, which is used to measure by how much the sample data points deviate from regression line is given by,

Se=(yiy^ i)2n2=SSEn2

Where, yi is the ith value of response variable,

y^i is the predicted value of yi, using the least-squares regression model,

n is the number of data pairs in the sample,

And SSE is the sum of squared errors.

Calculation:

The standard error of estimate is calculated as,

Se=(yiy^ i)2n2=SSEn2

Substitute 5868.153 for SSE and 8 for n in the above formula.

Se=16.724682=1.6696

Conclusion:

Thus, the standard error of estimate is 1.6696.

To determine

(c)

The 95% prediction interval for the price of 350-thread count sheets.

Expert Solution
Check Mark

Answer to Problem 9CR

Solution:

The required prediction interval is. (33.0776,41.8124).

Explanation of Solution

Given Information:

The following data is collected on the number of years of post high-school education and the annual incomes of eight people ten years after graduation from high school.

Thread Count 150 200 225 250 275 300 350 400
Price (in Dollars) 18 21 25 28 30 31 35 45

Formula used:

The margin of error of a prediction interval for an individual y-value is calculated as,

E=tα/2Se1+1n+n(x0x¯)2n(xi2)(xi)2

With degree of freedom df=n2.

Where, xi of response variable,

x0 is the fixed value,

Se is the standard error of estimate,

n is the number of data pairs in the sample,

SSE is the sum of squared errors,

And t -distribution is applied with degree of freedom of df=n2

Then the prediction interval for an individual y-value is,

(y^E,y^+E).

Calculation:

It is given that the level of prediction is 0.95 then the level of significance is calculated as,

α=10.95=0.05

Then,

tα/2=2.447

The mean of the number of years of post high school education is calculated as,

x¯=xin

Substitute 2150 for xi and 8 for n in the above formula of mean.

x¯=21508=268.75

The margin of error of a prediction interval for an individual y-value is calculated as,

E=tα/2Se1+1n+n(x0x¯)2n(xi2)(xi)2

Substitute 2.447 for tα/2, 1.6696 for Se, 8 for n, 350 for x0, 623750 for xi2, 2150 for xi and 268.75 for x¯ in the above formula.

E=2.447×1.6696×1+18+8(350268.75)28(623750)(2150)2=2.447×1.6696×1.0690=4.3674

The y^ is calculated by substituting the value of x0 in the regression line equation.

The regression line is,

y^=1.6050+0.1024x

Substitute 350 for x in the above formula.

y^=1.6050+0.1024(350)=37.4450

The prediction interval is,

(y^E,y^+E)=(37.44504.3674,37.4450+4.3674)=(33.0776,41.8124)

Conclusion:

The required prediction interval is. (33.0776,41.8124)

To determine

(d)

The 95% confidence interval for the y-intercept of the regression line.

Expert Solution
Check Mark

Answer to Problem 9CR

Solution:

The required confidence interval is (3.7304,6.9140).

Explanation of Solution

Given Information:

The following data is collected on the number of years of post high-school education and the annual incomes of eight people ten years after graduation from high school.

Thread Count 150 200 225 250 275 300 350 400
Price (in Dollars) 18 21 25 28 30 31 35 45

Formula Used:

The correlation coefficient measures the linear relationship between a response variable and explanatory variable calculated as,

r=n(xiyi)(xi)(yi)nxi2(xi)2nyi2(yi)2

Coefficient of determination measures the proportion of variation in the response variable caused by explanatory variable which is simply the square of r, the correlation coefficient.

The standard error of estimate, Se which is used to calculate by how much the sample data points deviates from regression line which is calculated as,

Se=(yiy^ i)2n2=SSEn2

In ANOVA,

Grand Mean is the weighted mean of the k sample means, one from each of the k populations, equivalent to the mean of all the sample data combined, given by

x¯¯=i=1knix¯ii=1kni

Sum of Squares among Treatments (SST) is the measures the variation between the sample means and the grand mean, given by,

SST =i=1kni(x¯ix¯¯)2

Sum of Squares for Error (SSE) is the measures the variation in the sample data resulting from the variability within each sample,

SSE =j=1n1(x1jx¯1)2+j=1n2(x2jx¯2)2+...+j=1nk(xkjx¯k)2

Total Variation, it is the sum of the squared deviations from the grand mean for all of the data values in each sample, given by

Total Variation=j=1n1(x1jx¯ ¯)2+j=1n2(x2jx¯ ¯)2+...+j=1nk(xkjx¯ ¯)2=SST + SSE

Mean Square for Treatments (MST) found by dividing the sum of squares among treatments by its degrees of freedom, given by

MST=SSTDFTwith DFT=k1

Mean Square for Error (MSE) found by dividing the sum of squares for error by its degrees of freedom, given by

MST=SSEDFEwith DFE=nTk

Test Statistic for an ANOVA Test is used when independent, simple random samples are taken from populations with variances that are unknown and assumed to be equal, where all of the k population distributions are approximately normal, given by,

F=MSTMSEwith df1=DFT=k1 and df2=DFE=nTk 

Calculation:

To generate the regression table in excel follow the given steps:

1. Under data tab, choose data analytics and then select regression.

2. Select the input Y range and enter the range of the given yi and select the input X range and enter the range of the given xi data.

3.Choose 95% confidence interval and click OK.

The following table will appear.

Regression Statistics
Multiple R 0.983094904
R Square 0.96647559
Adjusted R Square 0.960888189
Standard Error 1.66955532
Observations 8
ANOVA
df SS MS F Significance F
Regression 1 482.1505102 482.1505102 172.9740696 1.19253E-05
Residual 6 16.7244898 2.787414966
Total 7 498.875
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 1.591836735 2.17509095 0.73184835 0.491845191 -3.730419077 6.914092547
Slope 0.10244898 0.007789635 13.15196067 1.19253E-05 0.083388428 0.121509531
RESIDUAL OUTPUT
Observation Predicted y Residuals Standard Residuals
1 16.95918367 1.040816327 0.673359012
2 22.08163265 -1.081632653 -0.699765248
3 24.64285714 0.357142857 0.231054563
4 27.20408163 0.795918367 0.514921598
5 29.76530612 0.234693878 0.151835856
6 32.32653061 -1.326530612 -0.858202663
7 37.44897959 -2.448979592 -1.584374147
8 42.57142857 2.428571429 1.571171029

The confidence interval of the y-intercept can be constructed by adding and subtracting the margin of error to the point estimate by using Microsoft excel.

Referring regression statistics, R2, coefficient of determination is 0.9995 which implies that 99.95% of the response variable is explained by the explanatory variable.

Standard error is the standard error of estimate, Se calculated by the formula mentioned in concept.

The lower 95% and the upper 95% gives the confidence interval of the y-intercept.

The intercept given in the row of the table above is the b0 of the regression line coming out to be 1.59 and the slope which is 0.1024 given in the table above is b1 of the regression line.

So the regression line is,

y^=1.59+0.1024x

The lower and the upper endpoints for a 95% confidence interval for the y-intercept of the regression line, β0 is. (3.7304,6.9140)

Conclusion:

Thus, the 95% confidence interval for the y-intercept of the regression line is.

To determine

(e)

Construct a 95% confidence interval for the slope of the regression line.

Expert Solution
Check Mark

Answer to Problem 9CR

Solution:

The required confidence interval is (0.0833,0.1215).

Explanation of Solution

Given Information:

The following data is collected on the number of years of post high-school education and the annual incomes of eight people ten years after graduation from high school.

Thread Count 150 200 225 250 275 300 350 400
Price (in Dollars) 18 21 25 28 30 31 35 45

Formula Used:

The correlation coefficient measures the linear relationship between a response variable and explanatory variable calculated as,

r=n(xiyi)(xi)(yi)nxi2(xi)2nyi2(yi)2

Coefficient of determination measures the proportion of variation in the response variable caused by explanatory variable which is simply the square of r, the correlation coefficient.

The standard error of estimate, Se which is used to calculate by how much the sample data points deviates from regression line which is calculated as,

Se=(yiy^ i)2n2=SSEn2

In ANOVA,

Grand Mean is the weighted mean of the k sample means, one from each of the k populations, equivalent to the mean of all the sample data combined, given by

x¯¯=i=1knix¯ii=1kni

Sum of Squares among Treatments (SST) is the measures the variation between the sample means and the grand mean, given by,

SST=i=1kni(x¯ix¯¯)2

Sum of Squares for Error (SSE) is the measures the variation in the sample data resulting from the variability within each sample,

SSE =j=1n1(x1jx¯1)2+j=1n2(x2jx¯2)2+...+j=1nk(xkjx¯k)2

Total Variation, it is the sum of the squared deviations from the grand mean for all of the data values in each sample, given by

Total Variation=j=1n1(x1jx¯ ¯)2+j=1n2(x2jx¯ ¯)2+...+j=1nk(xkjx¯ ¯)2=SST + SSE

Mean Square for Treatments (MST) found by dividing the sum of squares among treatments by its degrees of freedom, given by

MST=SSTDFTwith DFT=k1

Mean Square for Error (MSE) found by dividing the sum of squares for error by its degrees of freedom, given by

MST=SSEDFEwith DFE=nTk

Test Statistic for an ANOVA Test is used when independent, simple random samples are taken from populations with variances that are unknown and assumed to be equal, where all of the k population distributions are approximately normal, given by,

F=MSTMSEwith df1=DFT=k1 and df2=DFE=nTk 

Calculation:

To generate the regression table in excel follow the given steps:

1. Under data tab, choose data analytics and then select regression.

2. Select the input Y range and enter the range of the given yi and select the input X range and enter the range of the given xi data.

3.Choose 95% confidence interval and click OK.

The following table will appear.

Regression Statistics
Multiple R 0.983094904
R Square 0.96647559
Adjusted R Square 0.960888189
Standard Error 1.66955532
Observations 8
ANOVA
df SS MS F Significance F
Regression 1 482.1505102 482.1505102 172.9740696 1.19253E-05
Residual 6 16.7244898 2.787414966
Total 7 498.875
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 1.591836735 2.17509095 0.731848356 0.491845191 -3.730419077 6.914092547
Slope 0.10244898 0.007789635 13.15196067 1.19253E-05 0.083388428 0.121509531
RESIDUAL OUTPUT
Observation Predicted y Residuals Standard Residuals
1 16.95918367 1.040816327 0.673359012
2 22.08163265 -1.081632653 -0.699765248
3 24.64285714 0.357142857 0.231054563
4 27.20408163 0.795918367 0.514921598
5 29.76530612 0.234693878 0.151835856
6 32.32653061 -1.326530612 -0.858202663
7 37.44897959 -2.448979592 -1.584374147
8 42.57142857 2.428571429 1.571171029

The lower 95% and the upper 95% gives the confidence interval of the slope.

The intercept given in the row of the table above is the b0 of the regression line coming out to be 1.59 and the slope which is 0.1024 given in the table above is b1 of the regression line.

So the regression line is,

y^=1.59+0.1024x

The lower and the upper endpoints for a 95% confidence interval for the slope of the regression line, β1 is. (0.0833,0.1215).

Conclusion:

Thus, the 95% confidence interval for the slope of the regression line is (0.0833,0.1215).

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!
Knowledge Booster
Background pattern image
Recommended textbooks for you
Text book image
MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc
Text book image
Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning
Text book image
Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning
Text book image
Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON
Text book image
The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman
Text book image
Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman