Suppose that the sales manager of a large automotive parts distributor wants to estimate the total annual sales for each of the company’s regions. Five factors appear to be related to regional sales: the number of retail outlets in the region, the number of automobiles in the region registered as of April 1, the total personal income recorded in the first quarter of the year, the average age of the automobiles (years), and the number of sales supervisors in the region. The data for each region were gathered for last year. For example, see the following table. In region 1 there were 1,739 retail outlets stocking the company’s automotive parts, there were 9,270,000 registered automobiles in the region as of April 1, and so on. The region’s sales for that year were $37,702,000. a. Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables outlets and income and between outlets and number of automobiles are fairly strong. Could this be a problem? What is this condition called? b. The output for all five variables is shown below. What percent of the variation is explained by the regression equation? c. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level. d. Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating “outlets” and “bosses”? Use the .05 significance level. e. The regression has been rerun below with “outlets” and “bosses” eliminated. Compute the coefficient of determination. How much has R 2 changed from the previous analysis? f. Following is a histogram of the residuals. Does the normality assumption appear reasonable? Why? g. Following is a plot of the fitted values of y (i.e., y ^ ) and the residuals. What do you observe? Do you see any violations of the assumptions?

Question

Want to see more full solutions like this?

Answer 1

Textbook Question

Chapter 14, Problem 18CE

Suppose that the sales manager of a large automotive parts distributor wants to estimate the total annual sales for each of the company’s regions. Five factors appear to be related to regional sales: the number of retail outlets in the region, the number of automobiles in the region registered as of April 1, the total personal income recorded in the first quarter of the year, the average age of the automobiles (years), and the number of sales supervisors in the region. The data for each region were gathered for last year. For example, see the following table. In region 1 there were 1,739 retail outlets stocking the company’s automotive parts, there were 9,270,000 registered automobiles in the region as of April 1, and so on. The region’s sales for that year were $37,702,000.

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 1

a. Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables outlets and income and between outlets and number of automobiles are fairly strong. Could this be a problem? What is this condition called?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 2

b. The output for all five variables is shown below. What percent of the variation is explained by the regression equation?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 3

c. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level.
d. Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating “outlets” and “bosses”? Use the .05 significance level.
e. The regression has been rerun below with “outlets” and “bosses” eliminated. Compute the coefficient of determination. How much has R² changed from the previous analysis?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 4

f. Following is a histogram of the residuals. Does the normality assumption appear reasonable? Why?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 5

g. Following is a plot of the fitted values of y (i.e., y ^ ) and the residuals. What do you observe? Do you see any violations of the assumptions?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 6

a.

Expert Solution

To determine

Find the single variable that has the strongest correlation with the dependent variable.

Explain whether the fairly strong correlations between outlets and income and outlets and number of automobiles, will be any problem.

Provide the name of the condition.

Answer to Problem 18CE

The single variable that has the strongest correlation with the dependent variable, is “income”.

The name of the condition is multicollinearity.

Explanation of Solution

Multiple linear regression model:

A multiple linear regression model is given as y^=a+b1x1+b2x2+b3x3+...+bkxk where y is the response or dependent variable, and x1,x2,...,xk are the k quantitative independent variables where k is a positive integer.

Here, a is the intercept term of the regression model, that is, the value of predicted value of y when X’s are 0 and bi’s are the slopes, that is, the amount of change of the predicted value of y for one unit increase in xi when all other independent variables are constant.

In the given problem the predicted dependent variable y is the annual sales. The number of retail outlets, the number of automobiles registered, personal income, the average of automobiles and the number of supervisors, are defined as x1,x2,x3,x4 and x5, respectively.

Correlation:

The correlation between two variables measures the linear relationship between those two variables.

According to the given output there is a strongest correlation between the independent variable “income” and the dependent variable “sales”. The correlation coefficient between “income” and “sales” is 0.964.

Thus, it implies that as the personal income increases the annual sales also increase.

Multicollinearity:

In a multiple regression model, when there is high correlation between two or more independent variables, then multicollinearity occurs.

The correlation between the independent variables outlets and income and between outlets and number of automobiles are fairly strong, such as, 0.825 and 0.775, respectively.

These correlations can occur multicollinearity in the regression model.

Due to this multicollinearity the standard errors will be high and there will be no exact estimate of the partial regression coefficient. Moreover, there will be difficulty to measure the relative significance of independent variables.

b.

Expert Solution

To determine

Find the percent of the variation that is explained by the regression equation.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output the SSR and SS total are 1,593.91 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.811,602.89=0.9943.

Thus, the approximate value of coefficient of multiple determination is 99.43%.

Hence, 99.43% of the variation is explained by the regression equation.

c.

Expert Solution

To determine

Perform a global hypothesis test to check whether any of the regression coefficients is not zero at 0.05 significance level.

Answer to Problem 18CE

There is strong evidence that at least any of the regression coefficient is not 0 at 0.05 significance level.

Explanation of Solution

Calculation:

Consider that y is dependent variable and xi's are the independent variables where βi's are the corresponding population regression coefficient for all i=1,2,3,4,5.

State the hypotheses:

Null hypothesis:

H0:β1=β2=β3=β4=β5=0.

That is, the model is not significant.

Alternative hypothesis:

H1:At least one βi is not equal to 0.

That is, the model is significant.

In case of global test the F test statistic is defined as,

F=SSRkSSEn−k−1, where SSR, SSE, n and k are the regression sum of square, error sum of square, sample size and the number of independent variables.

According to the output, the value of F statistic is 140.36 with numerator degrees of freedom 5 and denominator degrees of freedom 5.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the global test is 0.

Hence, p-value(=0)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that at least any of the regression coefficient is not 0 at 0.05 significance level.

d.

Expert Solution

To determine

Perform individual tests of each independent variable at 0.05 significance level.

Explain whether the independent variables “outlets” and “bosses” will be eliminated.

Answer to Problem 18CE

There is no significant relation between y and x1,x4 and x5, whereas there is significant relation between y and x2, and x3.

The independent random variables “the number of retail outlets”, “average age of automobiles” and “number of supervisors” can be eliminated.

Explanation of Solution

Calculation:

For independent variable x1:

Consider that β1 is the population regression coefficient of independent variable x1.

State the hypotheses:

Null hypothesis:

H0:β1=0.

That is, there is no significant relationship between y and x1.

Alternative hypothesis:

H1:β1≠0.

That is, there is significant relationship between y and x1.

In case of individual regression coefficient test the t test statistic is defined as,

t=bisbi, where bi and sbi are the i^th regression coefficient and the standard deviation of the i^th regression coefficient.

According to the given information the t statistic value corresponding to x1 is –0.24 with 4 degrees of freedom.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the outlets (x1) is 0.823

Hence, p-value(=0.823)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x1.

For independent variable x2:

Consider that β2 is the population regression coefficient of independent variable x2.

State the hypotheses:

Null hypothesis:

H0:β2=0.

That is, there is no significant relationship between y and x2.

Alternative hypothesis:

H1:β2≠0.

That is, there is significant relationship between y and x2.

According to the given ANOVA table the value of t test statistic corresponding to x2 is 3.15 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the automobiles (x2) is 0.035.

Hence, p-value(=0.035)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x2.

For independent variable x3:

Consider that β3 is the population regression coefficient of independent variable x3.

State the hypotheses:

Null hypothesis:

H0:β3=0.

That is, there is no significant relationship between y and x3.

Alternative hypothesis:

H1:β3≠0.

That is, there is significant relationship between y and x3.

According to the given ANOVA table the value of t test statistic corresponding to x3 is 9.35 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the income (x3) is 0.001.

Hence, p-value(=0.001)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x3.

For independent variable x4:

Consider that β4 is the population regression coefficient of independent variable x4.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x4.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x4.

According to the given ANOVA table the value of t test statistic corresponding to x4 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the age (x4) is 0.081.

Hence, p-value(=0.081)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x4.

For independent variable x5:

Consider that β4 is the population regression coefficient of independent variable x5.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x5.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x5.

According to the given ANOVA table the value of t test statistic corresponding to x5 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the bosses (x5) is 0.864.

Hence, p-value(=0.864)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x5.

As there are no significant relationship between the dependent variable and the independent variables x1 and x5, it is better to eliminate these variables.

Hence, it can be said that there is no significant relationship between the annual sales and the number of retail outlets and the number of supervisors. Thus, it is better to omit these independent random variables “the number of retail outlets” and “the number of supervisors”.

Moreover, there is no significant relationship between the dependent variable and the independent variable x4, it is better to eliminate this variable.

Hence, it can be said that there is no significant relationship between the annual sales and the average age of automobiles. Thus, it is better to omit this independent random variable “average age of automobiles” also.

e.

Expert Solution

To determine

Find the coefficient of determination.

Find the change of R2 from the previous analysis.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output after eliminating “outlets” and “bosses”, the SSR and SS total are 1,593.66 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.661,602.89=0.9942.

Thus, the approximate value of coefficient of multiple determination is 99.42%.

Hence, there is only 0.01%(=99.43−99.42) change of R2 from the previous analysis.

f.

Expert Solution

To determine

Explain whether the normality assumptions appear reasonably.

Explanation of Solution

Assumption of normality from histogram:

The majority of the observation in the middle and centered on the mean of 0.
There are lower frequencies on the tails of the distributions.

According to the given histogram, the most of the observations are centered on the mean of 0 and there are less frequencies on the tails of the distributions.

Hence, the normality assumptions appear reasonably.

g.

Expert Solution

To determine

Explain about the residual plot and also explain whether any assumptions are violated.

Explanation of Solution

Assumption for residual analysis for the regression model:

The plot of the residuals vs. the observed values of the predictor variable should fall roughly in a horizontal band and symmetric about x-axis.
For a normal probability plot, residuals should be roughly linear.
There should not be any observable pattern.

According to the given residual plot, the points are roughly in a horizontal band and more or less symmetric about x-axis. Moreover, there is no particular pattern in the residual plot. A complete haphazard and random nature has observed.

Hence, the assumptions of the residual plot are not violated.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Students have asked these similar questions

7.1. If X has an exponential distribution with the parameter 0, use the distribution function technique to find the probability density of the random variable Y = ln X. bilaga in dwreat

3. Please

What does the margin of error include? When a margin of error is reported for a survey, it includes a. random sampling error and other practical difficulties like undercoverage and non-response b. random sampling error, but not other practical difficulties like undercoverage and nonresponse c. practical difficulties like undercoverage and nonresponse, but not random smapling error d. none of the above is corret

Answer 2

Textbook Question

Chapter 14, Problem 18CE

Suppose that the sales manager of a large automotive parts distributor wants to estimate the total annual sales for each of the company’s regions. Five factors appear to be related to regional sales: the number of retail outlets in the region, the number of automobiles in the region registered as of April 1, the total personal income recorded in the first quarter of the year, the average age of the automobiles (years), and the number of sales supervisors in the region. The data for each region were gathered for last year. For example, see the following table. In region 1 there were 1,739 retail outlets stocking the company’s automotive parts, there were 9,270,000 registered automobiles in the region as of April 1, and so on. The region’s sales for that year were $37,702,000.

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 1

a. Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables outlets and income and between outlets and number of automobiles are fairly strong. Could this be a problem? What is this condition called?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 2

b. The output for all five variables is shown below. What percent of the variation is explained by the regression equation?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 3

c. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level.
d. Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating “outlets” and “bosses”? Use the .05 significance level.
e. The regression has been rerun below with “outlets” and “bosses” eliminated. Compute the coefficient of determination. How much has R² changed from the previous analysis?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 4

f. Following is a histogram of the residuals. Does the normality assumption appear reasonable? Why?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 5

g. Following is a plot of the fitted values of y (i.e., y ^ ) and the residuals. What do you observe? Do you see any violations of the assumptions?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 6

a.

Expert Solution

To determine

Find the single variable that has the strongest correlation with the dependent variable.

Explain whether the fairly strong correlations between outlets and income and outlets and number of automobiles, will be any problem.

Provide the name of the condition.

Answer to Problem 18CE

The single variable that has the strongest correlation with the dependent variable, is “income”.

The name of the condition is multicollinearity.

Explanation of Solution

Multiple linear regression model:

A multiple linear regression model is given as y^=a+b1x1+b2x2+b3x3+...+bkxk where y is the response or dependent variable, and x1,x2,...,xk are the k quantitative independent variables where k is a positive integer.

Here, a is the intercept term of the regression model, that is, the value of predicted value of y when X’s are 0 and bi’s are the slopes, that is, the amount of change of the predicted value of y for one unit increase in xi when all other independent variables are constant.

In the given problem the predicted dependent variable y is the annual sales. The number of retail outlets, the number of automobiles registered, personal income, the average of automobiles and the number of supervisors, are defined as x1,x2,x3,x4 and x5, respectively.

Correlation:

The correlation between two variables measures the linear relationship between those two variables.

According to the given output there is a strongest correlation between the independent variable “income” and the dependent variable “sales”. The correlation coefficient between “income” and “sales” is 0.964.

Thus, it implies that as the personal income increases the annual sales also increase.

Multicollinearity:

In a multiple regression model, when there is high correlation between two or more independent variables, then multicollinearity occurs.

The correlation between the independent variables outlets and income and between outlets and number of automobiles are fairly strong, such as, 0.825 and 0.775, respectively.

These correlations can occur multicollinearity in the regression model.

Due to this multicollinearity the standard errors will be high and there will be no exact estimate of the partial regression coefficient. Moreover, there will be difficulty to measure the relative significance of independent variables.

b.

Expert Solution

To determine

Find the percent of the variation that is explained by the regression equation.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output the SSR and SS total are 1,593.91 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.811,602.89=0.9943.

Thus, the approximate value of coefficient of multiple determination is 99.43%.

Hence, 99.43% of the variation is explained by the regression equation.

c.

Expert Solution

To determine

Perform a global hypothesis test to check whether any of the regression coefficients is not zero at 0.05 significance level.

Answer to Problem 18CE

There is strong evidence that at least any of the regression coefficient is not 0 at 0.05 significance level.

Explanation of Solution

Calculation:

Consider that y is dependent variable and xi's are the independent variables where βi's are the corresponding population regression coefficient for all i=1,2,3,4,5.

State the hypotheses:

Null hypothesis:

H0:β1=β2=β3=β4=β5=0.

That is, the model is not significant.

Alternative hypothesis:

H1:At least one βi is not equal to 0.

That is, the model is significant.

In case of global test the F test statistic is defined as,

F=SSRkSSEn−k−1, where SSR, SSE, n and k are the regression sum of square, error sum of square, sample size and the number of independent variables.

According to the output, the value of F statistic is 140.36 with numerator degrees of freedom 5 and denominator degrees of freedom 5.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the global test is 0.

Hence, p-value(=0)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that at least any of the regression coefficient is not 0 at 0.05 significance level.

d.

Expert Solution

To determine

Perform individual tests of each independent variable at 0.05 significance level.

Explain whether the independent variables “outlets” and “bosses” will be eliminated.

Answer to Problem 18CE

There is no significant relation between y and x1,x4 and x5, whereas there is significant relation between y and x2, and x3.

The independent random variables “the number of retail outlets”, “average age of automobiles” and “number of supervisors” can be eliminated.

Explanation of Solution

Calculation:

For independent variable x1:

Consider that β1 is the population regression coefficient of independent variable x1.

State the hypotheses:

Null hypothesis:

H0:β1=0.

That is, there is no significant relationship between y and x1.

Alternative hypothesis:

H1:β1≠0.

That is, there is significant relationship between y and x1.

In case of individual regression coefficient test the t test statistic is defined as,

t=bisbi, where bi and sbi are the i^th regression coefficient and the standard deviation of the i^th regression coefficient.

According to the given information the t statistic value corresponding to x1 is –0.24 with 4 degrees of freedom.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the outlets (x1) is 0.823

Hence, p-value(=0.823)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x1.

For independent variable x2:

Consider that β2 is the population regression coefficient of independent variable x2.

State the hypotheses:

Null hypothesis:

H0:β2=0.

That is, there is no significant relationship between y and x2.

Alternative hypothesis:

H1:β2≠0.

That is, there is significant relationship between y and x2.

According to the given ANOVA table the value of t test statistic corresponding to x2 is 3.15 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the automobiles (x2) is 0.035.

Hence, p-value(=0.035)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x2.

For independent variable x3:

Consider that β3 is the population regression coefficient of independent variable x3.

State the hypotheses:

Null hypothesis:

H0:β3=0.

That is, there is no significant relationship between y and x3.

Alternative hypothesis:

H1:β3≠0.

That is, there is significant relationship between y and x3.

According to the given ANOVA table the value of t test statistic corresponding to x3 is 9.35 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the income (x3) is 0.001.

Hence, p-value(=0.001)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x3.

For independent variable x4:

Consider that β4 is the population regression coefficient of independent variable x4.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x4.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x4.

According to the given ANOVA table the value of t test statistic corresponding to x4 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the age (x4) is 0.081.

Hence, p-value(=0.081)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x4.

For independent variable x5:

Consider that β4 is the population regression coefficient of independent variable x5.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x5.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x5.

According to the given ANOVA table the value of t test statistic corresponding to x5 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the bosses (x5) is 0.864.

Hence, p-value(=0.864)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x5.

As there are no significant relationship between the dependent variable and the independent variables x1 and x5, it is better to eliminate these variables.

Hence, it can be said that there is no significant relationship between the annual sales and the number of retail outlets and the number of supervisors. Thus, it is better to omit these independent random variables “the number of retail outlets” and “the number of supervisors”.

Moreover, there is no significant relationship between the dependent variable and the independent variable x4, it is better to eliminate this variable.

Hence, it can be said that there is no significant relationship between the annual sales and the average age of automobiles. Thus, it is better to omit this independent random variable “average age of automobiles” also.

e.

Expert Solution

To determine

Find the coefficient of determination.

Find the change of R2 from the previous analysis.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output after eliminating “outlets” and “bosses”, the SSR and SS total are 1,593.66 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.661,602.89=0.9942.

Thus, the approximate value of coefficient of multiple determination is 99.42%.

Hence, there is only 0.01%(=99.43−99.42) change of R2 from the previous analysis.

f.

Expert Solution

To determine

Explain whether the normality assumptions appear reasonably.

Explanation of Solution

Assumption of normality from histogram:

The majority of the observation in the middle and centered on the mean of 0.
There are lower frequencies on the tails of the distributions.

According to the given histogram, the most of the observations are centered on the mean of 0 and there are less frequencies on the tails of the distributions.

Hence, the normality assumptions appear reasonably.

g.

Expert Solution

To determine

Explain about the residual plot and also explain whether any assumptions are violated.

Explanation of Solution

Assumption for residual analysis for the regression model:

The plot of the residuals vs. the observed values of the predictor variable should fall roughly in a horizontal band and symmetric about x-axis.
For a normal probability plot, residuals should be roughly linear.
There should not be any observable pattern.

According to the given residual plot, the points are roughly in a horizontal band and more or less symmetric about x-axis. Moreover, there is no particular pattern in the residual plot. A complete haphazard and random nature has observed.

Hence, the assumptions of the residual plot are not violated.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Answer 3

Textbook Question

Chapter 14, Problem 18CE

Suppose that the sales manager of a large automotive parts distributor wants to estimate the total annual sales for each of the company’s regions. Five factors appear to be related to regional sales: the number of retail outlets in the region, the number of automobiles in the region registered as of April 1, the total personal income recorded in the first quarter of the year, the average age of the automobiles (years), and the number of sales supervisors in the region. The data for each region were gathered for last year. For example, see the following table. In region 1 there were 1,739 retail outlets stocking the company’s automotive parts, there were 9,270,000 registered automobiles in the region as of April 1, and so on. The region’s sales for that year were $37,702,000.

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 1

a. Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables outlets and income and between outlets and number of automobiles are fairly strong. Could this be a problem? What is this condition called?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 2

b. The output for all five variables is shown below. What percent of the variation is explained by the regression equation?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 3

c. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level.
d. Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating “outlets” and “bosses”? Use the .05 significance level.
e. The regression has been rerun below with “outlets” and “bosses” eliminated. Compute the coefficient of determination. How much has R² changed from the previous analysis?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 4

f. Following is a histogram of the residuals. Does the normality assumption appear reasonable? Why?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 5

g. Following is a plot of the fitted values of y (i.e., y ^ ) and the residuals. What do you observe? Do you see any violations of the assumptions?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 6

a.

Expert Solution

To determine

Find the single variable that has the strongest correlation with the dependent variable.

Explain whether the fairly strong correlations between outlets and income and outlets and number of automobiles, will be any problem.

Provide the name of the condition.

Answer to Problem 18CE

The single variable that has the strongest correlation with the dependent variable, is “income”.

The name of the condition is multicollinearity.

Explanation of Solution

Multiple linear regression model:

A multiple linear regression model is given as y^=a+b1x1+b2x2+b3x3+...+bkxk where y is the response or dependent variable, and x1,x2,...,xk are the k quantitative independent variables where k is a positive integer.

Here, a is the intercept term of the regression model, that is, the value of predicted value of y when X’s are 0 and bi’s are the slopes, that is, the amount of change of the predicted value of y for one unit increase in xi when all other independent variables are constant.

In the given problem the predicted dependent variable y is the annual sales. The number of retail outlets, the number of automobiles registered, personal income, the average of automobiles and the number of supervisors, are defined as x1,x2,x3,x4 and x5, respectively.

Correlation:

The correlation between two variables measures the linear relationship between those two variables.

According to the given output there is a strongest correlation between the independent variable “income” and the dependent variable “sales”. The correlation coefficient between “income” and “sales” is 0.964.

Thus, it implies that as the personal income increases the annual sales also increase.

Multicollinearity:

In a multiple regression model, when there is high correlation between two or more independent variables, then multicollinearity occurs.

The correlation between the independent variables outlets and income and between outlets and number of automobiles are fairly strong, such as, 0.825 and 0.775, respectively.

These correlations can occur multicollinearity in the regression model.

Due to this multicollinearity the standard errors will be high and there will be no exact estimate of the partial regression coefficient. Moreover, there will be difficulty to measure the relative significance of independent variables.

b.

Expert Solution

To determine

Find the percent of the variation that is explained by the regression equation.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output the SSR and SS total are 1,593.91 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.811,602.89=0.9943.

Thus, the approximate value of coefficient of multiple determination is 99.43%.

Hence, 99.43% of the variation is explained by the regression equation.

c.

Expert Solution

To determine

Perform a global hypothesis test to check whether any of the regression coefficients is not zero at 0.05 significance level.

Answer to Problem 18CE

There is strong evidence that at least any of the regression coefficient is not 0 at 0.05 significance level.

Explanation of Solution

Calculation:

Consider that y is dependent variable and xi's are the independent variables where βi's are the corresponding population regression coefficient for all i=1,2,3,4,5.

State the hypotheses:

Null hypothesis:

H0:β1=β2=β3=β4=β5=0.

That is, the model is not significant.

Alternative hypothesis:

H1:At least one βi is not equal to 0.

That is, the model is significant.

In case of global test the F test statistic is defined as,

F=SSRkSSEn−k−1, where SSR, SSE, n and k are the regression sum of square, error sum of square, sample size and the number of independent variables.

According to the output, the value of F statistic is 140.36 with numerator degrees of freedom 5 and denominator degrees of freedom 5.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the global test is 0.

Hence, p-value(=0)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that at least any of the regression coefficient is not 0 at 0.05 significance level.

d.

Expert Solution

To determine

Perform individual tests of each independent variable at 0.05 significance level.

Explain whether the independent variables “outlets” and “bosses” will be eliminated.

Answer to Problem 18CE

There is no significant relation between y and x1,x4 and x5, whereas there is significant relation between y and x2, and x3.

The independent random variables “the number of retail outlets”, “average age of automobiles” and “number of supervisors” can be eliminated.

Explanation of Solution

Calculation:

For independent variable x1:

Consider that β1 is the population regression coefficient of independent variable x1.

State the hypotheses:

Null hypothesis:

H0:β1=0.

That is, there is no significant relationship between y and x1.

Alternative hypothesis:

H1:β1≠0.

That is, there is significant relationship between y and x1.

In case of individual regression coefficient test the t test statistic is defined as,

t=bisbi, where bi and sbi are the i^th regression coefficient and the standard deviation of the i^th regression coefficient.

According to the given information the t statistic value corresponding to x1 is –0.24 with 4 degrees of freedom.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the outlets (x1) is 0.823

Hence, p-value(=0.823)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x1.

For independent variable x2:

Consider that β2 is the population regression coefficient of independent variable x2.

State the hypotheses:

Null hypothesis:

H0:β2=0.

That is, there is no significant relationship between y and x2.

Alternative hypothesis:

H1:β2≠0.

That is, there is significant relationship between y and x2.

According to the given ANOVA table the value of t test statistic corresponding to x2 is 3.15 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the automobiles (x2) is 0.035.

Hence, p-value(=0.035)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x2.

For independent variable x3:

Consider that β3 is the population regression coefficient of independent variable x3.

State the hypotheses:

Null hypothesis:

H0:β3=0.

That is, there is no significant relationship between y and x3.

Alternative hypothesis:

H1:β3≠0.

That is, there is significant relationship between y and x3.

According to the given ANOVA table the value of t test statistic corresponding to x3 is 9.35 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the income (x3) is 0.001.

Hence, p-value(=0.001)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x3.

For independent variable x4:

Consider that β4 is the population regression coefficient of independent variable x4.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x4.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x4.

According to the given ANOVA table the value of t test statistic corresponding to x4 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the age (x4) is 0.081.

Hence, p-value(=0.081)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x4.

For independent variable x5:

Consider that β4 is the population regression coefficient of independent variable x5.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x5.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x5.

According to the given ANOVA table the value of t test statistic corresponding to x5 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the bosses (x5) is 0.864.

Hence, p-value(=0.864)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x5.

As there are no significant relationship between the dependent variable and the independent variables x1 and x5, it is better to eliminate these variables.

Hence, it can be said that there is no significant relationship between the annual sales and the number of retail outlets and the number of supervisors. Thus, it is better to omit these independent random variables “the number of retail outlets” and “the number of supervisors”.

Moreover, there is no significant relationship between the dependent variable and the independent variable x4, it is better to eliminate this variable.

Hence, it can be said that there is no significant relationship between the annual sales and the average age of automobiles. Thus, it is better to omit this independent random variable “average age of automobiles” also.

e.

Expert Solution

To determine

Find the coefficient of determination.

Find the change of R2 from the previous analysis.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output after eliminating “outlets” and “bosses”, the SSR and SS total are 1,593.66 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.661,602.89=0.9942.

Thus, the approximate value of coefficient of multiple determination is 99.42%.

Hence, there is only 0.01%(=99.43−99.42) change of R2 from the previous analysis.

f.

Expert Solution

To determine

Explain whether the normality assumptions appear reasonably.

Explanation of Solution

Assumption of normality from histogram:

The majority of the observation in the middle and centered on the mean of 0.
There are lower frequencies on the tails of the distributions.

According to the given histogram, the most of the observations are centered on the mean of 0 and there are less frequencies on the tails of the distributions.

Hence, the normality assumptions appear reasonably.

g.

Expert Solution

To determine

Explain about the residual plot and also explain whether any assumptions are violated.

Explanation of Solution

Assumption for residual analysis for the regression model:

The plot of the residuals vs. the observed values of the predictor variable should fall roughly in a horizontal band and symmetric about x-axis.
For a normal probability plot, residuals should be roughly linear.
There should not be any observable pattern.

According to the given residual plot, the points are roughly in a horizontal band and more or less symmetric about x-axis. Moreover, there is no particular pattern in the residual plot. A complete haphazard and random nature has observed.

Hence, the assumptions of the residual plot are not violated.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Answer 4

Textbook Question

Chapter 14, Problem 18CE

Suppose that the sales manager of a large automotive parts distributor wants to estimate the total annual sales for each of the company’s regions. Five factors appear to be related to regional sales: the number of retail outlets in the region, the number of automobiles in the region registered as of April 1, the total personal income recorded in the first quarter of the year, the average age of the automobiles (years), and the number of sales supervisors in the region. The data for each region were gathered for last year. For example, see the following table. In region 1 there were 1,739 retail outlets stocking the company’s automotive parts, there were 9,270,000 registered automobiles in the region as of April 1, and so on. The region’s sales for that year were $37,702,000.

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 1

a. Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables outlets and income and between outlets and number of automobiles are fairly strong. Could this be a problem? What is this condition called?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 2

b. The output for all five variables is shown below. What percent of the variation is explained by the regression equation?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 3

c. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level.
d. Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating “outlets” and “bosses”? Use the .05 significance level.
e. The regression has been rerun below with “outlets” and “bosses” eliminated. Compute the coefficient of determination. How much has R² changed from the previous analysis?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 4

f. Following is a histogram of the residuals. Does the normality assumption appear reasonable? Why?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 5

g. Following is a plot of the fitted values of y (i.e., y ^ ) and the residuals. What do you observe? Do you see any violations of the assumptions?

Chapter 14, Problem 18CE, Suppose that the sales manager of a large automotive parts distributor wants to estimate the total , example 6

a.

Expert Solution

To determine

Find the single variable that has the strongest correlation with the dependent variable.

Explain whether the fairly strong correlations between outlets and income and outlets and number of automobiles, will be any problem.

Provide the name of the condition.

Answer to Problem 18CE

The single variable that has the strongest correlation with the dependent variable, is “income”.

The name of the condition is multicollinearity.

Explanation of Solution

Multiple linear regression model:

A multiple linear regression model is given as y^=a+b1x1+b2x2+b3x3+...+bkxk where y is the response or dependent variable, and x1,x2,...,xk are the k quantitative independent variables where k is a positive integer.

Here, a is the intercept term of the regression model, that is, the value of predicted value of y when X’s are 0 and bi’s are the slopes, that is, the amount of change of the predicted value of y for one unit increase in xi when all other independent variables are constant.

In the given problem the predicted dependent variable y is the annual sales. The number of retail outlets, the number of automobiles registered, personal income, the average of automobiles and the number of supervisors, are defined as x1,x2,x3,x4 and x5, respectively.

Correlation:

The correlation between two variables measures the linear relationship between those two variables.

According to the given output there is a strongest correlation between the independent variable “income” and the dependent variable “sales”. The correlation coefficient between “income” and “sales” is 0.964.

Thus, it implies that as the personal income increases the annual sales also increase.

Multicollinearity:

In a multiple regression model, when there is high correlation between two or more independent variables, then multicollinearity occurs.

The correlation between the independent variables outlets and income and between outlets and number of automobiles are fairly strong, such as, 0.825 and 0.775, respectively.

These correlations can occur multicollinearity in the regression model.

Due to this multicollinearity the standard errors will be high and there will be no exact estimate of the partial regression coefficient. Moreover, there will be difficulty to measure the relative significance of independent variables.

b.

Expert Solution

To determine

Find the percent of the variation that is explained by the regression equation.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output the SSR and SS total are 1,593.91 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.811,602.89=0.9943.

Thus, the approximate value of coefficient of multiple determination is 99.43%.

Hence, 99.43% of the variation is explained by the regression equation.

c.

Expert Solution

To determine

Perform a global hypothesis test to check whether any of the regression coefficients is not zero at 0.05 significance level.

Answer to Problem 18CE

There is strong evidence that at least any of the regression coefficient is not 0 at 0.05 significance level.

Explanation of Solution

Calculation:

Consider that y is dependent variable and xi's are the independent variables where βi's are the corresponding population regression coefficient for all i=1,2,3,4,5.

State the hypotheses:

Null hypothesis:

H0:β1=β2=β3=β4=β5=0.

That is, the model is not significant.

Alternative hypothesis:

H1:At least one βi is not equal to 0.

That is, the model is significant.

In case of global test the F test statistic is defined as,

F=SSRkSSEn−k−1, where SSR, SSE, n and k are the regression sum of square, error sum of square, sample size and the number of independent variables.

According to the output, the value of F statistic is 140.36 with numerator degrees of freedom 5 and denominator degrees of freedom 5.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the global test is 0.

Hence, p-value(=0)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that at least any of the regression coefficient is not 0 at 0.05 significance level.

d.

Expert Solution

To determine

Perform individual tests of each independent variable at 0.05 significance level.

Explain whether the independent variables “outlets” and “bosses” will be eliminated.

Answer to Problem 18CE

There is no significant relation between y and x1,x4 and x5, whereas there is significant relation between y and x2, and x3.

The independent random variables “the number of retail outlets”, “average age of automobiles” and “number of supervisors” can be eliminated.

Explanation of Solution

Calculation:

For independent variable x1:

Consider that β1 is the population regression coefficient of independent variable x1.

State the hypotheses:

Null hypothesis:

H0:β1=0.

That is, there is no significant relationship between y and x1.

Alternative hypothesis:

H1:β1≠0.

That is, there is significant relationship between y and x1.

In case of individual regression coefficient test the t test statistic is defined as,

t=bisbi, where bi and sbi are the i^th regression coefficient and the standard deviation of the i^th regression coefficient.

According to the given information the t statistic value corresponding to x1 is –0.24 with 4 degrees of freedom.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the outlets (x1) is 0.823

Hence, p-value(=0.823)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x1.

For independent variable x2:

Consider that β2 is the population regression coefficient of independent variable x2.

State the hypotheses:

Null hypothesis:

H0:β2=0.

That is, there is no significant relationship between y and x2.

Alternative hypothesis:

H1:β2≠0.

That is, there is significant relationship between y and x2.

According to the given ANOVA table the value of t test statistic corresponding to x2 is 3.15 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the automobiles (x2) is 0.035.

Hence, p-value(=0.035)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x2.

For independent variable x3:

Consider that β3 is the population regression coefficient of independent variable x3.

State the hypotheses:

Null hypothesis:

H0:β3=0.

That is, there is no significant relationship between y and x3.

Alternative hypothesis:

H1:β3≠0.

That is, there is significant relationship between y and x3.

According to the given ANOVA table the value of t test statistic corresponding to x3 is 9.35 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the income (x3) is 0.001.

Hence, p-value(=0.001)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x3.

For independent variable x4:

Consider that β4 is the population regression coefficient of independent variable x4.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x4.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x4.

According to the given ANOVA table the value of t test statistic corresponding to x4 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the age (x4) is 0.081.

Hence, p-value(=0.081)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x4.

For independent variable x5:

Consider that β4 is the population regression coefficient of independent variable x5.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x5.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x5.

According to the given ANOVA table the value of t test statistic corresponding to x5 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the bosses (x5) is 0.864.

Hence, p-value(=0.864)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x5.

As there are no significant relationship between the dependent variable and the independent variables x1 and x5, it is better to eliminate these variables.

Hence, it can be said that there is no significant relationship between the annual sales and the number of retail outlets and the number of supervisors. Thus, it is better to omit these independent random variables “the number of retail outlets” and “the number of supervisors”.

Moreover, there is no significant relationship between the dependent variable and the independent variable x4, it is better to eliminate this variable.

Hence, it can be said that there is no significant relationship between the annual sales and the average age of automobiles. Thus, it is better to omit this independent random variable “average age of automobiles” also.

e.

Expert Solution

To determine

Find the coefficient of determination.

Find the change of R2 from the previous analysis.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output after eliminating “outlets” and “bosses”, the SSR and SS total are 1,593.66 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.661,602.89=0.9942.

Thus, the approximate value of coefficient of multiple determination is 99.42%.

Hence, there is only 0.01%(=99.43−99.42) change of R2 from the previous analysis.

f.

Expert Solution

To determine

Explain whether the normality assumptions appear reasonably.

Explanation of Solution

Assumption of normality from histogram:

The majority of the observation in the middle and centered on the mean of 0.
There are lower frequencies on the tails of the distributions.

According to the given histogram, the most of the observations are centered on the mean of 0 and there are less frequencies on the tails of the distributions.

Hence, the normality assumptions appear reasonably.

g.

Expert Solution

To determine

Explain about the residual plot and also explain whether any assumptions are violated.

Explanation of Solution

Assumption for residual analysis for the regression model:

The plot of the residuals vs. the observed values of the predictor variable should fall roughly in a horizontal band and symmetric about x-axis.
For a normal probability plot, residuals should be roughly linear.
There should not be any observable pattern.

According to the given residual plot, the points are roughly in a horizontal band and more or less symmetric about x-axis. Moreover, there is no particular pattern in the residual plot. A complete haphazard and random nature has observed.

Hence, the assumptions of the residual plot are not violated.

Want to see more full solutions like this?

Subscribe now to access step-by-step solutions to millions of textbook problems written by subject matter experts!

Answer 5

Textbook Question

Answer 6

Textbook Question

Answer 7

a.

Expert Solution

To determine

Find the single variable that has the strongest correlation with the dependent variable.

Explain whether the fairly strong correlations between outlets and income and outlets and number of automobiles, will be any problem.

Provide the name of the condition.

Answer to Problem 18CE

The single variable that has the strongest correlation with the dependent variable, is “income”.

The name of the condition is multicollinearity.

Explanation of Solution

Multiple linear regression model:

A multiple linear regression model is given as y^=a+b1x1+b2x2+b3x3+...+bkxk where y is the response or dependent variable, and x1,x2,...,xk are the k quantitative independent variables where k is a positive integer.

Here, a is the intercept term of the regression model, that is, the value of predicted value of y when X’s are 0 and bi’s are the slopes, that is, the amount of change of the predicted value of y for one unit increase in xi when all other independent variables are constant.

In the given problem the predicted dependent variable y is the annual sales. The number of retail outlets, the number of automobiles registered, personal income, the average of automobiles and the number of supervisors, are defined as x1,x2,x3,x4 and x5, respectively.

Correlation:

The correlation between two variables measures the linear relationship between those two variables.

According to the given output there is a strongest correlation between the independent variable “income” and the dependent variable “sales”. The correlation coefficient between “income” and “sales” is 0.964.

Thus, it implies that as the personal income increases the annual sales also increase.

Multicollinearity:

In a multiple regression model, when there is high correlation between two or more independent variables, then multicollinearity occurs.

The correlation between the independent variables outlets and income and between outlets and number of automobiles are fairly strong, such as, 0.825 and 0.775, respectively.

These correlations can occur multicollinearity in the regression model.

Due to this multicollinearity the standard errors will be high and there will be no exact estimate of the partial regression coefficient. Moreover, there will be difficulty to measure the relative significance of independent variables.

b.

Expert Solution

To determine

Find the percent of the variation that is explained by the regression equation.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output the SSR and SS total are 1,593.91 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.811,602.89=0.9943.

Thus, the approximate value of coefficient of multiple determination is 99.43%.

Hence, 99.43% of the variation is explained by the regression equation.

c.

Expert Solution

To determine

Perform a global hypothesis test to check whether any of the regression coefficients is not zero at 0.05 significance level.

Answer to Problem 18CE

There is strong evidence that at least any of the regression coefficient is not 0 at 0.05 significance level.

Explanation of Solution

Calculation:

Consider that y is dependent variable and xi's are the independent variables where βi's are the corresponding population regression coefficient for all i=1,2,3,4,5.

State the hypotheses:

Null hypothesis:

H0:β1=β2=β3=β4=β5=0.

That is, the model is not significant.

Alternative hypothesis:

H1:At least one βi is not equal to 0.

That is, the model is significant.

In case of global test the F test statistic is defined as,

F=SSRkSSEn−k−1, where SSR, SSE, n and k are the regression sum of square, error sum of square, sample size and the number of independent variables.

According to the output, the value of F statistic is 140.36 with numerator degrees of freedom 5 and denominator degrees of freedom 5.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the global test is 0.

Hence, p-value(=0)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that at least any of the regression coefficient is not 0 at 0.05 significance level.

d.

Expert Solution

To determine

Perform individual tests of each independent variable at 0.05 significance level.

Explain whether the independent variables “outlets” and “bosses” will be eliminated.

Answer to Problem 18CE

There is no significant relation between y and x1,x4 and x5, whereas there is significant relation between y and x2, and x3.

The independent random variables “the number of retail outlets”, “average age of automobiles” and “number of supervisors” can be eliminated.

Explanation of Solution

Calculation:

For independent variable x1:

Consider that β1 is the population regression coefficient of independent variable x1.

State the hypotheses:

Null hypothesis:

H0:β1=0.

That is, there is no significant relationship between y and x1.

Alternative hypothesis:

H1:β1≠0.

That is, there is significant relationship between y and x1.

In case of individual regression coefficient test the t test statistic is defined as,

t=bisbi, where bi and sbi are the i^th regression coefficient and the standard deviation of the i^th regression coefficient.

According to the given information the t statistic value corresponding to x1 is –0.24 with 4 degrees of freedom.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the outlets (x1) is 0.823

Hence, p-value(=0.823)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x1.

For independent variable x2:

Consider that β2 is the population regression coefficient of independent variable x2.

State the hypotheses:

Null hypothesis:

H0:β2=0.

That is, there is no significant relationship between y and x2.

Alternative hypothesis:

H1:β2≠0.

That is, there is significant relationship between y and x2.

According to the given ANOVA table the value of t test statistic corresponding to x2 is 3.15 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the automobiles (x2) is 0.035.

Hence, p-value(=0.035)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x2.

For independent variable x3:

Consider that β3 is the population regression coefficient of independent variable x3.

State the hypotheses:

Null hypothesis:

H0:β3=0.

That is, there is no significant relationship between y and x3.

Alternative hypothesis:

H1:β3≠0.

That is, there is significant relationship between y and x3.

According to the given ANOVA table the value of t test statistic corresponding to x3 is 9.35 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the income (x3) is 0.001.

Hence, p-value(=0.001)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x3.

For independent variable x4:

Consider that β4 is the population regression coefficient of independent variable x4.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x4.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x4.

According to the given ANOVA table the value of t test statistic corresponding to x4 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the age (x4) is 0.081.

Hence, p-value(=0.081)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x4.

For independent variable x5:

Consider that β4 is the population regression coefficient of independent variable x5.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x5.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x5.

According to the given ANOVA table the value of t test statistic corresponding to x5 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the bosses (x5) is 0.864.

Hence, p-value(=0.864)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x5.

As there are no significant relationship between the dependent variable and the independent variables x1 and x5, it is better to eliminate these variables.

Hence, it can be said that there is no significant relationship between the annual sales and the number of retail outlets and the number of supervisors. Thus, it is better to omit these independent random variables “the number of retail outlets” and “the number of supervisors”.

Moreover, there is no significant relationship between the dependent variable and the independent variable x4, it is better to eliminate this variable.

Hence, it can be said that there is no significant relationship between the annual sales and the average age of automobiles. Thus, it is better to omit this independent random variable “average age of automobiles” also.

e.

Expert Solution

To determine

Find the coefficient of determination.

Find the change of R2 from the previous analysis.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output after eliminating “outlets” and “bosses”, the SSR and SS total are 1,593.66 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.661,602.89=0.9942.

Thus, the approximate value of coefficient of multiple determination is 99.42%.

Hence, there is only 0.01%(=99.43−99.42) change of R2 from the previous analysis.

f.

Expert Solution

To determine

Explain whether the normality assumptions appear reasonably.

Explanation of Solution

Assumption of normality from histogram:

The majority of the observation in the middle and centered on the mean of 0.
There are lower frequencies on the tails of the distributions.

According to the given histogram, the most of the observations are centered on the mean of 0 and there are less frequencies on the tails of the distributions.

Hence, the normality assumptions appear reasonably.

g.

Expert Solution

To determine

Explain about the residual plot and also explain whether any assumptions are violated.

Explanation of Solution

Assumption for residual analysis for the regression model:

The plot of the residuals vs. the observed values of the predictor variable should fall roughly in a horizontal band and symmetric about x-axis.
For a normal probability plot, residuals should be roughly linear.
There should not be any observable pattern.

According to the given residual plot, the points are roughly in a horizontal band and more or less symmetric about x-axis. Moreover, there is no particular pattern in the residual plot. A complete haphazard and random nature has observed.

Hence, the assumptions of the residual plot are not violated.

Answer 8

a.

Expert Solution

To determine

Find the single variable that has the strongest correlation with the dependent variable.

Explain whether the fairly strong correlations between outlets and income and outlets and number of automobiles, will be any problem.

Provide the name of the condition.

Answer to Problem 18CE

The single variable that has the strongest correlation with the dependent variable, is “income”.

The name of the condition is multicollinearity.

Explanation of Solution

Multiple linear regression model:

A multiple linear regression model is given as y^=a+b1x1+b2x2+b3x3+...+bkxk where y is the response or dependent variable, and x1,x2,...,xk are the k quantitative independent variables where k is a positive integer.

Here, a is the intercept term of the regression model, that is, the value of predicted value of y when X’s are 0 and bi’s are the slopes, that is, the amount of change of the predicted value of y for one unit increase in xi when all other independent variables are constant.

In the given problem the predicted dependent variable y is the annual sales. The number of retail outlets, the number of automobiles registered, personal income, the average of automobiles and the number of supervisors, are defined as x1,x2,x3,x4 and x5, respectively.

Correlation:

The correlation between two variables measures the linear relationship between those two variables.

According to the given output there is a strongest correlation between the independent variable “income” and the dependent variable “sales”. The correlation coefficient between “income” and “sales” is 0.964.

Thus, it implies that as the personal income increases the annual sales also increase.

Multicollinearity:

In a multiple regression model, when there is high correlation between two or more independent variables, then multicollinearity occurs.

The correlation between the independent variables outlets and income and between outlets and number of automobiles are fairly strong, such as, 0.825 and 0.775, respectively.

These correlations can occur multicollinearity in the regression model.

Due to this multicollinearity the standard errors will be high and there will be no exact estimate of the partial regression coefficient. Moreover, there will be difficulty to measure the relative significance of independent variables.

Answer 9

a.

Expert Solution

Answer 10

a.

Expert Solution

Answer 11

Expert Solution

Answer 12

To determine

Find the single variable that has the strongest correlation with the dependent variable.

Explain whether the fairly strong correlations between outlets and income and outlets and number of automobiles, will be any problem.

Provide the name of the condition.

Answer 13

Answer to Problem 18CE

The single variable that has the strongest correlation with the dependent variable, is “income”.

The name of the condition is multicollinearity.

Answer 14

Explanation of Solution

Multiple linear regression model:

A multiple linear regression model is given as y^=a+b1x1+b2x2+b3x3+...+bkxk where y is the response or dependent variable, and x1,x2,...,xk are the k quantitative independent variables where k is a positive integer.

Here, a is the intercept term of the regression model, that is, the value of predicted value of y when X’s are 0 and bi’s are the slopes, that is, the amount of change of the predicted value of y for one unit increase in xi when all other independent variables are constant.

In the given problem the predicted dependent variable y is the annual sales. The number of retail outlets, the number of automobiles registered, personal income, the average of automobiles and the number of supervisors, are defined as x1,x2,x3,x4 and x5, respectively.

Correlation:

The correlation between two variables measures the linear relationship between those two variables.

According to the given output there is a strongest correlation between the independent variable “income” and the dependent variable “sales”. The correlation coefficient between “income” and “sales” is 0.964.

Thus, it implies that as the personal income increases the annual sales also increase.

Multicollinearity:

In a multiple regression model, when there is high correlation between two or more independent variables, then multicollinearity occurs.

The correlation between the independent variables outlets and income and between outlets and number of automobiles are fairly strong, such as, 0.825 and 0.775, respectively.

These correlations can occur multicollinearity in the regression model.

Due to this multicollinearity the standard errors will be high and there will be no exact estimate of the partial regression coefficient. Moreover, there will be difficulty to measure the relative significance of independent variables.

Answer 15

b.

Expert Solution

To determine

Find the percent of the variation that is explained by the regression equation.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output the SSR and SS total are 1,593.91 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.811,602.89=0.9943.

Thus, the approximate value of coefficient of multiple determination is 99.43%.

Hence, 99.43% of the variation is explained by the regression equation.

Answer 16

b.

Expert Solution

Answer 17

b.

Expert Solution

Answer 18

Expert Solution

Answer 19

To determine

Find the percent of the variation that is explained by the regression equation.

Answer 20

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Answer 21

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output the SSR and SS total are 1,593.91 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.811,602.89=0.9943.

Thus, the approximate value of coefficient of multiple determination is 99.43%.

Hence, 99.43% of the variation is explained by the regression equation.

Answer 22

c.

Expert Solution

To determine

Perform a global hypothesis test to check whether any of the regression coefficients is not zero at 0.05 significance level.

Answer to Problem 18CE

There is strong evidence that at least any of the regression coefficient is not 0 at 0.05 significance level.

Explanation of Solution

Calculation:

Consider that y is dependent variable and xi's are the independent variables where βi's are the corresponding population regression coefficient for all i=1,2,3,4,5.

State the hypotheses:

Null hypothesis:

H0:β1=β2=β3=β4=β5=0.

That is, the model is not significant.

Alternative hypothesis:

H1:At least one βi is not equal to 0.

That is, the model is significant.

In case of global test the F test statistic is defined as,

F=SSRkSSEn−k−1, where SSR, SSE, n and k are the regression sum of square, error sum of square, sample size and the number of independent variables.

According to the output, the value of F statistic is 140.36 with numerator degrees of freedom 5 and denominator degrees of freedom 5.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the global test is 0.

Hence, p-value(=0)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that at least any of the regression coefficient is not 0 at 0.05 significance level.

Answer 23

c.

Expert Solution

Answer 24

c.

Expert Solution

Answer 25

Expert Solution

Answer 26

To determine

Perform a global hypothesis test to check whether any of the regression coefficients is not zero at 0.05 significance level.

Answer 27

Answer to Problem 18CE

There is strong evidence that at least any of the regression coefficient is not 0 at 0.05 significance level.

Answer 28

Explanation of Solution

Calculation:

Consider that y is dependent variable and xi's are the independent variables where βi's are the corresponding population regression coefficient for all i=1,2,3,4,5.

State the hypotheses:

Null hypothesis:

H0:β1=β2=β3=β4=β5=0.

That is, the model is not significant.

Alternative hypothesis:

H1:At least one βi is not equal to 0.

That is, the model is significant.

In case of global test the F test statistic is defined as,

F=SSRkSSEn−k−1, where SSR, SSE, n and k are the regression sum of square, error sum of square, sample size and the number of independent variables.

According to the output, the value of F statistic is 140.36 with numerator degrees of freedom 5 and denominator degrees of freedom 5.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the global test is 0.

Hence, p-value(=0)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that at least any of the regression coefficient is not 0 at 0.05 significance level.

Answer 29

d.

Expert Solution

To determine

Perform individual tests of each independent variable at 0.05 significance level.

Explain whether the independent variables “outlets” and “bosses” will be eliminated.

Answer to Problem 18CE

There is no significant relation between y and x1,x4 and x5, whereas there is significant relation between y and x2, and x3.

The independent random variables “the number of retail outlets”, “average age of automobiles” and “number of supervisors” can be eliminated.

Explanation of Solution

Calculation:

For independent variable x1:

Consider that β1 is the population regression coefficient of independent variable x1.

State the hypotheses:

Null hypothesis:

H0:β1=0.

That is, there is no significant relationship between y and x1.

Alternative hypothesis:

H1:β1≠0.

That is, there is significant relationship between y and x1.

In case of individual regression coefficient test the t test statistic is defined as,

t=bisbi, where bi and sbi are the i^th regression coefficient and the standard deviation of the i^th regression coefficient.

According to the given information the t statistic value corresponding to x1 is –0.24 with 4 degrees of freedom.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the outlets (x1) is 0.823

Hence, p-value(=0.823)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x1.

For independent variable x2:

Consider that β2 is the population regression coefficient of independent variable x2.

State the hypotheses:

Null hypothesis:

H0:β2=0.

That is, there is no significant relationship between y and x2.

Alternative hypothesis:

H1:β2≠0.

That is, there is significant relationship between y and x2.

According to the given ANOVA table the value of t test statistic corresponding to x2 is 3.15 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the automobiles (x2) is 0.035.

Hence, p-value(=0.035)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x2.

For independent variable x3:

Consider that β3 is the population regression coefficient of independent variable x3.

State the hypotheses:

Null hypothesis:

H0:β3=0.

That is, there is no significant relationship between y and x3.

Alternative hypothesis:

H1:β3≠0.

That is, there is significant relationship between y and x3.

According to the given ANOVA table the value of t test statistic corresponding to x3 is 9.35 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the income (x3) is 0.001.

Hence, p-value(=0.001)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x3.

For independent variable x4:

Consider that β4 is the population regression coefficient of independent variable x4.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x4.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x4.

According to the given ANOVA table the value of t test statistic corresponding to x4 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the age (x4) is 0.081.

Hence, p-value(=0.081)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x4.

For independent variable x5:

Consider that β4 is the population regression coefficient of independent variable x5.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x5.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x5.

According to the given ANOVA table the value of t test statistic corresponding to x5 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the bosses (x5) is 0.864.

Hence, p-value(=0.864)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x5.

As there are no significant relationship between the dependent variable and the independent variables x1 and x5, it is better to eliminate these variables.

Hence, it can be said that there is no significant relationship between the annual sales and the number of retail outlets and the number of supervisors. Thus, it is better to omit these independent random variables “the number of retail outlets” and “the number of supervisors”.

Moreover, there is no significant relationship between the dependent variable and the independent variable x4, it is better to eliminate this variable.

Hence, it can be said that there is no significant relationship between the annual sales and the average age of automobiles. Thus, it is better to omit this independent random variable “average age of automobiles” also.

Answer 30

d.

Expert Solution

Answer 31

d.

Expert Solution

Answer 32

Expert Solution

Answer 33

To determine

Perform individual tests of each independent variable at 0.05 significance level.

Explain whether the independent variables “outlets” and “bosses” will be eliminated.

Answer 34

Answer to Problem 18CE

There is no significant relation between y and x1,x4 and x5, whereas there is significant relation between y and x2, and x3.

The independent random variables “the number of retail outlets”, “average age of automobiles” and “number of supervisors” can be eliminated.

Answer 35

Explanation of Solution

Calculation:

For independent variable x1:

Consider that β1 is the population regression coefficient of independent variable x1.

State the hypotheses:

Null hypothesis:

H0:β1=0.

That is, there is no significant relationship between y and x1.

Alternative hypothesis:

H1:β1≠0.

That is, there is significant relationship between y and x1.

In case of individual regression coefficient test the t test statistic is defined as,

t=bisbi, where bi and sbi are the i^th regression coefficient and the standard deviation of the i^th regression coefficient.

According to the given information the t statistic value corresponding to x1 is –0.24 with 4 degrees of freedom.

The level of significance is α=0.05.

Decision rule:

If p-value≤α, then reject the null hypothesis.
Otherwise failed to reject the null hypothesis.

Conclusion:

Here, p-value corresponding to the outlets (x1) is 0.823

Hence, p-value(=0.823)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x1.

For independent variable x2:

Consider that β2 is the population regression coefficient of independent variable x2.

State the hypotheses:

Null hypothesis:

H0:β2=0.

That is, there is no significant relationship between y and x2.

Alternative hypothesis:

H1:β2≠0.

That is, there is significant relationship between y and x2.

According to the given ANOVA table the value of t test statistic corresponding to x2 is 3.15 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the automobiles (x2) is 0.035.

Hence, p-value(=0.035)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x2.

For independent variable x3:

Consider that β3 is the population regression coefficient of independent variable x3.

State the hypotheses:

Null hypothesis:

H0:β3=0.

That is, there is no significant relationship between y and x3.

Alternative hypothesis:

H1:β3≠0.

That is, there is significant relationship between y and x3.

According to the given ANOVA table the value of t test statistic corresponding to x3 is 9.35 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the income (x3) is 0.001.

Hence, p-value(=0.001)<α(=0.05).

That is, the p-value is less than the level of significance.

Therefore, reject the null hypothesis.

Hence, it can be concluded that there is significant relationship between y and x3.

For independent variable x4:

Consider that β4 is the population regression coefficient of independent variable x4.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x4.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x4.

According to the given ANOVA table the value of t test statistic corresponding to x4 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the age (x4) is 0.081.

Hence, p-value(=0.081)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x4.

For independent variable x5:

Consider that β4 is the population regression coefficient of independent variable x5.

State the hypotheses:

Null hypothesis:

H0:β4=0.

That is, there is no significant relationship between y and x5.

Alternative hypothesis:

H1:β4≠0.

That is, there is significant relationship between y and x5.

According to the given ANOVA table the value of t test statistic corresponding to x5 is 2.32 with 4 degrees of freedom.

Conclusion:

Here, p-value corresponding to the bosses (x5) is 0.864.

Hence, p-value(=0.864)>α(=0.05).

That is, the p-value is greater than the level of significance.

Therefore, fail to reject the null hypothesis.

Hence, it can be concluded that there is no significant relationship between y and x5.

As there are no significant relationship between the dependent variable and the independent variables x1 and x5, it is better to eliminate these variables.

Hence, it can be said that there is no significant relationship between the annual sales and the number of retail outlets and the number of supervisors. Thus, it is better to omit these independent random variables “the number of retail outlets” and “the number of supervisors”.

Moreover, there is no significant relationship between the dependent variable and the independent variable x4, it is better to eliminate this variable.

Hence, it can be said that there is no significant relationship between the annual sales and the average age of automobiles. Thus, it is better to omit this independent random variable “average age of automobiles” also.

Answer 36

e.

Expert Solution

To determine

Find the coefficient of determination.

Find the change of R2 from the previous analysis.

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output after eliminating “outlets” and “bosses”, the SSR and SS total are 1,593.66 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.661,602.89=0.9942.

Thus, the approximate value of coefficient of multiple determination is 99.42%.

Hence, there is only 0.01%(=99.43−99.42) change of R2 from the previous analysis.

Answer 37

e.

Expert Solution

Answer 38

e.

Expert Solution

Answer 39

Expert Solution

Answer 40

To determine

Find the coefficient of determination.

Find the change of R2 from the previous analysis.

Answer 41

Answer to Problem 18CE

The approximate value of coefficient of multiple determination is 99.43%, that is, 99.43% of the variation is explained by the regression equation.

Answer 42

Explanation of Solution

Calculation:

According to an ANOVA table the coefficient of multiple determination is defined as,

R2=SSRSS total ,

Where SSR is the regression sum of squares and SS total is the total sum of square.

According to the output after eliminating “outlets” and “bosses”, the SSR and SS total are 1,593.66 and 1,602.89, respectively.

Hence, the coefficient of multiple determination is,

R2=1,593.661,602.89=0.9942.

Thus, the approximate value of coefficient of multiple determination is 99.42%.

Hence, there is only 0.01%(=99.43−99.42) change of R2 from the previous analysis.

Answer 43

f.

Expert Solution

To determine

Explain whether the normality assumptions appear reasonably.

Explanation of Solution

Assumption of normality from histogram:

The majority of the observation in the middle and centered on the mean of 0.
There are lower frequencies on the tails of the distributions.

According to the given histogram, the most of the observations are centered on the mean of 0 and there are less frequencies on the tails of the distributions.

Hence, the normality assumptions appear reasonably.

Answer 44

f.

Expert Solution

Answer 45

f.

Expert Solution

Answer 46

Expert Solution

Answer 47

To determine

Explain whether the normality assumptions appear reasonably.

Answer 48

Explanation of Solution

Assumption of normality from histogram:

The majority of the observation in the middle and centered on the mean of 0.
There are lower frequencies on the tails of the distributions.

According to the given histogram, the most of the observations are centered on the mean of 0 and there are less frequencies on the tails of the distributions.

Hence, the normality assumptions appear reasonably.

Answer 49

g.

Expert Solution

To determine

Explain about the residual plot and also explain whether any assumptions are violated.

Explanation of Solution

Assumption for residual analysis for the regression model:

The plot of the residuals vs. the observed values of the predictor variable should fall roughly in a horizontal band and symmetric about x-axis.
For a normal probability plot, residuals should be roughly linear.
There should not be any observable pattern.

According to the given residual plot, the points are roughly in a horizontal band and more or less symmetric about x-axis. Moreover, there is no particular pattern in the residual plot. A complete haphazard and random nature has observed.

Hence, the assumptions of the residual plot are not violated.

Answer 50

g.

Expert Solution

Answer 51

g.

Expert Solution

Answer 52

Expert Solution

Answer 53

To determine

Explain about the residual plot and also explain whether any assumptions are violated.

Answer 54

Explanation of Solution

Assumption for residual analysis for the regression model:

The plot of the residuals vs. the observed values of the predictor variable should fall roughly in a horizontal band and symmetric about x-axis.
For a normal probability plot, residuals should be roughly linear.
There should not be any observable pattern.

According to the given residual plot, the points are roughly in a horizontal band and more or less symmetric about x-axis. Moreover, there is no particular pattern in the residual plot. A complete haphazard and random nature has observed.

Hence, the assumptions of the residual plot are not violated.