Unit 6, Ch.13 & 14 Homework
docx
keyboard_arrow_up
School
Park University *
*We aren’t endorsed by this school
Course
MG315
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
12
Uploaded by ConstableHare1868
Unit 06 | Ch. 13 &14 HW
1.
What is the definition of the standard error of estimate?
a.
The dispersion (scatter) of observed values around the line of regression for a given X.
2.
Which of the following is NOT an example of correlation analysis?
a.
Hypothesis testing for equality of means.
3.
Which of the following is usually the first step in a correlation analysis?
a.
Making a scatter diagram.
4.
Which two of the following describe the
independent
variable in a relationship between two variables?
a.
It is used to predict the other variable.
b.
On a scatter diagram, it is the horizontal axis.
5.
What symbol is used for the Pearson correlation coefficient, which shows the strength of the relation between two variables?
a.
r
6.
In analyzing the strength of the relationship between two variables, what does the symbol
r
represent?
a.
The Pearson correlation coefficient.
7.
Which of the following symbolized the standard error of estimate?
a.
s
Y
⋅
X
8.
What is Correlation Analysis?
a.
A group of techniques to measure the relationship between two variables.
9.
Place the following steps in correlation analysis in the order that makes the most sense.
a.
Make a scatter diagram, Calculate a correlation coefficient, & Draw a least square
fit line.
10.
If two variables are correlated to each other, which two of the following are characteristics of the
dependent
variable?
a.
It is usually shown on the vertical axis of a scatter diagram.
b.
In a cause-and-effect relationship, it is the effect.
11.
What values can the correlation coefficient assume?
a.
-1≤ r ≤ 1
12.
A scatter diagram in which the points move from the bottom left to the upper right would be characterized by what type of correlation coefficient?
a.
Positive
13.
A study of finishing time versus standardized test scores found a correlation coefficient of
r = 0.13. How would you describe this relationship?
a.
A weak positive correlation.
14.
A study found a correlation of r = 0.68 between the size of someone's vocabulary and their income. What can you reasonably conclude?
a.
A third variable may be related to both vocabulary and income.
b.
Reason
: For example, level of education may be related to both vocabulary and income.
15.
Which of these statements correctly describes the values that can be assumed by the correlation coefficient, r?
a.
It can be any number from -1 to +1, inclusive.
16.
Which of the following is the correct null hypothesis for the test of a sample correlation?
a.
The population correlation is zero.
b.
Reason
: H
0
: ρ = 0, H
1
: ρ ≠ 0
Page | 1
Unit 06 | Ch. 13 &14 HW
17.
For two variables the correlation coefficient is found to be nearly equal to zero. How would you describe the relationship between the two variables?
a.
There is very little, if any, relationship between the variables.
18.
Which of the following is the test statistic for the correlation coefficient?
a.
19.
An experiment of study times versus test scores found a correlation coefficient of r = 0.49. How would you describe this relationship?
a.
A moderate positive correlation.
20.
What is the best definition of a regression equation?
a.
An equation that expresses the linear relation between two variables.
21.
A study found a correlation of r = 0.7 between watching Netflix and grade point average. What can you reasonably conclude?
a.
There may be a third variable involved in the relationship.
b.
Reason
: For example, smarter people have more time to watch movies.
22.
A line is drawn through the points on a scatter diagram. Which three of the following are not likely to be a least squares fit?
a.
Nearly all of the data points are below the line.
b.
The line passes through the largest and smallest data points.
c.
All of the data points are above the line.
23.
Which of the following is a valid null hypothesis for the test of significance of the correlation coefficient?
a.
H
0
: ρ = 0
24.
The formula for the test of significance of the sample correlation is: . Match the
variables to their description.
a.
t = t-distribution test statistic
b.
n – 2 = degrees of freedom
c.
r = sample correlation
d.
n = sample size
Page | 2
Unit 06 | Ch. 13 &14 HW
25.
The general form of the regression equation is written like this: Match the variables to their description.
a.
Ŷ= estimated Y value
b.
X = the independent variable
c.
a = the Y intercept
d.
b = slope of the line
26.
Which of the following is the equation for the slope of the regression line?
a.
27.
When a line is drawn on a scatter diagram using the least squares principal, what is the quantity that is minimized?
a.
The sum of the squared difference between the line and the data points.
28.
Which of these is the equation for the y-intercept of the regression line?
a.
29.
The equation for the y-intercept of the regression line is: . Drag and drop the descriptions against their corresponding variables.
a.
a =the y-intercept
b.
b = the slope of the regression line
c.
Y= mean of the dependent variable
d.
X= mean of the independent variable
30.
The general form of the regression equation is written like this: Why is the dependent variable written as
Ŷ
instead
of just Y?
a.
The hat is to emphasize that the equation estimates the Y-value for a given X.
31.
f X is the size (in square feet) of a home and Y is its sales price and the regression equation relating them is Ŷ = $92,000 + 86X, what is the predicted sales price of a home when X=0? Assume that all homes used to build the model were between 1,800 and 2,500 square feet.
a.
$92,000
b.
Reason
: X=0 means there is no home.
32.
The slope of the regression line is given by
. Match the variables to their description.
a.
b = the slope
b.
sy = standard deviation of sample Y values
c.
sx = standard deviation of sample X values
d.
r = sample correlation coefficient
Page | 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Unit 06 | Ch. 13 &14 HW
33.
Which of the following regression equations looks like it matches the scatter diagram below? a.
Ŷ
= 14.4 - 1.9X.
34.
A test for the slope of the regression line uses the hypotheses H
0
: β = 0, H
1
: β ≠ 0. What are we seeking to discover with this test?
a.
If the regression line has predictive power for the dependent variable.
35.
A test for the slope of the regression line uses the hypotheses H
0
: β = 0, H
1
: β ≠ 0. If we reject H
0
what have we demonstrated about the regression line?
a.
The regression line has some power to predict the value of the dependent variable.
36.
If X is the amount a grocer spends on advertising and Y is his gross sales and the regression equation relating them is
Ŷ = 23044 + 10.4X, what is his predicted income if he spends $4000 on advertising?
a.
$64,644
37.
The equation for the test for the slope of a regression line is: . Match the variables and their description for this equation. a.
t = the test statistic
b.
n – 2 = degrees of freedom for t
c.
b = d.
sb = standard error of the slope
38.
Which of the following regression equations looks like it matches the scatter diagram below?
a.
Ŷ = 3.0 + 1.6X.
Page | 4
Unit 06 | Ch. 13 &14 HW
39.
Which of the following tests gives the same result as a test of the regression line slope?
a.
The t-test for the correlation coefficient.
b.
Reason
: They are mathematically the same.
40.
Compare the test for the slope of the regression line and the test for the correlation coefficient.
a.
They are mathematically the same and give the same result.
41.
What is the equation for the standard error of estimate?
a.
42.
The equation for the standard error of an estimate is: . Match the variables to
their descriptions.
a.
s
y.x
= standard error of Y for given X
b.
Ŷ = estimated Y for a given X
c.
n – 2 = df: sample size minus 2
d.
y = an observed value of Y
43.
Which of the following is true regarding the standard error of the estimate?
a.
The smaller the standard error of the estimate, the closer the points are to a straight line.
44.
Which of the following statistical distributions is used for the test for the slope of the regression equation?
a.
student’s t-distribution
b.
Reason
: t = (b - O) / s
b
45.
What is the definition of the coefficient of determination?
a.
The proportion of the total variation in Y that is explained by the variation in X.
46.
What is the term that is used for the proportion of the total variation in Y that is explained
by the variation in X?
a.
The Coefficient of Determination
47.
How do you calculate the coefficient of determination?
a.
The coefficient of determination, R
2
, is the square of the correlation coefficient, r.
48.
What is the symbol for the coefficient of determination?
a.
R
2
, equal to r squared.
49.
In evaluating a regression equation, what does it mean if the standard error of estimate is
small
? Choose all that apply.
a.
The data is close to the regression line.
b.
The predicted Y will have small error.
50.
The correlation between wait time on a helpline and customer satisfaction was found to be r = -0.85. What percentage of variation in customer satisfaction can be predicted by wait time using a regression line?
a.
72%
b.
Reason
: R
2
=-0.85
2
=0.72
51.
How is the standard error of the estimate calculated from ANOVA information?
Page | 5
Unit 06 | Ch. 13 &14 HW
a.
52.
How is the coefficient of determination calculated from ANOVA information?
a.
53.
Which two of the following are statistics that regression analysis provides to evaluate the predictive ability of the regression equation?
a.
The standard error of the estimate.
b.
The coefficient of determination.
54.
In regression analysis it is assumed that for any given X the Y values are normally distributed (the "Normality" assumption). What else is assumed about these distributions?
Choose all that apply.
a.
Their means lie on the regression line.
b.
They are independent.
c.
They have equal standard deviations.
55.
The correlation between the weight of an automobile and its gas mileage was found to be r = 0.77. What percentage of variation in mileage can be predicted from a car's weight using a regression line?
a.
59%
b.
Reason
: R
2
=0.77
2
=0.59
56.
Assume you have obtained a regression model to predict the sales price of a house based on the house's square footage. If the standard error of the estimate was found to be $8,400, which of the following would be true?
a.
95% of your predictions would be within $16,800 of the actual value.
57.
Assume you have obtained a regression model to predict the sales revenue based on marketing expenditures. If the standard error of the estimate was found to be $12,200, which of the following would be true?
a.
68% of your predictions would be within $12,200 of the actual value.
58.
If the standard error of estimate for a regression line is large, what would you expect for the coefficient of determination?
a.
The coefficient of determination should be small.
59.
In order to properly apply regression analysis, what assumption must be made about the distribution of Y values?
a.
For each X value, the Y values are normally distributed.
60.
How is the "confidence interval" used as a part of regression analysis?
a.
It is used when predicting the mean
value of Y for a given X.
61.
How is the "prediction interval" used as a part of regression analysis?
a.
It is used when predicting a particular value
of Y for a given X.
62.
In regression analysis it is assumed that for any given X the Y values are normally distributed (the "Normality" assumption). What else is assumed about these distributions?
Choose all that apply.
a.
They are independent.
b.
They have equal standard deviations.
c.
Their means lie on the regression line.
63.
Which do we expect to be larger, the confidence interval or the prediction interval, and why?
Page | 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Unit 06 | Ch. 13 &14 HW
a.
The prediction interval, because it is more accurate to predict a mean than to predict a single value.
64.
What are we estimating when we use the "confidence interval" in conjunction with a regression line?
a.
The mean
of the distribution of Y for a given X.
65.
How can you transform a non-linear relationship to better use correlation analysis?
a.
Replace one or both variables with its log, square root, reciprocal, etc.
66.
How can you tell if the relationship between two variables is non-linear?
a.
You can easily see a non-linear relationship on a scatter diagram.
67.
What are we estimating when we use the "prediction interval" in conjunction with a regression line?
a.
The value
of Y for a given value of X.
68.
Why does it make sense that the prediction interval for Y would be wider than the confidence interval?
a.
The confidence interval is for the mean of Y, and the prediction interval is for a single value.
b.
Reason
: Although the values themselves are the same.
69.
How can we use correlation analysis to explore non-linear relationships?
a.
By transforming one or both variables, so the new relationship is linear.
70.
The distance an object falls is directly related to the time it takes to fall, yet the variables time and distance fallen have a low correlation coefficient. How can this be true?
a.
The variables have a non-linear relationship.
71.
What is the difference between simple linear regression and multiple regression?
a.
Simple linear regression has one independent variable and multiple regression has
two or more.
72.
How are the coefficients found for the multiple regression equation?
a.
By the least squares method using a statistical software package.
73.
The results of a multiple regression can be summarized in an ANOVA table. Drag and drop the descriptions against their corresponding terms from ANOVA in regression analysis.
a.
Regression =Explained variation of Y
b.
Residual =Unexplained (random) variation of Y
c.
Df = Degrees of freedom
d.
F = Ratio of the explained variance and the unexplained variance
74.
In an ANOVA table, how is the "Residual" related to the regression equation?
a.
It is the variation in the value of Y not explained by the regression.
75.
How do you interpret the "Standard Error" in a multiple regression output table?
a.
It is the typical "error" when the regression equation is used to predict Y.
b.
Reason
: I.e. the standard deviation of the distribution of Y – Ŷ.
76.
Which of the following is characteristic of multiple regression but not simple linear regression?
a.
Two or more independent variables.
77.
Which statement(s) correctly describe the Coefficient of Multiple Determination (R
2
)? Select all that apply.
a.
It can range from 0 to 1.
b.
It is the percent of explained variation.
c.
It explains the percent of variation explained by the regression.
d.
It cannot assume negative values.
Page | 7
Unit 06 | Ch. 13 &14 HW
78.
As more independent variables are added to a regression model, the coefficient of determination tends to increase. How is this bad?
a.
It can lead to adding variables with no predictive power.
79.
The general form of the multiple regression equation is: Ŷ = a + b
1
x
1
+ b
2
x
2
+ . . . + b
n
x
n
. Match the variables to their description.
a.
Ŷ = the predicted dependent variable
b.
a = the intercept
c.
b
n = a coefficient
d.
x
n = the independent variable
80.
What is the purpose of the Adjusted Coefficient of Determination?
a.
It adjusts R
2
to reflect the number of independent variables used.
81.
If a regression equation predicts a Y-value of 15 with a standard error of 5, what does this
mean?
a.
About 68% of the sample Y-values are between 10 and 20.
82.
What effect does increasing the number of independent variables in the regression have on the coefficient of determination?
a.
It makes it larger.
83.
What are the hypotheses for the Global test of the multiple regression model with three independent variables?
a.
H
0
: β
1
= β
2
= β
3
= 0, H
1
: Not all β's are 0
84.
What happens in a regression analysis if the number of independent variables is equal to the sample size?
a.
The coefficient of determination becomes 1.
85.
The global test of the regression model examines the ratio of two variances. Which of these is the correct description of the test statistic?
a.
86.
What distribution is used with the global test of the regression model to reject the null hypothesis?
a.
The F-distribution.
87.
Which one of the following entries from the output tables of a multiple regression model could be used to reject the null hypothesis of equal coefficients in the global Test?
a.
Significance F = 0.002
88.
Which entry in a multiple regression output table is used to draw the conclusion for the Global Test?
a.
"Significance F" in the ANOVA table.
89.
When you use the global test for the multiple regression model, what are you testing?
a.
It tests the null hypothesis that all population coefficients are zero.
90.
When testing individual coefficients of a multiple regression model, what is the sampling distribution?
a.
The t-distribution with n - (k + 1) degrees of freedom.
91.
What is the formula for the test statistic used to test individual coefficients of the multiple
regression equation?
a.
Page | 8
Unit 06 | Ch. 13 &14 HW
92.
The formula used in testing individual coefficients in the multiple regression equation is:
. Match the terms to their description.
a.
t = the t-distribution.
b.
bi = the coefficient of the i
th
independent variable.
c.
s
bi
= standard deviation of the distribution of b
i
.
d.
0 = the hypothetical population coefficient.
93.
What should you do if your regression analysis finds that b
1
and b
3
are significant but b
2
is not (t=-0.83, p = 0.15) and neither is b
4
(t=-0.25, p = 0.34).
a.
Drop X
4
and rerun the regression.
94.
Regression analysis makes several assumptions. Which of these best describes the "linearity assumption"?
a.
The relationship between the dependent and individual independent variables is a straight line.
95.
Describe the sampling distribution of the test statistic for testing regression coefficients.
a.
The t-distribution with n - (k + 1) degrees of freedom.
96.
If the linearity assumption is violated, what might you see in a residual plot? Select all that apply.
a.
Most of the residuals are positive.
b.
There are more negative values in one part of the range.
97.
What is the term used for the assumption that the variation around the regression line will
appear to be the same for the whole range of the residual plot?
a.
Homoscedasticity
98.
When you run a "stepwise regression" or "best subset regression", the software may work
"too hard" to find an equation that fits the quirks of your data set. What characteristics should you look for in the regression equation?
a.
It should be simple and logical.
b.
It should make sense, based on your knowledge of the connection among the variables.
99.
What are the two kinds of plots that allow us to visually evaluate the "linearity assumption"? Select all that apply.
a.
Residual Plot
b.
Scatter Diagram
100.
On a residual plot the points are close to zero on the left side but widely scattered on the right side. This indicates a possible violation of which multiple regression assumption?
a.
Homoscedasticity
101.
What would you expect to see in a residual plot if the linearity assumption is correct? Select all that apply.
a.
The positive and negative values are evenly spread across the whole range.
b.
The points are scattered and there is no obvious pattern.
102.
What is meant by "homoscedasticity" in regard to a multiple regression model?
a.
The variation around the regression line is the same for all values of the independent variables.
103.
What is the preferred procedure if more than one coefficient of the multiple regression equation is found to be not significant?
a.
Drop the independent variable with the smallest t and rerun the regression.
Page | 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Unit 06 | Ch. 13 &14 HW
104.
If the points on a scatter diagram seem to be best described by a curving line, which one of the regression assumptions might be violated?
a.
The linearity assumption.
105.
One of the assumptions of regression analysis is that the distribution of the Y values about the regression line is approximately normal. Which of these tools can you use to check this?
a.
The histogram of residuals
106.
The normal probability plot shows each residual plotted according to the percentile it represents in the set of residuals. What does it look like if the normality assumption is true?
a.
The points closely approximate a line with positive slope.
107.
What is the impact of correlated independent variables?
a.
Correlated independent variables make inferences about individual regression coefficients difficult.
108.
Your experience tells you that an independent variable is positively correlated to the dependent variable, but a multiple regression model gives it a negative coefficient. What could cause this?
a.
The model may have correlated independent variables.
109.
Most software packages provide a histogram of residuals as part of regression analysis. How would you use this?
a.
To visually evaluate the normality assumption
110.
Multicollinearity can have many adverse effects on a multiple regression equation. Which
of the following would it be?
a.
A variable known to be an important predictor has a non-significant coefficient.
b.
The value or sign of one or more coefficients violates common sense.
111.
One of the requirements of regression analysis is called the multicollinearity assumption. How is multicollinearity defined?
a.
Multicollinearity exists when independent variables are correlated.
112.
What is a normal probability plot? What is it used for? Select all that apply.
a.
It is used to check the normality assumption of regression.
b.
It plots the percentiles vs. the residuals.
113.
Which of the following are reasons to avoid correlation between independent variables (multicollinearity)? Select all that apply.
a.
It may lead to erroneous results in hypothesis tests of independent variables.
b.
It is difficult to make inferences about the individual regression coefficients.
114.
Multicollinearity can have many adverse effects on a multiple regression model. Which of these could be one of them?
a.
Removing a non-significant variable results in drastic changes in the values of the
remaining coefficients.
115.
The formula for the variance inflation factor is . If VIF>10, then multicollinearity is excessive. What is the meaning of R
j
2
?
a.
The coefficient of determination of a regression with X
j
as the dependent variable
against the other independent variables
116.
What level of correlation between two independent variables in a regression model generally
does not
cause multicollinearity problems.
a.
A correlation coefficient between -0.7 and +0.7. 117.
What does the independent observation assumption mean for the residuals plot?
a.
There is no pattern to the residuals. Page | 10
Unit 06 | Ch. 13 &14 HW
118.
Autocorrelation can be identified by examining which of the following types of plots?
a.
A scatter diagram of residuals versus fitted values. 119.
Nominal level variables can be used in regression analysis, in which case they are known as
qualitative
variables. Identify the qualitative variables from this list. Select all that apply.
a.
Whether or not a car has air conditioning.
b.
Right or left handedness
c.
Male or Female 120.
What is a "dummy variable" in the context of regression analysis?
a.
A qualitative variable that has been replaced by a number, usually 0 or 1. 121.
What level of correlation between two independent variables in a regression model generally will cause multicollinearity problems.
a.
A correlation coefficient less than -0.7.
b.
A correlation coefficient greater than +0.7. 122.
If the pattern of residuals seems to cluster around a line with mostly positive values on the left and mostly negative values on the right, what regression assumption is violated?
a.
The independent observations assumption. 123.
Data which is collected over time (time series data) often violates which of these regression assumptions?
a.
Independent Observations
124.
What is an "interaction term" in the context of regression analysis?
a.
A new variable created by multiplying two independent variables. 125.
Many situations can occur when studying interactions among variables. Which of these situations is
not
a valid example of an interaction?
a.
Interaction occurring as the sum of two variables.
b.
Reason
: The multiple regression model already represents the sum of variables.
126.
Many situations can occur when studying interactions among variables. Which of these situations are
valid examples of an interaction?
a.
An interaction among three variables.
b.
A nominal scale variable interacting with a ratio-scale variable.
127.
Suppose someone is building a model to predict the sales price of a house and they would
like to include a variable to indicate whether or not the house has a pool. Which of the following could be used to model that variable.
a.
Use 0 if there is no pool and 1 if there is one. 128.
What does autocorrelation mean in the context of multiple regression analysis?
a.
Successive residuals are correlated.
129.
In the population model Y = α + β
1
X
1
+ β
2
X
2
+β
3
X
1
X
2
what is the
interaction term
?
a.
X
1
X
2
130.
What is the term used to denote the process that builds a regression equation one independent variable at a time, starting with the one the most highly correlated and keeping only terms with significant coefficients?
a.
Stepwise regression 131.
Which of the following is
not
an advantage of stepwise regression models?
a.
The model selected always has the highest R
2
. 132.
Define step-wise regression.
a.
Method used to denote the process that builds a regression equation one independent variable at a time
Page | 11
Unit 06 | Ch. 13 &14 HW
133.
What is the main advantage of using stepwise regression?
a.
It is an efficient way to find a regression equation with only significant coefficients.
134.
Which method of building a regression model starts will all independent variables and removes one variable at a time until all remaining variables are significant?
a.
Backward elimination method 135.
Suppose the following variables are being considered to build a model that predicts the sales price of a home: size of home in square feet, age in years, number of bedrooms, number of bathrooms, and size of yard in square feet. How many possible regression models are considered using the best-subset method?
a.
31
b.
Reason
: 2
5
-1=31
136.
How does the backward elimination method build a regression model?
a.
It starts with all variables in the model and insignificant ones out one at a time. 137.
Suppose
k
independent variables are being considered for building a regression model. Which method builds the best 1-variable model, the best 2-variable, model, ..., the best
k
-
variable model?
a.
Best-subset method Page | 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help