Sample Exam
pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
7280
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
Pages
8
Uploaded by DrWhale4087
IE
7280
Statistical
Methods
in
Engineering
Practice
Midterm
Problems
Problem 1: Indicate whether each of the following statements is true or false.
(1) The Pearson correlation coefficient is robust to outliers.
(2) If
y
is usually less than
x
, the correlation coefficient between
x
and
y
will be
negative.
(3)
The
coefficient
of
determination
(R square) gives
the
fraction
of
variation
unexplained
by
the
model.
(4) The sequential sum of squares (Type I) always sum to the error sum of squares
(SSE).
(5) The unit of measurement for standardized regression coefficients is the standard
deviation.
(6) If
X
and
Y
have strong linear correlation, that indicates there exists the causal
relation between
X
and
Y
.
1
Problem 2: Short answer questions.
(1) Suppose that, given your domain knowledge about a certain problem, you expect
diminishing returns in the effects of x on y, i.e., when x is small, a unit change in
x is associated with a larger change in y than when x is larger. You do not expect
negative returns (where the slope becomes negative), and both x and y take only
positive values. What do you suggest doing before estimating a linear regression
model?
(2) Would we always prefer the multiple linear model with larger
R
2
? Explain why?
Problem 3:
A sample of 34 stores in a chain is selected for a test-market study of OmniPower. All
the stores selected have approximately the same monthly sales volumes.
Two inde-
pendent variables are considered here — the price of an OmniPower bar as measured
in cents and the monthly budget for in-store promotional expenditures measured in
dollars. In-store promotional expenditures typically include signs and displays, in-store
coupons, and free samples. The dependent variable is the number of OmniPower bars
sold in a month. Regression output is given below.
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
____
________
________
________
2.86E-10
Error
____
12620947
________
Total
____
52093677
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
5837.52
628.15
9.29
1.79E-10
price
1
-53.22
6.85
-7.77
9.20E-09
promotion
1
3.61
0.69
5.27
9.82E-06
2
(1) Is the overall regression significant (.05 level).
State the null and alternative
hypotheses, the P-value, and your conclusion.
(2) Complete the missing numbers in ANOVA table above.
(3) Compute the value of
R
2
.
(4) Compute the standard error of the regression (RMSE).
(5) State the estimated regression equation.
(6) Does price have a significant effect on sales? State the null and alternative hy-
potheses, the
P
-value, and your decision.
(7) Construct a 95% confidence interval for the slope of price.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(8) How much sales do you expect when price is 59 and promotion is 200? You may
assume that you are not extrapolating.
Problem
4:
We
randomly
collect
data
X
1
, X
2
, .
.
.
, X
n
to
study
the
income
of
all
U.S.
citizens.
Suppose the underlying distribution has the unknown population mean, denoted by
μ
. Then, based on the sampling distribution of
¯
X
, we build a two-sided confidence
interval (CI) for the expected income
μ
and a two-sided prediction interval (PI) for an
individual income
X
with a significant level
α
. That means ideally the coverage of CI
and PI should be 1
-
α
.
(1) Would the CI be a random or deterministic interval? How about the PI? [Hint: If
we
choose
different
samples
of
X
1
, X
2
, ·
·
·
, X
n
,
would
we
get
different
intervals?]
(2) As 1
-
α
increases, would the width of CI and PI be larger or smaller?
(3) As the sample size
n
increases, would the width of CI decrease? What will happen
to the CI as
n
goes to infinity? Would it shrink to zero or not?
(4) As the sample size n increases, would the width of PI decrease? What will happen
to the PI as n goes to infinity? Would it shrink to zero or not?
4
(5) We randomly collect
n
data and build a CI for with coverage (1-
α
). Define a new
variable
Y
. We let Y equal to 1 if the interval covers
μ
and equal to 0 otherwise.
What is the distribution for
Y
? What are
E
[
Y
] and Var[
Y
]?
(6) If we know the underlying distribution for
X
follows normal distribution
N
(
μ
;
σ
2
),
with
μ
and
σ
2
unknown. Given data
X
1
;
X
2
;
· · ·
;
X
n
, we have the sample mean
¯
X
and sample standard deviation
S
. We could build two confidence intervals for
μ
:
CI
1
=
¯
X
-
z
α/
2
S
√
n
,
¯
X
+
z
α/
2
S
√
n
,
and
CI
2
=
¯
X
-
t
n
-
1
,α/
2
S
√
n
,
¯
X
+
t
n
-
1
,α/
2
S
√
n
.
What is the expected coverage of
CI
1
and
CI
2
?
(They are greater, equal or
smaller than (1
-
α
)?
(7) Suppose we observe a sample of
n
= 25 and have the sample mean ¯
x
= $4000 and
the sample standard deviation
s
= $500. Calculate a 95% CI for mean income
and a PI for a single individual income.
Problem 5:
Consider regressing overall satisfaction with a health plan (overall) on satisfaction
with the medical care (medcare) and satisfaction with the cost (cost). Suppose that
we have random sample of members from a particular health-care provider. The sample
of observations is large and all variables are measured on 5-point scales.
(a) The correlation between medcare and cost is 0.65 and the estimated regression
question is
overall
= 0
.
53 + 0
.
40
medcare
+ 0
.
31
cost.
Suppose we dropped medcare from the model, regressing overall on cost alone;
would the slope from this model be larger than, less than, or equal to the slopes
5
from the two-variable model (0.31)? Or can we not say what will happen without
more information? Explain.
The healthcare provider offers three types of plans: health maintenance organizations
(HMO), preferred provider organizations (PPO), and point-of-service products (POS).
The organization wants to know if different types of satisfaction “drive” overall satis-
faction for different types of plans. Dummy variables were added for POS and PPO
products. For example, PPO equals 1 when the plan type is a PPO and equals 0 when
the plan is POS or HMO. If
y
is overall satisfaction,
x
1
is satisfaction with medical
care,
x
2
is satisfaction with cost,
x
3
the POS dummy and
x
4
the PPO dummy, the
model being estimated is
y
=
β
0
+
x
1
β
1
+
x
2
β
2
+
x
3
β
3
+
x
4
β
4
+
x
1
x
3
β
13
+
x
1
x
4
β
14
+
x
2
x
3
β
23
+
x
2
x
4
β
24
+
e.
Use these parameter names in the statement of hypotheses below, e.g.,
β
1
is the main
effect for medical care.
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(
a
)
In
the
drop1
output,
what
does
the
medcare:type
row
tell
you?
State
the
null
and
alternative using the parameters defined above, the P-value and your decision.
Also write one sentence in English (i.e., so that a non-technical person could
understand) summarizing what this tells you.
(
b
)
What
is
the
estimated
regression
equation
for
HMOs?
(
c
)
What
is
the
estimated
regression
equation
for
PPOs?
(
d
)
What
is
the
estimated
regression
equation
for
POSs?
(
e
)
Test
whether
the
slope
for
satisfaction
with
cost
for
HMOs
equals
the
slope
for
cost for PPOs. State the null and alternative hypotheses, the P-value, and your
decision.
(
f
)
How
can
you
test
whether
the
slope
for
satisfaction
with
cost
for
POSs
equals
the
slope for cost for PPOs? State the null and alternative and estimate the difference
between the two slopes (the P-value is not easy to obtain from this output. For
you to think about but not turn in: how could you get the P-value in R?).
7
(f) Briefly discuss the managerial implications of this analysis.
8