Statistical Methods in Engineering: Midterm Problems Explained

IE 7280 Statistical Methods in Engineering Practice Midterm Problems Problem 1: Indicate whether each of the following statements is true or false. (1) The Pearson correlation coefficient is robust to outliers. (2) If y is usually less than x , the correlation coefficient between x and y will be negative. (3) The coefficient of determination (R square) gives the fraction of variation unexplained by the model. (4) The sequential sum of squares (Type I) always sum to the error sum of squares (SSE). (5) The unit of measurement for standardized regression coefficients is the standard deviation. (6) If X and Y have strong linear correlation, that indicates there exists the causal relation between X and Y . 1

Problem 2: Short answer questions. (1) Suppose that, given your domain knowledge about a certain problem, you expect diminishing returns in the effects of x on y, i.e., when x is small, a unit change in x is associated with a larger change in y than when x is larger. You do not expect negative returns (where the slope becomes negative), and both x and y take only positive values. What do you suggest doing before estimating a linear regression model? (2) Would we always prefer the multiple linear model with larger R 2 ? Explain why? Problem 3: A sample of 34 stores in a chain is selected for a test-market study of OmniPower. All the stores selected have approximately the same monthly sales volumes. Two inde- pendent variables are considered here — the price of an OmniPower bar as measured in cents and the monthly budget for in-store promotional expenditures measured in dollars. In-store promotional expenditures typically include signs and displays, in-store coupons, and free samples. The dependent variable is the number of OmniPower bars sold in a month. Regression output is given below. Sum of Mean Source DF Squares Square F Value Pr > F Model ____ ________ ________ ________ 2.86E-10 Error ____ 12620947 ________ Total ____ 52093677 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 5837.52 628.15 9.29 1.79E-10 price 1 -53.22 6.85 -7.77 9.20E-09 promotion 1 3.61 0.69 5.27 9.82E-06 2

(1) Is the overall regression significant (.05 level). State the null and alternative hypotheses, the P-value, and your conclusion. (2) Complete the missing numbers in ANOVA table above. (3) Compute the value of R 2 . (4) Compute the standard error of the regression (RMSE). (5) State the estimated regression equation. (6) Does price have a significant effect on sales? State the null and alternative hy- potheses, the P -value, and your decision. (7) Construct a 95% confidence interval for the slope of price. 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

(8) How much sales do you expect when price is 59 and promotion is 200? You may assume that you are not extrapolating. Problem 4: We randomly collect data X 1 , X 2 , . . . , X n to study the income of all U.S. citizens. Suppose the underlying distribution has the unknown population mean, denoted by μ . Then, based on the sampling distribution of ¯ X , we build a two-sided confidence interval (CI) for the expected income μ and a two-sided prediction interval (PI) for an individual income X with a significant level α . That means ideally the coverage of CI and PI should be 1 - α . (1) Would the CI be a random or deterministic interval? How about the PI? [Hint: If we choose different samples of X 1 , X 2 , · · · , X n , would we get different intervals?] (2) As 1 - α increases, would the width of CI and PI be larger or smaller? (3) As the sample size n increases, would the width of CI decrease? What will happen to the CI as n goes to infinity? Would it shrink to zero or not? (4) As the sample size n increases, would the width of PI decrease? What will happen to the PI as n goes to infinity? Would it shrink to zero or not? 4

(5) We randomly collect n data and build a CI for with coverage (1- α ). Define a new variable Y . We let Y equal to 1 if the interval covers μ and equal to 0 otherwise. What is the distribution for Y ? What are E [ Y ] and Var[ Y ]? (6) If we know the underlying distribution for X follows normal distribution N ( μ ; σ 2 ), with μ and σ 2 unknown. Given data X 1 ; X 2 ; · · · ; X n , we have the sample mean ¯ X and sample standard deviation S . We could build two confidence intervals for μ : CI 1 = ¯ X - z α/ 2 S √ n , ¯ X + z α/ 2 S √ n , and CI 2 = ¯ X - t n - 1 ,α/ 2 S √ n , ¯ X + t n - 1 ,α/ 2 S √ n . What is the expected coverage of CI 1 and CI 2 ? (They are greater, equal or smaller than (1 - α )? (7) Suppose we observe a sample of n = 25 and have the sample mean ¯ x = $4000 and the sample standard deviation s = $500. Calculate a 95% CI for mean income and a PI for a single individual income. Problem 5: Consider regressing overall satisfaction with a health plan (overall) on satisfaction with the medical care (medcare) and satisfaction with the cost (cost). Suppose that we have random sample of members from a particular health-care provider. The sample of observations is large and all variables are measured on 5-point scales. (a) The correlation between medcare and cost is 0.65 and the estimated regression question is overall = 0 . 53 + 0 . 40 medcare + 0 . 31 cost. Suppose we dropped medcare from the model, regressing overall on cost alone; would the slope from this model be larger than, less than, or equal to the slopes 5

from the two-variable model (0.31)? Or can we not say what will happen without more information? Explain. The healthcare provider offers three types of plans: health maintenance organizations (HMO), preferred provider organizations (PPO), and point-of-service products (POS). The organization wants to know if different types of satisfaction “drive” overall satis- faction for different types of plans. Dummy variables were added for POS and PPO products. For example, PPO equals 1 when the plan type is a PPO and equals 0 when the plan is POS or HMO. If y is overall satisfaction, x 1 is satisfaction with medical care, x 2 is satisfaction with cost, x 3 the POS dummy and x 4 the PPO dummy, the model being estimated is y = β 0 + x 1 β 1 + x 2 β 2 + x 3 β 3 + x 4 β 4 + x 1 x 3 β 13 + x 1 x 4 β 14 + x 2 x 3 β 23 + x 2 x 4 β 24 + e. Use these parameter names in the statement of hypotheses below, e.g., β 1 is the main effect for medical care. 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

( a ) In the drop1 output, what does the medcare:type row tell you? State the null and alternative using the parameters defined above, the P-value and your decision. Also write one sentence in English (i.e., so that a non-technical person could understand) summarizing what this tells you. ( b ) What is the estimated regression equation for HMOs? ( c ) What is the estimated regression equation for PPOs? ( d ) What is the estimated regression equation for POSs? ( e ) Test whether the slope for satisfaction with cost for HMOs equals the slope for cost for PPOs. State the null and alternative hypotheses, the P-value, and your decision. ( f ) How can you test whether the slope for satisfaction with cost for POSs equals the slope for cost for PPOs? State the null and alternative and estimate the difference between the two slopes (the P-value is not easy to obtain from this output. For you to think about but not turn in: how could you get the P-value in R?). 7

(f) Briefly discuss the managerial implications of this analysis. 8

Sample Exam

Related Documents