In Class Final Study Guide MGSC2301_SP 23_answers (1)

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

2301

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

17

Uploaded by sophielai08

Report
Mills College at Northeastern University MGSC #2301: BUSINESS STATISTICS FINAL EXAM STUDY GUIDE Questions: Will be a combination of questions and problems. You are responsible for all assigned material. Suggestions for Study: Course Element Test Question How to Study Concepts / Terms Short questions that test general understanding of these terms. Please be concise and precise. The space allotted is indicative of the length of your answer. Please do not exceed allotted space. Please read chapters 10, 14, 15 in your textbook. Problems Problem that is similar to one of the problems in your homework assignments. Work the assigned problems until you fully understand how to do them. In-Class Material Lectures, class discussion. Review your reactions and notes. Formulas Do not take time to memorize formulas However, know which formula to use to solve a particular problem. Know when and how to apply a formula to solve a problem. Guidelines for Final Exam 1. For problems involving calculation, please show your work. If you do not get the right answer, but you worked the problem correctly, you may get partial credit. 2. In answering short questions, please be explicit, direct, and complete. Answers that are vague and incomplete will be marked down. The space allotted for the answer serves as a guideline to how much detail is expected. 3. The questions provided in the next few pages are indicative of the type of questions to expect in the midterm. They are by no means exhaustive. To get an exhaustive picture, please make sure you cover ALL assignments from chapters 10, 14, 15. Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
Problem 1 A bank with a branch located in a commercial district of a city has the business objective of developing an improved process for serving customers during the noon-to-1P.M. lunch period. Management decides to first study the waiting time in the current process. The waiting time is defined as the number of minutes that elapses from when the customer enters the line until they reach the teller window. Data collected from a random sample of 15 customers show a mean waiting time of 4.29 minutes. Assume the standard deviation of the population is known and is equal to 1.64 minutes. Suppose that another branch, located in a residential area, is also concerned with improving the process of serving customers in the noon-to-1P.M. lunch period. Data collected from a random sample of 15 customers in the residential area branch show a mean waiting time of 7.11 minutes. Assume the standard deviation of the population of that other branch is 2.08 minutes. Is there evidence of a difference in the mean waiting time between the two branches? Use α = 0.05. Clearly state your hypotheses and conclusions. H 0 : µ 1 µ 2 = 0. The true waiting time is the same at both branches. H a : µ 1 µ 2 0. The true waiting time at the commercial district branch is not equal to the true waiting time at the residential area branch. Test Statistic Z = ( 4.29 7.11 ) 0 ( 1.64 1.64 15 + 2.08 2.08 15 ) =− 4.12 Z crit = +/- 1.96 since upper tail area = 0.025 Decision Rule: Reject the null hypothesis if Z STAT < - 1.96 or Z STAT > 1.96 Conclusion: Z STAT = -4.12 so we reject the null hypothesis. There is sufficient evidence to conclude that the true mean waiting time at the commercial district branch is not equal to the true mean waiting time at the residential branch. Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide STAT STAT
Construct and interpret a 95% confidence interval estimate of the difference between the population means in the two branches. CI lower limit = (4.29 – 7.11) – 1.96 *sqrt((1.64*1.64/15)+(2.08*2.08/15)) = -2.82 – 1.96*0.684 = -4.16 minutes CI upper limit = (4.29 – 7.11) + 1.96 * sqrt((1.64*1.64/15)+(2.08*2.08/15)) = -2.82 + 1.96*0.684 = -1.48 minutes We are 95% confident that the true mean waiting time at the commercial district branch is shorter (than that of the residential branch) by a time value that is between 4.16 and 1.48 minutes. Problem 2 Below is a random sample of show sizes for 12 mothers and their daughters. At α=0.01, does this sample show that women’s shoe sizes have increased, i.e., can you claim that the true mean of daughters’ shoe sizes is bigger than that of mothers’ shoe sizes? Think carefully about the choice of the appropriate test, then clearly state hypotheses and conclusions. This is a paired t-test because Mother and Daughter share some genes. H 0 : μ d ≤ 0 . The true mean of daughters’ shoe sizes is at most equal to the true mean of mothers’ shoe sizes. H a : μ d > 0 . The true mean of daughters’ shoe sizes is bigger than the true mean of mothers’ shoe sizes. This is an upper tail test Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide Daughter Mother 8 7 8 7 7.5 7.5 8 8 9 8.5 9 8.5 8.5 7.5 9 7.5 9 6 8 7.5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Test statistic: First, compute the paired differences d . Daughter Mother d i 8 7 1 8 7 1 7.5 7.5 0 8 8 0 9 8.5 0.5 9 8.5 0.5 8.5 7.5 1 9 7.5 1.5 9 6 3 8 7.5 0.5 d = ( di ¿ / n = ¿ = (1+1+0+0+0.5+0.5+1+1.5+3+0.5)/10 = 0.9 S d = √((1-0.9) 2 +(1-0.9) 2 +(0-0.9) 2 +(0-0.9) 2 +(0.5-0.9) 2 +(0.5-0.9) 2 +(1-0.9) 2 +(1.5-0.9) 2 +(3- 0.9) 2 +(0.5-0.9) 2 ) /9 = √6.9/9 = 0.876 t STAT = 0.9 0 0.876 / 10 = 3.25 t crit = t α = 2.821 from t table with upper-tail area = 0.01 and deg. of freed. = 9 Decision Rule: Reject the null hypothesis if t stat >2.821 Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide STAT
Since t stat > t crit , we reject the null. There is sufficient evidence that the true mean of daughters’ shoe sizes is bigger than that of mothers’ shoe sizes Problem 3 In order to measure the level of satisfaction with Vail Resorts’ Web sites, the Vail Resorts marketing team periodically surveys a random sample of guests and asks them to rate their likelihood of recommending the Web site to a friend or a colleague. In 2017, from a random sample of 2,386 Vail ski mountain guests, there were 2,014 active promoters. In 2018, from a random sample of 2,309 Vail ski mountain guests, there were 2,048 active promoters. a) At the 0.01 level of significance, is the proportion of active promoters different in 2018 as compared to 2017? b) How much of a difference is there between both years? H 0 : p 1 = p 2 . The proportion of active promoters is the same in both years. H a : p 1 p 2 . The proportion of active promoters in 2018 is different from the proportion of active promoters in 2017. Test statistic: p 1 = 2048/2309 = .8870 p 2 = 2014/2386 = .8441 p = n 1 p 1 + n 2 p 2 n 1 + n 2 = 2309 0.887 + 2386 0.8441 2309 + 2386 = .8652 = 0.887 0.8441 ( 0.8652 )( 1 0.8652 )( 1 2309 + 1 2308 ) = 4.313 Z crit = 2.57 or 2.58 with upper tail area = 0.005 (look in Z Table for probability = 0.995) Decision Rule: Reject the null hypothesis if Z stat >2.57 or Z stat <-2.57 Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide STAT
We reject H 0 . There is sufficient evidence that the proportion of active promoters in 2018 is different from the proportion of active promoters in 2017. (Note: check n 1 p 1 ≥ 5 and n 1 (1-p 1 ) ≥ 5 ; verified since n 1 = 2039 and p 1 =0.8870 check n 2 p 2 ≥ 5 and n 2 (1-p 2 ) ≥ 5 ; verified since n 2 = 2386 and p 2 =0.8441) a) How much of a difference is there between both years? Here we need to compute a confidence interval. CI lower limit = CI upper limit = We are 99% confident that the proportion of active promoters has increased by a value that is between 1.74 and 6.84 percentage points. Problem 4 Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11. a. Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
400 500 600 700 800 900 1000 1100 1200 1300 1400 50 55 60 65 70 75 80 85 Price ($) Overall Score b. The scatter diagram indicates a positive linear relationship between x = price ($) and y = overall score. c. d. The slope of .0212 means that spending an additional $100 in price will increase the overall score by approximately two points. e. A prediction of the overall score is Problem 5 One of the most common questions of prospective house buyers pertains to the average cost of heating in dollars (Y). To provide its customers with information on that matter, a large real estate firm used the following 4 variables to predict heating costs: the daily minimum outside temperature in Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
degrees of Fahrenheit ( ), the amount of insulation in inches ( ), the number of windows in the house ( ), and the age of the furnace in years ( ). Given below are the EXCEL outputs based on a sample size of 20. ANOVA   df SS MS Regression 4 169503.4241 42375.86 Residual 15 40262.3259 2684.155 Total 19 209765.75        Coefficients Standard Error t Stat P-value Lower 90.0% Upper 90.0% Intercept 421.4277 77.8614 5.4125 7.2E-05 284.9327 557.9227 X 1 (Temperature) -4.5098 0.8129 -5.5476 5.58E-05 -5.9349 -3.0847 X 2 (Insulation) -14.9029 5.0508 0.0099 -23.7573 -6.0485 X 3 (Windows) 0.2151 4.8675 0.0442 0.9653 -8.3181 8.7484 X 4 (Furnace Age) 6.3780 4.1026 1.5546 0.1408 -0.8140 13.5702 1. Write the output of the model in terms of the equation line. ^ y = 421.43 – 4.51Temp -14.90 Insul + 0.22 Wind +6.38 Furn 2. Interpret the value of the estimated regression parameter all else equal, a 1 degree increase in the daily minimum outside temperature results in an estimated expected decrease in average heating costs by $4.51. 3. Compute the R 2 and adjusted R 2 of the model. How strong is your model? where n = 20 and p = 4 Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide R 2 = SSR SST = 169503.4241 209765.75 = 0.808 = 0.8342 R a 2 = 1 ( 1 0.808 ) ( 20 1 20 4 1 ) ¿ 1 −( 1 0.808 ) ( 19 15 ) = 0.757
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The model explains 80.8% of the sample variability of average heating costs; after correcting for the degrees of freedom, the model explains 75.7% of the sample variability of average heating costs. 4. Compute Fstat and Fcrit at the 0.05 level of significance. Is the model overall significant? Fstat = MSR / MSE = 42375.86/2684.155 = 15.787 Fcrit = F.INV.RT(0.05,4,15) = 3.056 Reject H 0 and conclude that the 4 independent variables taken as a group have significant linear effects on average heating costs. 5. Referring to the Excel output, compute the tStat for X 2 (Insulation). What is your decision and conclusion for the test of significance of the coefficient β 2 at the = 0.01 level of significance? tstat = -14.9029/5.0508 = -2.9506 Reject H 0 and conclude that the amount of insulation has a negative linear effect on average heating costs. 6. Referring to the Excel output, what is the 90% confidence interval for the expected change in average heating costs as a result of a 1 degree Fahrenheit change in the daily minimum outside temperature ? [ 5.94, 3.08] Multiple Choice Questions Are Japanese managers more motivated than American managers? A randomly selected group of each were administered the Sarnoff Survey of Attitudes Toward Life (SSATL), which measures motivation for upward mobility. The SSATL scores are summarized below. American Japanese Sample Size 211 100 Mean SSATL Score 65.75 79.83 Population Std. Dev. 11.07 6.41 1. Referring to the Table above, judging from the way the data were collected, which test would likely be most appropriate to employ? a) Paired t test b) Pooled-variance t test for the difference between two means c) Independent samples Z test for the difference between two means d) Related samples Z test for the mean difference 2. Referring to the Table above, give the null and alternative hypotheses to determine if the average SSATL score of Japanese managers differs from the average SSATL score of American managers. Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
a) b) c) d) 3. Referring to the Table above, assuming the independent samples procedure was used, calculate the value of the test statistic. a) b) c) d) 4. Referring to the table above, suppose that the test statistic is Z = 2.45. Find the p -value if we assume that the alternative hypothesis was a two-tailed test ( ). a) 0.0071 b) 0.0142 c) 0.4929 d) 0.9858 TABLE A A real estate company is interested in testing whether, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have. Assume that the two population variances are equal. A random sample of 100 families from Gotham and a random sample of 150 families in Metropolis yield the following data on length of residence in current homes. Gotham: G = 35 months, σ G 2 = 900 Metropolis: M = 50 months, σ M 2 = 1050 5. Referring to Table A, which of the following represents the relevant hypotheses tested by the real estate company? a) b) Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
c) d) 6. Referring to Table A, what is an unbiased point estimate for the mean of the sampling distribution of the difference between the 2 sample means? a) – 22 b) – 10 c) – 15 d) 0 7. Referring to Table A, suppose Zstat = -3.69. At = 0.10. Which of the following represents the result of the relevant hypothesis test? a) The alternative hypothesis is rejected. b) The null hypothesis is rejected. c) The null hypothesis is not rejected. d) Insufficient information exists on which to make a decision. 8. Referring to Table A, suppose = 0.10. Which of the following represents the correct conclusion? a) There is not enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have. b) There is enough evidence that, on average, families in Gotham have been living in their current homes for less time than families in Metropolis have. c) There is not enough evidence that, on average, families in Gotham have been living in their current homes for no less time than families in Metropolis have. d) There is enough evidence that, on average, families in Gotham have been living in their current homes for no less time than families in Metropolis have. 9. A few years ago, Pepsi invited consumers to take the “Pepsi Challenge.” Consumers were asked to decide which of two sodas, Coke or Pepsi, they preferred in a blind taste test. Pepsi was interested in determining what factors played a role in people’s taste preferences. One of the factors studied was the gender of the consumer. Below are the results of analyses comparing the taste preferences of men and women with the proportions depicting preference for Pepsi. Males: n = 109, p M = 0.422018 Females: n = 52, p F = 0.25 p M p F = 0.172018 Z = 2.11825 To determine if a difference exists in the taste preferences of men and women, give the correct alternative hypothesis that Pepsi would test. a) H 1 : b) H 1 : c) H 1 : 0 d) H 1 : = 0 10. Suppose that the two-tailed p -value for the “Pepsi Challenge” was 0.0734. State the proper conclusion. Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a) At = 0.05, there is sufficient evidence to indicate the proportion of males preferring Pepsi differs from the proportion of females preferring Pepsi. b) At = 0.10, there is sufficient evidence to indicate the proportion of males preferring Pepsi differs from the proportion of females preferring Pepsi. c) At = 0.05, there is sufficient evidence to indicate the proportion of males preferring Pepsi equals the proportion of females preferring Pepsi. d) At = 0.08, there is insufficient evidence to indicate the proportion of males preferring Pepsi differs from the proportion of females preferring Pepsi. 11. The Y-intercept ( b 0 ) represents the a) predicted value of Y when X = 0. b) change in estimated average Y per unit change in X . c) predicted value of Y . d) variation around the sample regression line. 12. The slope ( b 1 ) represents a) predicted value of Y when X = 0. b) the estimated average change in Y per unit change in X . c) the predicted value of Y . d) variation around the line of regression. 13. The least squares method minimizes which of the following? a) SSR b) SSE c) SST d) All of the above TABLE B A large national bank charges local companies for using their services. A bank official reported the results of a regression analysis designed to predict the bank’s charges ( Y ) -- measured in dollars per month -- for services rendered to local companies. One independent variable used to predict service charge to a company is the company’s sales revenue ( X ) -- measured in millions of dollars. Data for 21 companies who use the bank’s services were used to fit the model: The results of the simple linear regression are provided below. 14. Referring to Table B, interpret the estimate of , the Y -intercept of the line. a) All companies will be charged at least $2,700 by the bank. b) There is no practical interpretation since a sales revenue of $0 is a nonsensical value. c) About 95% of the observed service charges fall within $2,700 of the least squares line. d) For every $1 million increase in sales revenue, we expect a service charge to decrease $2,700. 15. Referring to Table B, interpret the p -value for testing whether exceeds 0. a) There is sufficient evidence (at the = 0.05) to conclude that sales revenue ( X ) is a useful linear predictor of service charge ( Y ). Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
b) There is insufficient evidence (at the = 0.10) to conclude that sales revenue ( X ) is a useful linear predictor of service charge ( Y ). c) Sales revenue ( X ) is a poor predictor of service charge ( Y ). d) For every $1 million increase in sales revenue, we expect a service charge to increase $0.034. 16. Referring to Table B, a 95% confidence interval for is (15, 30). Interpret the interval. a) We are 95% confident that the mean service charge will fall between $15 and $30 per month. b) We are 95% confident that the sales revenue ( X ) will increase between $15 and $30 million for every $1 increase in service charge ( Y ). c) We are 95% confident that average service charge ( Y ) will increase between $15 and $30 for every $1 million increase in sales revenue ( X ). d) At the = 0.05 level, there is no evidence of a linear relationship between service charge ( Y ) and sales revenue ( X ). 17. Based on the residual plot below, you will conclude that there might be a violation of which of the following assumptions. a) Linearity of the relationship b) Normality of errors c) Homoscedasticity (constant variance) d) Independence of errors 18. The residuals represent a) the difference between the actual Y values and the mean of Y . b) the difference between the actual Y values and the predicted Y values. c) the square root of the slope. d) the predicted value of Y for the average X value. 19. What do we mean when we say that a simple linear regression model is “statistically” useful? a) All the statistics computed from the sample make sense. b) The model is an excellent predictor of Y . Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide Footage Residual Plot -4000 -2000 0 2000 4000 6000 0 1,000 2,000 3,000 4,000 5,000 6,000 Footage Residuals
c) The model is “practically” useful for predicting Y . d) The model is a better predictor of Y than the sample mean, . 20. In a multiple regression problem involving two independent variables, if b 1 is computed to be +2.0, it means that a) the relationship between X 1 and Y is significant. b) the estimated average of Y increases by 2 units for each increase of 1 unit of X 1 , holding X 2 constant. c) the estimated average of Y increases by 2 units for each increase of 1 unit of X 1 , without regard to X 2 . d) the estimated average of Y is 2 when X 1 equals zero. 21. In a multiple regression model, the value of the coefficient of multiple determination a) has to fall between -1 and +1. b) has to fall between 0 and +1. c) has to fall between -1 and 0. d) can fall between any pair of real numbers. 22. The variation attributable to factors other than the relationship between the independent variables and the explained variable in a regression analysis is represented by a) regression sum of squares. b) error sum of squares. c) total sum of squares. d) regression mean squares. TABLE C An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below. SUMMARY OUTPUT Regression Statistics Multiple R 0.991 R Square 0.982 Adjusted R Square 0.976 Standard Error 0.299 Observations 10 ANOVA df SS MS F Signif F Regression 2 33.4163 16.7082 186.325 0.0001 Residual 7 0.6277 0.0897 Total 9 34.0440 Coeff StdError t Stat P-value Intercept – 0.0861 0.5674 – 0.152 0.8837 GDP 0.7654 0.0574 13.340 0.0001 Price – 0.0006 0.0028 – 0.219 0.8330 Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
23. Referring to Table C, the p -value for GDP is a) 0.05 b) 0.01 c) 0.001 d) None of the above. 24. Referring to Table C, the p -value for the aggregated price index is a) 0.05 b) 0.8330 c) 0.001 d) None of the above. 25. Referring to Table C, the p -value for the regression model as a whole is a) 0.05 b) 0.01 c) 0.0001 d) None of the above. 26. Referring to Table C, what is the predicted consumption level for an economy with GDP equal to $4 billion and an aggregate price index of 150? a) $1.39 billion b) $2.89 billion c) $4.75 billion d) $9.45 billion 27. Referring to Table C, one economy in the sample had an aggregate consumption level of $3 billion, a GDP of $3.5 billion, and an aggregate price level of 125. What is the residual for this data point? a) $2.52 billion b) $0.48 billion c) – $1.33 billion d) – $2.52 billion 28. Referring to Table C, to test for the significance of the coefficient on aggregate price index, the value of the relevant t -statistic is a) 2.365 b) 0.143 c) – 0.219 d) – 1.960 29. Referring to Table C, which of the following statements is supported by the analysis shown? a) There is sufficient evidence (at = 0.05) of a linear relationship between gross domestic product and consumption. b) There is insufficient evidence (at = 0.05) of a linear relationship between gross domestic product and consumption. c) There is sufficient evidence (at = 0.05) of a linear relationship between price and consumption. d) None of the above Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide
30. A dummy variable is used as an independent variable in a regression model when a) the variable involved is numerical. b) the variable involved is categorical. c) a curvilinear relationship is suspected. d) when 2 independent variables interact. 31. The critical F value with 5 numerator and 8 denominator degrees of freedom at α = 0.05 is: a) 3.69 b) 0.369 c) 6.39 d) 9.36 Carol Theokary for Mills College MGSC 2301-Final Exam Study Guide