_STAT3032_Exam3_F2022_Solution_Section001

pdf

School

University of Minnesota-Twin Cities *

*We aren’t endorsed by this school

Course

3032

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

8

Uploaded by KidResolveKudu34

Report
STAT 3032 Fall 2022 Exam 3 Time Limit: 50 min Name (Print): ______ Solution _______________ Student ID number: ______________________ _ Instructions: Do not begin or turn this page until you are instructed. This exam contains 8 pages (including this cover page) and is worth a total of 40 points. Check to see if any pages are missing. You may use one cheat sheet (size 8.5 by 11 inches) with notes on both sides. The cheat-sheet cannot be typed. You may bring a calculator. However, graphing calculators are not allowed. If you do not bring a calculator, you may write your final answer (for numerical calculations) in the calculator-ready form. For example, a final answer of is calculator-ready, but a 88 + 4 final answer of is not calculator-ready since the value of is not plugged in. 88 + ? 2 ? Enter the answers of the multiple choice questions in the box next to each question. Show all your work on each problem for full credit except multiple choice questions or those specified as having no need to explain. Honesty Statement and Pledge: I have not given or received any aid or assistance to or from any other student in this course during the exam period. Everything I have written on this exam represents my own work and knowledge.I sign this knowing that infringements on the University’s Academic Honesty policy may result in failure or expulsion. Signed by _____________________ Date __________________________ 1
Problem I: The Study of WholeFoods [18 pts] (Disclaimer: The data used in this problem are artificial.) Amazon is considering opening a new WholeFoods grocery store in the neighborhood. To assess the viability of this business plan, Amazon contracted a consulting firm to conduct an online survey among the residents. The survey responses were saved in the dataset “ WholeFoods” . The following information is collected: Interest Whether or not the respondent is interested in a new WholeFoods grocery store in the neighborhood: 1 = interested; 0 = not interested Budget The average monthly grocery budget of the respondent, measured in thousands of dollars. Store Where the respondent usually shops for groceries: “Target”, “CubFoods”, and “Costco” The consultant fitted Model A that uses the main effect of Budget and the main effect of Store to predict Interest . See below for the part of the summary of Model A from R: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.77 0.35 -10.8 < 2e-16 *** StoreCubFoods 2.92 0.20 14.6 < 2e-16 *** StoreTarget 0.40 0.21 1.9 0.057 . Budget 2.54 0.45 5.6 2.1e-08 *** --- Null deviance: 3946.9 on 2879 degrees of freedom Residual deviance: 3003.9 on 2876 degrees of freedom Q1 [2 pts]: What is the sample size of the dataset? In other words, how many residents answered the survey? Please explain how you obtained the answer. Solution: n = df of the residual deviance + number of model coefficients = 2876 + 4 = 2880 2
Q2 [2 pts]: Complete the R code that fits Model A in R. > modA = glm(Interest ~ 1 + Store + Budget, _________________________, data = WholeFoods) Solution: family = binomial , OR family = “binomial” Q3 [3 2 pts]: In Model A, the slope of StoreTarget is 0.40. Please interpret this slope (0.40) in context, with respect to the odds (rather than the log odds). Solution: ? 0.4 − 1 ≈ 0. 49 After controlling for the monthly grocery budget, compared to a person who usually shops at Costco, the odds (of having interest in the new WholeFoods store) for a person who usually shops at the Target grocery store increases by 49% on average. Jesse is a 20-year-old college student with an average grocery budget of $450 per month (which is equivalent to 0.45 thousand of dollars). He usually buys his groceries from the CubFoods store. Please answer Q4-5. Q4 [2 pts]: Based on Model A, what is the estimated probability that Jesse is interested in the new WholeFoods store? Please show your work. You can leave your final answer in the calculator-ready form. Solution: ???????? ^ = 1 1+???(3.77−2.29×????????𝐹????−0.4×????????𝑔??−2.54 ???𝑔??) Setting Budget = 0.45, StoreCubFoods = 1, and StoreTarget = 0, ???????? ^ = 1 1+???(3.77−2.99−2.54×0.45) ≈ 0. 573 Q5 [2 pts]: Now suppose that we have access to R. Please complete the R code that estimates the probability of Jesse being interested in the new WholeFoods store. > ____________(modA, newdata = data.frame(Store = 'CubFoods', Budget = 0.45), type = ____________) Solution: predict, “response” 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q6 [2 pts]: The consultant also fitted Model B that uses the main effect of Budget , the main effect of Store , and the interaction of Budget and Store to predict Interest . Partial summary of Model B from R is provided below. Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.71 1.32 -3.6 0.0003 *** StoreCubFoods 4.09 1.36 3.0 0.0027 ** StoreTarget 0.89 1.44 0.6 0.55 Budget 4.01 2.01 2.0 0.045 * StoreCubFoods:Budget -1.82 2.08 -0.88 0.38 StoreTarget:Budget -0.75 2.20 -0.34 0.73 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 3946.9 on 2879 degrees of freedom Residual deviance: 3002.3 on 2874 degrees of freedom Based on Model B, write down the equation of the fitted model for the residents that usually buy groceries from the Target store. Please let the left side of the equation be the estimated log odds (rather than the probability). Hint: the only predictor variable in your final equation should be Budget . Solution: Setting the dummy variables StoreCubFoods=0 and StoreTarget =1 , we have the fitted model ??𝑔( ???????? ^ 1−???????? ^ ) = − 4. 71 + 0 + 0. 89 + 4. 01???𝑔?? + 0 − 0. 75 × 1 × ???𝑔?? = − 3. 82 + 3. 26 ???𝑔?? The consultant used the (Chi-squared) test to compare Model A and Model B. Please answer χ 2 Q7-9. Q7 [2 pts]: What are the null hypothesis and the alternative hypothesis of this Chi-squared test? Solution: : Model A (Interest ~ 1 + Budget + Store) ? ? : Model B (Interest ~ 1 + Budget + Store + Budget:Store) ? ? Alternatively: Let and be the coefficients of the interaction terms of Budget and Store. β 4 β 5 ? ? : β 4 = β 5 = 0 : At least one of and is not zero. ? ? β 4 β 5 4
Q8 [2 pts]: What is the value of the test statistic? Please show your work. You may use any information given in this problem so far, including the model summary of Model A and Model B. Solution: The test statistic = the residual deviance of modA- the residual deviance of modB = 3003.9-3002.3 = 1.6 Q9 [2 pts]: Based on your answer Q8, do you reject the null hypothesis? Please use the significance level 0.05 and utilize the following R output in your answer. Hint: compare the test statistic value to 4. > pchisq(4, df = 2, lower.tail = F) [1] 0.1353353 Solution: The test statistic under the null hypothesis has the Chi-squared distribution with df = 2. Since 1.6 < 4, the p-value is greater than 0.135. Therefore we know that the p-value is greater than 0.05. We will not reject the null hypothesis. Problem 2: The Study of a Theoretical Time Series Model [8 pts] The time series has the following population model , where {? ? } ? ? = − 2. 8 + ? ? 1. 5 ? ?−1 . Note, 0.64 is the variance of the white noise. ? ? 𝑖𝑖? 𝑁(0, 0. 64) Q10 [2 pts]: Based on the theoretical model, we know that has an MA model. What is the order of {? ? } this MA model? ______________. There is no need to explain. Solution: 1 Q11 [2 pts] : What is the (theoretical) variance of ? Please show your work. You can leave your final {? ? } answer in the calculator-ready form. Solution: Var(St) = 𝑉??(? ? − 1. 5? ?−1 ) = 𝑉??(? ? ) + (− 1. 5) 2 𝑉??(? ?−1 ) = 0. 64 + 2. 25 × 0. 64 = 2. 08 Q12 [2 pts] : What is the (theoretical) autocorrelation of at Lag 5? Please explain your answer. {? ? } Solution: Since this is an MA(1) model, the autocorrelation at any lag greater than 1 is 0. 5
Q13 [2 pts]: Which of the following statements is correct? Please select the best answer. A. is stationary. {? ? } B. is not stationary. {? ? } C. Cannot determine if is stationary. {? ? } Solution: A Problem 3: The Study of Web Traffic [14 pts] (Disclaimer: The data used in this problem are artificial.) A website called “Happy Statisticians” recorded the daily web traffic (how many people visited the website) in 2018 and 2019. Let’s denote the time series of the daily web traffic as . The time series {? ? } data is saved in “ WebTraffic” . Assume that is stationary. See below for the ACF plot and the {? ? } PACF plot of the time series WebTraffic . Q14 [2 pts]: Justify that an AR(3) model is appropriate for . {? ? } Solution: The ACF plot decays and the PACF plot cuts off after Lag 3. Therefore an AR(3) model is appropriate. Q15 [2 pts]: Complete the R code that fits the AR(3) model to . {? ? } > mod = arima(web$WebTraffic, order = _____________) Solution: c(3, 0, 0) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q16 [2 pts]: See below for the model output of the fitted AR(3) model. Coefficients: ar1 ar2 ar3 intercept 0.78 0.72 -0.61 198.05 s.e. 0.029 0.031 0.029 1.757 sigma^2 estimated as 27.89: log likelihood = -2252.31, aic = 4514.61 Write down the equation for the fitted model. You can use the “mean format” or the “intercept format”. Solution: The fitted model (using the mean format) is ? ^ ? − 198. 05 = 0. 78 × (? ?−1 − 198. 05) + 0. 72 × (? ?−2 − 198. 05) − 0. 61 × (? ?−3 − 198. 05) Q17 [2 pts]: The ACF plot of the residuals of the fitted AR(3) model (mod) is shown below. Are the residuals independent from each other? Please explain. Solution: Yes, the residuals are independent from each other, since all of the estimated autocorrelations beyond Lag 0 are approximately 0. 7
Q18 [2 pts]: What happens if we use the fitted AR(3) model to predict into the distant future (many days after the end of 2019)? Please select the best answer. A. The prediction will approach the estimated intercept of the model of . {? ? } B. The prediction will approach the estimated mean of . {? ? } C. The prediction will approach the estimated autocorrelation of at Lag 3. {? ? } D. None of the above answers are correct. Solution: B Q19 [2 pts]: Suppose that we define the time series of the cumulative web traffic since the beginning of 2018 as . In other words, is identical to . What model type does have? Please select the {? ? } {? ? } {∆? ? } {? ? } best answer. A. AR(2) model B. AR(3) model C. AR(4) model D. Cannot determine the model type of {? ? } Solution: C Q20 [2 pts]: Have you entered the answers of the multiple choice questions in the boxes? A. Yes, I have. B. No, I haven’t. But I will do it before I submit the exam. Solution: A or B 8