MKT EXAM 3 Quiz Examples

pdf

School

Michigan State University *

*We aren’t endorsed by this school

Course

317

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

4

Uploaded by ChefFlowerPanther37

Report
Quiz 8 Suppose you have a very large data set. The first three rows of the data set are given below: ID X Y 1 2 20 2 6 22 3 3 16 Suppose you use the entire data set (with all 500 rows of data), and obtain the model AN Y =10+2(X) What is the OBSERVED value for ID#1 (where the x-value is 2 and the y-value is 20)? Answer: 20v Suppose you have a very large data set. The values for ID numbers 4, 5, and 6 are given below: D X Y 4 2 19 5 8 25 6 2 15 Suppose you use the entire data set (with all 500 rows of data), and obtain the model ¥ =10+ 2(X) What is the RESIDUAL (prediction error) value for ID#4 (where the x-value is 2 and the y=value is 19)? Answer: 5v The "log" in this question refers to the natural log. Suppose we create a model in R using the model Im(Y ~ log(X)) We use the predict() command in R to compute the 95% confidence interval for prediction when X = 100 We obtain the output below. ‘fit ‘Iwr ‘upr 20 18 22 We conclude that when X = 100, we're 95% confident that the value of Y is equal to log(20) equal to 20 o between 18 and 22 between e”(18) and e”(22) between log(18) and log(22) equal to e*(20) Suppose we create a regression model. From that model, we compute the predicted values of Y. In the plot below, we are given dots on a scatter plot representing the predicted value of Y and the observed value of Y. The dots on this scatter plot appear to approximately follow the diagonal line y=x. LY=X : . oz.’:” 0 ot 2 °® 74 o 8 L af, 2%t Ol i ) N Predicted Y What can we conclude? The model does not fit the data / the model highly inaccurate at matching the trends in the data. v o The model fits the data / the model is generally accurate at matching the trends in the data. The "log" in this question refers to the natural log, and the "sqrt" below refers to the square root. Suppose we create a model in R using the model Im(sqrt(Y) ~ log(X)) We use the predict() command in R to compute the 95% confidence interval for prediction when X = 100 We obtain the output below. ‘fit ‘Iwr ‘upr 8 7 9 | We conclude that when X = 100, we're 95% confident that the value of Y is between between log(7) and log(9) between 7 and 9 between e*(7) and e”(9) v e 7"2and 972 sqrt(7) and sqrt(9) In modules 2 and 3, we created scatter plots with trend lines in Tableau, and we were able to assess if the "model fits the data" based on how closely the trend line matched the general pattern of the dots in the scatter plot. Why did we need to learn more complicated methods to assess model accuracy in this module? *Recall that a model "fits the data" when the model's trend line follows the same general pattern/trend in the data. The accuracy plots we learned in this module was extra information that are not necessary in assessing if a model fits the data. For any type of model, we can determine if the model fits the data based on the p-values of the model. Because the graphical methods in Module 2 and 3 only work when we have one quantitative x- variable; the plots we learned in this module can be used to assess model accuracy even for models that have many quantitative x-variables. AN ° The accuracy plots we learned in this module was extra information that are not necessary in assessing if a model fits the data. For any type of model, we can determine if the model fits the data based on the R-squared of the model. Suppose we create a model in R using the model Im(Y ~ X) We use the predict() command in R to compute the 95% confidence interval for prediction when X = 100 The "log" in this question refers to the natural log. Suppose we create a model in R using the model Im(log(Y) ~ X) We use the predict() command in R to compute the 95% confidence interval for prediction when X = 100 We obtain the output below. We obtain the output below. fit Iwr upr fit wr lupr | 10 6 14 2 1 s | What do we conclude? When X = 100, we're 95% confident that the value of Y is between 10 and 14. v o When X = 100, we're 95% confident that the value of Y is between 6 and 14. When X = 100, we're 95% confident that the value of Y will equal 10. When X = 100, we're 95% confident that the value of Y is between 10 and 14. We conclude that when X = 100, we're 95% confident that the value of Y is equal to log(2) equal to e~ (2) between 1 and 3 « o between e”(1) and e”(3) between log(1) and log(3) equal to 2
Which of the following is the "prediction errors from a model" observed values v o residuals predicted values fitted values Neo LU Lo - Min 10 Median 3Q Max -6.8610 -2.0033 0.0808 2.0726 6.9634 Coefficients: Estimate Standard Error |t value Pr(>|t]) (Intercept) 1.90793 0.55611 3.431 0.000882 **=* X 3.02168 0.09262 32.623 < 2e-16 *** Signif. codes: 0 '"***' (0.001 '**' 0.01 '*' 0.05 '." 0.1 " " 1 Residual standard error: 2.987 on 98 degrees of freedom Multiple R-squared: 0.9157, Adjusted R-squared: 0.9148 F-statistic: 1064 on 1 and 98 DF, p-value: < 2.2e-16 We can conclude that about half of the Y-values in the data set named Data are within of their predicted values from this model. ()2 % (0)3 0.9157 7 Suppose we have data from 20 stores: their revenue in November 2021 and their revenue in November November 2021 and November 2022 (after controlling for individual variability between stores). What method can we use? Independent t-test We are not allowed to use any form of t-test in this situation. « o Paired t-test 2022. We wish to determine if there has been a statistically significant change in revenue when comparing Quiz 9 average customer revenue is different when comparing version A vs. version B. What method can we use? We can not use any type of t-test in this situation. Paired t-test + o Independent t-test Suppose we have a large data set with information from an online store. The store is showing each customer a randomly chosen webpage layout: version A or version B. We would like to determine if the
Suppose we would like to predict the probability that a customer leaves a positive review on a company's social media page based on the location of store that they visited (location 1 or location 2). What method may we use? v o We can not use any type of t-test in this situation. Independent t-test Paired t-test When we have data with two quantitative variables, X and Y. Suppose the general trend changes direction (for example, increases and then decreases). Suppose we would like to determine if there is a relationship between X and Y, and if so, we wish to create a model that allows us to predict the expected value of Y based on X. What is a good method to use? ~ o First we can use an ANOVA to determine if there are differences in average Y based on X (ora binned variable created from X). If the p-value for that ANOVA is small, then we can use a polynomial to model the predicted Y based on X. First try to use a linear regression model. If the linear model is not accurate enough, try an exponential, logarithmic, or power law regression model. If a trend is increasing and then decreasing, it is impossible to create a model that is a reasonable accurate representation of the relationship between X and Y. Y = Sales. Independent variables: Location (North/South) and Segment (A/B/C). We compute a Factorial ANOVA, and the p-value for the interaction of Segment and Location is large: about 0.59. What do we conclude? All six groups (Segment A North, Segment A South, Segment B North, Segment B South, etc) have about the same average sales. = The impact of Segment on Sales is about the same in both locations. Segment and Location both impact Sales about the same amount. % ‘o Sales are increasing in one location and decreasing in the other location. Suppose we are comparing the average Sales between three segments: Segment A, Segment B, and Segment C. We compute an ANOVA, and the p-value is small; approximately 0.000003 What can we conclude? After controlling for variability within each segment, all three segments have about the same average sales. % o All three segments have different values for average sales; we can rank the segments from lowest average sales to highest average sales, and no two segments are about the same. = We're confident the average sales is different for at least two of the segments. You will need R for this question. For this question, you will use a data set named iris, which is pre-programmed into R. This data set consistJ of observations from 150 flowers, 50 from each of the three species in the data set. Please use the command below to view the data. View(iris) Compute an ANOVA that answers the question, "are there statistically significant differences in average petal length when comparing the three species?" What is the F value (F-statistic) for the ANOVA you calculated? Answer: 1,180 v
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
For this question, you will need R. Use the command below to view the mtcars data set. View(mtcars) Suppose we wish to calculate an equation for the predicted value of mpg based on hp. Compute the equation outlined below Predicted mpg = + *(hp) What is the value you obtained for the intercept of the equation above? Answer: 30.1 v Quiz 10