STAT 252 Practice Final Exam Study Guide and Questions

1 University of Alberta Department of Mathematical and Statistical Sciences STAT 252 PRACTICE FINAL Instructor: Greg Wagner Student Name: ______________________________________ Signature: __________________________________ Instructions: (READ ALL INSTRUCTIONS CAREFULLY.) 1. This is a closed book exam. 2. You are permitted to use a NON-PROGRAMMABLE calculator, and the formula sheets and tables provided. 3. Please turn off your cellular phones or pagers. 4. You have 3 hours to complete the exam. 5. The exam is out of a total of 64 marks . 6. This exam has 14 pages (including this cover and all computer output tables). Please ensure that you have all pages. 7. Make sure your name and signature are on the front and your student ID number is at the top of page two. 8. For questions that state you should show all steps, be sure that you do this in order to obtain full credit. Conclusions must also be clearly stated. Your answers must have adequate justification. 9. For questions that state that you do not need to show all steps, read the question carefully and follow the exact instructions regarding what is required. 10. If you run out of space in the blank area provided, use the back of the page to complete your answers as needed and label such answers so that is clear which question they belong to. 11. Also use the reverse sides of the pages for all rough work. 12. When referring to “log”, I am always referring to the natural log. BEST WISHES!!

2 Student ID Number: ___________________ Question 1 (2 marks): What is an indicator or dummy variable? What is its application in regression? Question 2 (Two parts totaling 5 marks): A randomized experiment was conducted on washing hands using four different methods and determining subsequent bacterial counts. The output below is from an ANOVA F-test which resulted in rejecting the null hypothesis and concluding that there is a difference in the bacterial counts after using the four methods of washing hands. SUMMARY Groups Count Sum Average Variance Just water 8 936 117 969.1429 Alcohol (65%) 8 300 37.5 705.4286 Anti-bacterial soap 8 740 92.5 1760.857 Regular soap 8 848 106 2205.143 ANOVA Source of Variation SS df MS F P-value Between Groups 29882 3 9960.667 7.064 0.0011 Within Groups 39484 28 1410.143 Total 69366 31 (a) (3 marks): Using the Bonferroni method at the 94% confidence level, determine the individual comparison-wise error rate ( I  ), find the critical value (two-sided) from the appropriate statistical table and calculate the margin of error. You do not need to perform all the steps. [Note: You only have to calculate the margin of error once since the sample sizes are equal for all treatment groups.]

3 (b) (2 marks): Develop a linear combination (contrast) to test whether there is a difference between using alcohol versus the other three methods combined. However, you do not need to perform all steps of a hypothesis test; just develop the contrast and calculate the estimate of the contrast. Question 3 (Four parts totaling 16 marks): The average saturated fat consumption (in grams) and cholesterol level (in mg/100 ml of blood) of a random a sample of 8 men were recorded. The data obtained fit the assumptions of simple linear regression analysis. SPSS output obtained after analysis of the data is shown below, with some values missing from the tables. You may also need some of the following information: 52.625 x = , 189.250 y = and 1587.875 xx S = . Scatterplot (done with SPSS) Normal Probability Plot (done with SPSS) Residual Plot (done with Excel) -50 0 50 0 20 40 60 80 100 Residuals Fat consumption Fat consumption Residual Plot

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4 Model Summary b Model R R Square Adjusted R Square Std. Error of the Estimate 1 a. Predictors: (Constant), Fat_consumption b. Dependent Variable: Cholesterol_level ANOVA a Model Sum of Squares df Mean Square F Sig. 1 Regression 5089.335 Residual 502.165 Total 5591.500 a. Dependent Variable: Cholesterol_level b. Predictors: (Constant), Fat_consumption Coefficients a Model Unstandardized Coefficients t Sig. 99.0% Confidence Interval for B B Std. Error Lower Bound Upper Bound 1 (Constant) 95.036 12.507 0.000270 Fat_consumption 1.790 .230 0.000234 a. Dependent Variable: Cholesterol_level (a) (5 marks): Using a simple linear regression ANOVA test, at the1% significance level, test whether there is a relationship between saturated fat consumption and cholesterol level in men. In other words, test for the significance of the slope of the regression line. Perform ALL steps of the hypothesis test. Give both the exact P-value from the computer output and the P-value obtained from the F-table.

5 (b) (3 marks): Calculate the Pearson correlation coefficient for the relationship between saturated fat consumption and cholesterol level in men. You do not need to do all the steps of a hypothesis test; just state the correlation coefficient, the P-value (both the exact P-value from the computer output and the P-value obtained from the r-table) and your conclusion. (c) (4 marks): Since it is fairly common knowledge that high saturated fat consumption increases cholesterol level, perform a regression t-test, at the 1% significance level, to test the hypothesis that there is a positive relationship between saturated fat consumption and cholesterol level in men. Perform ALL steps of the hypothesis test. Give both the exact P-value from the computer output and also the P-value obtained from the t-table.

6 (d) (4 marks): Calculate a 95% confidence interval for the mean response of cholesterol level for men whose average fat consumption is 60 g/day. Question 4 (Two parts totaling 8 marks): An experiment was conducted to test the ultimate strength (in MPa’s) of random sampl es of three types of metals (steel, alloy and titanium) produced by two methods (Method 1 and Method 2). The following is incomplete SPSS output. [ Note: This is a balanced design where n = 42 and there are 7 replicates for each combination of the two factors.] Tests of Between-Subjects Effects Dependent Variable: Strength Source Type III Sum of Squares df Mean Square F Sig. Corrected Model Intercept 31037405.357 Method 48.214 Metal 30389.286 Method * Metal 10246.429 Error 53485.714 Total Corrected Total 94169.643 41 a. R Squared = .432 (Adjusted R Squared = .353)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

7 (a) (6 marks): Perform a hypothesis test, at the 1% significance level, to determine whether the overall model is significant.

8 (b) (2 marks): The table below shows the results of multiple comparisons for the difference in strength between the three types of metals (Steel, Alloy and Titanium) both separately for Methods 1 and 2 and overall for the two methods combined. Firstly, construct a means comparison diagram summarizing the results of multiple comparisons for the difference between the three types of metals, regardless of the method used (that is, based on the means for the totals). Secondly, write a conclusion in words about what the multiple comparisons show. Descriptive Statistics Dependent Variable: Strength Method Metal Mean Std. Deviation N Method 1 Alloy 820.00 40.415 7 Steel 864.29 36.904 7 Titanium 891.43 36.710 7 Total 858.57 47.041 21 Method 2 Alloy 824.29 41.975 7 Steel 903.57 39.761 7 Titanium 854.29 35.051 7 Total 860.71 49.932 21 Total Alloy 822.14 39.648 14 Steel 883.93 42.116 14 Titanium 872.86 39.502 14 Total 859.64 47.925 42 Multiple Comparisons (Combining the two methods) Dependent Variable: Strength Tukey HSD (I) Metal (J) Metal Mean Difference (I- J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Alloy Steel -61.79 * 14.569 .000 -97.40 -26.18 Titanium -50.71 * 14.569 .004 -86.32 -15.10 Steel Alloy 61.79 * 14.569 .000 26.18 97.40 Titanium 11.07 14.569 .730 -24.54 46.68 Titanium Alloy 50.71 * 14.569 .004 15.10 86.32 Steel -11.07 14.569 .730 -46.68 24.54 Based on observed means. The error term is Mean Square (Error) = 1485.714. *. The mean difference is significant at the 0.05 level.

9 Question 5 (Five parts totaling 15 marks): A marine ecologist wanted to examine the relationship between water depth, light intensity, and diatom density (response variable). At 9 different depths in the ocean, he recorded depth (in meters), light intensity (as percentage of the surface intensity) and diatom density (in cells per milliliter of ocean water). The first table below shows the raw data recorded. Below that is incomplete SPSS output of the data analysis. Water depth (m) Light intensity (% of surface intensity) Interaction Diatom density (cells/ml) 1 95 95 96 5 84 420 85 10 69 690 77 20 53 1060 60 30 44 1320 52 40 37 1480 45 60 22 1320 28 80 14 1120 18 100 5 500 7 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 a. Predictors: (Constant), Interaction, Depth, Light_intensity ANOVA a Model Sum of Squares df Mean Square F Sig. 1 Regression 7510.626 Residual 9.374 Total a. Dependent Variable: Diatoms b. Predictors: (Constant), Interaction, Depth, Light_intensity Coefficients a Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 35.208 14.431 Depth -.306 .118 Light_intensity .639 .152 Interaction -.002 .003 a. Dependent Variable: Diatoms

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

10 (a) (5 marks): At the 1% significance level, perform a hypothesis test to determine whether the overall multiple regression model is significant or useful for making predictions about diatom density. (b) (3 marks): Calculate a 95% confidence interval for the slope of the interaction term (representing interaction between depth and light intensity). Using this confidence interval, what conclusion can you make about the significance of the slope of the interaction term? Explain your answer. (c) (1 mark): Find the standard error of the model (standard error of the estimate of the model)?

11 (d) (2 marks): At a depth of 70 meters and a light intensity of 18%, suppose that the actual or observed diatom density observed was 19.6 cells per milliliter. What was the residual or error of this observation? (e) (4 marks): Based on the values of the predictor variables given in part (d) (depth = 70 m, light = 18%, interaction term = 1260), what is the 95% prediction interval for all single observation responses of diatom density at those values of the predictor variables? [Note: SE(Fit) = 0.793]

12 Question 6 (5 marks): A certain company wanted to analyze the relationship between total sales (response variable) and the money they spend advertising through magazines, television, and radio. All data were recorded in millions of dollars based on a random sample of 10 business transactions. At the 5% significance level, perform the most appropriate test, showing all steps, to determine whether magazines have any effect on sales after accounting for the effect of TV and radio advertizing. Consider the following three models and the corresponding ANOVA tables below them: Model 1: 0 1 { | } Sales magazines magazines    = + Model 2: 0 2 3 { | , } Sales TV radio TV radio     = + + Model 3: 0 1 2 3 { | , , } Sales magazines TV radio magazines TV radio      = + + + ANOVA table for Model 1: Effect of magazines ANOVA a Model Sum of Squares df Mean Square F Sig. 1 Regression 353.361 Residual 958.275 Total 1311.636 a. Dependent Variable: Sales b. Predictors: (Constant), Magazines ANOVA table for Model 2: Effect of Radio and TV ANOVA a Model Sum of Squares df Mean Square F Sig. 1 Regression 1117.732 Residual 193.904 Total 1311.636 a. Dependent Variable: Sales b. Predictors: (Constant), Radio, TV ANOVA table for Model 3: Effect of Magazines, Radio and TV (Full Model) ANOVA a Model Sum of Squares df Mean Square F Sig. 1 Regression 1194.529 Residual 117.107 Total 1311.636 a. Dependent Variable: Sales b. Predictors: (Constant), Radio, TV, Magazines

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

13 Solution for Question 6: Question 7 (Two parts totaling 3 marks): Consider the following model: 0 1 2 3 4 { | , , } ( ) Sales magazines TV radio magazines TV radio magazines TV       = + + + +  5 6 7 ( ) ( ) ( ) magazines radio TV radio magazines TV radio    +  +  +   (a) (2 marks): What is the effect of magazines on sales? How would you redefine the model? No calculations are necessary. (b) (1 mark): What would be the null hypotheses for testing for the effect of magazines? Question 8 (2 marks): Suppose the relationship between the annual rate of hip fractures (per 100,000 people) and age follows the following model: ˆ (ln( ) | ) 2.09 0.0912 fractures age age  = − + . For an increase in age from 40 to 50 years old, what would be your interpretation regarding the rate of hip fractures on the original scale?

14 Question 9 (6 marks): All parametric hypothesis tests have assumptions about normality. However, the specific requirements regarding normality differ from one test to the other. For each of the hypothesis tests mentioned below, explain what the specific requirement is regarding normality. In answering this question, do not make reference to the Central Limit Theorem, assuming that sample sizes are not large enough to apply that theorem. 1. Two-sample t-test (independent samples) 2. Paired-sample t-test 3. One-way ANOVA 4. Two-way ANOVA 5. Simple linear regression 6. Multiple linear regression Question 10 (2 marks): There are two types of inferential statistics that can be applied to a research problem; one type is a hypothesis test and the other is a confidence interval. What advantage does a hypothesis test have over calculating a confidence interval? Explain your answer.

Stat 252-Practice Final-Greg-Questions Only

Related Documents