STATS 252-HW3-Solutions

pdf

School

University of Alberta *

*We aren’t endorsed by this school

Course

252

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

7

Uploaded by SuperHumanSteel1194

Report
Stat 252 Homework # 3 Solutions Winter 2024 1 STAT 252 Homework 3 (52 marks) Due Tuesday, March 12 at 9:59 pm For questions that state, “ SHOW ALL STEPS ”, write all the steps of a hypothesis test or confidence interval as indicated below. For other questions that say do “ NOT ” show all steps, read the question carefully and follow the exact instructions regarding what is required. Whenever you are asked to “ carry out the most appropriate test ” and “ SHOW ALL STEPS : i) Select the most appropriate hypothesis test and define the parameter(s) of interest. ii) State clearly the null and alternative hypothesis in terms of the parameter(s). iii) Calculate the test statistic being sure to state its components (estimate and standard error). iv) Calculate df . Determine the P -value for the test AND state the strength of the evidence against H 0 . State whether P is less than or greater than alpha and, based on this comparison, decide whether to reject or not reject H 0 . v) Based on the research problem and referring to the significance level given, write a conclusion in words. Whenever you are asked to calculate a “ confidence interval ” and “ SHOW ALL STEPS : i) State the critical value of the test statistic. ii) Calculate the confidence interval, stating its components (estimate and standard error). iii) Interpret the interval. Other: If you need to use the t-table and the t-distribution you need is NOT on the table, round your degrees of freedom DOWN to the nearest one. 1. (Ten parts; 33 marks in total) Consider a SLR model to explore the association between selling price of a house (in thousands of dollars) and the square footage of the house. Use the SPSS output given below based on a random sample of 108 houses. Assume that all the assumptions are met for the required analysis. Use the following model for parts (a) (j): μ ( price | sqft ) = 0 + 1 ( sqft ) ANOVA a Model Sum of Squares df Mean Square F Sig. 1 Regression 854698.400 .000 a Residual Total 1170535.000 a. Dependent Variable: Price b. Predictors: (Constant), sqft
Stat 252 Homework # 3 Solutions Winter 2024 2 a) (2 mark) According to the linear regression model, what is the predicted price of a house that is 2,000 square feet? From the output, ˆ ( | ) 18.056 0.2154644 price sqft sqft = + For a house that is 2,000 square feet, ˆ ( | ) 18.056 (0.2154644)(2,000) 448.9848 price sqft = + = Therefore a 2,000 square foot house is predicted to have a price of 448.9848 (in thousands of dollars), that is, $448,984.80. b) (2 marks) What is the linear correlation coefficient for the relationship between price and square feet? 2 854,698.400 0.7302 1,170,535.000 REGR TOTAL SS R SS = = = Since the slope is positive, 2 0.730 0.8545 r R = + = + = + . Thus, the linear correlation between price and square feet is approximately 0.8545. c) (2 marks) What is the standard error of the model (standard deviation of the residuals)? 1,170,535.000 854,698.400 315,836.600 Error TOTAL REGR SS SS SS = = = 315,836.600 ˆ 2979.590566 54.585626 2 108 2 Error Error SS MS n = = = = = d) (6 marks) Is there a linear relationship between average price and square footage? Carry out the most appropriate test using the t -distribution to answer this question. Use a significance level of 0.01. 0 1 1 1 1 * 0 1 0 1 : 0 : 0 ˆ 0.2154644. (1 mar ˆ . .( ) . .( ) 0.0127218. ˆ value 0.2154644 0 16.937. ˆ . .( ) 0.0127218 ( k ) (2.5 marks) ( . ) 1 5 a H H Estimate S E Estimate S E Estimate H Test Statistic t S E Estimate SE = = = = = = = = = = * 2 0 106 2 ( | |) 2 m ( 16.937) 2(0.0005) 0.00 arks) 1. n p value P t t P t = = Or P < 0.001. There is extremely strong evidence against Ho. P < α (0.01), therefore reject H 0 . (1 mark) At the 1% significance level, there is sufficient evidence to conclude that there is a linear relationship between square footage and mean selling price of a house.
Stat 252 Homework # 3 Solutions Winter 2024 3 e) (4 marks) Calculate a 98% confidence interval for the slope of the regression line describing the relationship between square footage and mean selling price of a house. For 98% confidence, * * * * /2, 2 0.02/2,108 2 0.01,106 0.01,100 . . 2.364 n CV t t t t = = = = (1 mark) 1 ˆ : 0.2154644 Estimate = 1 ˆ . .( ) ( ) 0.0127218 S E Estimate SE = = The 98% confidence interval for slope is then: { ( )} Estimate CV SE Estimate 0.2154644 (2.364)(0.0127218) 0.2154644 0.0300743 ( ) 0.185,0.246 (2 marks) Conclusion: It is estimated with 98% confidence that slope of the regression line describing the relationship between square footage and mean selling price of a house is between $185 and $246. (Note the data for selling price of a house is in thousands of dollars, so multiply the endpoints of the confidence interval by 1000.) (1 mark) f) (5 marks) Calculate a 96% confidence interval for the average price of houses that are 1500 square feet. Solution: Estimate: 0 1 ˆ ˆ ˆ ( | 1500) (1500) 18.056 (0.2154644)(1500) 341.2526 Y x = = + = + = (1 mark) For 96% confidence, * * * * /2, 2 0.04/2,108 2 0.02,106 0.02,100 . . 2.081 n CV t t t t = = = = (1 mark) 2 /2, 2 ( ) 1 ˆ ˆ p p n xx x x y t n S + 2 2 ( 1) (108 1)(414.80024) 18410338.58 x xx S n s = = = where 2 1 (1500 1760.0648) 341.2526 2.081 54.5856 108 18410338.58 + 341.2526 2.081 6.2076 341.2526 12.9180 (328.335,354.171) (2 marks) Conclusion: It is estimated with 96% confidence that the mean selling price of houses that are 1500 square feet is between $328,335 and $354,171. (Note the data for selling price of a house is in thousands of dollars, so multiply the endpoints of the confidence interval by 1000.) (1 mark)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Stat 252 Homework # 3 Solutions Winter 2024 4 g) (3 marks) Calculate a 96% prediction interval for the selling price of a house that is 1500 square feet. Solution: Estimate: 0 1 ˆ ˆ ˆ ( | 1500) (1500) 18.056 (0.2154644)(1500) 341.2526 Y x = = + = + = (From part (j)) For 96% confidence, * * * * /2, 2 0.04/2,108 2 0.02,106 0.02,100 . . 2.081 n CV t t t t = = = = (From part (j)) 2 /2, 2 ( ) 1 ˆ ˆ 1 p p n xx x x y t n S + + 2 2 ( 1) (108 1)(414.80024) 18410338.58 x xx S n s = = = where (From part (j)) 2 1 (1500 1760.0648) 341.2526 2.081 54.5856 1 108 18410338.58 + + 341.2526 2.081 54.9374 341.2526 114.3247 (226.928,455.577) (2 marks) Conclusion: It is estimate with 96% confidence that the selling price of a house that is 1500 square feet is between $226,928 and $455,577. (Note the data for selling price of a house is in thousands of dollars, so multiply the endpoints of the confidence interval by 1000.) (1 mark) h) (1 marks) How do your intervals in (f) and (g) compare? Which is wider? The prediction interval is much wider because it is for an individual observation. i) (6 marks) Is there a linear relationship between average price and square footage? Carry out the most appropriate test using the F -distribution to answer this question. Use a significance level of 0.01. Solution: H 0 : β 1 = 0 (There is no linear relationship between average price and square footage.) H A : β 1 ≠ 0 (There is a linear relationship between average price and square footage.) (1 mark) 1170535 854698.4 315836.6 Error Total REGR SS SS SS = = = /1 854698.4 /1 286.851 / ( 2) 315836.6 / (108 2) REGR REGR ERROR Error MS SS F MS SS n = = = = (1.5 marks) (1, 2) (1,108 2) (1,106) df n = = = OR 1 1 1 2 108 2 106 n F F F = = (0.5 marks) The P -value : From the F -table: P < 0.001 OR 1 106 ( 286.851) 0.001. P F (1 mark) There is extremely strong evidence against H 0 . Since P - value ≤ α (0.01), reject H 0 . (1 mark) Conclusion: At the 1% significance level, the data provide sufficient evidence that there is a linear relationship between average price and square footage. (1 mark)
Stat 252 Homework # 3 Solutions Winter 2024 5 j) (2 marks) Compare the test statistic and p -value from part (d) to the test statistic and p -value obtained in part (i). How are the test statistics related? How are the p -values related? The square of the t -statistic is equal to the F -statistic. (Any difference is due to rounding.) 2 2 ( ) (16.937) 286.862 286.851 t statistic F statistic = = = Both tests give the same P-value: P < 0.001. 2. (Three parts; 6 marks in total) The gestation time ( time ) between fertilization and birth for a mammal is related to the birth weight ( weight ) by the relationship, 0 1 (ln( ) | ) time weight weight = + . The approximate gestation time and birth weights (in kg) of 11 selected mammals were recorded. The least- squares estimate of the regression line was observed to be ˆ (ln( ) | ) 5.231 0.011 time weight weight = + . Assume all the required assumptions for this model are satisfied. a) (2 marks) On the original scale, estimate the difference in gestation time for mammals with birth weights that differ by 1 kg. Solution: Taking the antilog, where k = 1 (difference of 1 kg): ( ) 1 (1)(0.011) 1.0111 k e e e = = = Interpretation: It is estimated that a one-kilogram increase in weight is associated with a multiplicative change of 0.011 1.0111 e = in the median of gestation time. OR: The median gestation time of mammals at a weight of x + 1 kilograms will be 1.0111 times the median gestation time of mammals that weigh x kilograms. (Not required for full marks): For example, the median gestation time of mammals that weigh 41 kilograms is 1.0111 times the median gestation time of mammals that weigh 40 kilograms. (Not required for full marks): OR: The median gestation time of mammals that weigh 41 kilograms is 1.11% more than the median gestation time of mammals that weigh 40 kilograms.) b) (2 marks) A 95% confidence interval for the slope of this regression line is (0.006, 0.016). Interpret this confidence interval on the original scale. Solution: Taking antilogs of the endpoints, we obtain a 95% confidence interval for the multiplicative effect: 0.006 0.016 ( , ) (1.0060,1.0161) e e = Interpretation: It is estimated, with 95% confidence, that a one-kilogram increase in weight is associated with a multiplicative change of from 1.0060 to 1.0161 in the median gestation time of mammals. OR: The median gestation time of mammals will increase by from 1.0060 to 1.0161 times for each one- kilogram increase in weight.
Stat 252 Homework # 3 Solutions Winter 2024 6 (Not required for full marks): For example, it is estimated with 95% confidence that the median gestation time of mammals that weigh 41 kilograms is between 1.0060 and 1.0161 times the median gestation time of mammals that weigh 40 kilograms. OR: It is estimated with 95% that the median gestation time of mammals that weigh 41 kilograms is between 0.60% and 1.61% more than the median gestation time of mammals that weigh 40 kilograms.) c) (2 marks) A 95% prediction interval for the natural log of gestation time of a lion that weighs 1.2 kg at birth is (4.59, 5.90). Interpret this prediction interval on the original scale. Solution: By taking antilogs, a 95% prediction interval for the gestation time of the lion with a birth weight of 1.2 kg is 4.59 5.90 ( , ) (98.494, 365.038) e e = . Conclusion: It is estimated with 95% confidence that the gestation time of a lion with a birth weight of 1.2 kg is between 98.494 and 365.038 time units (days, on the original scale). 3. (Four parts; 9 marks in total) An experiment was conducted to determine the extent to which the growth rate of a certain fungus could be affected by filling test tubes containing the same medium at the same temperature with different inert gases. Three such experiments were performed for each of six gases (thus n = 6), and the average growth rate over these three tests was used as the response. The linear regression model was X X Y 012 . 0 707 . 3 ) | { ˆ = , where X is the molecular weight of each gas. In addition, 95% of the variation in Y is explained by the simple linear regression on X . Assume that all the required assumptions for this model are satisfied. a) (2 marks) What is the linear correlation coefficient for the relationship between X and Y ? Since 95% of variation in Y was explained by X, we have 95 . 0 2 = R . For simple linear regression model, we have 2 , where the sign of is the same as the sign of the slope. r R r =  Since 012 . 0 ˆ 1 = , the sample correlation is negative, and therefore, 0.95 0.9747 r = − = − (One mark for square root of 0.95 and one mark for pointing out that r is negative.) b) (3 marks) What is the value of the F -statistic used to test if there is any linear relationship? Note: There is sufficient information given above to answer this question. Solution: /1 ( 2) / ( 2) ( 2) REGR REGR ERROR Error MS SS n F MS SS n n = = ( 2) ( 2) REGR REGR Error TOTAL REGR SS SS n n SS SS SS = = / / ( 2) ( 2) ( ) / ( / ) ( / ) REGR TOTAL REGR TOTAL TOTAL REGR TOTAL TOTAL TOTAL REGR TOTAL SS SS SS SS n n SS SS SS SS SS SS SS = = 2 2 ( 2) 1 R n R = 2 REGR TOTAL SS R SS = Note: 0.95 (6 2) 1 0.95 = 76.000 F = Note: Students may present their solutions in a somewhat different way than is shown above; nevertheless, if their logic is correct, they should get full marks for the correct answer.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Stat 252 Homework # 3 Solutions Winter 2024 7 c) (2 marks) What is the value of the test statistic for testing H 0 : β 1 = 0 versus H A : β 1 < 0 ? Solution: The test-statistic for the left-tailed alternative is a t -test statistic. Since the estimate of the slope of the regression line (-0.012) is negative, the value of the t -test statistic is negative. Therefore, 76.000 8.718 t F = − = = − (Markers should take away one mark if a student found the value of t-stat = +8.718) d) (2 marks) What is the standard error of the estimate of the slope of the regression line? Solution: 1 1 1 1 ˆ ˆ ˆ ( ) ˆ ( ) t SE t SE = = 1 0.012 ˆ ( ) 0.00138 8.718 SE = = 4. (4 marks) Single-factor ANOVA and simple linear regression analysis both have assumptions about normality and equal standard deviations. However, the specific requirements regarding these assumptions differ in these two cases. Therefore, for these two types of analyses, explain what the specific requirements are for each of the assumptions mentioned. For single-factor ANOVA, all populations being compared must be approximately normally distributed or sample size should be large (n ≥ 30). However, for simple linear regression analysis, for each value of the explanatory (or predictor) variable, the corresponding values of the response variable must be approximately normally distributed. For single-factor ANOVA, all populations being compared must have approximately equal standard deviations. However, for simple linear regression analysis, for all values of the predictor variable, the corresponding values of the response variable must have approximately equal standard deviations.