Final-Practice-Solutions

pdf

School

Ohio State University *

*We aren’t endorsed by this school

Course

3470

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

11

Uploaded by JudgePantherPerson828

Report
STAT 3470-AU 23 Solutions to Practice Questions - Final December 6, 2023 Nasser Sadeghkhani Q.1 Regression methods were used to analyze the data from a study investigating the relationship between roadway surface temperature in F ( x ) and pavement defection ( y ). Summary quantities were n = 20, X y i = 12 . 75 , X y 2 i = 8 . 86 , X x i = 1478 , X x 2 i = 143 , 215 . 8 X x i y i = 1083 . 67 . (a) Calculate the least squares estimates of the slope and intercept. Estimate σ 2 . (b) Use the equation of the fitted line to predict what pavement deflection would be observed when the surface temperature is 90F. (c) Give a point estimate of the mean pavement deflection when the surface is 85F. (d) What change in mean pavement deflection would be expected for a 1F change in surface temperature? (e) Test for significance of regression using α = 0 . 05. What conclusion can you draw? (f) Estimate the standard errors of the slope and intercept. (g) Find a 95% CI for β 0 , and β 1 . (h) Find a 95% CI for expected value of pavement deflection (or true value of Y ) when the surface tem- perature is 85F. (i) Find 95% PI when the for the future pavement deflection when the surface temperature is 85F. (j) Complete the ANOVA table. What are your (null and alternative) hypotheses here. (k) What is your conclusion in (j). 1
Sol. (a) We have ˆ β 1 = S xy S xx , ˆ β 0 = y ˆ β 1 x, ˆ σ 2 = MSE = S yy ˆ β 1 S xy n 2 , where S xy = x i y i 1 n ( x i )( y i ) = 141 . 445 S xx = x 2 i 1 n ( x i ) 2 = 33991 . 6 S yy = y 2 i 1 n ( y i ) 2 = 0 . 731875 , so that ˆ β 1 = 0 . 00416, ˆ β 0 = 0 . 32999, and ˆ σ 2 = 0 . 00797 (b) ˆ y (90) = ˆ β 0 + ˆ β 1 · 90 = 0 . 70 (c) The question can be rephrased as “use the equation of the fitted line to predict what pavement deflection would be observed when the surface temperature is 85F”, i.e.ˆ y (85) = ˆ β 0 + ˆ β 1 · 85 = 0 . 68. (d) That is the definition of the slope: ˆ β 1 = 0 . 00416. (e) We test for H 0 : β 1 = 0 , against H 1 : β 1 ̸ = 0. The test statistic is T 0 = ˆ β 1 0 p ˆ σ 2 /S xx = 0 . 00416 p 0 . 00797 / 33991 . 6 = 8 . 6 , and we reject H 0 in favour of a linear relationship between x and y , since | 8 . 6 | > t (18 , . 025) = 2 . 1. (f) The standard errors are se( ˆ β 1 ) = s ˆ σ 2 S xx , se( ˆ β 0 ) = s ˆ σ 2 1 n + ¯ x 2 S xx . So, se( ˆ β 1 ) = 0 . 00048, se( ˆ β 0 ) = 0 . 04098. (g) CI for β 1 : 0 . 00416 ± 0 . 00048 × 2 . 1 where 2 . 1 is the critical value of t (18 , 0 . 025). CI for β 0 : 0 . 32999 ± 0 . 04098 × 2 . 1. (h) 0 . 68 ± 2 . 1 q 0 . 00797[ 1 20 + (85 73 . 9) 2 33991 . 6 ], where 73 . 9 is the ¯ x . (i) 0 . 68 ± 2 . 1 q 0 . 00797[1 + 1 20 + (85 73 . 9) 2 33991 . 6 ]. (j) We use ANOVA for testing H 0 = β 1 = 0 vs. H 0 = β 1 ̸ = 0. In here, SST = 0 . 731875, with df = 19 SSR = 0 . 00416 × 141 . 445 = 0 . 5884 , with df = 1 SSE = 0 . 731875 0 . 5884 = 0 . 14347 , with df = 18, Therefore MSR = 0 . 5884 MSE = 0 . 14347 18 = 0 . 008 therefore F 0 = 0 . 5884 0 . 008 = 73 . 55 (k) F 0 is (much) greater than F (1 , 18 , 0 . 05) = 4 . 41, we reject H 0 in favor of a linear relationship between x, and y. 2
Q.2 The number of emails arriving at a server per minute is claimed to follow a Poisson distribution . To test this claim, the number of emails arriving in 70 randomly chosen 1-minute intervals is recorded. The table below summarises the results. Test the hypothesis that the number of emails per minute follows a Poisson distribution? Use α = 0 . 05. # emails freq. 0 13 1 22 2 23 3 12 4 0 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Sol. We know P ( X = x ) = e λ λ x x ! for x = 0 , 1 , . . . . Let us estimate λ first. We have seen already the ˆ λ ML = ¯ x and ¯ x = 1 . 49 in here. To find the test statistics we require the expected frequencies, and they are given by # emails freq. expected freq. 0 13 15.78 1 22 23.51 2 23 17.5 3 12 13.2 Note that the number of bins is k = 4. Moreover the test statistic is χ 2 0 = (15 . 78 13) 2 15 . 78 + · · · + (13 . 2 12) 2 13 . 2 = 2 . 417 . The critical value is χ 2 (4 1 1 , 0 . 05) = 5 . 99, so we will not reject H 0 . 4
Q.3 A researcher believes the number of fish in a certain river (Y) depends on the pH (X) of the water. He collected 40 observations from different location of the river. Complete the following ANOVA table for the regression analysis. State the null and alternative hypotheses for the F-test as well as your conclusion in sentence form. Assume α = 0 . 05. Source of Sum of df Mean Square F 0 Variation Square Regression 55 . 3 ? ? ? Error ? ? ? Total 60 ? 5
Sol. Source of Sum of df Mean Square F 0 Variation Square Regression 55 . 3 1 53 . 3 445 . 97 Error 4 . 70 38 0 . 124 Total 60 39 The hypotheses are H 0 : β 1 = 0 , vs. H 1 : β 1 ̸ = 0. We reject the null hypothesis since F 0 > F (1 , 38 , 0 . 05), and conclude that there is a significant linear relationship between pH of water and fish count. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q.4 A manager in a trucking company wants to predict the total daily travel time for the drivers. He believes that total travel time (hours) depends of number of miles traveled in making deliveries x 1 , and number of deliveries x 2 . A random sample of 10 driving assignment are taken. Below is the part of Minitab output (missing values are denoted by ? ) The regression equation is Time = - 0.869 + 0.0611 Miles + 0.923 Deliveries Predictor Coef SE Coef T P Constant -0.8687 0.9515 -0.91 0.392 Miles 0.061135 0.009888 6.18 0.000 Deliveries 0.9234 0.2211 4.18 0.004 R-Sq = 90.4% R-Sq(adj) = 87.6% Analysis of Variance SOURCE DF SS MS F p Reg. ? 21.601 ? ? 0.000 Error. ? ? ? Total. ? ? (a) Find the estimated regression equation? (b) Predict the travel time when miles traveled x 1 and number of deliveries x 2 are 80, and 4 respectively. (c) interpret the estimated coefficient for Miles , i.e. 0 . 061135. (d) Complete the ANOVA table by finding missing values are denoted by ? 7
Sol. (a) Time = - 0.869 + 0.0611 Miles + 0.923 Deliveries (b) ˆ y = 0 . 869 + 0 . 0611 80 + 0 . 923 4 = 7 . 711 hrs (c) 0 . 061135 hours is an estimate of the expected increase in travel time corresponding to an increase of one mile in the distance traveled when the number of deliveries is held constant. (d) We know R 2 = SSR/SST , in here 0 . 904 = 21 . 601 /SST , and hence SST = 23 . 90. Analysis of Variance SOURCE DF SS MS F p Reg. 2 21.601 10.8 32.88 0.000 Error. 7 2.2999 0.328 Total. 9 23.900 8
Q.5 Below is a survey of average used vehicle prices in American in 1957. X = Vehicle age (years) 1 2 3 4 5 6 7 8 9 10 Y = Average price ( $ ) 2651 1943 1494 1087 765 538 484 290 226 204 The scatter plot of the raw data Y and X (left), and a log transformation on Y (log( Y )) and X . (right), are depicted below. Below is the R summary of the regression of Y on X : Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2371.47 210.25 11.279 3.43e-06 *** age -255.14 33.89 -7.529 6.74e-05 *** --- Residual standard error: 307.8 on 8 degrees of freedom Multiple R-squared: 0.8763, Adjusted R-squared: 0.8609 Below is the R summary of the regression of log( Y ) on X : Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.164585 0.057051 143.11 6.36e-15 *** age -0.297680 0.009195 -32.38 9.03e-10 *** --- Residual standard error: 0.08351 on 8 degrees of freedom Multiple R-squared: 0.9924, Adjusted R-squared: 0.9915 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Answer to the following questions: (a) Do you think applying transformation on Y , was a good idea? Why? Use the R outputs as well as the scatter plots to justify your answer? (b) What are the equations of the estimated regression lines in these two regressions, respectively? (c) Predict the average car price (the value of the Y variable) for a used vehicle of age X = 2 . 5 years, based on the regression of log( Y ) on X . 10
Sol. (a) Yes, It is clear from the scatter plot that the relationship between X and Y is not linear, but log transformation on Y , makes the relationship linear. This can be seen through two R 2 s. Without the transformation R 2 = . 87 and with the transformation it is R 2 = . 99 (b) y = 2371 . 47 255 . 14 x log y = 8 . 164585 0 . 297680 x (c) log y = 8 . 164585 0 . 297680 × 2 . 5 = 7 . 42 So y = e 7 . 42 = 1669 . 7. Q.6 Use the ANOVA procedure to test if the linear regression if significant for the following table. compare test statistic with critical value. Set α = 0 . 05. Sol. Already was solved in the class. 11