NEWTest2Solutions 3

pdf

School

York University *

*We aren’t endorsed by this school

Course

3A03

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

9

Uploaded by SuperHumanOtterMaster523

Report
Email Address : Stats 3A03 Term Test 2 Date: November 24, 2022 Instructor: Dr. K. Davies Duration: 75 minutes Total Marks: 30 Instructions 1. Please fill in the above information as neatly as possible. 2. This test has 6 questions and there are 9 pages in total, including a formula sheet. 3. Please make sure your test paper is complete! 4. Solve all the questions and show your work to get full credit. 5. Marks for each question are as indicated. 6. Only the McMaster standard calculator is permitted. 7. Do not write on the QR code at the top of each page. 1 SOLUTIONS
1. How is plutonium activity related to alpha particle counts? Plutonium emits subatomic particles — called alpha particles. Devices used to detect plutonium record the intensity of alpha particle strikes in counts per second. To investigate the relationship between plutonium activity ( X , in pCi/g) and alpha count rate ( Y , in number per second), a study was conducted on 23 samples of plutonium. Beside each plot below, in 1-3 sentences, make comments and identify any possible model violations based on the plot.( 4 marks: 2 each ) 2 Based on this plot , we see violation of constant variance . In particular we see variability increasing with X . Based on this plot , we see departures from normality . That is , the assumption that the error terms are normally distributed seems to be violated . (seen in this plot as lack of inatter).
2. (a) In a regression context, what is an outlier? ( 1 mark ) (b) In general, are all outliers influential points? Justify your answer. ( 1 mark ) (c) Fill in the blanks: (i) measures the influence of an observation on all fitted values. (ii) measures the influence of an observation on its own fitted value. (iii) measures the influence of an observation on a par- ticular regression coe ffi cient. ( 3 marks: 1 each ) 3 An outlier is a point that deviates from the model . No , not all outliers are influential points . Consider : line with A * B B 1 : without A B not influential Cook 's Distance DFFITS DFBETAS
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. Consider a data set about coal deposits, which consists of 75 observations on three variables: thickness value ( Y ), North Distance ( X 1 )and North Distance squared ( X 2 ). Upon regressing Y on X 1 and X 2 , the following plots are produced. Explain what each plot indicates, including identifying potential outliers, high leverage points and/or influential points. ( 3 marks ) 4 Points 448,63 and 65 appear to be influential points ( they exceed 4175=0-05 ) Points 448,63 and 6s appear to be potential outliers . Points 44.8 , 1916,63 and 6s are high - leverage and are potential influential points . > .. -.
4. Classify each of the following models as linear, linearizable, or neither linear nor linearizable. If the model is linearizable, write out the transformed model and show it results in a linear model. Be sure to justify each classification and do not consider Box-Cox or Box-Tidwell transformations.( 6 marks: 2 each ) (a) Y = β 0 + β 1 e X + " (b) Y = β 0 + β 1 e β 2 X + " (c) Y = "↵ X β . 5 This is a linear model since the parameters Po and B , enter the model linearly . This model is neither linear nor linear i 2- able . This model in not linear but is linear 2- able . Consider applying the natural logarithm : log (4) = log Lex B) = log (C) + log G) + Blog A) So the new model is : Y ' = Bot Bix 't E ' Where 41=109141 , ' = logcx ) and C ' = logcc ) .
5. Consider a data set consisting of 24 observations on the variables X =Age and Y =Plasma Level. A Box-Cox transformation was considered and from SAS, we have the following plot: (a) Based on the plot, for practical purposes, what transformation would we use? State the new model and be sure to define any new variables you introduce. ( 3 marks ) (b) What type of model violations is the Box-Cox transformation aimed at correcting? Are there any guidelines to its use and/or particular cases where it works best? ( 2 marks ) 6 Since top + = -0.5 , we would consider Y' = g- 422 Yry . The new model would be : YÉpotBiXi The Box . Cot transformation aims at correcting non - normality of errors and / Or non - constant variance . It is mostly useful for strictly positive quantities and works best for > c- C- 42 ) . + 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
6. (a) In what scenarios would one consider using weighted least-squares regression? ( 2 marks ) (b) In a study of 27 industrial establishments of varying size, the number of supervised workers ( X ) and the number of supervisors ( Y ) were recorded. It was decided to study the relationship between the two variables and the following model was postulated y i = β 0 + β 1 x i + " i , (1) where " i N (0 , σ 2 ). A scatterplot also was suggestive of this model. In fitting the model, however, some model violations were apparent and a transformation was applied. (i) The observations on Y and X were entered in SAS as y and x in a data set called work . Based on the following SAS code and output, what transformation was used and what is the transformed model? Also, what is the variance of the new error terms? ( 3 marks ) 7 One would consider weighted least-squares when the assumption of constant variance appears to be violated . The transformation used was : Y ' -_ YX , ' = Yx and C' = '% . The transformed model is : Y' =p 'otP ' X' + E ' . Varley -_r ? I
(ii) Provide a value and an interpretation of ˆ β 1 , the least-squares estimator of the slope of the untransformed model (1). Make sure that your interpretation is in context. ( 2 marks ) Bonus Question : What term is used describe non-constant variance (and it must be spelled correctly!) ( 1 mark ) 8 Note : Po ' =p , Bi - - Bo so here , p , = 012099 Which can be interpret ated as : For every supervised worker , we predict an increase of 0121 supervisors , hetero scedasticity Or hetero skedasticity
STATS 3A03: Fall 2022 Formula Sheet 1. Preliminaries S xx = P n i =1 ( x i - x ) 2 = P n i =1 x 2 i - n x 2 S xy = P n i =1 ( x i - x )( y i - y ) = P n i =1 x i y i - n x y Cov( X, Y ) = E [( X - μ X )( Y - μ Y )] = E ( XY ) - μ X μ Y = Cov( X, Y ) p Var( X )Var( Y ) r = S xy p S xx S yy r p n - 2 p 1 - r 2 t n - 2 2. For the simple linear regression model: Y i = β 0 + β 1 X i + " i , i = 1 , . . . , n ˆ β 0 = y - ˆ β 1 x ˆ β 1 = S xy S xx Var ˆ β 0 = σ 2 1 n + x 2 S xx Var ˆ β 1 = σ 2 S xx ˆ σ 2 = SSE n - 2 = P ( y i - ˆ y i ) 2 n - 2 Var(ˆ μ 0 ) = 1 n + ( x 0 - ¯ x ) 2 S xx σ 2 Var(ˆ y 0 ) = 1 + 1 n + ( x 0 - ¯ x ) 2 S xx σ 2 F = MSReg MSE F df reg ,df E R 2 = 1 - SSE / SST 3. For the multiple linear regression model: Y = X β + " H = X ( X 0 X ) - 1 X 0 ˆ β = ( X 0 X ) - 1 X 0 y Var( ˆ β ) = σ 2 ( X 0 X ) - 1 ˆ σ 2 = SSE / ( n - p - 1). Var(ˆ μ 0 ) = σ 2 ( x 0 0 ( X 0 X ) - 1 x 0 ) Var(ˆ y 0 ) = σ 2 (1 + x 0 0 ( X 0 X ) - 1 x 0 ) ˆ β j - β j se ( ˆ β j ) t n - p - 1 j = 0 , 1 , . . . p, F = MSReg MSE F df reg ,df E R 2 = 1 - SSE / SST R 2 adj = 1 - SSE / ( n - p - 1) SST / ( n - 1) F = (SSE red - SSE full ) / ( df red - df full ) SSE full /df full F df red - df full ,df full D i = ( r i ) 2 p +1 h ii 1 - h ii DFBETAS j,i = ˆ β j - ˆ β j ( i ) ˆ σ ( i ) q ( x 0 x ) - 1 jj DFFITS i = r i q h ii 1 - h ii y ( λ ) i = y λ i - 1 λ if λ 6 = 0 , log y i if λ = 0 . ˆ β WLS = ( X 0 WX ) - 1 X 0 Wy 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help