final_review_session

.pdf

School

Hong Kong Polytechnic University *

*We aren’t endorsed by this school

Course

273

Subject

Statistics

Date

Nov 24, 2024

Type

pdf

Pages

14

Uploaded by lixun73230

STAT 151A Final Review Session December 6, 2023 This worksheet doesn’t include everything you need to review for the final exam. Please see the final study guide posted on bCourses for a more comprehensive list of concepts, examples and exercises. 1 Regression output For this question we consider the seatpos dataset from R. Here is a description from the R help file: Car drivers like to adjust the seat position for their own comfort. Car designers would find it helpful to know where different drivers will position the seat depending on their size and age. Researchers at the HuMoSim laboratory at the University of Michigan collected data on 38 drivers. We focus on a random subset of 33 drivers. The dataset contains the following variables Age (in years) Weight (in lbs) HtShoes (height in shoes in cm) Ht (Height bare foot in cm) Seated (Seated height in cm) Arm (lower arm length in cm) Thigh (Thigh length in cm) Leg (Lower leg length in cm) hipcenter (horizontal distance of the midpoint of the hips from a fixed location in the car in mm) Using the variables given in the dataset, you decide to create four new variables v 1, v 2, v 3 and v 4 via > v1 = seatpos$HtShoes - 171.3 > v2 = seatpos$Arm - 0.2252*seatpos$HtShoes + 6.346 > v3 = seatpos$Thigh - 0.1662*seatpos$HtShoes - 0.3376*seatpos$Arm + 0.6548 > v4 = seatpos$Leg - 0.2374*seatpos$HtShoes - 0.2280*seatpos$Arm + 0.0746*seatpos$Thigh + 8.872 You then fit the model hipcenter = β 0 + β 1 v 1 + β 2 v 2 + β 3 v 3 + β 4 v 4 + e to the data using R which gives the following output: 1
STAT 151A Final Review Session December 6, 2023 > model = lm(seatpos$hipcenter ~ v1 + v2 + v3 + v4) > summary(model) Call: lm(formula = seatpos$hipcenter ~ v1 + v2 + v3 + v4) Residuals: Min 1Q Median 3Q Max -89.50 -23.21 -4.82 24.41 60.39 Coefficients: Estimate Std. Error (Intercept) -163.201 6.654 v1 -4.206 0.580 v2 0.112 3.095 v3 -0.613 2.516 v4 -8.927 XXXX --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 38.23 on 28 degrees of freedom Multiple R-squared: 0.666,Adjusted R-squared: XXXX F-statistic: XXXX on 4 and 28 DF The X T X matrix for this linear model is given by (approximately) X = model.matrix(model) t(X) %*% X (Intercept) v1 v2 v3 v4 (Intercept) 33 0 0 0 0 v1 0 4343.7 0 0 0 v2 0 0 152.54 0 0 v3 0 0 0 230.76 0 v4 0 0 0 0 58.53 a) Fill in the three missing values in the R output above, giving appropriate reasons. 2
STAT 151A Final Review Session December 6, 2023 b) You decide to drop the variables v3 and v4 from the above model which results in the following fit: Call: lm(formula = seatpos$hipcenter ~ v1 + v2) Residuals: Min 1Q Median 3Q Max -101.318 -27.873 7.407 23.938 71.741 Coefficients: Estimate Std. Error (Intercept) XXXXXX XXXXXX v1 XXXXXX XXXXXX v2 XXXXXX XXXXXX --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 39.02 on 30 degrees of freedom Multiple R-squared: 0.6273,Adjusted R-squared: 0.6024 F-statistic: 25.24 on 2 and 30 DF Fill the six missing values above, explaining your answers. 3
STAT 151A Final Review Session December 6, 2023 2 Hypothesis Testing Let’s say I have collected the following data: a single categorical variable with 3 categories, a continuous independent variable (call this ⃗x 1 ), and a proportion outcome, y (0 , 1) I assume there are no other relevant variables for the data-generating process. Design a linear model and F-distributed test statistic to jointly test these null hypotheses: There is no relationship of ⃗x 1 on y within categorical group 1. The relationship of ⃗x 1 on y within categorical group 2 is the same as the relation of ⃗x 1 on y within categorical group 3. Toward this end, please answer the following questions: a) What is the data matrix for the linear model; b) Write out the model formulation; c) Note any assumptions that make the F-test valid (ie. our canonical assumptions for linear modeling in the course thus far); d) Construct a matrix L for general linear hypothesis to jointly test the two null hypoth- esis. How to compute the relevant F-test statistic (including it’s degrees of freedom parameters, as a function of sample size, n ), and what distribution does it follow? 4
STAT 151A Final Review Session December 6, 2023 3 Categorical regression Let’s imagine that 80 students took a particular course at sophomores, 20 were juniors and 20 were seniors. In R, I have saved the final scores (out of 100) for the 20 freshmen in the vector g1 , for the 20 sophomores in g2 , juniors in g3 and seniors in g4 . Also, for i = 1 , · · · 80, let - y i : Final score of the i th student in the class - x i 1 : Takes the value of 1 if the i th student is a freshman and 0 otherwise - x i 2 : Takes the value of 1 if the i th student is a sophomore and 0 otherwise - x i 3 : Takes the value of 1 if the i th student is a junior and 0 otherwise - x i 4 : Takes the value of 1 if the i th student is a senior and 0 otherwise I fit the linear model: y i = β 0 + β 1 x i 1 + β 2 x i 2 + β 3 x i 3 + β 4 x i 4 + ϵ i to this data via R to obtain the following output: a) Why does the R output above say ”1 not defined because of singularities”? How would you fix the problem? b) Fill in the 3 missing values in the R output with proper reasoning. c) Explain why the standard error estimates for the coefficients of x 1, x 2, x 3 are all the same. 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help