Stats-101A-HW-7

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

101A

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

6

Uploaded by ucladsp

Report
Stats 101A HW 7 Ian Zhang UID: 205702810 2023-05-17 Question 1 b wwh <- read.table( "waistweightheight.txt" , header = TRUE) m1 <- lm(Weight ~ Waist + Height, data = wwh) anova(m1) ## Analysis of Variance Table ## ## Response: Weight ## Df Sum Sq Mean Sq F value Pr(>F) ## Waist 1 358074 358074 3590.77 < 2.2e-16 *** ## Height 1 29843 29843 299.26 < 2.2e-16 *** ## Residuals 504 50259 100 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 #SYY = SSReg + RSS = 358074 + 29843 + 50259 = 438176 #SSReg = 387917 #RSS = 50259 summary(m1) ## ## Call: ## lm(formula = Weight ~ Waist + Height, data = wwh) ## ## Residuals: ## Min 1Q Median 3Q Max ## -32.760 -6.405 -0.420 5.656 45.474 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -165.5332 8.2517 -20.06 <2e-16 *** ## Waist 4.9605 0.1229 40.37 <2e-16 *** ## Height 2.4884 0.1438 17.30 <2e-16 *** ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 ## ## Residual standard error: 9.986 on 504 degrees of freedom ## Multiple R-squared: 0.8853, Adjusted R-squared: 0.8848 ## F-statistic: 1945 on 2 and 504 DF, p-value: < 2.2e-16 1
#R-squared = 0.8853 #Adjusted R-squared = 0.8848 #The slope of height means that among people with the same waist size, #those who are 1 inch taller are an average of 2.4884 pounds heavier b set.seed( 23 ) new.df <- transform(wwh, worthless = rnorm(dim(wwh)[ 1 ], 0 , 5 )) m2 <- lm(Weight ~ Waist + Height + worthless, data = new.df) anova(m2) ## Analysis of Variance Table ## ## Response: Weight ## Df Sum Sq Mean Sq F value Pr(>F) ## Waist 1 358074 358074 3584.4800 <2e-16 *** ## Height 1 29843 29843 298.7400 <2e-16 *** ## worthless 1 12 12 0.1176 0.7318 ## Residuals 503 50247 100 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 #SSReg = 387929 #RSS = 50247 #SYY = 387929 + 50247 = 438176 #The SSReg has increased, but the RSS has decreased #Overall, SYY has stayed the same. #SSReg increase means that the model explains more of the total variation. #RSS decrease means that there is less unexplained variation. #When SSReg increases and RSS decreases, this means that the model #has been improved as it explains more variation than before. summary(m2) ## ## Call: ## lm(formula = Weight ~ Waist + Height + worthless, data = new.df) ## ## Residuals: ## Min 1Q Median 3Q Max ## -32.981 -6.384 -0.350 5.800 45.435 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -165.54777 8.25903 -20.044 <2e-16 *** ## Waist 4.95999 0.12300 40.325 <2e-16 *** ## Height 2.48874 0.14397 17.286 <2e-16 *** ## worthless 0.02992 0.08724 0.343 0.732 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 ## 2
## Residual standard error: 9.995 on 503 degrees of freedom ## Multiple R-squared: 0.8853, Adjusted R-squared: 0.8846 ## F-statistic: 1294 on 3 and 503 DF, p-value: < 2.2e-16 #R-squared = 0.8852, Adjusted R-squared = 0.8846 #Adjusted R-squared decreased from 0.8848 to 0.8846 set.seed( 23 ) m3 <- lm(Weight ~ worthless + Waist + Height, data = new.df) anova(m3) ## Analysis of Variance Table ## ## Response: Weight ## Df Sum Sq Mean Sq F value Pr(>F) ## worthless 1 58 58 0.5828 0.4456 ## Waist 1 358020 358020 3583.9463 <2e-16 *** ## Height 1 29850 29850 298.8086 <2e-16 *** ## Residuals 503 50247 100 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 #SSReg = 387928 #RSS = 50247 #SYY = 387928 + 50247 = 438175 #The SSReg has decreased by 1, RSS stayed the same #SYY has decreased by 1 #R-squared = 0.8852, Adjusted R-squared = 0.8846 #Adjusted R-squared decreased from 0.8848 to 0.8846 d Adjusted r-squared is a more reliable guide to determining whether a new variable should be added or not, because adjusted r-squared only increases if the new variable added improves the model more than would be expected by chance. This allows you to determine whether a new variable is just noise or actually betters the model. The adjusted R-squared decreased from 0.8848 to 0.8846. This is a very small difference, but since it decreased, decreasing adjusted R-squared would indicate that the new worthless variable is not significant. e SSReg doesn’t actually tell us whether we should add a new variable. SSReg represents the total variance and thus, it will always increase when new variables are added regardless of whether those variables actually add to the model or not. Partial tests tell us whether the added variables provide more explanatory power which supports the addition of those variables. Partial tests also take into account the variablilty explained by the variable relative to the total variability and it also accounts for how many predictors there are in the model. Essentially, it is a test that allows us to determine whether the new variable added is significant. Question 2 cars <- read.csv( "cars04.csv" , header = TRUE) m4 <- lm(SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders + Horsepower+CityMPG + HighwayMPG + 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
m4$coefficients ## (Intercept) DealerCost EngineSize Cylinders Horsepower CityMPG ## 349.9762772 1.0541853 -32.2471983 228.3295245 2.3621206 -16.7423882 ## HighwayMPG Weight WheelBase Length Width ## 46.7575373 0.6991997 27.0534500 -7.3201885 -84.7084980 Equation: SuggestedRetailPrice = 349.9762772 + 1.0541853(DealerCost)- 32.2471983(EngineSize) + 228.3295245(Cylinders) + 2.3621206(Horsepower) - 16.7423882(CityMPG) + 46.7575373(HighwayMPG) + 0.6991997(Weight) + 27.0534500(WheelBase) - 7.3201885(Length) - 84.7084980(Width) #We exclude Vehicle name because the names aren ' t numerical variables #and will mess up the final analysis. The names are also not very #relevant to this regression as they have no impact on the #suggested retail price of the cars. summary(m4) ## ## Call: ## lm(formula = SuggestedRetailPrice ~ DealerCost + EngineSize + ## Cylinders + Horsepower + CityMPG + HighwayMPG + Weight + ## WheelBase + Length + Width, data = cars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1403.85 -276.86 -55.03 257.55 2584.11 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 349.97628 1461.40052 0.239 0.810953 ## DealerCost 1.05418 0.00564 186.923 < 2e-16 *** ## EngineSize -32.24720 123.05642 -0.262 0.793523 ## Cylinders 228.32952 71.99492 3.171 0.001730 ** ## Horsepower 2.36212 1.42851 1.654 0.099624 . ## CityMPG -16.74239 21.46286 -0.780 0.436181 ## HighwayMPG 46.75754 24.17910 1.934 0.054403 . ## Weight 0.69920 0.20751 3.370 0.000887 *** ## WheelBase 27.05345 16.36168 1.653 0.099644 . ## Length -7.32019 7.12296 -1.028 0.305209 ## Width -84.70850 30.21238 -2.804 0.005496 ** ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 ## ## Residual standard error: 532.3 on 223 degrees of freedom ## Multiple R-squared: 0.9989, Adjusted R-squared: 0.9989 ## F-statistic: 2.073e+04 on 10 and 223 DF, p-value: < 2.2e-16 #Estimated slope = 228.32952 #T-statistic = 3.171 #p-value = 0.001730 #assuming that all necessary model conditions are valid, #we can conclude that we are able to reject the null hypothesis #that the slope is 0 as the p-value is 0.00173. This means that #we can assume that the slope is not 0, which indicates a 4
#linear association. This means that there is enough evidence #to say that the variable Cylinders contributes significantly #to the variation in the suggested retail price am4 <- anova(m4) #getting t statistic from anova sqrt(am4$ ` F value ` [ 3 ]) ## [1] 3.099739 #t-value from anova is 3.099739 sum4 <- summary(m4) #F-statistic from summary = 20730 on 10 and 223 DF #This is a high number, which essentially means that there #is strong evidence to support the null hypothesis. #Essentially this means that we can conclude that at #least one of the predictor variables has a significant #impact on the suggested retail price f partialm4 <- lm(SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders + Horsepower + Weight + Wheel anova(partialm4) ## Analysis of Variance Table ## ## Response: SuggestedRetailPrice ## Df Sum Sq Mean Sq F value Pr(>F) ## DealerCost 1 5.8714e+10 5.8714e+10 2.0204e+05 < 2.2e-16 *** ## EngineSize 1 7.7453e+06 7.7453e+06 2.6651e+01 5.353e-07 *** ## Cylinders 1 2.7222e+06 2.7222e+06 9.3670e+00 0.002478 ** ## Horsepower 1 7.0394e+05 7.0394e+05 2.4223e+00 0.121028 ## Weight 1 5.3446e+05 5.3446e+05 1.8391e+00 0.176418 ## WheelBase 1 7.3600e+02 7.3600e+02 2.5000e-03 0.959900 ## Length 1 1.4322e+06 1.4322e+06 4.9281e+00 0.027421 * ## Width 1 1.4236e+06 1.4236e+06 4.8985e+00 0.027885 * ## Residuals 225 6.5388e+07 2.9061e+05 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 anova(m4) ## Analysis of Variance Table ## ## Response: SuggestedRetailPrice ## Df Sum Sq Mean Sq F value Pr(>F) ## DealerCost 1 5.8714e+10 5.8714e+10 2.0724e+05 < 2.2e-16 *** ## EngineSize 1 7.7453e+06 7.7453e+06 2.7338e+01 3.925e-07 *** ## Cylinders 1 2.7222e+06 2.7222e+06 9.6084e+00 0.002186 ** ## Horsepower 1 7.0394e+05 7.0394e+05 2.4847e+00 0.116377 ## CityMPG 1 2.1856e+05 2.1856e+05 7.7150e-01 0.380714 ## HighwayMPG 1 2.1052e+05 2.1052e+05 7.4310e-01 0.389601 ## Weight 1 1.2563e+06 1.2563e+06 4.4344e+00 0.036341 * 5
## WheelBase 1 3.9621e+04 3.9621e+04 1.3990e-01 0.708785 ## Length 1 1.6483e+06 1.6483e+06 5.8179e+00 0.016673 * ## Width 1 2.2271e+06 2.2271e+06 7.8611e+00 0.005496 ** ## Residuals 223 6.3178e+07 2.8331e+05 ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 anova(partialm4, m4) ## Analysis of Variance Table ## ## Model 1: SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders + ## Horsepower + Weight + WheelBase + Length + Width ## Model 2: SuggestedRetailPrice ~ DealerCost + EngineSize + Cylinders + ## Horsepower + CityMPG + HighwayMPG + Weight + WheelBase + ## Length + Width ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 225 65387880 ## 2 223 63178392 2 2209488 3.8994 0.02165 * ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 The p-value is 0.02165, which is below 0.05, which means that we can reject the null hypothesis. Thus, we can conclude that the full model is in fact a better fit than the partial model, and that the MPG variables have a significant impact on the suggested retail price. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help