Project-1

pdf

School

Foothill College *

*We aren’t endorsed by this school

Course

103

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

10

Uploaded by ElderFreedomSeahorse

Report
Project 1 | Group 76 Joey Liu, Noemi Loera, Dale Kim 2023-10-15 library (AER) ## Loading required package: car ## Loading required package: carData ## Loading required package: lmtest ## Loading required package: zoo ## ## Attaching package: ’zoo’ ## The following objects are masked from ’package:base’: ## ## as.Date, as.Date.numeric ## Loading required package: sandwich ## Loading required package: survival data ( "MurderRates" ) summary (MurderRates $ income) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.760 1.550 1.830 1.781 2.070 2.390 library (lmtest) library (leaps) library (HH) ## Loading required package: lattice ## Loading required package: grid ## Loading required package: latticeExtra 1
## Loading required package: multcomp ## Loading required package: mvtnorm ## Loading required package: TH.data ## Loading required package: MASS ## ## Attaching package: ’TH.data’ ## The following object is masked from ’package:MASS’: ## ## geyser ## Loading required package: gridExtra ## ## Attaching package: ’HH’ ## The following objects are masked from ’package:car’: ## ## logit, vif summary (MurderRates $ income) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.760 1.550 1.830 1.781 2.070 2.390 hist (MurderRates $ income, main = "Income Histogram" ) 2
Income Histogram MurderRates$income Frequency 1.0 1.5 2.0 0 2 4 6 8 10 boxplot (MurderRates $ income) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1.0 1.5 2.0 5 regsub <- regsubsets (rate ~ ., method = "exhaustive" , nbest = 2 , data = MurderRates) summaryHH (regsub) ## model p rsq rss adjr2 cp bic stderr ## 1 s 2 0.588 353 0.578 18.40 -31.4 2.90 ## 2 n 2 0.560 377 0.550 22.27 -28.6 2.99 ## 3 n-s 3 0.675 278 0.659 8.01 -38.1 2.61 ## 4 t-n 3 0.641 308 0.623 12.88 -33.7 2.74 ## 5 t-n-s 4 0.705 253 0.683 5.81 -38.6 2.51 ## 6 c-n-s 4 0.698 259 0.675 6.84 -37.5 2.54 ## 7 c-t-n-s 5 0.729 232 0.701 4.43 -38.5 2.44 ## 8 t-i-n-s 5 0.711 247 0.682 6.89 -35.8 2.52 ## 9 c-t-i-n-s 6 0.736 226 0.701 5.38 -35.9 2.44 ## 10 c-e-t-n-s 6 0.730 231 0.695 6.25 -34.9 2.47 ## 11 c-t-i-l-n-s 7 0.744 219 0.703 6.22 -33.5 2.43 ## 12 c-e-t-i-n-s 7 0.740 223 0.697 6.88 -32.7 2.46 ## 13 c-e-t-i-l-n-s 8 0.746 218 0.696 8.00 -30.0 2.46 ## ## Model variables with abbreviations ## model ## s southernyes ## n noncauc ## n-s noncauc-southernyes ## t-n time-noncauc 4
## t-n-s time-noncauc-southernyes ## c-n-s convictions-noncauc-southernyes ## c-t-n-s convictions-time-noncauc-southernyes ## t-i-n-s time-income-noncauc-southernyes ## c-t-i-n-s convictions-time-income-noncauc-southernyes ## c-e-t-n-s convictions-executions-time-noncauc-southernyes ## c-t-i-l-n-s convictions-time-income-lfp-noncauc-southernyes ## c-e-t-i-n-s convictions-executions-time-income-noncauc-southernyes ## c-e-t-i-l-n-s convictions-executions-time-income-lfp-noncauc-southernyes ## ## model with largest adjr2 ## 11 ## ## Number of observations ## 44 south <- lm (rate ~ time + noncauc + southern, data = MurderRates) summary (south) ## ## Call: ## lm(formula = rate ~ time + noncauc + southern, data = MurderRates) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5143 -1.5177 -0.1934 1.4288 6.7320 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.419658 1.232067 3.587 0.000901 *** ## time -0.014227 0.007091 -2.006 0.051600 . ## noncauc 16.256184 4.719997 3.444 0.001358 ** ## southernyes 3.548793 1.204250 2.947 0.005333 ** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 2.514 on 40 degrees of freedom ## Multiple R-squared: 0.7049, Adjusted R-squared: 0.6828 ## F-statistic: 31.85 on 3 and 40 DF, p-value: 1.092e-10 #Ask for clarification when comparing the two The Rˆ2 for (3) is 0.7459. When compared to (south), it is 0.7049. (3) has a higher Rˆ2 which indicates that (3) is a better fit of the overall model. However, this could explain (south) to majorly be the explanatory variable. #Ask for clarification when comparing the two 6. region <- ifelse (MurderRates $ southern == "yes" , 1 , 0 ) newReg <- lm (rate ~ time + noncauc + region, data = MurderRates) plot (south, main = "Plot of South" ) 5
2 4 6 8 10 12 14 -4 -2 0 2 4 6 8 Plot of South Fitted values Residuals lm(rate ~ time + noncauc + southern) Residuals vs Fitted 1 37 28 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
-2 -1 0 1 2 -2 -1 0 1 2 3 Plot of South Theoretical Quantiles Standardized residuals lm(rate ~ time + noncauc + southern) Q-Q Residuals 1 37 28 7
2 4 6 8 10 12 14 0.0 0.5 1.0 1.5 Plot of South Fitted values Standardized residuals lm(rate ~ time + noncauc + southern) Scale-Location 1 37 28 8
0.00 0.05 0.10 0.15 0.20 0.25 -2 -1 0 1 2 3 Plot of South Leverage Standardized residuals lm(rate ~ time + noncauc + southern) Cook's distance 0.5 0.5 1 Residuals vs Leverage 1 43 35 We can see that there is a pattern in the residuals, and they are portraying a cone shape. Further indicating that the regression is heteroskedastic. This implies that it is violating Assumption 3 of multiple linear models: homoskedasticity. 7. The Reset Hypothesis Test; H 0 : β 4 = 0 and β 5 = 0 versus H 1 : β 4 ̸ = 0 and/or β 5 ̸ = 0 resettest (newReg) ## ## RESET test ## ## data: newReg ## RESET = 1.1266, df1 = 2, df2 = 38, p-value = 0.3347 Since the p-value is 0.3347, it is greater than the significant value α = 0.05. This implies that we fail to reject the null, that our model is correctly specified. 8. reg2 <- lm (rate ~ convictions + executions + time + income + lfp + noncauc + region, data = MurderRates) #GQ Test to test for Heteroskedasticity gqtest (reg2, point = 0.5 , alternative= "greater" , order.by = MurderRates $ convictions) 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## ## Goldfeld-Quandt test ## ## data: reg2 ## GQ = 0.32275, df1 = 14, df2 = 14, p-value = 0.9787 ## alternative hypothesis: variance increases from segment 1 to 2 #Ask which X(explanatory variable) corresponds with order.by Since the p-value of the GQ Test is 0.9787, we fail to reject that the variances differ from the two sub-models, implying that it is heteroskedastic. 10