Assignment 4 (5%)

pdf

School

Western University *

*We aren’t endorsed by this school

Course

2143

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by CommodoreJayMaster

4/30/2021 Assignment 4 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/8ccd0319-b998-403d-b26f-afc4e841838e/Sol… 1/5 Assignment 4 (5%) Instructions Submit one PDF document per team with the names and student numbers of all members. The assignment is due Monday, April 12 (10:00PM), and to be submitted via Gradescope. In this assignment you will perform linear regression analysis to identify the underlying drivers of gun violence in the United States (US). The data set contains socio-economic factors by state in 2014, 2015, 2016 which were selected in accordance to the main talking points of the US gun debate, consistently with previous peer-reviewed studies. The dataset “project4_data” contains observations for the following: State Year Rate: firearm mortality rate per 100,000 total population (Centers for Disease Control and Prevention) Density: population density by square mile (US Census Bureau) Unemployment: yearly average of monthly unemployment data (St. Louis FRED) Answer each of the questions below with full sentences accompanied by reproducible code from the software of your choice (e.g. Excel, RStudio, Python, WolframAlpha). Report answers with software precision. # Import data and load packages data <- read.csv("~/ss2143/project4/project4_data.csv") attach (data) Question 1 (4 points): Estimate the following linear regression model where the response variable is the firearm mortality rate (per 100,000 total population), and the explanatory variable is the unemployment rate. Report the estimated regression coefficients (1 point) and their standard errors (1 point). Estimate the variance of the residuals (1 point). Calculate the coefficient of determination of the model (1 point). Answer A regression summary is printed below # Fit regression model with R function 'lm' model1 <- lm(Rate ~ Unemployment, data = data) # Print regression summary summary(model1)

4/30/2021 Assignment 4 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/8ccd0319-b998-403d-b26f-afc4e841838e/Sol… 2/5 ## ## Call: ## lm(formula = Rate ~ Unemployment, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.421 -2.772 -0.270 3.311 10.684 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.0670 1.6699 6.028 1.26e-08 *** ## Unemployment 0.4351 0.3170 1.373 0.172 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.62 on 148 degrees of freedom ## Multiple R-squared: 0.01257, Adjusted R-squared: 0.005899 ## F-statistic: 1.884 on 1 and 148 DF, p-value: 0.1719 # Number of observations in the data nobs <- nrow(data) # Number of parameters in the model nparam <- length(model1$coefficients) # Error sum of squares SSE <- sum(model1$residuals^2) # Estimate of the variance of the residuals sigma2 <- SSE/(nobs-nparam) # Total sum of squares SST <- sum((Rate-mean(Rate))^2) # Coefficient of determination R2 <- (SST-SSE)/SST We find that { 10.0669936, 0.4351143 } with corresponding standard errors { 1.6699275, 0.3169844 }. An estimate of the variance of the residuals can be computed as the error sum of squares divided by the number of degrees of freedom. The error sum of squares is The number of degrees of freedom is defined as the number of observations used to estimate the model parameters minus the number of estimated parameters (including the intercept). We obtain 21.3444756 . To calculate the coefficient of determination, we compute the total sum of squares as . The coefficient of determination is then 0.0125711 . Question 2 (2 points): Interpret the signs and the values of the coefficients (1 point). Are they significantly different than zero at the 95% confidence level (1 point)? Answer

4/30/2021 Assignment 4 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/8ccd0319-b998-403d-b26f-afc4e841838e/Sol… 3/5 The intercept is 10.0669936 and the slope coefficient is 0.4351143 . This means that the expected firearm mortality rate when there is 0% unemployment is 10.0669936 per 100,000 total population . For every percent increase in unemployment, the firearm mortality rate increases by 0.4351143 per 100,000 total population . The intercept is statistically significant at the 95% confidence level, but the effect of unemployment is not significant with a p-value of 0.172. Question 3 (4 points): Estimate a second linear regression model where the explanatory variables is the population density. Report the estimated regression coefficients (1 point) and their standard errors (1 point). Estimate the variance of the residuals (1 point). Calculate the coefficient of determination of the model (1 point). # Fit regression model with R function 'lm' model2 <- lm(Rate ~ Density, data = data) # Print regression summary summary(model2) ## ## Call: ## lm(formula = Rate ~ Density, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.1988 -2.6815 0.0911 2.8339 8.9613 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 14.452857 0.403426 35.825 < 2e-16 *** ## Density -0.012755 0.001514 -8.422 2.96e-14 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.823 on 148 degrees of freedom ## Multiple R-squared: 0.324, Adjusted R-squared: 0.3194 ## F-statistic: 70.94 on 1 and 148 DF, p-value: 2.956e-14 # Number of parameters in the new model nparam <- length(model2$coefficients) # Error sum of squares SSE <- sum(model2$residuals^2) # Estimate of the variance of the residuals sigma2 <- SSE/(nobs-nparam) # Coefficient of determination R2 <- (SST-SSE)/SST We compute the variance of the residuals and the coefficient of determination in the same way as in Question 1. We find = 14.6124008 and 0.3240075 .

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4/30/2021 Assignment 4 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/8ccd0319-b998-403d-b26f-afc4e841838e/Sol… 4/5 Estimate the same model without an intercept term. Report the estimated regression coefficient (1 point) and its standard error (1 point). Estimate the variance of the residuals (1 point). Calculate the coefficient of determination of the model without an intercept (1 point). # Fit regression model with R function 'lm' model3 <- lm(Rate ~ Density - 1, data = data) # Print regression summary summary(model3) ## ## Call: ## lm(formula = Rate ~ Density - 1, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -16.865 5.867 9.776 13.970 23.376 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## Density 0.021620 0.003631 5.954 1.81e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 11.85 on 149 degrees of freedom ## Multiple R-squared: 0.1922, Adjusted R-squared: 0.1867 ## F-statistic: 35.44 on 1 and 149 DF, p-value: 1.807e-08 # Number of parameters in the new model nparam <- length(model3$coefficients) # Error sum of squares SSE <- sum(model3$residuals^2) # Estimate of the variance of the residuals sigma2 <- SSE/(nobs-nparam) # Total sum of squares # Computed differently when there is no intercept term SST <- sum((Rate)^2) # Coefficient of determination R2 <- (SST-SSE)/SST We compute the variance of the residuals and the coefficient of determination in the same way as in Question 1. We find = 140.3822823 and 0.1921677 . Question 4 (5 points): Using your answers for Question 3, discuss the differences between the model with and without an intercept term. Answer Your answer should include the following elements, or something along those lines.

4/30/2021 Assignment 4 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/8ccd0319-b998-403d-b26f-afc4e841838e/Sol… 5/5 The sign of the slope coefficient changed from negative to positive when the intercept was removed. This may be indicative that one of the two models is ill designed. In this case, the model without an intercept is not adequate as the fitted line does not align with the scatter plot. (1 point) We find that the first coefficient of determination is higher than the second because the first model includes an intercept term. The intercept term adds flexibility to the model, which makes it easier to capture variability in the response variable (1 point). The variance of the residuals is much higher in the model with no intercept. This can be seen as evidence that the first model performs better (1 point). Do not say something that is completely wrong. (2 points)