HW4

pdf

School

Boston University *

*We aren’t endorsed by this school

Course

555

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

9

Uploaded by jihobang96

Report
#1) Form: Linear Direction: Positive association Strength: Strong There is a strong positive linear association between the variables Prestige and Education Correlation Coefficent: 0.8501769 #2) Conclusion Summary Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std.Error T value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434
Education.Level..years. 5.361 0.332 16.148 < 2e-16 Residual standard error: 9.103 on (100 DF) Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF p-value: < 2.2e-16 The model assumptions are linearity, independence, normality, and variance. The above scatter plot shows that the model meets the linearity. Also the data are collected from each individuals only once so the independence assumptions meets as well. The histogram shows slightly left skewed form but considering the overall deviation, it meets the normality assumption. Lastly, the variance assumption meets as well considering the residual scatter plot is not widely spread. Using IQR outlier detection function in R, there was no outliers. For the influence points, ID 24, 53, & 67 were found using the cooks.distance function in R, these points have an overall effect on the slope of the regression model.
#3) 1. H0: Predicts prestige from education, income, and workforce that are women are not significant predictors of prestige score H1: The beta for education is =! 0 and/or the beta for income is != 0 and/or the beta for percent of women is != 0. At least one of the slope coefficients is different than 0. Alpha Level: 0.05 2. F = Reg MS/ Res MS with 3 and 102(n)-3(k)-1 = 98 degrees of freedom 3. qf(.95, df1=3, df2 = 98) = 2.697423 4. F_stat <- modelSum$fstatistic[1] = 129.17 P-Value: < 2.2e-16 5. Reject H0 since 129.17≥ 2.697423. We have significant evidence at the α=0.05 that income, education, and percent of women when taken together are predictive of prestige. #4) Conclusions Summary Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std.Error T value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434 Education.Level..years. 5.361 0.332 16.148 < 2e-16 Residual standard error: 9.103 on (100 DF) Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF p-value: < 2.2e-16 Confidence interval: 3.936143 Residuals: Min 1Q Median 3Q Max -33.007 -8.378 -2.378 8.432 32.084 Coefficients:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Estimate Std.Error T value Pr(>|t|) (Intercept) 2.714e+01 2.268e+00 11.97 <2e-16 Education.Level..years. 2.897e-03 2.833e-04 10.22 <2e-16 Residual standard error: 12.09 on (100 DF) Multiple R-squared: 0.5111, Adjusted R-squared: 0.5062 F-statistic: 104.5 on 1 and 100 DF p-value: < 2.2e-16 Confidence interval: 3.936143 Residuals: Min 1Q Median 3Q Max -33.444 -12.391 -4.126 13.034 39.185 Coefficients: Estimate Std.Error T value Pr(>|t|) (Intercept) 48.69300 2.30760 21.101 <2e-16 Education.Level..years. -0.06417 0.05385 -1.192 0.236 Residual standard error: 17.17 on (100 DF) Multiple R-squared: 0.014, Adjusted R-squared: 0.004143 F-statistic: 1.42 on 1 and 100 DF p-value: 0.2362 The above results show that the percent of workforce that are women is not a significant independent variable in the model (low F-statistic value and p-value >0.05). Income and education have the greatest impact on the model with F-stat values greater than the confidence level and extremely low p-values. We can interpret the estimate(slope) of education as for every year of education we get a 5.361 increase in prestige. For income, for every income increase results in a 2.714e+01 increase in prestige. #5)
The model assumptions are linearity, independence, normality, and variance. The above scatter plot shows that the model meets the linearity. Also the data are collected from each individuals only once so the independence assumptions meets as well. The histogram is normally distributed. Lastly, the variance assumption meets as well considering the residual scatter plot is not widely spread. Outliers ID = 2,17,24,25,26 Influence Points ID = 2,20,24,27,29,53,54,67,82 R code [#1] > setwd("~/Desktop/BU/CS 555/HW") > data <- read.csv("cs555data4.csv", header=TRUE) > attach(data) > plot(Education.Level..years., Prestige.Score, main = "Prestige Score vs. Education") > cor(Education.Level..years., Prestige.Score) [1] 0.8501769 [#2] > reg.score.education <- lm(Prestige.Score ~ Education.Level..years.)
> reg.score.education Call: lm(formula = Prestige.Score ~ Education.Level..years.) Coefficients: (Intercept) Education.Level..years. -10.732 5.361 > summary(reg.score.education) Call: lm(formula = Prestige.Score ~ Education.Level..years.) Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434 ** Education.Level..years. 5.361 0.332 16.148 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.103 on 100 degrees of freedom Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF, p-value: < 2.2e-16 > residual <- resid(reg.score.education) > plot(Education.Level..years., residual, main = "Residual Plot") > abline(0,0) > hist(residual, main= "Histogram of Residuals") >plot(reg.score.education,4) [#3] > model <- lm(formula = Prestige.Score ~ (Education.Level..years. + Income .... + Percent.of.Workforce.that.are.Women), data=data) > summary(model) Call: lm(formula = Prestige.Score ~ (Education.Level..years. + Income .... + Percent.of.Workforce.that.are.Women), data = data) Residuals: Min 1Q Median 3Q Max -19.8246 -5.3332 -0.1364 5.1587 17.5045
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Coefficients: Estimate Std. Error t value (Intercept) -6.7943342 3.2390886 -2.098 Education.Level..years. 4.1866373 0.3887013 10.771 Income .... 0.0013136 0.0002778 4.729 Percent.of.Workforce.that.are.Women -0.0089052 0.0304071 -0.293 Pr(>|t|) (Intercept) 0.0385 * Education.Level..years. < 2e-16 *** Income .... 7.58e-06 *** Percent.of.Workforce.that.are.Women 0.7702 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7.846 on 98 degrees of freedom Multiple R-squared: 0.7982, Adjusted R-squared: 0.792 F-statistic: 129.2 on 3 and 98 DF, p-value: < 2.2e-16 > anova(model) Analysis of Variance Table Response: Prestige.Score Df Sum Sq Mean Sq F value Education.Level..years. 1 21608.4 21608.4 350.9741 Income .... 1 2248.1 2248.1 36.5153 Percent.of.Workforce.that.are.Women 1 5.3 5.3 0.0858 Residuals 98 6033.6 61.6 Pr(>F) Education.Level..years. < 2.2e-16 *** Income .... 2.739e-08 *** Percent.of.Workforce.that.are.Women 0.7702 Residuals --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > qf(0.95, df1=3, df2 =98) [1] 2.697423 > modelsum <- summary(model) > fstat <- modelsum$fstatistic[1] > fstat value 129.1917 [#4] > model1 <- lm(formula = Prestige.Score ~ Education.Level..years., data=data) > model2 <- lm(formula = Prestige.Score ~ Income .... , data=data) > model3 <- lm(formula = Prestige.Score ~ Percent.of.Workforce.that.are.Women, data=data)
> summary(model1) Call: lm(formula = Prestige.Score ~ Education.Level..years., data = data) Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434 ** Education.Level..years. 5.361 0.332 16.148 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.103 on 100 degrees of freedom Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF, p-value: < 2.2e-16 > summary(model2) Call: lm(formula = Prestige.Score ~ Income .... , data = data) Residuals: Min 1Q Median 3Q Max -33.007 -8.378 -2.378 8.432 32.084 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.714e+01 2.268e+00 11.97 <2e-16 *** Income .... 2.897e-03 2.833e-04 10.22 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 12.09 on 100 degrees of freedom Multiple R-squared: 0.5111, Adjusted R-squared: 0.5062 F-statistic: 104.5 on 1 and 100 DF, p-value: < 2.2e-16 > summary(model3) Call: lm(formula = Prestige.Score ~ Percent.of.Workforce.that.are.Women, data = data) Residuals: Min 1Q Median 3Q Max
-33.444 -12.391 -4.126 13.034 39.185 Coefficients: Estimate Std. Error t value (Intercept) 48.69300 2.30760 21.101 Percent.of.Workforce.that.are.Women -0.06417 0.05385 -1.192 Pr(>|t|) (Intercept) <2e-16 *** Percent.of.Workforce.that.are.Women 0.236 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.17 on 100 degrees of freedom Multiple R-squared: 0.014, Adjusted R-squared: 0.004143 F-statistic: 1.42 on 1 and 100 DF, p-value: 0.2362 > qf(0.95, df1 = 3, df2 = 98) [1] 2.697423 [#5] > residual <- resid(model) > hist(residual, main="Histogram of Residuals") > plot(fitted(model), resid(model), axes = TRUE, frame.plot=TRUE, xlab="Fitted Values", ylab= "Residual", main = "Model Residual by Fitted") > abline(h=0)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help