HW4

pdf

School

Boston University *

*We aren’t endorsed by this school

Course

555

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by jihobang96

#1) Form: Linear Direction: Positive association Strength: Strong There is a strong positive linear association between the variables Prestige and Education Correlation Coefficent: 0.8501769 #2) Conclusion Summary Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std.Error T value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434

Education.Level..years. 5.361 0.332 16.148 < 2e-16 Residual standard error: 9.103 on (100 DF) Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF p-value: < 2.2e-16 The model assumptions are linearity, independence, normality, and variance. The above scatter plot shows that the model meets the linearity. Also the data are collected from each individuals only once so the independence assumptions meets as well. The histogram shows slightly left skewed form but considering the overall deviation, it meets the normality assumption. Lastly, the variance assumption meets as well considering the residual scatter plot is not widely spread. Using IQR outlier detection function in R, there was no outliers. For the influence points, ID 24, 53, & 67 were found using the cooks.distance function in R, these points have an overall effect on the slope of the regression model.

#3) 1. H0: Predicts prestige from education, income, and workforce that are women are not significant predictors of prestige score H1: The beta for education is =! 0 and/or the beta for income is != 0 and/or the beta for percent of women is != 0. At least one of the slope coefficients is different than 0. Alpha Level: 0.05 2. F = Reg MS/ Res MS with 3 and 102(n)-3(k)-1 = 98 degrees of freedom 3. qf(.95, df1=3, df2 = 98) = 2.697423 4. F_stat <- modelSum$fstatistic[1] = 129.17 P-Value: < 2.2e-16 5. Reject H0 since 129.17≥ 2.697423. We have significant evidence at the α=0.05 that income, education, and percent of women when taken together are predictive of prestige. #4) Conclusions Summary Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std.Error T value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434 Education.Level..years. 5.361 0.332 16.148 < 2e-16 Residual standard error: 9.103 on (100 DF) Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF p-value: < 2.2e-16 Confidence interval: 3.936143 Residuals: Min 1Q Median 3Q Max -33.007 -8.378 -2.378 8.432 32.084 Coefficients:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Estimate Std.Error T value Pr(>|t|) (Intercept) 2.714e+01 2.268e+00 11.97 <2e-16 Education.Level..years. 2.897e-03 2.833e-04 10.22 <2e-16 Residual standard error: 12.09 on (100 DF) Multiple R-squared: 0.5111, Adjusted R-squared: 0.5062 F-statistic: 104.5 on 1 and 100 DF p-value: < 2.2e-16 Confidence interval: 3.936143 Residuals: Min 1Q Median 3Q Max -33.444 -12.391 -4.126 13.034 39.185 Coefficients: Estimate Std.Error T value Pr(>|t|) (Intercept) 48.69300 2.30760 21.101 <2e-16 Education.Level..years. -0.06417 0.05385 -1.192 0.236 Residual standard error: 17.17 on (100 DF) Multiple R-squared: 0.014, Adjusted R-squared: 0.004143 F-statistic: 1.42 on 1 and 100 DF p-value: 0.2362 The above results show that the percent of workforce that are women is not a significant independent variable in the model (low F-statistic value and p-value >0.05). Income and education have the greatest impact on the model with F-stat values greater than the confidence level and extremely low p-values. We can interpret the estimate(slope) of education as for every year of education we get a 5.361 increase in prestige. For income, for every income increase results in a 2.714e+01 increase in prestige. #5)

The model assumptions are linearity, independence, normality, and variance. The above scatter plot shows that the model meets the linearity. Also the data are collected from each individuals only once so the independence assumptions meets as well. The histogram is normally distributed. Lastly, the variance assumption meets as well considering the residual scatter plot is not widely spread. Outliers ID = 2,17,24,25,26 Influence Points ID = 2,20,24,27,29,53,54,67,82 R code [#1] > setwd("~/Desktop/BU/CS 555/HW") > data <- read.csv("cs555data4.csv", header=TRUE) > attach(data) > plot(Education.Level..years., Prestige.Score, main = "Prestige Score vs. Education") > cor(Education.Level..years., Prestige.Score) [1] 0.8501769 [#2] > reg.score.education <- lm(Prestige.Score ~ Education.Level..years.)

> reg.score.education Call: lm(formula = Prestige.Score ~ Education.Level..years.) Coefficients: (Intercept) Education.Level..years. -10.732 5.361 > summary(reg.score.education) Call: lm(formula = Prestige.Score ~ Education.Level..years.) Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434 ** Education.Level..years. 5.361 0.332 16.148 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.103 on 100 degrees of freedom Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF, p-value: < 2.2e-16 > residual <- resid(reg.score.education) > plot(Education.Level..years., residual, main = "Residual Plot") > abline(0,0) > hist(residual, main= "Histogram of Residuals") >plot(reg.score.education,4) [#3] > model <- lm(formula = Prestige.Score ~ (Education.Level..years. + Income .... + Percent.of.Workforce.that.are.Women), data=data) > summary(model) Call: lm(formula = Prestige.Score ~ (Education.Level..years. + Income .... + Percent.of.Workforce.that.are.Women), data = data) Residuals: Min 1Q Median 3Q Max -19.8246 -5.3332 -0.1364 5.1587 17.5045

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Coefficients: Estimate Std. Error t value (Intercept) -6.7943342 3.2390886 -2.098 Education.Level..years. 4.1866373 0.3887013 10.771 Income .... 0.0013136 0.0002778 4.729 Percent.of.Workforce.that.are.Women -0.0089052 0.0304071 -0.293 Pr(>|t|) (Intercept) 0.0385 * Education.Level..years. < 2e-16 *** Income .... 7.58e-06 *** Percent.of.Workforce.that.are.Women 0.7702 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7.846 on 98 degrees of freedom Multiple R-squared: 0.7982, Adjusted R-squared: 0.792 F-statistic: 129.2 on 3 and 98 DF, p-value: < 2.2e-16 > anova(model) Analysis of Variance Table Response: Prestige.Score Df Sum Sq Mean Sq F value Education.Level..years. 1 21608.4 21608.4 350.9741 Income .... 1 2248.1 2248.1 36.5153 Percent.of.Workforce.that.are.Women 1 5.3 5.3 0.0858 Residuals 98 6033.6 61.6 Pr(>F) Education.Level..years. < 2.2e-16 *** Income .... 2.739e-08 *** Percent.of.Workforce.that.are.Women 0.7702 Residuals --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > qf(0.95, df1=3, df2 =98) [1] 2.697423 > modelsum <- summary(model) > fstat <- modelsum$fstatistic[1] > fstat value 129.1917 [#4] > model1 <- lm(formula = Prestige.Score ~ Education.Level..years., data=data) > model2 <- lm(formula = Prestige.Score ~ Income .... , data=data) > model3 <- lm(formula = Prestige.Score ~ Percent.of.Workforce.that.are.Women, data=data)

> summary(model1) Call: lm(formula = Prestige.Score ~ Education.Level..years., data = data) Residuals: Min 1Q Median 3Q Max -26.0397 -6.5228 0.6611 6.7430 18.1636 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -10.732 3.677 -2.919 0.00434 ** Education.Level..years. 5.361 0.332 16.148 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.103 on 100 degrees of freedom Multiple R-squared: 0.7228, Adjusted R-squared: 0.72 F-statistic: 260.8 on 1 and 100 DF, p-value: < 2.2e-16 > summary(model2) Call: lm(formula = Prestige.Score ~ Income .... , data = data) Residuals: Min 1Q Median 3Q Max -33.007 -8.378 -2.378 8.432 32.084 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.714e+01 2.268e+00 11.97 <2e-16 *** Income .... 2.897e-03 2.833e-04 10.22 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 12.09 on 100 degrees of freedom Multiple R-squared: 0.5111, Adjusted R-squared: 0.5062 F-statistic: 104.5 on 1 and 100 DF, p-value: < 2.2e-16 > summary(model3) Call: lm(formula = Prestige.Score ~ Percent.of.Workforce.that.are.Women, data = data) Residuals: Min 1Q Median 3Q Max

-33.444 -12.391 -4.126 13.034 39.185 Coefficients: Estimate Std. Error t value (Intercept) 48.69300 2.30760 21.101 Percent.of.Workforce.that.are.Women -0.06417 0.05385 -1.192 Pr(>|t|) (Intercept) <2e-16 *** Percent.of.Workforce.that.are.Women 0.236 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.17 on 100 degrees of freedom Multiple R-squared: 0.014, Adjusted R-squared: 0.004143 F-statistic: 1.42 on 1 and 100 DF, p-value: 0.2362 > qf(0.95, df1 = 3, df2 = 98) [1] 2.697423 [#5] > residual <- resid(model) > hist(residual, main="Histogram of Residuals") > plot(fitted(model), resid(model), axes = TRUE, frame.plot=TRUE, xlab="Fitted Values", ylab= "Residual", main = "Model Residual by Fitted") > abline(h=0)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Homework Project 1.pdf

Quiz_-_2.pdf

Chapter 3 Boxplots HW.pdf

Chris_Killmer.docx

PHY 151 Lab M1.pdf

Week-2_Exercise---solution.pdf

Fall 22 (50 minute) Activity - Shark Teeth.docx

What do you think S vs Q questions mean.docx

Quiz 6.docx

Assignment #3.2.pdf

EconS_311_Lab_Five_Answers[AnnaBrainard].docx

STA5736takehomeexam.pdf

Recommended textbooks for you

Algebra & Trigonometry with Analytic Geometry

Algebra

ISBN:9781133382119

Author:Swokowski

Publisher:Cengage

College Algebra

Algebra

ISBN:9781305115545

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Algebra and Trigonometry (MindTap Course List)

Algebra

ISBN:9781305071742

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

College Algebra

Algebra

ISBN:9781938168383

Author:Jay Abramson

Publisher:OpenStax

SEE MORE TEXTBOOKS

Recommended textbooks for you

Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax