STAT3032_004_HW6_S2023_Solution (Shared)

docx

School

University of Minnesota-Twin Cities *

*We aren’t endorsed by this school

Course

3032

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by JudgeOxide10008

STAT 3032 Regression and Correlated Data Homework 6 (Solution) Please show your work on each problem for full credit. A correct answer, unsupported by the necessary explanation , R code or output will receive very little if any credit. Your work needs to be organized in a reasonably neat and coherent way, and submitted as a pdf file on Canvas. Please do not share this handout outside the class. Problem 1 The Current Population Survey (CPS) is used to supplement census information between census years. The data file cps1985.csv contains a random sample of 534 persons from the CPS data collected in 1985, with information on wages and other characteristics of the workers. The variables we will use in the analyses are listed below: wage Wage (dollars per hour). age The age of the worker union Whether the worker has union membership. The possible values are “Yes” and “No”. Download the cps1985.csv data file from Canvas. Import the dataset into R and answer the following questions. (a)_[1 pt] Fit Model A ( wage ~ 1 + age + union) and provide the model summary. Solution: > modA = lm(wage~1+age+union,data = cps1985) > summary(modA) Call: lm(formula = wage ~ 1 + age + union, data = cps1985) Residuals: Min 1Q Median 3Q Max -8.862 -3.352 -1.187 1.977 36.929 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.09967 0.71624 8.516 < 2e-16 *** age 0.07009 0.01866 3.757 0.000191 *** unionYes 1.90745 0.56921 3.351 0.000862 *** ---

STAT 3032 Regression and Correlated Data Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.015 on 531 degrees of freedom Multiple R-squared: 0.05138, Adjusted R-squared: 0.04781 F-statistic: 14.38 on 2 and 531 DF, p-value: 8.282e-07 (b)_[2 pts] Interpret the slope of age in Model A in context. Solution: Holding the union membership constant, for every 1 year increase in age, the wage increases by 0.07009 dollars per hour on average. (c)_[2 pts] Interpret the slope of unionYes in Model A in context. Solution: Holding age constant, when compared to those without the union membership, the workers who have union membership earn 1.90745 dollars per hour more on average. Alternatively , Holding the age constant, the wage increases by 1.90745 dollars per hour on average when switching from not being in the union to being in the union. (d)_[1 pt] Fit the Model B ( wage ~ 1 + age + union + age:union) and provide the model summary. Solution: > modB = lm(wage~1+age+union+age:union,data = cps1985) > summary(m2) Call: lm(formula = wage ~ 1 + age + union + age:union, data = cps1985) Residuals: Min 1Q Median 3Q Max -8.123 -3.342 -1.167 1.879 37.137 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.60183 0.78091 7.173 2.48e-12 *** age 0.08385 0.02055 4.081 5.18e-05 *** unionYes 4.93788 1.99130 2.480 0.0135 * age:unionYes -0.07736 0.04872 -1.588 0.1129 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.008 on 530 degrees of freedom

STAT 3032 Regression and Correlated Data Multiple R-squared: 0.05587, Adjusted R-squared: 0.05053 F-statistic: 10.45 on 3 and 530 DF, p-value: 1.084e-06 (e)_[2 pts] Based on Model B, write down the equations of the fitted models ( wage ~ 1 + age ) for the workers with union membership and without union membership . Does age have a larger impact on the wage for the workers with union membership or without union membership? Please explain your answer. Hint: Compare the slopes of age in these two equations. Solution: Denote the expected wage with and without union membership as ^ wage U and ^ wage N , respectively. Then ^ wage U =( 5.60183 + 4.93788 )+( 0.08385 − 0.07736 ) age = 10.53971 + 0.00649 age ^ wage N = 5.60183 + 0.08385 age Since the slope is larger and positive for those without union membership, age has a larger impact on wage for workers without union membership. (f)_[2 pts] Interpret the coefficient of age in Model B in context. Hint: consider the workers without the union membership. Solution: For every 1 year increase in age of a worker without union membership , the wage increases by 0.08385 dollars per hour on average. Problem 2 In this problem, we continue to use the cps1985.csv data. The variables we will use in the analyses are listed below: wage Wage (dollars per hour). exper Number of years of work experience. sector Worker sector. The values are “clerical”, “const”, “manag”, “manuf”, “other”, “prof”, “sales”, and “service”. (a)_[1 pt] Fit Model C that uses exper to predict wage and provide the model summary. Solution: > modC = lm(wage~1 + exper,data = cps1985) > summary(modC) Call:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

STAT 3032 Regression and Correlated Data lm(formula = wage ~ 1 + exper, data = cps1985) Residuals: Min 1Q Median 3Q Max -8.247 -3.601 -1.111 2.332 36.084 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.37997 0.38895 21.545 <2e-16 *** exper 0.03614 0.01793 2.016 0.0443 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.124 on 532 degrees of freedom Multiple R-squared: 0.007579, Adjusted R-squared: 0.005714 F-statistic: 4.063 on 1 and 532 DF, p-value: 0.04433 (b)_[1 pt] Fit Model D that uses exper and sector (main effects only) to predict wage and provide the model summary. Solution: > modD = lm(wage~1+exper+sector,data = cps1985) > summary(modD) Call: lm(formula = wage ~ 1 + exper + sector, data = cps1985) Residuals: Min 1Q Median 3Q Max -12.011 -3.001 -0.945 1.924 32.679 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.51341 0.55243 11.791 < 2e-16 *** exper 0.05172 0.01643 3.147 0.00174 ** sectorconst 1.87394 1.14080 1.643 0.10105 sectormanag 5.25580 0.78286 6.714 4.94e-11 *** sectormanuf 0.49955 0.73441 0.680 0.49667 sectorother 1.19459 0.73445 1.627 0.10444 sectorprof 4.63452 0.65406 7.086 4.47e-12 *** sectorsales 0.12505 0.88767 0.141 0.88802 sectorservice -1.02039 0.69479 -1.469 0.14253 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.638 on 525 degrees of freedom Multiple R-squared: 0.1978, Adjusted R-squared: 0.1856 F-statistic: 16.18 on 8 and 525 DF, p-value: < 2.2e-16

STAT 3032 Regression and Correlated Data (c)_[1 pt] What would Model D look like in space? Please explain. Solution: Model D would be 8 parallel lines. Since the model contains one quantitative variable ( exper ), the shape will be a line. Since there is one categorical predictor variable ( sector ) with 8 levels, there will be 8 lines. Since there is no interaction, the lines are parallel to each other. (d)_[1 pt] What is the difference between the degrees of freedom of RSS in Model C and Model D? Your answer should be a number. Briefly explain why this is not 1. Solution: The difference between the degrees of freedom is 532 − 525 = 7 degrees of freedom. This is because the additional predictor variable in Model D ( sector ) requires 7 dummy variables and we need to use 7 degrees of freedom to estimate their slopes. (e)_[3 pts] Use the Partial F test to compare Model C and Model D. (i) [1 pt] What are the null and alternative hypotheses? Please define the parameters. (ii) [1 pt] What is the test statistic value? (iii) [1 pt] Based on the test result, do you prefer Model C and Model D? Please explain. You may use 0.05 as the significance level. Solution: (i) The hypotheses are H 0 : β 2 = β 3 = β 4 = β 5 = β 6 = β 7 = β 8 = 0 H A : at least 1 β j ≠ 0 for j = 2,3 ,…, 7,8 Where β 2 , ... , β 8 are the coefficients of the dummy variables for sector . OR: H 0 : wage~1 + exper H A : wage ~ 1 + exper + sector OR: H 0 : E ( wage )= β 0 + β 1 exper H A : E ( wage )= β 0 + β 1 exper + β 2 sectorconst + β 3 sectormanag + β 4 sectormanuf + β 5 sectorother + β 6 sectorprof + β 7 sectorsales + β 8 sectorservice (ii) From R, the anova output is > anova(modC,modD) Analysis of Variance Table

STAT 3032 Regression and Correlated Data Model 1: wage ~ 1 + exper Model 2: wage ~ 1 + exper + sector Res.Df RSS Df Sum of Sq F Pr(>F) 1 532 13970 2 525 11292 7 2678 17.787 < 2.2e-16 *** Thus, the test statistic value is F = 17.787 with a corresponding p value less than 2.2 × 10 − 16 , which is almost 0. (iii) Since the p value is less than the significance level, we reject the null and conclude there should be a different intercept for each level of sector , i.e., that model C is insufficient. Model D is preferred. (f)_[1 pt] Review the summary of Model D in Part (b). The coefficient table contains the results of the t tests for the coefficients ( H 0 : coefficient = 0 vs. H A : coefficient ≠ 0 ). Let’s focus on the 7 coefficients of the dummy variables of sector . Do all of these t tests reject the null hypotheses ( coefficient = 0 )? Please use 0.05 as the significance level. Solution: No, only the t tests for the slopes of sectormanag and sectorprof reject the null hypotheses. (g)_[2 pts] Explain in words why the conclusion of the Partial F test in Part (e) does not contradict with the results of the t tests in Part (f). Solution: The partial F test is a test of whether all of slopes of the dummy variables for sector are zero vs whether at least one is nonzero. We rejected the null hypothesis in the partial F test, which means that some slopes are not zero. This conclusion is consistent with the results of the t tests, since the t tests show that 2 slopes are significantly different from 0.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

20211202061330discussion_board_questions.docx

answer_1 (2).doc

Document98 (1).docx

Homework_11.docx

STAT3032_001_HW3_Solution_S2023.docx

STAT3032_HW9_Section001_S2023_Solution.docx

STAT3032_004_HW5_S2023_Solution (Shared).docx

Recommended textbooks for you

Algebra and Trigonometry (MindTap Course List)

Algebra

ISBN:9781305071742

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...

Algebra

ISBN:9780547587776

Author:HOLT MCDOUGAL

Publisher:HOLT MCDOUGAL

Trigonometry (MindTap Course List)

Trigonometry

ISBN:9781305652224

Author:Charles P. McKeague, Mark D. Turner

Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

Algebra for College Students

Algebra

ISBN:9781285195780

Author:Jerome E. Kaufmann, Karen L. Schwitters

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning