439HW5Sol_F23
pdf
keyboard_arrow_up
School
Washington University in St Louis *
*We aren’t endorsed by this school
Course
439
Subject
Mathematics
Date
Jan 9, 2024
Type
Pages
18
Uploaded by SanG12345489u3y78t34y85weriltu
MATH 439. Solutions HW 5.
Problem 1.
a)
> data(sat)
> reduced<-lm(math~I(expend+ratio)+salary,data=sat)
> full<-lm(math~expend+ratio+salary,data=sat)
> anova(reduced,full)
Analysis of Variance Table
Model 1: math ~ I(expend + ratio) + salary
Model 2: math ~ expend + ratio + salary
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
47 65343
2
46 64834 1
508.19 0.3606 0.5511
Since the p-value is not small enough, we fail to reject H0 and conclude the full
model is not significantly better than the reduced model.
b)
> data(sat)
> reduced<-lm(math~expend+I(ratio-salary),data=sat)
> full<-lm(math~expend+ratio+salary,data=sat)
> anova(reduced,full)
Analysis of Variance Table
Model 1: math ~ expend + I(ratio - salary)
Model 2: math ~ expend + ratio + salary
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
47 64976
2
46 64834 1
141.59 0.1005 0.7527
Since the p-value is not small enough, we fail to reject H0 and conclude the full
model is NOT significantly better than the reduced model.
Problem 2.
a)
We can then explicitly write the ellipse:
࠵?
"
!.#
= {(࠵?
!
, ࠵?
$
) ∈ ℝ
$
: 7.9727(5.62 − ࠵?
!
)
$
+ 91.1162(−1.327 − ࠵?
$
)
$
+2(12.5285)(5.62 − ࠵?
!
)(−1.327 − ࠵?
$
) ≤ 2(2.82)(2.49)}
b)
Boferroni adjustment consists of changing the alpha to alpha/L, where L is the
number of parameters we want to estimate jointly. In this case, L=2 and we need the
critical value corresponding to the level .1/2*2:
> qt(1-0.1/4,31)
[1] 2.039513
Then, the intervals are:
0.82 ± 2.0395=2.82(. 170)
→ (−0.592133, 2.232133)
−1.327 ± 2.0395=2.82(. 014)
→ (−1.732243, −0.9217574)
c)
> qt(1-.1/8,31)
[1] 2.355568
> qchisq(1-.1/8,31)
[1] 51.25556
> qchisq(.1/8,31)
[1] 16.07876
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Problem 3.
(a)
> X<-model.matrix(g)
> C<-solve(t(X)%*%X)
> D<-C[c(2,7),c(2,7)]
> solve(D)
x1
x6
x1 3902.45440 68.177148
x6
68.17715 7.457435
> (f<-qf(.95,2,18))
[1] 3.554557
> sigma(g)^2
[1] 10.41115
> summary(g)
Call:
lm(formula = y ~ ., data = table.b3)
Residuals:
Min
1Q Median
3Q
Max
-5.3441 -1.6711 -0.4486 1.4906 5.2508
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.339838 30.355375
0.571
0.5749
x1
-0.075588
0.056347 -1.341
0.1964
x2
-0.069163
0.087791 -0.788
0.4411
x3
0.115117
0.088113
1.306
0.2078
x4
1.494737
3.101464
0.482
0.6357
x5
5.843495
3.148438
1.856
0.0799 .
x6
0.317583
1.288967
0.246
0.8082
x7
-3.205390
3.109185 -1.031
0.3162
x8
0.180811
0.130301
1.388
0.1822
x9
-0.397945
0.323456 -1.230
0.2344
x10
-0.005115
0.005896 -0.868
0.3971
x11
0.638483
3.021680
0.211
0.8350
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.227 on 18 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.8355,
Adjusted R-squared: 0.7349
F-statistic: 8.31 on 11 and 18 DF, p-value: 5.231e-05
(b)
> g<-lm(y~.,data=table.b3)
> confint(g,level=1-0.05/2)
1.25 %
98.75 %
(Intercept) -56.87922409 91.558899681
x1
-0.21335587 0.062179526
x2
-0.28381101 0.145485669
x3
-0.10032074 0.330554961
x4
-6.08836036 9.077833615
x5
-1.85445496 13.541444016
x6
-2.83394765 3.469114242
x7
-10.80736493 4.396585230
x8
-0.13777439 0.499397229
x9
-1.18879757 0.392907146
x10
-0.01953174 0.009301153
x11
-6.74954265 8.026508114
(c)
> alpha=0.05
> delta=sqrt(2*qf(1-alpha,2,18))
> alpha.eff=2*(1-pt(delta,18))
> confint(g,level=1-alpha.eff)
0.787 %
99.213 %
(Intercept) -63.59646238 98.27613797
x1
-0.22582462 0.07464827
x2
-0.30323788 0.16491254
x3
-0.11981907 0.35005329
x4
-6.77467285 9.76414611
x5
-2.55116224 14.23815130
x6
-3.11917874 3.75434534
x7
-11.49538600 5.08460630
x8
-0.16660818 0.52823102
x9
-1.26037411 0.46448369
x10
-0.02083651 0.01060592
x11
-7.41820008 8.69516554
d)
Using the R-code below, we generate the plot below:
library(
ellipse
)
plot(ellipse(
g
,c(
2
,
7
)),
type
=
"l"
,
main
=
'Confidence Region'
)
points (coef(
g
)[
2
], coef(
g
)[
7
],
pch
=
18
)
# Regular CIs
abline (
v
=confint(
g
)[
2
,],
lty
=
2
)
abline (
h
=confint(
g
)[
7
,],
lty
=
2
)
# Bonferroni CFs
abline (
v
=confint(
g
,
level
=
1
-
0.05
/(
2
))[
2
,],
lty
=
3
,
col
=
'red'
)
abline (
h
=confint (
g
,
level
=
1
-
0.05
/(
2
))[
7
,],
lty
=
3
,
col
=
'red'
)
# Scheffe CFs
abline (
v
=confint(
g
,
level
=
1
-
alpha.eff
)[
2
,],
lty
=
4
,
col
=
'blue'
)
abline (
h
=confint (
g
,
level
=
1
-
alpha.eff
)[
7
,],
lty
=
4
,
col
=
'blue'
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Problem 4.
Constant Variance:
Strong NonConstant Variance:
Comments: Standard Residuals plot do show an increase in the variability around
the x-axis as the fitted value gets larger. The standardized residuals plot (3
rd
plot)
also indicates that sigma may be linearly dependent in x. Interesting is that the R2 is
relatively small, even though the p-value of significance of regression is quite small.
The QQ plot also shows some issues with normality, but this is superfluous since, as
we know, the errors are truly normally distributed.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Mild NonConstant Variance:
Comments: The standard residuals plot (first plot) do show an increase in the
variability as fitted value gets larger. The standardized residuals (3
rd
plot) also
shows that sigma is dependent in x. Interesting is that the R2 is now large as
opposed to the previous case. The QQ plot does not show significant issues with
normality.
Nonlinearity:
Comments: Standard Residuals plot do show the nonlinearity pattern of the
regression. The standardized residuals and QQ plots are Ok, as it should since there
is no issues with the error model assumptions here. Interesting is that the R2 and p
values are quite bad as expected.
Problem 5. Ex. 4.1 from MPV:
This is because there seems to be some linear correlation between the errors and
variable x2.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Note: For the last conclusion, the correlation between the errors and x2 seems mild
so, concluding that it won’t be improve significantly is fine as well.
Problem 6.
a)
b)
The code below find the hat matrix and the leverage of all the points:
> data(table.b3)
> mymodel<-lm(y~x1+x8,data=table.b3)
> X<-model.matrix(mymodel)
> H<-X%*%vcov(mymodel)%*%t(X)/sigma(mymodel)^2
> diag(H)
1
2
3
4
5
6
0.04170355 0.04239733 0.06255061 0.04253231 0.07502672 0.35532964
7
8
9
10
11
12
0.04400666 0.05657843 0.13464435 0.11491736 0.05241265 0.12545435
13
14
15
16
17
18
0.06743573 0.10948015 0.08119224 0.04022419 0.13969058 0.15875219
19
20
21
22
23
24
0.04824053 0.03393987 0.04400666 0.07890021 0.11171762 0.11491736
25
26
27
28
29
30
0.09149888 0.13050830 0.08650701 0.13012246 0.09450602 0.09926926
31
32
0.05504747 0.13648931
The point with the largest leverage is the point 6: (x1,x8,y)=(440.0, 184.5, 11.20).
The hmax=0.3553.
c)
d)
e)
When applying the Bonferroni correction, we compare r-student to the t critical
values with alpha/(2n) and still n-p-1. This value will turn to be bigger than the
critical value in c). Since the largest r-student does not exceed the t-value of c), it
won’t exceed either the t-value here. Therefore, we conclude that there is no
atypical observation.
f)
The R output for the leverage is
> # Verification
> lm.influence(mymodel)$hat
1
2
3
4
5
6
0.04170355 0.04239733 0.06255061 0.04253231 0.07502672 0.35532964
7
8
9
10
11
12
0.04400666 0.05657843 0.13464435 0.11491736 0.05241265 0.12545435
13
14
15
16
17
18
0.06743573 0.10948015 0.08119224 0.04022419 0.13969058 0.15875219
19
20
21
22
23
24
0.04824053 0.03393987 0.04400666 0.07890021 0.11171762 0.11491736
25
26
27
28
29
30
0.09149888 0.13050830 0.08650701 0.13012246 0.09450602 0.09926926
31
32
0.05504747 0.13648931T
This is the R output for the studentized residuals:
> rstandard(g)
1
2
3
4
5
6
7
8
9
0.57292819 -0.05055391 -0.61214921 0.37496156 -0.98650013 -0.71628401 -
0.22268216 0.04064724 1.79358391
10
11
12
13
14
15
16
17
18
0.43360011 -0.22048348 2.33686180 -1.37529946 -0.63292011 -2.27493766 -
0.52427197 1.50345414 0.72099394
19
20
21
22
23
24
25
26
27
0.20899578 -0.73433532 0.24323034 1.61637231 0.60065010 0.94509677
0.78404364 0.47043307 -1.17059787
28
29
30
31
32
0.38957467 -1.06239901 -1.27744319 -0.95100714 -0.08171735
This is the R output for the R-student residuals is:
> rstudent(g)
1
2
3
4
5
6
7
8
9
0.56617682 -0.04967684 -0.60542659 0.36933638 -0.98602805 -0.71013577 -
0.21899644 0.03994141 1.86910420
10
11
12
13
14
15
16
17
18
0.42744650 -0.21683051 2.54869186 -1.39772951 -0.62625234 -2.46623906 -
0.51761230 1.53847888 0.71489022
19
20
21
22
23
24
25
26
27
0.20551563 -0.72836698 0.23924408 1.66503209 0.59390910 0.94329960
0.77870451 0.46402495 -1.17841607
28
29
30
31
32
0.38380456 -1.06484892 -1.29210551 -0.94938803 -0.08030532
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Problem 7.
We can also look at the qq-normal plot.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help