F23_hw2
docx
keyboard_arrow_up
School
Rutgers University *
*We aren’t endorsed by this school
Course
225
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
7
Uploaded by LieutenantWildcat1908
Homework 2 (82 pts)
Data is accessible from the course website: Data and Resource > Data used in class> BikeProject.csv
Note:
1.
In the confidence interval problems, nota that components in a confidence interval include the point estimate, the critical value, the standard error, and the margin error. The result should be computed toward the end.
For example, compute a CI as 5
±
2
∗
4
=
5
±
8
=(−
3
,
13
)
2.
In computation problems, a basic rule is that you keep 3 or more significant decimal places for numbers during the working period and keep 2 or more significant decimal places at the number
reported at the end. 3.
In the fill-in-the-blank question, when denote or write the formula for a term, show both the general and the specific form based on the question
. For example, the critical value for a one-sided t-test, H
0
μ
=
0
, H
a
:
μ
>
0
is denoted by
t
(
1
−
α ,n
−
1
)
=
t
(
0.95
,
30
)
.
The test statistic, t
s
, can be computed with a formula t
s
=
Y
s
/
√
n
=
10
20
/
√
25
, and a value of 2.5, where the general form is t
s
=
Y
s
/
√
n
, and the specific form is 10
20
/
√
25
,
The p value can be computed with a formula Pr
(
t
>
t
s
|
μ
0
istrue
¿
=
Pr
(
t
>
2.5
,given μ
0
=
0
)
, where the general form is Pr
(
t
>
t
s
|
μ
0
istrue
¿
and the specific form is Pr
(
t
>
2.5
,given μ
0
=
0
)
,
Consider a simple linear regression Y ~ X, where X is the humidity and Y is the rental counts. The goal is to study the impact of X on Y.
1.
(28) Complete the confidence interval questions. (1 pt each blank, no partial credit)
a). (8) To estimate the mean response value of Y when X=0.5, the point estimate can be estimated as ^
Y
h
=
β
0
+
β
1
x
h
:
^
Y
i
=
378.88
−
303.59
(
0.5
)
:
(both the general formula and specific formula in this question)=_227.085 __(computed as this value). The standard error of this estimation is denoted _
S
{
^
Y
h
}
=
s
√
[
1
n
+
(
x
h
−
X
)
2
Σ
(
X
i
−
X
)
2
]
:171.66
√
[
1
17379
+
0.016187
646.84
]
(both the general formula and specific formula in this question)=_
1.559
_(computed as the value). At the significant level of 95%, the t-value is 1
denoted by t
(
1
−
α
2
;n
−
2
)
:
t
(
1
−
0.025
;
17379
−
2
)
both the general formula and specific formula in this question)=___1.9601__(computed as this value). b). (8) To predict the single response (the next observation value), the point estimate can be estimated as __
^
Y
h
=
β
0
+
β
1
x
h
:
^
Y
i
=
378.88
−
303.59
(
0.5
)
:
___(both the general formula and specific formula in this question)=_ 227.085_(computed as this value). The standard error of this estimation is denoted S
{
Pred
}
❑
=
s
√
[
1
n
+
(
x
h
−
X
)
2
Σ
(
X
i
−
X
)
2
+
1
]
:
171.66
√
[
1
17379
+
0.016187
646.84
+
1
]
(both the general formula and specific formula in this question)=_171.66 _(computed as the value). At the significant level of 95%, the t-value is denoted by __
t
(
1
−
α
2
;n
−
2
)
:
t
(
1
−
0.025
;
17379
−
2
)
_(both the general formula and specific formula in this question)=__1.9601___(computed as this value)
c). (8) To predict the mean of m responses (the average of the next m observation values), the point estimate can be estimated as _
^
Y
h
=
β
0
+
β
1
x
h
:
^
Y
i
=
378.88
−
303.59
(
0.5
)
:
(both the general formula and specific formula in this question)=_ 227.085 _(computed as this value, as m=3). The standard error of this estimation is denoted _
S
Predmean
2
=
s
√
[
1
n
+
(
x
h
−
X
)
2
Σ
(
X
i
−
X
)
2
+
1
m
]
:171.66
√
[
1
17379
+
0.016187
646.84
+
1
3
]
__(both the general formula and specific formula in this question)=_
99.12022_(computed as the value). At the significant level of 95%, the t-value is denoted by _
t
(
1
−
α
2
;n
−
2
)
:
t
(
1
−
0.025
;
17379
−
2
)
_(both the general formula and specific formula in this question)=_1.644 _(computed as this value). d). (2) Answer this question without computation, when estimate the mean response value X=0.6, the corresponding standard error at is _Bigger _(bigger than/smaller than/the same as) at X=0.5
, because___you are moving farther from the center mean of the data which is farther than the expected mean allowing for more data in between to fluxuate the estimate ____. e). (2) Answer this question without computation, when estimate the mean of 10 responses, the corresponding standard error at is _Smaller __(bigger than/smaller than/the same as) estimate the mean of 3 responses at the same X level, because __As you increase sample size SE decreases ___. 2
2.
(10 pts, no partial credit) (Compare the hypothesis test between the linear impact and linear correlation) Using the R-generated summary and ANOVA table for the model Y~X, answer the following questions. a) (6) For a two-sided hypothesis test on the linear impact, Ho
:
β
1
=
0
,H
a
:
β
1
≠
0
if a T-test is used, the test statistic is computed with the formula: _
b
1
S
¿
b
1
}∨
r
√
n
−
2
√
1
−
r
2
=
−
303.95
6.75
=
¿¿
, which is computed as _
±
44.982_(value); The critical value has the notation of _
t
(
1
−
α
2
;n
−
2
)
:
t
(
1
−
0.025
;
17379
−
2
)
___, and a value of. 1.9601_. The p-value of the test can be computed with the formula _
Pr
(
t
>
t
s
|
β
1
is true
¿
=
Pr
¿
___, and the value is _p = 2*10^-16 = ~0 _. b) (2pts) Verify your answer
by highlighting the corresponding p-values in the R output for the T-test .
c) (2) Adjust the HT components from a two-sided test to a one-sided test. Consider the one-sided HT Ho
:
β
1
=
0
,H
a
:
β
1
>
0
, the test statistic is the same as the two-sided test, but the p-value needs to be adjusted with the formula_
Pr
(
t
>
t
s
|
β
1
istrue
¿
=
Pr
¿
_ and computed as ; p = 2*10^-16 = ~0_in this question.
Changing Critical value t
(
1
−
α
2
;n
−
2
)
:
t
(
1
−
0.05
;
17379
−
2
)
=
1.6449
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.(5) For a two-sided hypothesis test on the significance of the linear correlation coefficient between X and Y, H
0
:
ρ
=
0
,H
a
:
ρ≠
0
1)
(2) if a T-test is used, the test statistic is computed with the sample correlation, r ,
with the formula: ___
r
√
n
−
2
√
1
−
r
2
:
0.3229
√
17379
−
2
√
1
−
0.1043
_____, which is computed as _ ±
44.982____(value). 2)
(1) Is this test statistic the same as the t-test in question 2a) ____YES___(Y/N). (2) Discuss when the results of the hypothesis test on the linear impact and the linear association are equivalent. They are equivlant when they are comparing it to 0. Linear impact is the effect x has on y while linear association is how well the predicted line firsts the data. There is a relationship between the two because if there is a linear impact of x and y then default there is some sort of linear association since the line is a good predictor. 4.
(4) Use R to compute a 95% confidence interval for the linear correlation coefficient between Y and X. Use the confidence interval to verify the hypothesis test in 3, i.e., how to use the confidence interval to conclude the hypothesis problem in question 3; and then compare the conclusion to that in question 3). We are 95% confindent that the linear corerelation cofficeint between Bikeproject Count and Bikeproject humity is beween -0.3361 and -0.3095. This means there is a negative correlation between count and humidity. This means that the higher the humidity the lower the count of bike rentals there is. As we can see the value of r falls in the condince interveral and the t statistice is the same. 5
. (21) (Compare the ANOVA F-test and the T-test on the significance of a SLR model) We know that both ANOVA F test and T-test can be used to address the significance of the linear impact of X on Y,
H
0
:
β
1
=
0
, H
a
:
β
1
≠
0.
We have completed the T-test in the previous questions, now complete an ANOVA F-test.
Note: the significance of a linear model
. 4
In the simple linear regression model (SLR) with only one X, the test on the significance linear impact is equivalent to the test on the significance of the linear model
. In the multiple linear regression model (MLR) with multiple Xs, the test on individual linear impact cannot imply the significance of the linear model. The significance of the linear model, H
0
:
β
1
=
β
2
…
=
β
k
=
0
, H
a
,at least one β is different ,
can be tested with the Global F test, or the ANOVA F test on the entire model, F
s
=
MSR
/
MSE
a)
(12) Do a global ANOVA F test for the significance of the SLR model. The significance of the SLR model can be defined in symbols:
H
0
β
1
=
0
,Ha
:
β
1
≠
0
Or define in the following statements. H
0
:
The predictor X has no impact on Y in a linear model Y~X. H
0
:
The predictor X can be dropped from the linear model Y~X.
SST
can be computed with the formula: _
Σ
(
Y
i
−
Y
)
2
_, and a value of _
571761591_, the degree of freedom is computed with the formula: __n-1______, and a value of _17378_. SSE
can be computed with the formula: _
Σ
(
Y
i
−
^
Y
i
)
2
_, and a value of 512143240_, the degree of freedom is computed with the formula: ___n-2___, and a value of _17377__. SSR
can be computed with the formula: __
Σ
(
Y
−
^
Y
i
)
2
__, and a value of __
59618351__, the degree of freedom is computed with the formula: ____1_____, and a value of __17379_. b) (4) Compute the test statistic for the F test, F
s
.
(2) The formula is ____
F
s
=
MS R
MSE
=
59618351
29472.48
__ and has a value of _2022.848___. (2) How is it related to the t
s
in question 3a) and 3b)?
This is related to t
s
because t
s
2
=
F
s
meaning square root of 2022.848 = 44.976 which is the t statistic. 5
c) (2) the critical value of the F-test can be denoted by _
F
(
1
−
α ;
1
,n
−
2
)
=
F
(
1
−
0.05
;
1
,
17379
−
2
)
__, and has a value of _3.841_. d) (3) Compute the p-value for the F-test with the formula __
Pr
(
F
>
F
s
|
β
1
istrue
¿
=
Pr
¿
___, and the value is _p > 2*10^-16 ~ 0
_. It is __Same as ___(same as / different from) the p-value for the t-test in question 3
6 (14) (Compare the ANOVA F test and the GLT test on the significance of the SLR model) consider a General Linear Test (GLT) on the impact of X on Y
a) (2) General lineal test (GLT) compares two models that establish under Ho and Ha. Specifically, the full model is established under _Ha__(Ho/Ha), and the reduced model is established under _Ho__(Ho/Ha)
b) (4) The total error in the full model, SSE(Full) or SSE(F) has a value of _
∑
(
y
i
−
^
Y
i
)
2
= 512143240 _and a degree of freedom of __n-2 = 17379-1 = 17378__ The total error in the reduced model, SSE(Reduced) or SSE(R) has a value of _
∑
(
y
i
−
y
i
)
2
=
571761591
__ and a degree of freedom of __n-1 = 17379-2 = 17378__
c). Discuss the connection between the Global F test and the GLT F test in the following perspectives. 1). (2) The null and alternative hypothesis Global F test: Null hypothesis (Ho) typically tests if all regression coefficients are equal to zero and the alternative hypothesis (Ha) is that at least one of the coefficients is not equal to zero (at least one predictor has an effect).
GLT F test: The null hypothesis (Ho) in a GLT F test can be more specific, targeting a particular subset of predictors or specific linear combinations of them. The alternative hypothesis (Ha) is then that the coefficients of interest are not equal to zero.
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2). (2) The test statistic
In both tests, the test statistic is an F-statistic, which measures the ratio of the explained variance to the
unexplained variance. For ANOVA F test, it measures the overall significance of the model, while for GLT F test, it focuses on the significance of specific linear combinations of predictors.
3). (4) Situations when the two methods are equivalent, and situations when only GLT T test is appropriate. Global F test is used for testing entire model and it is the same aas the GLT T test when your testing for
the full model. GLT T test can be only used for Subset of a full model analysis. When your testing a hypothesis related to significance to slope coefficient both F test and GLT wil give you same reults. But GLt can only be used If you have a more complex SLR model or want to test hypotheses involving linear combinations of parameters beyond just the slope coefficient - testing whether a combination of coefficients is equal to zero- then a GLT would be more appropriate. This allows you to perform a wider
range of hypothesis tests within the same modeling framework.
7
Related Documents
Recommended textbooks for you
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL