Stats-101A-HW-5
pdf
keyboard_arrow_up
School
University of California, Los Angeles *
*We aren’t endorsed by this school
Course
101A
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
7
Uploaded by ucladsp
Stats 101A HW 5
Ian Zhang UID: 205702810
2023-05-05
Question 4
rse
<-
2.418
df
<-
33
r2
<-
.
7254
fstat
<-
87.17
RSS
<-
df * rseˆ
2
RSS
## [1] 192.9419
SSreg
<-
fstat * (RSS/df)
SSreg
## [1] 509.6589
meanSSreg
<-
SSreg / df
meanSSreg
## [1] 15.44421
totalSS
<-
SSreg + RSS
totalSS
## [1] 702.6008
r
<-
sqrt(SSreg / totalSS)
r
## [1] 0.8516977
Question 1
arm
<-
read.csv(
"armspans2022_gender.csv"
)
mean(arm$is.female)
## [1] 0.3478261
m1
<-
lm(armspan ~ is.female,
data =
arm)
summary(m1)
##
## Call:
## lm(formula = armspan ~ is.female, data = arm)
##
## Residuals:
1
##
Min
1Q
Median
3Q
Max
## -9.7586 -2.0248
0.2414
2.2414
8.2414
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept)
69.7586
0.7399
94.284
< 2e-16 ***
## is.female
-7.7338
1.2408
-6.233 1.68e-07 ***
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
##
## Residual standard error: 3.984 on 43 degrees of freedom
##
(1 observation deleted due to missingness)
## Multiple R-squared:
0.4746, Adjusted R-squared:
0.4624
## F-statistic: 38.85 on 1 and 43 DF,
p-value: 1.676e-07
plot(armspan ~ is.female,
data =
arm)
0.0
0.2
0.4
0.6
0.8
1.0
55
60
65
70
75
is.female
armspan
b) the
intercept is 69.758, which is the estimated mean armspan of males.
c)
the slope is -7.7338, which means that the difference in the estimated mean armspan of males and
females is 7.7338. this means that females armspans are 7.7338 shorter than male armspans on average
d)
the t statistic and p value is testing if there is a difference in mean armspans between males and
females. The null hypothesis would be that the slope = 0, which means that there is no difference. The
alternative would be that the slope != 0. The p-value for the slope is 1.68e-7, which means that we can
reject the null hypothesis, meaning that there is a significant difference between the mean armspans
between males and females
Question 2
iowa
<-
read.delim(
"iowatest.txt"
)
temp
<-
ifelse(iowa$City==
"Iowa City"
,
1
,
0
)
iowa$is.iowa
<-
temp
2
m2
<-
lm(Test ~ is.iowa,
data =
iowa)
plot(Test ~ is.iowa,
data =
iowa)
abline(m2)
0.0
0.2
0.4
0.6
0.8
1.0
20
30
40
50
60
70
80
90
is.iowa
Test
summary(m2)
##
## Call:
## lm(formula = Test ~ is.iowa, data = iowa)
##
## Residuals:
##
Min
1Q
Median
3Q
Max
## -29.353
-9.353
-0.353
7.647
31.647
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept)
49.353
1.347
36.626
< 2e-16 ***
## is.iowa
14.705
3.769
3.902 0.000152 ***
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
##
## Residual standard error: 14.51 on 131 degrees of freedom
## Multiple R-squared:
0.1041, Adjusted R-squared:
0.09727
## F-statistic: 15.22 on 1 and 131 DF,
p-value: 0.000152
Looking at the mean of all the cities that aren’t Iowa City, we see that the mean test score is 49.353, while
the mean test score for Iowa City is 49.353 + 14.705 = 64.058, from observation alone, this shows that Iowa
City has a higher mean test score than the other cities. Additionally, if you look at the p-value (0.000152),
this is less than 0.05, which means that we can reject the null hypothesis that the difference in means is 0,
allowing us to conclude that Iowa City does have higher test scores than other cities.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 3
m3
<-
lm(Test ~ Poverty,
data=
iowa)
plot(Test ~ Poverty,
data =
iowa)
abline(m3)
0
20
40
60
80
100
20
30
40
50
60
70
80
90
Poverty
Test
summary(m3)
##
## Call:
## lm(formula = Test ~ Poverty, data = iowa)
##
## Residuals:
##
Min
1Q
Median
3Q
Max
## -27.2812
-6.2097
0.5058
4.8252
22.3610
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.60578
1.61325
46.25
<2e-16 ***
## Poverty
-0.53578
0.03262
-16.43
<2e-16 ***
## ---
## Signif. codes:
0
'
***
'
0.001
'
**
'
0.01
'
*
'
0.05
'
.
'
0.1
' '
1
##
## Residual standard error: 8.766 on 131 degrees of freedom
## Multiple R-squared:
0.6731, Adjusted R-squared:
0.6707
## F-statistic: 269.8 on 1 and 131 DF,
p-value: < 2.2e-16
From the regression line, we can see that there is a strong linear association between test scores and poverty
— as poverty increases, test scores decrease. The line appears to be a good fit to the data, and as we look at
the summary, the p value for the slope of the line is <2e-16, which means that we are able to reject the null
hypothesis of 0 slope, and we can assume that the slope is not 0, meaning that there is a linear association.
4
Question 4
plot(m3)
20
30
40
50
60
70
-30
-20
-10
0
10
20
Fitted values
Residuals
lm(Test ~ Poverty)
Residuals vs Fitted
70
43
47
-2
-1
0
1
2
-3
-2
-1
0
1
2
3
Theoretical Quantiles
Standardized residuals
lm(Test ~ Poverty)
Normal Q-Q
70
47
43
5
20
30
40
50
60
70
0.0
0.5
1.0
1.5
Fitted values
Standardized residuals
lm(Test ~ Poverty)
Scale-Location
70
47
43
0.00
0.01
0.02
0.03
0.04
0.05
-3
-2
-1
0
1
2
3
Leverage
Standardized residuals
lm(Test ~ Poverty)
Cook's distance
Residuals vs Leverage
47
90
7
When
looking at the residuals vs fitted plot, we can see that first, there is no clear pattern or trend in the residual
plot. The points look like they are scattered randomly. There also isn’t a fan shape, which supports constant
variance and ultimately validates the model.
When looking at the qq-norm plot, we can see that the points are linear. The qq-norm plot tells us that the
normal distribution condition is not violated, as the points follow the straight line upwards.
When looking at the scale-location plot, we can determine that it does not violate the constant variance
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
condition. Since the points are randomly scattered and there is no clear trend and the red line is basically
horizontal and the values are equally spread around the line, this proves that the model is valid.
Ultimately, based off of the 3 residual plots, we can determine that neither the constant variance or normal
distribution conditions were violated and the model is valid.
Question 5
leverage
<-
hatvalues(m3)
leverage[leverage == max(leverage)]
##
27
## 0.04997855
#row 27
highlev
<-
leverage > (
4
/
133
)
standardRes
<-
rstandard(m3)
standardRes[highlev] < -
2
##
7
27
46
64
67
89
109
120
126
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
standardRes[highlev] >
2
##
7
27
46
64
67
89
109
120
126
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Since there are no high leverage points that are < -2 or > 2, so there are no bad leverage points in this data.
Question 6
The f test in question 3 tests whether there is a significant relationship between poverty and test scores.
The null hypothesis is that the slope of the regression line is 0, and the alternate hypothesis is that it !=
0. If the slope of the line is 0, that means there is not a linear association, whereas if it isn’t 0, then there
would be a relationship between the 2. The p-value for this test is 2.2e-16, which is less than 0.05, thus we
have significant evidence to reject the null hypothesis. This means that we can conclude there is a linear
association between poverty and test scores.
7
Related Documents
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage