final.exam.204_1.sol - Copy
pdf
keyboard_arrow_up
School
McGill University *
*We aren’t endorsed by this school
Course
204
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
19
Uploaded by EarlBeaver3892
McGill University
Faculty of Science
Final examination
Principles of Statistics II
Math 204
INSTRUCTIONS
1. The seven questions have to be answered in the exam booklets provided
2. The total possible number of points for the exam is 180.
3. This is a closed book exam. One 8 1/2”
×
11” double sided crib sheet is allowed.
4. Calculators (both programmable and non-programmable) are permitted.
5. Use of a regular dictionary is permitted.
6. Use of a translation dictionary is permitted.
This exam comprises the cover page, eight pages of questions and output, with questions numbered 1 to
7, and five pages of statistical tables.
Math 204 Final Exam
Page 2
1. (10 pts) A particular measure of ceramic strength was obtained for two different batches of ceramic
material, with 10 random samples collected from each batch. The sample statistics for each batch
are contained in the table below. Test the hypothesis that the
population
standard deviation for the
first batch is larger than the
population
standard deviation for the second batch with Type I error
rate
α
= 0
.
05.
Batch
# of Samples
Mean
Standard deviation
Min.
Max.
1
10
671.08
71.68
518.65
751.67
2
10
610.4
56.06
531.37
747.54
2. (15 points) A study was designed to evaluate the effects of an herbal remedy, Echinacea purpurea, on
upper respiratory infections (URI) in children. Children with URI, aged 2 to 11 years, were assigned
to receive either echinacea or placebo (parents did not know the assignment) and then followed up
after recovering from the illness.
Parents were then asked to rate their child’s severity of illness
as mild, moderate, or severe. The results of the study are contained in the table below. Test the
hypothesis that there is an association between the treatment variable and the parental assessment
of severity. Use a Type I error rate of 0.10.
Group
Parental assessment
Echinacea
Placebo
Mild
153
170
Moderate
128
157
Severe
48
40
Answer:
In order to test for an association between the two factors, we can use a chi-square test of
independence. Our null hypothesis is that the two factors are independent, the alternative hypothesis
is that they are dependent. We can compute the expected value for the (
i, j
)
-
th
cell under a null
hypothesis of independence as
E
ij
=
n
i
·
n
·
j
n
,
giving the following table of expected values:
Expected values under
H
0
Group
Parental assessment
Echinacea
Placebo
Total
Mild
152.7
170.3
323
Moderate
134.7
150.3
285
Severe
41.6
46.4
88
Total
329
367
696
The test statistic for this problem is:
X
2
=
3
X
i
=1
2
X
j
=1
(
O
ij
-
E
ij
)
2
E
i
j
= (153
-
152
.
7)
2
/
152
.
7 + (170
-
170
.
3)
2
/
170
.
3 + (128
-
134
.
7)
2
/
134
.
7
+ (157
-
150
.
3)
2
/
150
.
3 + (48
-
41
.
6)
2
/
41
.
6 + (40
-
46
.
4)
2
/
46
.
4
= 2
.
50
Math 204 Final Exam
Page 3
We reject
H
0
for
X
2
larger than
χ
2
0
.
10
,
(3
-
1)(2
-
1)
=
χ
2
0
.
10
,
2
= 4
.
61, so we do not reject
H
0
and we
cannot conclude that the two factors are associated (or dependent).
3. (25 pts) A certain suspect garage in the Plateau was suspected of insurance fraud by an insurance
company.
The insurance company took 10 damaged cars that had been serviced by the suspect
garage to a more trusted garage and had a second damage estimate completed. Here are the damage
estimates for the 10 automobiles at the two garages:
Car
Suspect Garage
Trusted Garage
1
1375
1250
2
1550
1300
3
1250
1250
4
1300
1200
5
900
950
6
1500
1575
7
1750
1600
8
3600
3300
9
2250
2125
10
2800
2600
(a) (10 points) Conduct a sign test to determine whether the suspect garage is charging higher
estimates of damage at Type I error
α
= 0
.
05).
Suspect Garage
Trusted Garage
Difference
1375
1250
125
1550
1300
250
1250
1250
0
1300
1200
100
900
950
-50
1500
1575
-75
1750
1600
150
3600
3300
300
2250
2125
125
2800
2600
200
(b) (10 points) Conduct a signed rank test to test the same hypothesis in part (a) (again at
α
= 0
.
05).
Do you come to the same conclusion?
Answer:
For the Wilcoxon signed rank test, we have to take absolute values for the differences
and then rank them (excluding the 0 difference):
Suspect Garage
Trusted Garage
Diff
Abs(Diff)
Rank
1375
1250
125
125
4.5
1550
1300
250
250
8
1250
1250
0
0
0
1300
1200
100
100
3
900
950
-50
50
1
1500
1575
-75
75
2
1750
1600
150
150
6
3600
3300
300
300
9
2250
2125
125
125
4.5
2800
2600
200
200
7
We then sum the ranks for the negative differences to obtain
T
-
and sum the ranks of the positive
differences to obtain
T
+
. Therefore,
T
-
= 1+2 = 3 and
T
+
= 3+4
.
5+4
.
5+6+7+8+9 = 42. This
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 4
means that
T
0
=
min
(
T
-
, T
+
) = 3. We see from the table for the Wilcoxon signed rank statistic
that for
α
= 0
.
05 we would reject for
H
0
for
T
0
<
= 8. Therefore, using this more sensitive test,
we are able to reject
H
0
and conclude that there suspect garage has higher median estimate for
cars.
(c) (5 points) Briefly state one reason why you may want to use one of the non-parametric tests in
parts (a) or (b) instead of a paired t-test.
Answer:
We would want to use the non-parametric tests instead of a t-test if we believed that
the distribution of the differences was not normal.
4. (30 pts) Three types (labelled A, B, and C) of soil preparation were each randomly installed in plots
at four different locations (labelled 1, 2, 3 and 4), i.e. each type of preparation was installed at a
random plot at each of the four locations. The researcher measured the growth of seedlings planted
in each of the 12 plots and constructed the following ANOVA table (although some of the cells are
missing):
Df
Sum Sq
Mean Sq
F value
Soil Prep
48.667
Location
51.333
Residuals
Total
11
108.667
(a) (10 pts) Write down the ANOVA table above in your exam booklet, correctly filling in the
missing cells.
Answer:
Df
Sum Sq
Mean Sq
F value
Soil Prep
2
48.667
24.33
16.9
Location
3
51.333
17.11
11.88
Residuals
6
8.667
1.44
Total
11
108.667
(b) (5 pts) Using your answer to part (a), is there evidence to indicate mean differences in growth
between the soil preparations at a significance level of
α
= 0
.
05?
Answer:
The null hypothesis is that the means in the soil preparation groups are the same.
Using the F-table, we find that the rejection value for the test is
F
0
.
05
,
2
,
6
= 5
.
14.
Therefore,
because 16.9 is larger than 5.14, we are able to reject
H
0
and conclude that there is a difference
mean growth between the soil preparations.
(c) (5 pts) Name the experimental design that was used.
Answer:
The design used was a randomized block design.
(d) (5 pts) Construct the
one-way
ANOVA table that compares the three brands of soil treatment,
ignoring
the location factor.
Answer:
Because the SST and Total SS stay the same whether you include the blocks or not,
we can easily recompute the table by moving the Location degrees of freedom and SS to the
Error row:
Df
Sum Sq
Mean Sq
F value
Soil Prep
2
48.667
24.33
3.65
Residuals
9
60
6.67
Total
11
108.667
(e) (5 pts) Using your answer to part (d), would you come to the same conclusion as in part (b)?
Why or why not?
Math 204 Final Exam
Page 5
Answer:
In this case, we have that the reject value would be
F
2
,
9
= 4
.
26, so we would NOT
reject
H
0
, so we would not come to the same conclusion as in part (b).
The reason is that
without accounting for the heterogeneity between blocks, we have a larger MSE and therefore
the differences in means between the preparations are less statistically significant.
5. (30 pts) Some researchers believed that the iron content of food could be affected by the type of pot
used to cook the food. The researchers conducted a study using three different kinds of Ethiopian
cookware: iron, clay, and aluminum pots. They randomly selected 12 pots of each kind for the study
(yielding 36 pots in total). They randomly assigned each pot to cook one of three different types of
food (meat, legumes, or vegetables) in a completely randomized, balanced two factor design. The
food was cooked for the same amount of time in each case and the iron content of the food was then
measured.
(a) (5 points) List the different treatments for this experiment, identify the experimental unit and
determine the number of experimental units assigned to each treatment for this design.
Answer:
There are three kinds of pots and three kinds of food, so there are 9 different treat-
ments: (Iron:Meat, Iron:Legumes, Iron:Vegetables, Clay:Meat, Clay:Legumes, Clay:Vegetables,
Alum:Meat, Alum:Legumes, Alum:Vegetables).
The experimental unit is a single pot and so
there are 4 pots assigned to each treatment.
(b) (20 points) The two-way ANOVA table for the data and diagnostic plots for the model (Figure 1,
next page) are below. Conduct a complete analysis of variance for the model below and clearly
state your conclusions. Conduct all hypothesis tests at
α
= 0
.
01. Be sure to state and assess
validity of your assumptions for the model.
> iron.model = aov(iron~type*food)
> summary(iron.model)
Df
Sum Sq Mean Sq F value
Pr(>F)
type
2 24.8940 12.4470
92.263 8.531e-13 ***
food
2
9.2969
4.6484
34.456 3.699e-08 ***
type:food
4
2.6404
0.6601
4.893
0.004247 **
Residuals
27
3.6425
0.1349
Answer:
The first hypothesis we test is the null hypothesis of no interaction. From the table, we
see that the F-statistic for this hypothesis test is 4.893 with a corresponding p-value of 0.004.
Therefore, at
α
= 0
.
01, we would clearly reject
H
0
and conclude that there is an interaction
between the type of food and the type of pot in terms of how each factor associates with the
response. Therefore, the differences in mean iron level between pots depends on the kind of food.
The assumptions for the analysis of variance model are that the model errors should be indepen-
dent and normally distributed with mean 0 and constant variance, i.e. the variance should be
the same across treatments. Looking at a plot of the data, we see that of the 36 pots, we have
a single outlying residual in the Clay:Meat group. This might be cause for concern. Otherwise,
the residuals look roughly normally distributed. The variance in the residuals by treatment is
more of a cause for concern as there seem to be a couple of groups with very large variances in
the residuals, so this would make us suspicious of our results.
(c) (5 points) Could you conclude from the results in part (b) that a single type of pot would have,
on average, higher iron levels for all three kinds of food? Why or why not?
Answer:
You could not conclude this because of the presence of the interaction term and the
figure which shows that the Iron pots do not have a higher mean for all foods.
Math 204 Final Exam
Page 6
204-Russ/question6plots.pdf
1.5
2.0
2.5
3.0
3.5
4.0
4.5
type
mean of iron
Aluminum
Clay
Iron
Aluminum
Clay
Iron
A : l
A : m
A : v
C : l
C : m
C : v
I : l
I : m
I : v
-3
-2
-1
0
1
2
Factor level combination (Type:Food)
Standardized residuals
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
-3
-2
-1
0
1
2
Standardized residuals
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
-2
-1
0
1
2
-3
-2
-1
0
1
2
Normal Q-Q Plot
Theoretical Quantiles
Sample Quantiles
Standardized residuals
stdres(iron.model)
Frequency
-3
-2
-1
0
1
2
0
2
4
6
8
10
12
Figure 1: Diagnostic plots for Question 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 7
204-Russ/kidcigs.pdf
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
15
20
25
30
35
40
1.5
2.0
2.5
3.0
3.5
4.0
Number of cigarettes sold (per capita)
Deaths per 100K population from kidney cancer
Figure 2: Plot of data for Question 6
6. (20 points) Fraumeni (1968, Journal of the National Cancer Institute) collected data from 43 states
and the District of Columbia on the number of cigarettes sold per capita and deaths per 100K people
from various kinds of cancer. The figure above (Figure 2) is a plot of the number of cigarettes sold
and the number of deaths per 100K people from kidney cancer for each of the 44 observations. The
regression model output for this data follows.
> kidney.model1 = lm(KID~CIG)
> summary(kidney.model1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.66359
0.32020
5.196 5.63e-06 ***
CIG
0.04539
0.01255
3.617 0.000792 ***
---
Signif. codes:
0
^
O***
~
O 0.001
^
O**
~
O 0.01
^
O*
~
O 0.05
^
O.
~
O 0.1
^
O
~
O 1
Residual standard error: 0.4586 on 42 degrees of freedom
Multiple R-squared: 0.2375,Adjusted R-squared: 0.2194
F-statistic: 13.09 on 1 and 42 DF,
p-value: 0.0007922
(a) (6 points) Test for a linear association between the number of cigarettes sold per capita and the
number of kidney cancer deaths per 100K people with Type I error rate
α
= 0
.
05.
Answer:
We can use the t-test for the slope coefficient in the model above to test
H
0
:
β
1
= 0
vs.
H
a
:
β
1
6
= 0. We have a t-statistic of 3.617 and a p-value of 0.000792, so we clearly reject
H
0
at
α
= 0
.
05 and conclude that there is strong evidence of a linear association.
(b) (5 points) What is the sample correlation between the number of deaths due to kidney cancer
per 100K and the number of cigarettes sold per capita?
Answer:
The sample correlation can be found by taking the square root of the
R
2
value, or
√
0
.
2375 = 0
.
487.
Math 204 Final Exam
Page 8
(c) (9 points) State the model assumptions that are necessary for your conclusions in part (b) to
be valid. Assess the appropriateness of those assumptions using the figure above (Figure 2), the
figure on the next page (Figure 3), and/or the output for the linear regression fit.
Answer:
For the simple linear regression model we assume that the model errors are indepen-
dent, mean 0 Normal random variables with equal variance
σ
2
. From the diagnostic plots, we
detect a slight positive skew to the residuals that seem to indicate that a non-normal distribu-
tion. There are two moderate outliers, although not too serious on the right hand side of the
distribution. There is a possibility of a non-linear trend on the far right of the distribution of
cigarettes, but it’s not clear whether those are a couple of outlying points or a real indication
of a quadratic relationship.
Math 204 Final Exam
Page 9
204-Russ/kidneydiagplots.pdf
●
●
-2
-1
0
1
2
3
Standardized residuals from kidney.model1
Histogram of stdres(kidney.model1)
Standardized residuals from kidney.model1
Frequency
-2
-1
0
1
2
3
0
2
4
6
8
10
12
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2.4
2.6
2.8
3.0
3.2
3.4
3.6
-2
-1
0
1
2
3
Fitted values from kidney.model1
Standardized residuals from kidney.model1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
-2
-1
0
1
2
-2
-1
0
1
2
3
Q-Q normal plot for kidney.model1
Theoretical Quantiles
Sample Quantiles
Figure 3: Diagnostic plots for Question 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 10
7. (50 points) It is assumed that wages will rise with experience (or length of service, LOS). A random
sample of 60 women working in Indiana banks was taken. LOS is measured in months of experience
and wages are yearly total income divided by number of weeks worked. The size of the bank where
each woman worked was also measured and banks were classified into two different categories: Large
and Small.
Four different regression models were fit to the data. The regression output for these four models is
contained on the next page.
los
indicates the LOS variable,
size
is the size of the bank.
(a) (7 points) Using the output for
model1
, estimate the mean wage for a woman with LOS equal
to 60 months. Provide an approximate 95% confidence interval for your estimate.
Hint: use the
standard error of
ˆ
β
1
OR the standard deviation of
los
to find
S
XX
.
Answer:
We would estimate the mean wage for a woman with LOS = 60 using:
ˆ
y
=
ˆ
β
0
+
β
LOS
×
60
= 44
.
21 + 0
.
07
×
60
= 48
.
41
The 95% confidence interval for the mean can be constructed using the following, noting that
S
XX
=
s
2
x
(59) = 157884 or
S
XX
=
s
2
/s.e.
(
ˆ
β
1
) = 11
.
98
2
/
(0
.
03015
2
) = 157884:
ˆ
y
±
2
s
r
1
n
+ (
x
0
-
¯
x
)
2
/SS
XX
= 48
.
41
±
2
×
11
.
98
r
1
60
+ (60
-
70
.
48)
2
/
157884
= 48
.
41
±
3
.
16
= (45
.
25
,
51
.
57)
(b) (4 points) Interpret the value of the two slope coefficients in
model3
.
Answer:
The slope coefficient for
LOS
indicates that for each one unit increase in LOS we
observe a 0.08417 increase in the mean wage
after
adjusting for the association of size with the
wage. The slope coefficient for
size
indicates that the Small bank workers make, on average,
10.22 less than the large bank workers,
after
adjusting for LOS.
(c) (4 points) Using the output for
model4
, give a prediction for the wages for a woman with LOS
of 60 months who is working at a large bank.
You do not need to provide a prediction
interval.
Answer:
Here we need to simply calculate:
ˆ
y
= 49
.
54 + 0
.
056
×
60
= 52
.
9
(d) (5 points) Interpret the value of the interaction coefficient in
model4
.
Answer:
The interaction coefficient indicates that we estimate that the rate of change in mean
wage with respect to los is 0.04828 larger for small banks than it is for large banks. That is, the
regression line for modelling wage as a function of LOS is slightly steeper for small banks than
it is for large banks.
Math 204 Final Exam
Page 11
(e) (5 points) Using
only
the values of
R
2
and adjusted
R
2
for
model3
and
model4
, explain which
of these two models should be preferred.
Answer:
We should use adjusted
R
2
to choose a model, as it penalizes for complexity. We see
that according to adjusted
R
2
, we would choose model 3 as it has a value that is slightly higher
(0.2303 vs. 0.2267).
(f) (8 points) Using the output for
model4
, test the hypothesis that the association between LOS
and wages depends on the size of the bank with Type I error
α
= 0
.
01. State your conclusion.
Answer:
The null hypothesis for this test is that
H
0
:
β
LOS
:
Size
= 0. We see from the R output
that the t-statistic is 0.857, which corresponds to a p-value of 0.395. Therefore, we cannot reject
H
0
at
α
= 0
.
01 and therefore we would not conclude that there was an interaction.
(g) (6 points) Using
forward
step-wise regression and the output for all four models, choose an
appropriate model for the data using F-tests and
α
= 0
.
05.
Answer:
Here are the steps for the forwards stepwise regression:
i. Our first comparison in forward is to attempt to put one coefficient into the model, i.e.
compare
model1
and
model2
to a model with no covariates. Of course, this can be done using
the overall F-test (or equivalent the slope t-tests) for each model. We see that both
model1
and
model2
provide significant improvement, but
model2
provides more improvement (based
on the F-statistic/p-value), so therefore we choose
model2
, i.e. we first add
size
.
ii. Our second comparison would be to see whether we should add
LOS
to the model including
size, i.e. comparing
model2
to
model3
. This requires a test of nested hypotheses where the
complete model is
model3
and the reduced model is
model2
. We see here that the F-statistic
is:
F
=
(7920
.
8
-
6816
.
6)
/
1
1119
.
59
= 9
.
24
Compared to an
F
0
.
05
,
1
,
57
rejection value (which can be approximated using
F
0
.
05
,
1
,
40
= 2
.
45
from the table), we would clearly reject the reduced model and add
LOS
to the model.
iii. Our last comparison is to see whether we should add the interaction
LOS:Size
to the model,
i.e. compare
model3
to
model4
. Again we can use the nested F-test only now
model3
is the
reduced model and
model4
is the complete model:
F
=
(6816
.
6
-
6728
.
3)
/
1
120
.
15
= 0
.
734
Compared again to our conservative rejection value of 2.45, we would NOT reject the
reduced model and would stop the procedure, having selected
model3
.
(h) (6 points) Using
backward
step-wise regression and the output for all four models, choose an
appropriate model for the data using F-tests and
α
= 0
.
05.
Answer:
Here are the steps for the backwards stepwise regression:
i. In this case we being with the complete model as
model4
and the reduced model as
model3
.
We know from the previous part that we cannot reject
model3
in favor of
model4
, so we
would choose
model3
as the new complete model and drop the interaction term.
Math 204 Final Exam
Page 12
ii. Now we need to test for whether we can drop either of the two covariates. We know from
the previous part, that we would reject choosing
model2
over
model3
on the basis of the
F-test, i.e. we would not choose to drop LOS from the model because the F-test indicated
that it should be included. Therefore, we need only to see if we should drop Size from the
model, i.e. compare
model3
(complete) to
model1
(reduced):
F
=
(8322
.
9
-
6816
.
6)
/
1
119
.
59
= 12
.
599
And again, we clearly see that
model3
provides significant improvement over
model1
, so we
would not drop Size either. Therefore, backwards stepwise regression also chooses
model3
.
(i) (5 points) Are your model selected in parts (g) and (h) the same model? Will this always be
the case? Explain your answer.
Answer:
Yes, both models are the same. No, this would not always be the case. We could have
that a variable is not statistically significant enough to be added in during a forward stepwise
regression that terminates at a certain point, but that the variable is too important to be deleted
during a backwards stepwise regression and another variable is left in. This would happen due
to
multicollinearity
between the regressors.
> describe(los)
var
n
mean
sd median trimmed
mad min max range skew kurtosis
se
1
1 60 70.48 51.73
60
62.69 44.48
7 228
221 1.33
1.42 6.68
> summary(size)
Large Small
35
25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 13
## Model 1
> summary(model1)
Call:
lm(formula = wages ~ los)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.21281
2.62824
16.822
<2e-16 ***
los
0.07310
0.03015
2.425
0.0185 *
---
Signif. codes:
0
^
O***
~
O 0.001
^
O**
~
O 0.01
^
O*
~
O 0.05
^
O.
~
O 0.1
^
O
~
O 1
Residual standard error: 11.98 on 58 degrees of freedom
Multiple R-squared: 0.09202,Adjusted R-squared: 0.07637
F-statistic: 5.878 on 1 and 58 DF,
p-value: 0.01847
> anova(model1)
Analysis of Variance Table
Response: wages
Df Sum Sq Mean Sq F value
Pr(>F)
los
1
843.5
843.51
5.8782 0.01847 *
Residuals 58 8322.9
143.50
### Model 2
> model2 = lm(wages~size)
> summary(model2)
Call:
lm(formula = wages ~ size)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
53.216
1.975
26.94
< 2e-16 ***
sizeSmall
-9.242
3.060
-3.02
0.00375 **
---
Residual standard error: 11.69 on 58 degrees of freedom
Multiple R-squared: 0.1359,Adjusted R-squared: 0.121
F-statistic: 9.121 on 1 and 58 DF,
p-value: 0.003754
> anova(model2)
Analysis of Variance Table
Response: wages
Df Sum Sq Mean Sq F value
Pr(>F)
size
1 1245.6 1245.60
9.1208 0.003754 **
Residuals 58 7920.8
136.57
---
Math 204 Final Exam
Page 14
> model3 = lm(wages~los + size)
> summary(model3)
Call:
lm(formula = wages ~ los + size)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
47.69407
2.59207
18.400
< 2e-16 ***
los
0.08417
0.02770
3.039 0.003582 **
sizeSmall
-10.22840
2.88197
-3.549 0.000782 ***
---
Residual standard error: 10.94 on 57 degrees of freedom
Multiple R-squared: 0.2564,Adjusted R-squared: 0.2303
F-statistic: 9.825 on 2 and 57 DF,
p-value: 0.0002157
> anova(model3)
Analysis of Variance Table
Response: wages
Df Sum Sq Mean Sq F value
Pr(>F)
los
1
843.5
843.51
7.0535 0.0102409 *
size
1 1506.3 1506.35 12.5961 0.0007823 ***
Residuals 57 6816.6
119.59
# Model 4
> model4 = lm(wages~los*size)
> summary(model4)
Call:
lm(formula = wages ~ los * size)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
49.54532
3.37887
14.663
< 2e-16 ***
los
0.05595
0.04307
1.299
0.19925
sizeSmall
-13.63087
4.90998
-2.776
0.00747 **
los:sizeSmall
0.04828
0.05634
0.857
0.39511
---
Residual standard error: 10.96 on 56 degrees of freedom
Multiple R-squared: 0.266,Adjusted R-squared: 0.2267
F-statistic: 6.764 on 3 and 56 DF,
p-value: 0.0005667
> anova(model4)
Analysis of Variance Table
Response: wages
Df Sum Sq Mean Sq F value
Pr(>F)
los
1
843.5
843.51
7.0206 0.0104534 *
size
1 1506.3 1506.35 12.5374 0.0008115 ***
los:size
1
88.2
88.24
0.7344 0.3951072
Residuals 56 6728.3
120.15
Math 204 Final Exam
Page 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 16
Math 204 Final Exam
Page 17
Math 204 Final Exam
Page 18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 204 Final Exam
Page 19