C2M2_peer_reviewed
pdf
keyboard_arrow_up
School
Poolesville High *
*We aren’t endorsed by this school
Course
DTSA5003
Subject
Statistics
Date
Jun 3, 2024
Type
Pages
12
Uploaded by MateComputerAntelope99
C2M2_peer_reviewed
May 19, 2024
1
C2M2: Peer Reviewed Assignment
1.0.1
Outline:
The objectives for this assignment:
1. Utilize contrasts to see how different pairwise comparison tests can be conducted.
2. Understand power and why it’s important to statistical conclusions.
3. Understand the different kinds of post-hoc tests and when they should be used.
General tips:
1. Read the questions carefully to understand what is being asked.
2. This work will be reviewed by another human, so make sure that you are clear and concise
in what your explanations and answers.
2
Problem 1: Contrasts and Coupons
Consider a hardness testing machine that presses a rod with a pointed tip into a metal specimen
with a known force. By measuring the depth of the depression caused by the tip, the hardness of
the specimen is determined.
Suppose we wish to determine whether or not four different tips produce different readings on a
hardness testing machine. The experimenter has decided to obtain four observations on Rockwell C-
scale hardness for each tip. There is only one factor - tip type - and a completely randomized single-
factor design would consist of randomly assigning each one of the 4×4=16 runs to an experimental
unit, that is, a metal coupon, and observing the hardness reading that results. Thus, 16 different
metal test coupons would be required in this experiment, one for each run in the design.
[8]:
tip
<-
factor(rep(
1
:
4
, each
= 4
))
coupon
<-
factor(rep(
1
:
4
, times
= 4
))
y
<-
c(
9.3
,
9.4
,
9.6
,
10
,
9.4
,
9.3
,
9.8
,
9.9
,
9.2
,
9.4
,
9.5
,
9.7
,
9.7
,
9.6
,
10
,
10.2
)
hardness
<-
data
.
frame(y, tip, coupon)
hardness
1
A data.frame: 16 × 3
y
tip
coupon
<dbl>
<fct>
<fct>
9.3
1
1
9.4
1
2
9.6
1
3
10.0
1
4
9.4
2
1
9.3
2
2
9.8
2
3
9.9
2
4
9.2
3
1
9.4
3
2
9.5
3
3
9.7
3
4
9.7
4
1
9.6
4
2
10.0
4
3
10.2
4
4
2.0.1
1. (a) Visualize the Groups
Before we start throwing math at anything, let’s visualize our data to get an idea of what to expect
from the eventual results.
Construct interaction plots for
tip
and
coupon
using ggplot(). Be sure to explain what you can
from the plots.
[14]:
# Your Code Here
library(dplyr)
library(ggplot2)
hardness
%>%
ggplot(aes(x
=
tip, y
=
y, fill
=
coupon))
+
geom_boxplot()
+
labs(x
=
"Tip Type"
,
y
=
"Hardness"
,
fill
=
"Coupon Type"
,
title
=
"Boxplot of Hardness by Tip Type and Coupon Type"
)
hardness
%>%
ggplot(aes(y
=
y, x
=
tip))
+
geom_smooth(method
=
"lm"
, se
=
TRUE, aes(group
=
coupon, color
=
coupon))
+
labs(x
=
"Tip Type"
,
color
=
"Coupon Type"
,
y
=
"Hardness"
,
title
=
"Linear Models for Tip type Vs. Hardness"
)
2
hardness
%>%
ggplot(aes(y
=
y, x
=
tip))
+
geom_line(aes(group
=
coupon, color
=
coupon))
+
labs(x
=
"Tip Type"
,
color
=
"Coupon Type"
,
y
=
"Hardness"
,
title
=
"Line Plots of Tip type Vs. Hardness, group by Coupon Type"
)
hardness
%>%
ggplot(aes(x
=
tip, y
=
y))
+
geom_point(aes(color
=
coupon))
+
facet_wrap(
~
coupon)
+
labs(x
=
"Tip Type"
,
y
=
"Hardness"
,
color
=
"Coupon Type"
,
title
=
"Scatterplot for Tip type Vs. Hardness, Faceted by Coupon Type"
)
`geom_smooth()` using formula 'y ~ x'
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
5
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2.0.2
1. (b) Interactions
Should we test for interactions between
tip
and
coupon
? Maybe there is an interaction between
the different metals that goes beyond our current scientific understanding!
Fit a linear model to the data with predictors
tip
and
coupon
, and an interaction between the two.
Display the summary and explain why (or why not) an interaction term makes sense for this data.
[16]:
# Your Code Here
# Fit a linear model with interaction
model
<-
lm(y
~
tip
*
coupon, data
=
hardness)
summary(model)
7
print
(
"2nd Model"
)
model
<-
lm(y
~
tip
+
coupon, data
=
hardness)
summary(model)
Call:
lm(formula = y ~ tip * coupon, data = hardness)
Residuals:
ALL 16 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
9.300e+00
NA
NA
NA
tip2
1.000e-01
NA
NA
NA
tip3
-1.000e-01
NA
NA
NA
tip4
4.000e-01
NA
NA
NA
coupon2
1.000e-01
NA
NA
NA
coupon3
3.000e-01
NA
NA
NA
coupon4
7.000e-01
NA
NA
NA
tip2:coupon2 -2.000e-01
NA
NA
NA
tip3:coupon2
1.000e-01
NA
NA
NA
tip4:coupon2 -2.000e-01
NA
NA
NA
tip2:coupon3
1.000e-01
NA
NA
NA
tip3:coupon3 -3.758e-15
NA
NA
NA
tip4:coupon3 -3.869e-15
NA
NA
NA
tip2:coupon4 -2.000e-01
NA
NA
NA
tip3:coupon4 -2.000e-01
NA
NA
NA
tip4:coupon4 -2.000e-01
NA
NA
NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:
1,Adjusted R-squared:
NaN
F-statistic:
NaN on 15 and 0 DF,
p-value: NA
[1] "2nd Model"
Call:
lm(formula = y ~ tip + coupon, data = hardness)
Residuals:
Min
1Q
Median
3Q
Max
-0.10000 -0.05625 -0.01250
0.03125
0.15000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
8
(Intercept)
9.35000
0.06236 149.934
< 2e-16 ***
tip2
0.02500
0.06667
0.375 0.716345
tip3
-0.12500
0.06667
-1.875 0.093550 .
tip4
0.30000
0.06667
4.500 0.001489 **
coupon2
0.02500
0.06667
0.375 0.716345
coupon3
0.32500
0.06667
4.875 0.000877 ***
coupon4
0.55000
0.06667
8.250 1.73e-05 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.09428 on 9 degrees of freedom
Multiple R-squared:
0.938,Adjusted R-squared:
0.8966
F-statistic: 22.69 on 6 and 9 DF,
p-value: 5.933e-05
The second model I fitted, which includes predictors for “tip” and “coupon” without an interaction
term, offers a more straightforward interpretation of the data.
The analysis indicates several key insights into how these variables impact hardness. Firstly, the
intercept suggests that the expected hardness is around 9.35 when both “tip” and “coupon” are at
their baseline levels.
When examining the coefficients for “tip,” only the level “tip4” shows a statistically significant
effect on hardness, with an increase of 0.3 units compared to the baseline.
On the other hand,
for “coupon,” all levels except “coupon2” exhibit significant effects on hardness, with “coupon4”
having the most substantial impact, leading to a 0.55 unit increase. These findings are supported
by the model’s high adjusted R-squared value of 0.8966, indicating that approximately 89.66% of
the variability in hardness can be explained by the predictors.
2.0.3
1. (c) Contrasts
Let’s take a look at the use of contrasts. Recall that a contrast takes the form
t
∑
i
=1
c
i
μ
i
= 0
,
where
c
= (
c
1
, ..., c
t
)
is a constant vector and
μ
= (
μ
1
, ..., μ
t
)
is a parameter vector (e.g.,
μ
1
is the
mean of the
i
th
group).
We can note that
c
= (1
,
-
1
,
0
,
0)
corresponds to the null hypothesis
H
0
:
μ
2
-
μ
1
= 0
, where
μ
1
is
the mean associated with tip1 and
μ
2
is the mean associated with tip2. The code below tests this
hypothesis.
Repeat this test for the hypothesis
H
0
:
μ
4
-
μ
3
= 0
.
Interpret the results.
What are your
conclusions?
[17]:
library(multcomp)
lmod
=
lm(y
~
tip
+
coupon, data
=
hardness)
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
fit
.
gh2
=
glht(lmod, linfct
=
mcp(tip
=
c(
1
,
-1
,
0
,
0
)))
#estimate of mu_2 - mu_1
with
(hardness,
sum
(y[tip
== 2
])
/
length(y[tip
== 2
])
-
sum
(y[tip
== 1
])
/
length(y[tip
== 1
]))
0.0250000000000021
The code above fits a linear model (lmod) to the hardness data with predictors for “tip” and
“coupon.”
Then, it creates a general linear hypothesis test (glht) using the multcomp package to compare the
means between different levels of “tip.” Specifically, it tests the hypothesis that the mean hardness
for “tip2” is equal to the mean hardness for “tip1” in the lmod model.
The estimated difference in means between “tip2” and “tip1” is approximately 0.025. This value
indicates that, on average, the hardness is slightly higher for “tip2” compared to “tip1.”
2.0.4
1. (d) All Pairwise Comparisons
What if we want to test all possible pairwise comparisons between treatments. This can be done
by setting the treatment factor (
tip
) to “Tukey”. Notice that the p-values are adjusted (because
we are conducting multiple hypotheses!).
Perform all possible Tukey Pairwise tests. What are your conclusions?
[18]:
# Your Code Here
tukey
<-
TukeyHSD(aov(lmod))
tukey
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lmod)
$tip
diff
lwr
upr
p adj
2-1
0.025 -0.18311992 0.23311992 0.9809005
3-1 -0.125 -0.33311992 0.08311992 0.3027563
4-1
0.300
0.09188008 0.50811992 0.0066583
3-2 -0.150 -0.35811992 0.05811992 0.1815907
4-2
0.275
0.06688008 0.48311992 0.0113284
4-3
0.425
0.21688008 0.63311992 0.0006061
$coupon
diff
lwr
upr
p adj
2-1 0.025 -0.18311992 0.2331199 0.9809005
3-1 0.325
0.11688008 0.5331199 0.0039797
4-1 0.550
0.34188008 0.7581199 0.0000830
10
3-2 0.300
0.09188008 0.5081199 0.0066583
4-2 0.525
0.31688008 0.7331199 0.0001200
4-3 0.225
0.01688008 0.4331199 0.0341762
The Tukey multiple comparisons of means test conducted on the “tip” and “coupon” factors revealed
the following conclusions:
For the “tip” factor, there is a significant difference in hardness between “tip4” and “tip1,” while
other comparisons between tip levels did not show statistically significant differences.
Regard-
ing the “coupon” factor, significant differences were observed between “coupon3” and “coupon1,”
“coupon4” and “coupon1,” “coupon4” and “coupon2,” and “coupon4” and “coupon3,” indicating
variations in hardness across these levels.
3
Problem 2: Ethics in my Math Class!
In your own words, answer the following questions:
• What is power, in the statistical context?
• Why is power important?
• What are potential consequences of ignoring/not including power calculations in statistical
analyses?
Power in statistics refers to the probability that a statistical test will correctly reject a null hypoth-
esis when the alternative hypothesis is true. In simpler terms, it’s the ability of a test to detect
an effect or difference if it exists in the population being studied. Power is a crucial concept be-
cause it directly impacts the reliability and accuracy of statistical analyses. Here’s a more detailed
breakdown:
1. Definition of Power: Power is a statistical measure that quantifies the likelihood of finding
a statistically significant result in a study, given that a true effect or difference exists. It’s
calculated as 1 minus the probability of a Type II error, which occurs when the test fails to
detect a true effect (i.e., a false negative).
2. Power is a crucial metric as it directly influences the sensitivity of a statistical test to detect
effects or differences in the population being studied.
A high power test is more likely to
detect even small but meaningful effects, making it essential in fields such as medicine where
subtle treatment effects can have significant implications. Moreover, power affects the validity
of study results, resource efficiency, and ethical considerations. Adequate power ensures that
research outcomes are trustworthy, resources are utilized effectively, and participants are not
exposed to unnecessary risks due to underpowered studies.
3. One of the primary consequences is the potential for missed discoveries, where low power
increases the risk of overlooking genuine effects or differences. This can lead to incomplete or
misleading research outcomes, fostering a false sense of security in null findings. Additionally,
ignoring power considerations can result in resource wastage as studies with inadequate power
may require repeated efforts or larger sample sizes to achieve meaningful results.
11
4
Problem 3: Post-Hoc Tests
There’s so many different post-hoc tests! Let’s try to understand them better. Answer the following
questsions in the markdown cell:
• Why are there multiple post-hoc tests?
• When would we choose to use Tukey’s Method over the Bonferroni correction, and vice versa?
• Do some outside research on other post-hoc tests. Explain what the method is and when it
would be used.
1. Post-hoc tests serve to address the familywise-error problem in statistical analyses, with
various methods available for correction.
There is no consensus on a universally superior
approach, leading to the selection of different tests based on the research context and priorities.
2. The Bonferroni method offers certain advantages and potential drawbacks compared to the
Tukey method. It tends to be slightly conservative, providing alpha values slightly below the
desired cutoff, which can be beneficial or limiting depending on the situation and the number
of tests conducted.
As the number of tests increases, the Bonferroni correction’s accuracy
diminishes, impacting its power compared to Tukey, especially for the same number of tests.
Bonferroni allows for testing a limited number of simultaneous hypotheses, whereas Tukey
automatically conducts pairwise comparisons, which can be resource-intensive for variables
with numerous factor levels.
Tukey is particularly effective for factors with few levels and
allows precise alpha specification using the Studentized Range Distribution.
3. Exploring post-hoc tests further, Tukey’s HSD test is a post-hoc test commonly used in
ANOVA to identify which specific groups or treatments differ significantly from each other.
It compares all possible pairs of means and determines whether the difference between any
two means is statistically significant.
Tukey’s HSD is robust and widely used, especially
when the sample sizes are equal.
It helps researchers gain a deeper understanding of the
differences among groups identified as significant in the initial ANOVA analysis, providing
valuable insights into the relationships between variables in the study.
[ ]:
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL