C2M2_peer_reviewed

pdf

School

Poolesville High *

*We aren’t endorsed by this school

Course

DTSA5003

Subject

Statistics

Date

Jun 3, 2024

Type

pdf

Pages

12

Uploaded by MateComputerAntelope99

Report
C2M2_peer_reviewed May 19, 2024 1 C2M2: Peer Reviewed Assignment 1.0.1 Outline: The objectives for this assignment: 1. Utilize contrasts to see how different pairwise comparison tests can be conducted. 2. Understand power and why it’s important to statistical conclusions. 3. Understand the different kinds of post-hoc tests and when they should be used. General tips: 1. Read the questions carefully to understand what is being asked. 2. This work will be reviewed by another human, so make sure that you are clear and concise in what your explanations and answers. 2 Problem 1: Contrasts and Coupons Consider a hardness testing machine that presses a rod with a pointed tip into a metal specimen with a known force. By measuring the depth of the depression caused by the tip, the hardness of the specimen is determined. Suppose we wish to determine whether or not four different tips produce different readings on a hardness testing machine. The experimenter has decided to obtain four observations on Rockwell C- scale hardness for each tip. There is only one factor - tip type - and a completely randomized single- factor design would consist of randomly assigning each one of the 4×4=16 runs to an experimental unit, that is, a metal coupon, and observing the hardness reading that results. Thus, 16 different metal test coupons would be required in this experiment, one for each run in the design. [8]: tip <- factor(rep( 1 : 4 , each = 4 )) coupon <- factor(rep( 1 : 4 , times = 4 )) y <- c( 9.3 , 9.4 , 9.6 , 10 , 9.4 , 9.3 , 9.8 , 9.9 , 9.2 , 9.4 , 9.5 , 9.7 , 9.7 , 9.6 , 10 , 10.2 ) hardness <- data . frame(y, tip, coupon) hardness 1
A data.frame: 16 × 3 y tip coupon <dbl> <fct> <fct> 9.3 1 1 9.4 1 2 9.6 1 3 10.0 1 4 9.4 2 1 9.3 2 2 9.8 2 3 9.9 2 4 9.2 3 1 9.4 3 2 9.5 3 3 9.7 3 4 9.7 4 1 9.6 4 2 10.0 4 3 10.2 4 4 2.0.1 1. (a) Visualize the Groups Before we start throwing math at anything, let’s visualize our data to get an idea of what to expect from the eventual results. Construct interaction plots for tip and coupon using ggplot(). Be sure to explain what you can from the plots. [14]: # Your Code Here library(dplyr) library(ggplot2) hardness %>% ggplot(aes(x = tip, y = y, fill = coupon)) + geom_boxplot() + labs(x = "Tip Type" , y = "Hardness" , fill = "Coupon Type" , title = "Boxplot of Hardness by Tip Type and Coupon Type" ) hardness %>% ggplot(aes(y = y, x = tip)) + geom_smooth(method = "lm" , se = TRUE, aes(group = coupon, color = coupon)) + labs(x = "Tip Type" , color = "Coupon Type" , y = "Hardness" , title = "Linear Models for Tip type Vs. Hardness" ) 2
hardness %>% ggplot(aes(y = y, x = tip)) + geom_line(aes(group = coupon, color = coupon)) + labs(x = "Tip Type" , color = "Coupon Type" , y = "Hardness" , title = "Line Plots of Tip type Vs. Hardness, group by Coupon Type" ) hardness %>% ggplot(aes(x = tip, y = y)) + geom_point(aes(color = coupon)) + facet_wrap( ~ coupon) + labs(x = "Tip Type" , y = "Hardness" , color = "Coupon Type" , title = "Scatterplot for Tip type Vs. Hardness, Faceted by Coupon Type" ) `geom_smooth()` using formula 'y ~ x' 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4
5
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2.0.2 1. (b) Interactions Should we test for interactions between tip and coupon ? Maybe there is an interaction between the different metals that goes beyond our current scientific understanding! Fit a linear model to the data with predictors tip and coupon , and an interaction between the two. Display the summary and explain why (or why not) an interaction term makes sense for this data. [16]: # Your Code Here # Fit a linear model with interaction model <- lm(y ~ tip * coupon, data = hardness) summary(model) 7
print ( "2nd Model" ) model <- lm(y ~ tip + coupon, data = hardness) summary(model) Call: lm(formula = y ~ tip * coupon, data = hardness) Residuals: ALL 16 residuals are 0: no residual degrees of freedom! Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.300e+00 NA NA NA tip2 1.000e-01 NA NA NA tip3 -1.000e-01 NA NA NA tip4 4.000e-01 NA NA NA coupon2 1.000e-01 NA NA NA coupon3 3.000e-01 NA NA NA coupon4 7.000e-01 NA NA NA tip2:coupon2 -2.000e-01 NA NA NA tip3:coupon2 1.000e-01 NA NA NA tip4:coupon2 -2.000e-01 NA NA NA tip2:coupon3 1.000e-01 NA NA NA tip3:coupon3 -3.758e-15 NA NA NA tip4:coupon3 -3.869e-15 NA NA NA tip2:coupon4 -2.000e-01 NA NA NA tip3:coupon4 -2.000e-01 NA NA NA tip4:coupon4 -2.000e-01 NA NA NA Residual standard error: NaN on 0 degrees of freedom Multiple R-squared: 1,Adjusted R-squared: NaN F-statistic: NaN on 15 and 0 DF, p-value: NA [1] "2nd Model" Call: lm(formula = y ~ tip + coupon, data = hardness) Residuals: Min 1Q Median 3Q Max -0.10000 -0.05625 -0.01250 0.03125 0.15000 Coefficients: Estimate Std. Error t value Pr(>|t|) 8
(Intercept) 9.35000 0.06236 149.934 < 2e-16 *** tip2 0.02500 0.06667 0.375 0.716345 tip3 -0.12500 0.06667 -1.875 0.093550 . tip4 0.30000 0.06667 4.500 0.001489 ** coupon2 0.02500 0.06667 0.375 0.716345 coupon3 0.32500 0.06667 4.875 0.000877 *** coupon4 0.55000 0.06667 8.250 1.73e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.09428 on 9 degrees of freedom Multiple R-squared: 0.938,Adjusted R-squared: 0.8966 F-statistic: 22.69 on 6 and 9 DF, p-value: 5.933e-05 The second model I fitted, which includes predictors for “tip” and “coupon” without an interaction term, offers a more straightforward interpretation of the data. The analysis indicates several key insights into how these variables impact hardness. Firstly, the intercept suggests that the expected hardness is around 9.35 when both “tip” and “coupon” are at their baseline levels. When examining the coefficients for “tip,” only the level “tip4” shows a statistically significant effect on hardness, with an increase of 0.3 units compared to the baseline. On the other hand, for “coupon,” all levels except “coupon2” exhibit significant effects on hardness, with “coupon4” having the most substantial impact, leading to a 0.55 unit increase. These findings are supported by the model’s high adjusted R-squared value of 0.8966, indicating that approximately 89.66% of the variability in hardness can be explained by the predictors. 2.0.3 1. (c) Contrasts Let’s take a look at the use of contrasts. Recall that a contrast takes the form t i =1 c i μ i = 0 , where c = ( c 1 , ..., c t ) is a constant vector and μ = ( μ 1 , ..., μ t ) is a parameter vector (e.g., μ 1 is the mean of the i th group). We can note that c = (1 , - 1 , 0 , 0) corresponds to the null hypothesis H 0 : μ 2 - μ 1 = 0 , where μ 1 is the mean associated with tip1 and μ 2 is the mean associated with tip2. The code below tests this hypothesis. Repeat this test for the hypothesis H 0 : μ 4 - μ 3 = 0 . Interpret the results. What are your conclusions? [17]: library(multcomp) lmod = lm(y ~ tip + coupon, data = hardness) 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
fit . gh2 = glht(lmod, linfct = mcp(tip = c( 1 , -1 , 0 , 0 ))) #estimate of mu_2 - mu_1 with (hardness, sum (y[tip == 2 ]) / length(y[tip == 2 ]) - sum (y[tip == 1 ]) / length(y[tip == 1 ])) 0.0250000000000021 The code above fits a linear model (lmod) to the hardness data with predictors for “tip” and “coupon.” Then, it creates a general linear hypothesis test (glht) using the multcomp package to compare the means between different levels of “tip.” Specifically, it tests the hypothesis that the mean hardness for “tip2” is equal to the mean hardness for “tip1” in the lmod model. The estimated difference in means between “tip2” and “tip1” is approximately 0.025. This value indicates that, on average, the hardness is slightly higher for “tip2” compared to “tip1.” 2.0.4 1. (d) All Pairwise Comparisons What if we want to test all possible pairwise comparisons between treatments. This can be done by setting the treatment factor ( tip ) to “Tukey”. Notice that the p-values are adjusted (because we are conducting multiple hypotheses!). Perform all possible Tukey Pairwise tests. What are your conclusions? [18]: # Your Code Here tukey <- TukeyHSD(aov(lmod)) tukey Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = lmod) $tip diff lwr upr p adj 2-1 0.025 -0.18311992 0.23311992 0.9809005 3-1 -0.125 -0.33311992 0.08311992 0.3027563 4-1 0.300 0.09188008 0.50811992 0.0066583 3-2 -0.150 -0.35811992 0.05811992 0.1815907 4-2 0.275 0.06688008 0.48311992 0.0113284 4-3 0.425 0.21688008 0.63311992 0.0006061 $coupon diff lwr upr p adj 2-1 0.025 -0.18311992 0.2331199 0.9809005 3-1 0.325 0.11688008 0.5331199 0.0039797 4-1 0.550 0.34188008 0.7581199 0.0000830 10
3-2 0.300 0.09188008 0.5081199 0.0066583 4-2 0.525 0.31688008 0.7331199 0.0001200 4-3 0.225 0.01688008 0.4331199 0.0341762 The Tukey multiple comparisons of means test conducted on the “tip” and “coupon” factors revealed the following conclusions: For the “tip” factor, there is a significant difference in hardness between “tip4” and “tip1,” while other comparisons between tip levels did not show statistically significant differences. Regard- ing the “coupon” factor, significant differences were observed between “coupon3” and “coupon1,” “coupon4” and “coupon1,” “coupon4” and “coupon2,” and “coupon4” and “coupon3,” indicating variations in hardness across these levels. 3 Problem 2: Ethics in my Math Class! In your own words, answer the following questions: • What is power, in the statistical context? • Why is power important? • What are potential consequences of ignoring/not including power calculations in statistical analyses? Power in statistics refers to the probability that a statistical test will correctly reject a null hypoth- esis when the alternative hypothesis is true. In simpler terms, it’s the ability of a test to detect an effect or difference if it exists in the population being studied. Power is a crucial concept be- cause it directly impacts the reliability and accuracy of statistical analyses. Here’s a more detailed breakdown: 1. Definition of Power: Power is a statistical measure that quantifies the likelihood of finding a statistically significant result in a study, given that a true effect or difference exists. It’s calculated as 1 minus the probability of a Type II error, which occurs when the test fails to detect a true effect (i.e., a false negative). 2. Power is a crucial metric as it directly influences the sensitivity of a statistical test to detect effects or differences in the population being studied. A high power test is more likely to detect even small but meaningful effects, making it essential in fields such as medicine where subtle treatment effects can have significant implications. Moreover, power affects the validity of study results, resource efficiency, and ethical considerations. Adequate power ensures that research outcomes are trustworthy, resources are utilized effectively, and participants are not exposed to unnecessary risks due to underpowered studies. 3. One of the primary consequences is the potential for missed discoveries, where low power increases the risk of overlooking genuine effects or differences. This can lead to incomplete or misleading research outcomes, fostering a false sense of security in null findings. Additionally, ignoring power considerations can result in resource wastage as studies with inadequate power may require repeated efforts or larger sample sizes to achieve meaningful results. 11
4 Problem 3: Post-Hoc Tests There’s so many different post-hoc tests! Let’s try to understand them better. Answer the following questsions in the markdown cell: • Why are there multiple post-hoc tests? • When would we choose to use Tukey’s Method over the Bonferroni correction, and vice versa? • Do some outside research on other post-hoc tests. Explain what the method is and when it would be used. 1. Post-hoc tests serve to address the familywise-error problem in statistical analyses, with various methods available for correction. There is no consensus on a universally superior approach, leading to the selection of different tests based on the research context and priorities. 2. The Bonferroni method offers certain advantages and potential drawbacks compared to the Tukey method. It tends to be slightly conservative, providing alpha values slightly below the desired cutoff, which can be beneficial or limiting depending on the situation and the number of tests conducted. As the number of tests increases, the Bonferroni correction’s accuracy diminishes, impacting its power compared to Tukey, especially for the same number of tests. Bonferroni allows for testing a limited number of simultaneous hypotheses, whereas Tukey automatically conducts pairwise comparisons, which can be resource-intensive for variables with numerous factor levels. Tukey is particularly effective for factors with few levels and allows precise alpha specification using the Studentized Range Distribution. 3. Exploring post-hoc tests further, Tukey’s HSD test is a post-hoc test commonly used in ANOVA to identify which specific groups or treatments differ significantly from each other. It compares all possible pairs of means and determines whether the difference between any two means is statistically significant. Tukey’s HSD is robust and widely used, especially when the sample sizes are equal. It helps researchers gain a deeper understanding of the differences among groups identified as significant in the initial ANOVA analysis, providing valuable insights into the relationships between variables in the study. [ ]: 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help