Assignment 3 STAT2040

pdf

School

Toronto Metropolitan University *

*We aren’t endorsed by this school

Course

324

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by SargentEnergy8424

Report
Part A:A Bit on t-Tests The value of the t-test statistic I obtained is -2.5737 and my p-value is 0.9898. We would fail to reject the null hypothesis at the 5% significance level with a p-value of 0.9898. This p-value is greater than 0.05 which suggests that there isn't sufficient evidence to conclude that the 'Before' measurements are significantly greater than the 'After' measurements. Therefore, we can conclude that the observed difference is not statistically significant. The 95% confidence interval I obtained is (-15.898, -1.537). PART B: One-Way ANOVA B1:
B2: The value of the total SS I calculated is: 15.2161 + 6.2731 = 21.49 B3: The p-value for the F test statistic allows us to reject the null hypothesis at the 5% level of confidence. We can reject it because the value I got is 9.848996e-10, which is a much smaller value than the 5% level of confidence. B4: Some conclusions I can make based on this analysis are that the Jersey breed tends to have the highest butterfat percentage, while Holstein-Friesian breed appears to have the lowest. Based on Tukey’s HSD test, we can identify which breed means differ significantly and which do not by looking at the p-values. If they are less than 0.05, it indicates significant differences between breeds in their means. The underscore diagram helps visualize the relative differences between the breed means, helping us better understand the variations in Butterfat Percentage across the different breeds.
B5: If Tukey's HSD procedure had used a 10% familywise error rate instead of a 5% level, it would have impacted the determination of statistical significance in the differences between the breed means. Increasing the familywise error rate would lead to a more open and broader set of differences between the breeds. This could alter and expand the conclusions drawn regarding which breed means differ significantly from each other. B6: B7: One possible violation is that there is a slight variation in the appearance of the width of the boxes, also known as the interquartile ranges, which raises the possibility that the homogeneity of variance assumption has been violated. Another possible violation could be shown through the boxplots. Asymmetric boxplot forms in certain breeds suggest that their butterfat percentage distributions may be more skewed. This can suggest a slightly more serious violation of the ANOVA's normality assumption. B8:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
MSE = 0.174 PART C: Simple Linear Regression Analysis C1: the least squares regression equation that was fit to the data is Duration.jumping.s = Intercept + Slope × No.jumps Duration.jumping.s = 2.1532 + 1.4882 × No.jumps C2: The critical value for a 95% confidence interval for both coefficients with 24 degrees of freedom is approximately 2.064 derived from the t-distribution table. For the slope (No.jumps): Estimate: 1.4882, SE: 0.1187 For the Y-intercept (Intercept): Estimate: 2.1532, SE: 3.1716 Formula: Confidence Interval for Slope = Estimate ± Critical Value × SE 1.4882 ± 2.064 × 0.1187 = (1.243, 1.733) Formula: Confidence Interval for Intercept = Estimate ± Critical Value × SE 2.1532 ± 2.064 × 3.1716 = (−4.39, 8.699) C3: Duration.jumping.s = Intercept + Slope × No.jumps Intercept: 2.1532
Slope: 1.4882 Duration.jump.s = 2.1532 + 1.4882 × 33 Duration.jump.s = 51.26 Therefore, the predicted value of Duration.jump.s when No.jumps is 33 is approximately 51.26. C4. Residual = (actual y value) − (predicted y value) The value of the residual for the second last observation in the data set is -4.965. C5. The value of the correlation coefficient is 0.931. C6. This value corresponds to the sum of squares of residuals entry in the ANOVA table, found as “Sum Sq”. C7: The sets of hypothesis these t-values are testing for a null hypothesis or an alternative hypothesis. With the null hypothesis, it tells us that there is no relationship between the predictor variable, in this case No.jumps, and the response variable, Duration.jumping.s. It also tells us
that the slope coefficient for No.jumps (the predictor variable) would be equal to zero. The alternative hypothesis implies that there is a significant relationship between the predictor variable (No.jumps) and the response variable (Duration.jumping.s). This would also tell us that the slope coefficient for No.jumps is not equal to zero. At the 5% level of significance, if the p-value is less than 0.05, it suggests strong evidence against the null hypothesis. Therefore, we would reject the null hypothesis and conclude that there is a statistically significant relationship between No.jumps and Duration.jumping.s. However, if the p-value is greater than 0.05, it implies that there isn't enough evidence to reject the null hypothesis. In this case, we would not have sufficient evidence to conclude that there is a significant relationship between No.jumps and Duration.jumping.s. In the summary table, it states that the p-value is 5.01e-12, which is significantly lower than 0.05. Therefore, we can conclude that there is a significant relationship between No.jumps and Duration.jumping.s., and we could reject the null hypothesis. C8: The F-test statistic obtained from the ANOVA table and the t-test for the No.jumps coefficient are related in the context of testing the significance of the relationship between the predictor variable (No.jumps) and the response variable (Duration.jumping.s). They are statistically similar because if we square the t-value for the coefficient of No.jumps, it results in the F-value obtained from the ANOVA table. The t-value for No.jumps is 12.539, and if we square it, it equals the F-value, 157.22. The F-test and the t-test statistics both assess the significance of the relationship between the predictor (No.jumps) and the response variable (Duration.jumping.s). The main difference is that the F-test assesses overall model significance while the t-test assesses the significance of individual coefficients.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help