Assignment 3 (5%)

pdf

School

Western University *

*We aren’t endorsed by this school

Course

2143

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

7

Uploaded by CommodoreJayMaster

Report
4/30/2021 Assignment 3 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/186d6c5a-0960-49ac-9edc-11a3b367dbf6/So… 1/7 Assignment 3 (5%) Instructions Submit one PDF document per team with the names and student numbers of all members. The project is due Friday, April 2 (10:00PM), and to be submitted via Gradescope. In this assignment you will use a sample of top chess players to conduct a variety of parametric hypothesis tests. Adapted from data published in August 2020 by the International Chess Federation (FIDE), the dataset “project3_data” provides data for 1987 players: Country Gender FIDE title Name Standard game rating (>90 minute game) Rapid rating (10 to 60 minutes) Blitz rating (<10 minutes) For the purposes of the assignment, the original dataset was modified to keep titled players from India, Russia and the United States only. Answer each of the questions below with full sentences accompanied by reproducible code from the software of your choice (e.g. Excel, RStudio, Python, WolframAlpha). Report answers with software precision. Conduct all hypothesis tests at the 95% confidence level. # Import data and load packages data <- read.csv("~/ss2143/project3/project3_data.csv") attach (data) Question 1 (9 points): Are the averages of the Rapid rankings across three countries equal? Identify the null and the alternative hypotheses (1 point), the test statistic (1 point), the rejection region (1 point), and the conclusion (1 point). Answer To compare more than two means, we use a single-factor ANOVA. We denote the mean Rapid game rating for India, Russia, and the United States as , , and , respectively. The null and alternative hypotheses are
4/30/2021 Assignment 3 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/186d6c5a-0960-49ac-9edc-11a3b367dbf6/So… 2/7 # Subsamples xIND <- Rapid[Country=='IND'] xRUS <- Rapid[Country=='RUS'] xUSA <- Rapid[Country=='USA'] # Number of countries ncountries <- length(unique(Country)) # Number of observations nobs <- length(Rapid) nobsIND <- length(xIND) nobsRUS <- length(xRUS) nobsUSA <- length(xUSA) # Sums of squares SST <- sum(xIND)^2/nobsIND + sum(xRUS)^2/nobsRUS + sum(xUSA)^2/nobsUSA - sum(Rapid)^2/nobs SSErr <- sum(Rapid^2) - (sum(xIND)^2/nobsIND + sum(xRUS)^2/nobsRUS + sum(xUSA)^2/nobsUSA) # Mean Squares MST <- SST/(ncountries - 1) MSErr <- SSErr/(nobs - ncountries) # Test statistic Fstat <- MST/MSErr # Critical value Fcritval <- qf(0.95,ncountries - 1,nobs - ncountries) The test statistic for a single-factor ANOVA is the ratio of the mean square for treatments and the mean square for error , resulting in 82.7448513 . The rejection region at the 95% level is any value greater than 3.0002602 . Since the test statistic is in the rejection region, we reject the null hypothesis according to which the mean rating for Rapid games is equal across India, Russia, and the United States. If the countries are statistically different, which country outperforms the other two (1 point)? Is the average Rapid ranking greater than the average of the other two countries? Identify the null and the alternative hypotheses (1 point), calculate the test statistic (1 point), state the rejection region (1 point), and draw the conclusion (1 point). Answer xbarIND <- mean(xIND) xbarRUS <- mean(xRUS) xbarUSA <- mean(xUSA) The sample averages for Rapid game ratings are 2059 , 2227.2023653 , and 2308.1156463 for India, Russia, and the United States, respectively. The United States therefore seem to outperform the other two countries in Rapid chess.
4/30/2021 Assignment 3 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/186d6c5a-0960-49ac-9edc-11a3b367dbf6/So… 3/7 Let us test whether the United States have a greater average Rapid game rating than the other two countries. To compare two means, we use a two-sample t-test. We denote the mean Rapid rating across India and Russia as . The null and alternative hypotheses are # Sample of non-USA ratings xIR <- Rapid[Country!='USA'] # Sample size nobsIR <- length(xIR) # Average xbarIR <- mean(xIR) # Standard error sdUSA <- sd(xUSA) sdIR <- sd(xIR) # Test statistic Tstat <- (xbarUSA - xbarIR)/sqrt((sdUSA^2/nobsUSA + sdIR^2/nobsIR)) # Degrees of freedom dof <- ((sdUSA^2/nobsUSA + sdIR^2/nobsIR))^2/( (sdUSA^2/nobsUSA)^2/(nobsUSA-1) + (sdIR^2/nobsIR) ^2/(nobsIR-1) ) # Critical value Tcritval <- qt(0.95,dof) The test statistic for the two-sample -test is 5.1743882 . The rejection region at the 95% level for a one-sided test is any value greater than 1.6539298 . This value corresponds to the 95th percentile of a -distribution with 168.8182402 degrees of freedom. Since the test statistic is in the rejection region, we reject the null hypothesis according to which the mean rating for Rapid games is equal between the United States and other countries. Question 2 (3 points) Do players tend to have higher rankings in Standard games than in Rapid games? Identify the null and the alternative hypotheses (1 point), calculate the p-value (1 point), and draw the conclusion (1 point). Answer To compare the rating of Standard and Rapid games, we use the paired -test. We denote the mean difference between Standard and Rapid games as , where and denote average Standard and Rapid ratings. The null and alternative hypotheses are
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4/30/2021 Assignment 3 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/186d6c5a-0960-49ac-9edc-11a3b367dbf6/So… 4/7 # Sample of differences diff <- Standard - Rapid # Test statistic Tstat <- mean(diff)/sd(diff)*sqrt(nobs) # P-value pvalue <- 1-pt(Tstat,nobs-1) The test statistic for the paired -test is 23.8802265 . The p-value can be defined as the largest such that the test would be rejected at the confidence level. This corresponds to defining a critical value equal to the test statistic. The test would therefore be rejected if 23.8802265 . This value is a quantile associated with probability , where 0 for a Student’s distribution with 1986 degrees of freedom. Since the p-value is less than 5%, we reject the null hypothesis according to which the mean rating for Rapid games is equal to the mean rating for Standard games. Question 3 (4 points): Are the proportion of Grand Masters (GM) among the Indian players different than the proportion of GMs among Russian players? Identify the null and the alternative hypotheses (1 point), calculate the p-value (1 point), and draw the conclusion (1 point). Answer We denote the proportion of GMs among Indian and Russian players as , and , respectively. To test the difference between two proportions, the null and alternative hypotheses are # Sample proportions pIND <- mean(Title[Country=='IND']=='GM') pRUS <- mean(Title[Country=='RUS']=='GM') # Variances vIND <- pIND*(1-pIND)/nobsIND vRUS <- pRUS*(1-pRUS)/nobsRUS # Test statistic Zstat <- (pIND-pRUS)/sqrt(vIND+vRUS) # P-value pvalue <- 2*(1-pnorm(abs(Zstat))) The test statistic for the two-sample -test for proportions is 2.4852422 .
4/30/2021 Assignment 3 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/186d6c5a-0960-49ac-9edc-11a3b367dbf6/So… 5/7 The p-value can be defined as the largest such that the test would be rejected at the confidence level. This corresponds to defining a critical value equal to the test statistic. The test would therefore be rejected if 2.4852422 . This value is the standard normal quantile associated with probability , where 0.0129463 Since the p-value is less than 5%, we reject the null hypothesis according to which the proportions of GMs among Indian and Russian players are equal. Alternatively, one can use the large sample testing procedure as outlined below. # Proportion of GMs across Indian and Russian players phat <- mean(Title[Country=='IND'|Country=='RUS']=='GM') # Large sample procedure test statistic Zstat2 <- (pIND-pRUS)/sqrt(phat*(1-phat)*(1/nobsIND + 1/nobsRUS)) # P-value pvalue2 <- 2*(1-pnorm(abs(Zstat2))) The test statistic for the two-sample -test for proportions is 2.7366541 . The corresponding p-value is 0.0062068 . What is the power of this test given the sample proportions? In other words, evaluate at and (1 point). Answer The two-sided test has function where , , and are the two sample sizes, and . barp <- (nobsIND*pIND + nobsRUS*pRUS)/(nobsIND + nobsRUS) barq <- (nobsIND*(1-pIND) + nobsRUS*(1-pRUS))/(nobsIND + nobsRUS) sigma <- sqrt(pIND*(1-pIND)/nobsIND + pRUS*(1-pRUS)/nobsRUS) zalpha <- qnorm(0.975) # Power calculation beta <- pnorm( ( zalpha*sqrt(barp*barq*(1/nobsIND + 1/nobsRUS))-(pIND-pRUS) )/sigma ) - pnorm( ( -zalpha*sqrt(barp*barq*(1/nobsIND + 1/nobsRUS))-(pIND-pRUS) )/sigma ) power <- 1 - beta If the sample proportions are good approximations of the true population proportions, the power of the test can be approximated to 0.7597097 . The power measures the probability of correctly rejecting the null hypothesis (i.e. rejecting the null hypothesis when it should be rejected). Question 4 (4 points):
4/30/2021 Assignment 3 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/186d6c5a-0960-49ac-9edc-11a3b367dbf6/So… 6/7 Are the variances of Blitz rankings different for males and females? Identify the null and the alternative hypotheses (1 point), calculate the test statistic (1 point), state the rejection region (1 point), and draw the conclusion (1 point). Answer The sample variances for female and male players are denoted and , and population variances for female and male players are denoted and . To test the difference between two variances, the null and alternative hypotheses are # Sample variances vMale <- var(Blitz[Gender=='M']) vFemale <- var(Blitz[Gender=='F']) # Sample size nobsMale <- sum(Gender=='M') nobsFemale <- sum(Gender=='F') # Test statistic Fstat <- vMale/vFemale # Critical values Fcritval <- qf(c(0.025,0.975),nobsMale,nobsFemale) The sample variance for women and men’s Blitz ratings are and , respectively. The test statistic for a comparison of variances is . Under the null hypothesis, we have , which then yields = 0.7202129 . The rejection region at the 95% level is any value outside of the interval 0.8649173, 1.1631673 . This value corresponds to the 2.5 and 97.5th percentiles of a -distribution with 1532 and 455 degrees of freedom. Since the test statistic is in the rejection region, we reject the null hypothesis according to which the variances of Blitz ratings are different from female to male players. Alternatively, one can invert the ratio when computing the test statistic. One must then invert the degrees of freedom of the distribution when computing the critical values for the rejection region. # Test statistic Fstat2 <- vFemale/vMale # Critical values Fcritval2 <- qf(c(0.025,0.975),nobsFemale,nobsMale) This yields a test statistic of 1.3884782 . The rejection region is any value outside of the interval 0.8597216, 1.15618 . The conclusion of the test is the same.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4/30/2021 Assignment 3 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/186d6c5a-0960-49ac-9edc-11a3b367dbf6/So… 7/7 Remarks Drawing conclusions from hypothesis tests A common mistake when drawing conclusions from hypothesis tests is to confirm one of the two hypotheses. One never accepts the null hypothesis, but only fails to reject it given a predetermined confidence level. Similarly, rejecting the null hypothesis with a predetermined confidence level is not the same thing as saying that the alternative hypothesis is true. If we think of hypothesis tests as a means for scientific inquiry, we are measuring the strength of the evidence for a claim/hypothesis. No matter how strong the evidence, you can never be assertive when you draw conclusions from statistical inference. Falsifiability is a fundamental principle of the philosophy of science, which is why theories are never confirmed, but only unrefuted. Here are some examples of incorrect conclusions: is accepted. is accepted. is rejected so is true. is not rejected so is true. is rejected in favor of . is rejected in favor of . is incorrect. is incorrect. Here are some examples of correct conclusions: is rejected at the significance level. The data gives strong support for . is not rejected at the significance level. The data does not give strong support for . Reporting reproducible code Many students are decidedly reluctant to include reproducible code in their answers. Consistent with the previous assignments’ marking scheme, one point was deducted for each question where instructions were ignored. Software commands that refer to undefined values are not considered reproducible (e.g. calculating a formula with the value in cell Z9 in Excel, where the value in cell Z9 is never defined).