Dasgupta_Module6HW

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

7340

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

5

Uploaded by AgentSalamanderPerson867

Report
Module 6 Homework Problem 1 We have the Golub data set and we are considering the H4/j gene from row 2972 and the APS Prostate Specific Antigen gene from row 2989. (a) We can set up the null and alternate hypotheses as follows: H0: The mean for the ‘H4/j gene’ gene expression values in ‘ALL’ patients is -0.9. HA: The mean for the ‘H4/j gene’ gene expression values in ‘ALL’ patients is greater than -0.9. To test the hypotheses, we can apply the one-sided T-test for one sample: t.test(H4j_data_ALL, alternative="greater", mu=-0.9) One Sample t-test data: H4j_data_ALL t = 2.2659, df = 26, p-value = 0.01601 alternative hypothesis: true mean is greater than -0.9 95 percent confidence interval: -0.844439 Inf sample estimates: mean of x -0.6753033 Since the p-value of 0.01601 is very small, i.e. smaller than 0.05, the data is very surprising when H0 is true; so the data is statistically significant and provides enough evidence against H0. Hence, we reject the null hypothesis and accept the alternate hypothesis, i.e. the mean of the H4/j data in ‘ALL’ patients is greater than -0.9. (b) We can set up the null and alternate hypotheses as follows: H0: The mean ‘H4/j gene’ gene expression value in ‘ALL’ group is the same as that in the ‘AML’ group. HA: The mean ‘H4/j gene’ gene expression value in ‘ALL’ group is not the same as that in the ‘AML’ group. To test the hypotheses, we can apply Welch’s two sample T-test : t.test(H4j_data~gol.fac)
Welch Two Sample t-test data: H4j_data by gol.fac t = -1.4988, df = 29.978, p-value = 0.1444 alternative hypothesis: true difference in means between group ALL and group AML is not equal to 0 95 percent confidence interval: -0.48627436 0.07463315 sample estimates: mean in group ALL mean in group AML -0.6753033 -0.4694827 Since the p-value is greater than 0.05, the data is not surprising when H0 is true; so the data is not statistically significant and does not provide enough evidence against H0. Hence, we reject the alternate hypothesis and accept the null hypothesis, i.e. the mean ‘H4/j gene’ gene expression value in ‘ALL’ group is the same as that in the ‘AML’ group. (c) We can set up the null and alternate hypotheses as follows: H0: In ‘ALL’ patients, the mean gene expression value for the h4/j gene is equal to the mean gene expression value for the Antigen prostate specific gene. HA: In ‘ALL’ patients, the mean gene expression value for the h4/j gene is less than the mean gene expression value for the Antigen prostate specific gene. To test the hypotheses, we can use paired 1-sided T-test : t.test(H4j_data_ALL, APS_data_ALL, alternative="less",paired=T) Paired t-test data: H4j_data_ALL and APS_data_ALL t = -1.8366, df = 26, p-value = 0.03886 alternative hypothesis: true mean difference is less than 0 95 percent confidence interval: -Inf -0.02175309 sample estimates: mean difference -0.3050307 Since the p-value of 0.03886 is very small, i.e. smaller than 0.05, the data is very surprising when H0 is true; so the data is statistically significant and provides enough evidence against H0. Hence, we reject the null hypothesis and accept the alternate hypothesis, i.e. In ‘ALL’ patients, the mean gene expression value for the h4/j gene is less than the mean gene expression value for the Antigen prostate specific gene.
(d) We can set up the hypotheses as follows: H0: ph4/j in the ‘ALL’ group is equal to 0.5. HA: ph4/j in the ‘ALL’ group is less than 0.5. To test the hypotheses, we can use one-sided exact binomial test ph4j <- (H4j_data_ALL > (-0.6)) binom.test(sum(ph4j), length(ph4j), p=0.5, alternative="less") Exact binomial test data: sum(ph4j) and length(ph4j) number of successes = 10, number of trials = 27, p-value = 0.1239 alternative hypothesis: true probability of success is less than 0.5 95 percent confidence interval: 0.0000000 0.5466402 sample estimates: probability of success 0.3703704 The p-value of 0.1239 is greater than 0.05, so this provides significant evidence in support of the null hypothesis; hence we accept the null hypothesis H0 , i.e. ph4/j in the ‘ALL’ group is equal to 0.5. As a result, we cannot show that ph4/j in the ‘ALL’ group is less than 0.5. (e) We can set up the hypotheses as follows: H0: The proportion ph4/j in ‘ALL’ group is equal to the proportion ph4/j in ‘AML’ group. HA: The proportion ph4/j in the ‘ALL’ group differs from the proportion ph4/j in the ‘AML’ group. To test the hypotheses, we can use the two proportions comparison test for binomial distributions. ph4j_AML <- (H4j_data_AML > (-0.6)) prop.test(x=c(sum(ph4j_ALL),sum(ph4j_AML)), n=c(length(ph4j_ALL),length(ph4j_AML)), alternative="two.sided") 2-sample test for equality of proportions with continuity correction data: c(sum(ph4j_ALL), sum(ph4j_AML)) out of c(length(ph4j_ALL), length(ph4j_AML)) X-squared = 2.6901, df = 1, p-value = 0.101
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
alternative hypothesis: two.sided 95 percent confidence interval: -0.74094690 0.02714219 sample estimates: prop 1 prop 2 0.3703704 0.7272727 The p-value of 0.101 is greater than 0.05, so we accept the null hypothesis H0 , i.e. The proportion ph4/j in ‘ALL’ group is equal to the proportion ph4/j in ‘AML’ group. Problem 2 This is a binomial distribution (a) The number of expected rejections = 3000*0.03 = 90 (b) The probability of less than 75 rejections: binom.test(74, 3000, p=0.03,alternative = "less") Exact binomial test data: 74 and 3000 number of successes = 74, number of trials = 3000, p-value = 0.04538 alternative hypothesis: true probability of success is less than 0.03 95 percent confidence interval: 0.00000000 0.02985101 sample estimates: probability of success 0.02466667 The probability of less than 75 rejections is 0.04538. Problem 3 Rejection rate/ Type I error rate = 0.1049 With a 95% confidence interval: 0.09889419 0.10490000 0.11090581 Yes, the test is valid since the error rate matches the new nominal alpha rate of 0.1.
Problem 4 (a) Number of genes expressed as per: Bonferroni: 103 FDR: 695 (b) The gene names for the top three strongest differentially expressed genes: "Zyxin" "FAH Fumarylacetoacetate" "APLP2 Amyloid beta (A4) precursor-like protein 2" After Bonferroni adjustment: "Zyxin" "FAH Fumarylacetoacetate" "APLP2 Amyloid beta (A4) precursor-like protein 2" After FDR adjustment: "Zyxin" "FAH Fumarylacetoacetate" "APLP2 Amyloid beta (A4) precursor-like protein 2"