chap 10 Expect_The_Unexpected_A_First_Course_In_Biostatist..._----_(Statistics) (3)

pdf

School

University of Ottawa *

*We aren’t endorsed by this school

Course

2379

Subject

Biology

Date

Jan 9, 2024

Type

pdf

Pages

28

Uploaded by GrandUniverseHyena41

Report
Chapter 10 Comparison of Two Independent Samples Biologists are often interested in the comparison of groups. Consider the following examples. Do two different species of swallow produce similar eggs on average? Does a type of fertilizer produce larger plants on average, com- pared to another type of fertilizer? In this chapter, we introduce methods to compare two independent groups. We discuss how interval estimation and hypothesis testing can be used to infer whether there are differences be- tween the two populations. We first discuss techniques to compare means, and end the chapter with techniques to compare proportions. 10.1 Study/Experimental Design When analyzing data, it is important to consider the design of the study or experiment. This is especially true when comparing groups. The design of the study often dictates the probability model that will be used to describe the data collection process from the populations of interest. It is only when the probability model is appropriate, that we can generalize our results from the samples to the populations. Scientists often want to compare groups that are outcomes from a con- trolled experiment which is run under different experimental conditions. For example, a simple experiment might be designed to test a claim that a particular type of fertilizer produces taller plants compared to another type of fertilizer. The response variable in this instance is the height of the plants. The primary factor for this experiment is the fertilizer. The levels of the factor are called treatments . So the treatments in this case are the types of fertilizer. In a controlled experiment we assign the treatments to the experimental units, which could be plots with one seedling in this case. This assignment determines the treatment groups. 163 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
164 Expect the Unexpected: A First Course in Biostatistics It is possible that there are uncontrolled factors that might affect the response variable. These are called nuisance factors . For example the genetic predisposition of a seedling to produce a tall plant might be a nuisance factor. Randomization is used to average the effects of the nuisance factors over the different groups. We should randomly assign the types of fertilizer to the seedlings. The purpose of a controlled experiment is to determine if there is a cause-and-effect relationship. In our case, this means that the use of the new fertilizer produces taller plants on average. If the controlled experi- ment is randomized and the treatment groups are statistically significantly different, then we can be confident that there is indeed a cause-and-effect relationship. One of the simplest experimental designs is called a completely random- ized design . For completely randomized designs, the levels of the primary factor are randomly assigned to the experimental units. Our fertilizer ex- periment has such a design. The tools introduced in this chapter apply to experiments with a completely randomized design. In some circumstances, the distribution of the response variable can be highly spread-out. This variability might be due to nuisance factors. For example, females and males might react differently to a particular drug. This noise can be prohibitive, in the sense that we would need very large samples in order to identify significant treatment effects. To reduce this noise we can construct homogeneous subgroups, called blocks . The variance within each block should be smaller than the variance of the entire sample. So the estimates within the blocks should be more precise. As we combine the estimates across blocks, we should obtain an estimate of the treatment effect that is more precise than without blocking . If we randomly assign all of the treatments to the experimental units within each block, then we say that the experiment has a randomized com- plete block design . As an example, if we want to compare a drug to a placebo and we believe that the gender has also an effect on the response, we divide the subjects into blocks according to their gender. If we have ten subjects of each gender, we randomly assign the drug to five subjects of each gender. The remainder of the subjects are given the placebo. We do not discuss the analysis of block designs in this chapter. The techniques presented in this chapter do not apply only to completely randomized experiments. They are also applicable in a non-experimental setting. Consider the study [64], where the authors compare the breeding biology of the Welcome Swallow in Australia and New Zealand. The factor Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Comparison of Two Independent Samples 165 (in this case, the location) is not assigned to the unit of study (the bird). Such a study is called an observational study . An observational study can identify associations, but not causality. We are not randomly assigning the treatments to the units of study. So there is a danger that any association that we find between the response and the factor may be due to some third variable, called a lurking variable, which is not evenly distributed among the groups. Maybe it is access to food that caused the difference in breeding biology, and not the location. So we should not say that it is the observational factor that caused the significant result. However, we can say that there is an association. The techniques in this chapter can be used to compare samples from an observational study as long as it is reasonable to assume that observations within the samples are independent, and that there is independence between the two samples. 10.2 Confidence Intervals and Tests for Means: Large Samples In this section, we discuss techniques to compare the means of two inde- pendent populations, when both sample sizes are large. We use X 1 and X 2 to denote the random measurements from population 1 and population 2, respectively. Their means are denoted by μ 1 = E ( X 1 ) and μ 2 = E ( X 2 ) and their variances are denoted by σ 2 1 = Var( X 1 ) and σ 2 2 = Var( X 2 ). We assume that we have a random sample of size n 1 40 from pop- ulation 1, whose mean and variance are denoted by X 1 , respectively S 2 1 . Similarly, we have a random sample of size n 2 40 from population 2, whose mean and variance are denoted by X 2 , respectively S 2 2 . From Ex- ample 7.8, we know that E ( X 1 ) = μ 1 , Var( X 1 ) = σ 2 1 n 1 , E ( X 2 ) = μ 2 , Var( X 2 ) = σ 2 2 n 1 . To compare the two means, we examine the difference in means μ 1 - μ 2 . We begin the discussion with point estimation. A natural estimator of μ 1 - μ 2 is the difference in sample means X 1 - X 2 . This estimator is unbiased since its expected value is E ( X 1 - X 2 ) = E ( X 1 ) - E ( X 2 ) = μ 1 - μ 2 . The variance of the estimator is Var( X 1 - X 2 ) = Var( X 1 ) + Var( X 2 ) = σ 2 1 n 1 + σ 2 2 n 2 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
166 Expect the Unexpected: A First Course in Biostatistics (see Theorem 7.1). Similar to the estimation of the mean in the one sample case, the larger the sample sizes, the more precise is the estimate. Further- more, as we standardize X 1 - X 2 , we obtain that X 1 - X 2 - ( μ 1 - μ 2 ) p σ 2 1 /n 1 + σ 2 2 /n 2 has approximately an N (0 , 1) distribution. When both sample sizes are large (i.e. n 1 40 and n 2 40), we can use the sample variances instead of the population variances. More precisely, X 1 - X 2 - ( μ 1 - μ 2 ) p S 2 1 /n 1 + S 2 2 /n 2 has approximately an N (0 , 1) distribution. (10.1) This approximation can be used even if the populations are not normally distributed. To justify it, recall that by Theorem 8.1, X 1 has approximately an N ( μ 1 , S 2 1 /n 1 ) distribution, and X 2 has approximately an N ( μ 2 , S 2 2 /n 2 ) distribution. Moreover, X 1 and X 2 are independent random variables, since the two populations are independent. By Theorem 7.2, it follows that X 1 - X 2 has also approximately a normal distribution, with mean μ 1 - μ 2 and variance S 2 1 /n 1 + S 2 2 /n 2 . Relation (10.1) follows by standardization. The theory that we present in this section is based upon the standard- ization (10.1). We emphasize that this standardization should be used only when both sample samples are greater than or equal to 40. We consider the inference concerning the difference μ 1 - μ 2 . The null hypothesis is of the form H 0 : μ 1 - μ 2 = δ 0 , where δ 0 is a given numeric value. Note that when δ 0 = 0, the null hypothesis becomes H 0 : μ 1 - μ 2 = 0, or equivalently H 0 : μ 1 = μ 2 . We use the following test statistic: Z 0 = X 1 - X 2 - δ 0 p S 2 1 /n 1 + S 2 2 /n 2 . (10.2) If H 0 holds, then the sampling distribution of Z 0 is approximately standard normal. Hence we can use Table 18.3, to compute the corresponding p - value. Recall that the p -value is the probability of observing a value as extreme as the current observed value, under the assumption that the null hypothesis holds. Since our definition of an extreme value depends on the alternative hypothesis, the computation of the p -value depends on the alternative hypothesis. Table 10.1 gives the p -value for testing the null hypothesis H 0 : μ 1 - μ 2 = δ 0 against one of the alternative hypotheses H 1 . In this table, Z has a standard normal distribution and z 0 = x 1 - x 2 - δ 0 p s 2 1 /n 1 + s 2 2 /n 2 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Comparison of Two Independent Samples 167 is the observed value of the test statistic Z 0 given by (10.2). This will produce a large sample test for comparing the means. Table 10.1 The p -value for comparison of two means: large samples Alternative Hypothesis p -value H 1 : μ 1 - μ 2 > δ 0 P ( Z > z 0 ) H 1 : μ 1 - μ 2 < δ 0 P ( Z < z 0 ) H 1 : μ 1 - μ 2 6 = δ 0 2 P ( Z > | z 0 | ) The p -value is a measure of how much evidence we have against the null hypothesis. The smaller the p -value, the greater the inconsistency between the data and the null hypothesis. Actually, the p -value is the smallest level of significance at which the null hypothesis can be rejected with the given data. We will use the same rule as in Section 9.1: if p -value < α, then we reject H 0 if p -value α, then we fail to reject H 0 . This rule ensures that the probability of type I error is approximately equal to α . When the null hypothesis H 0 : μ 1 - μ 2 = 0 is rejected, it is often said that the difference between μ 1 and μ 2 is statistically significant. The p -value is a valuable statistic that measures the risk associated with rejecting the null hypothesis. However, it does not give us the whole picture. Think of the hypothesis test as a diagnostic tool. We must assess its specificity and its sensitivity (often called power in the context of hypothesis testing). We can control its specificity (our chances of failing to reject H 0 when H 0 is true) with the use of a significance level. We can use a confidence interval to assess the sensitivity (our chances of rejecting H 0 when H 1 is true). A confidence interval is also useful as a stand-alone tool if the goal is simply to estimate the difference in means. An (approximate) confidence interval for μ 1 - μ 2 at a level of confidence of (1 - α ) 100% is x 1 - x 2 ± z s s 2 1 n 1 + s 2 2 n 2 where z is a value such that P ( - z < Z < z ) = 1 - α and Z follows a standard normal distribution. This means that P ( Z > z ) = α/ 2, i.e. z = z α/ 2 . Regardless of whether the difference is found to be statistically signifi- cant or not, it is important to assess the sensitivity of the hypothesis test. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
168 Expect the Unexpected: A First Course in Biostatistics This will be demonstrated through the use of examples. To assess the sen- sitivity of the test we must first determine practical (biological or clinical) significance. As an example, consider the comparison of mean triglyceride levels for two groups. The researcher might decide that a difference in means of 5 mg/dl is not biologically important, but a difference of 20 mg/dl is important. Researchers determine practical importance using their good judgment and experience. Suppose that we found a statistically significant difference in the mean triglyceride levels. The researcher produces a 95% confidence interval for the difference in means and he finds that the difference in means is between 2.3 mg/dl to 4.7 mg/dl. The researcher concludes that the means are statistically different, but the difference is not biologically (or clinically) important. In this instance, the test is highly sensitive since it can detect differences in means which have no practical significance. Now suppose a scenario where the p -value is large, so we fail to reject the null hypothesis that the means are equal. The researcher produces a 95% confidence interval for the difference in means and finds that the difference in means is between - 2 . 5 mg/dl to 24.1 mg/dl. The maximum error of the estimate is very large. Perhaps the failure to reject the null hypothesis was caused by an inadequate sample size. The test is not sensitive (also said not powerful) enough to detect a difference of biological importance. A large p -value should not automatically be interpreted as evidence in support of the null hypothesis, and a small p -value should not automatically be interpreted as evidence in support of practical significance. All biologists should be ultimately interested in biological importance, which may be assessed using confidence intervals. Example 10.1. We want to compare the lipid content (% of weight) of the lake whitefish Coregonus clupeaformis in two large neighboring lakes. The focus of the study was on medium sized fish, from 600 grams to 1,000 grams. We collected n 1 = 175 fish from lake 1 and n 2 = 225 fish from lake 2. The observed samples means and standard deviations are x 1 = 7 . 18, x 2 = 7 . 31, s 1 = 0 . 55 and s 2 = 0 . 70. We test H 0 : μ 1 - μ 2 = 0 against H 1 : μ 1 - μ 2 6 = 0. The observed value of the test statistic for this large sample test is z 0 = x 1 - x 2 p s 2 1 /n 1 + s 2 2 /n 2 = 7 . 18 - 7 . 31 p (0 . 55) 2 / 175 + (0 . 70) 2 / 225 = - 2 . 08 . The p -value is (approximately) equal to 2 P ( Z > | z 0 | ) = 2 P ( Z > 2 . 08) = 2 (1 - 0 . 9812) = 0 . 0376 . At a level of significance of α = 0 . 05, we can reject Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Comparison of Two Independent Samples 169 the hypothesis that the lake whitefish have equal mean lipid content in both lakes. Using their good judgment and experience the researchers had deter- mined before hand that the absolute difference | μ 1 - μ 2 | would have to be at least 1 to be of biological importance. The biological significance cannot be determined from the p -value. We must analyze the error of the estimate of μ 1 - μ 2 . A point estimate for μ 1 - μ 2 is x 1 - x 2 = - 0 . 13 and its estimated stan- dard error is p s 2 1 /n 1 + s 2 2 /n 2 = 0 . 0625 . A 95% (approximate) confidence interval for μ 1 - μ 2 is x 1 - x 2 ± 1 . 96 s s 2 1 n 1 + s 2 2 n 2 = - 0 . 13 ± 0 . 1225 = [ - 0 . 25; - 0 . 01] . We are 95% confident that | μ 1 - μ 2 | < 1. The statistically significant difference between the means has no biological importance. 10.3 Confidence Intervals and Tests for Means: Small Samples In this section, we consider the same problem of comparison of the means μ 1 and μ 2 of two independent populations, in the case of small samples. We use the same notation as in Section 10.2. In addition, we suppose that both X 1 and X 2 are normally distributed. Under this assumption, by Theorem 7.3, we know that X 1 has an N ( μ 1 , σ 2 1 /n 1 ) distribution, and X 2 has an N ( μ 2 , σ 2 2 /n 2 ) distribution. Therefore, X 1 - X 2 has an N ( μ 1 - μ 2 , σ 2 1 n 1 + σ 2 2 n 2 ) distribution . We consider two cases: (1) the population variances are equal; (2) the population variances are not equal. Case (1). Normal Populations with Equal Variances In this case, the underlying assumptions of our model are independent normal populations with equal variances: σ 2 1 = σ 2 2 . In addition, the sample sizes could be small. We denote the common variance by σ 2 . With the added assumption of homogeneity of the variance, the standardization of the estimator X 1 - X 2 becomes X 1 - X 2 - ( μ 1 - μ 2 ) σ p 1 /n 1 + 1 /n 2 has an N (0 , 1) distribution. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
170 Expect the Unexpected: A First Course in Biostatistics Since σ 2 is unknown, we cannot base our inference on this statistic. Denoting by S 2 i the sample variance from population i , for i = 1 , 2, and using the fact that E ( S 2 i ) = σ 2 i = σ 2 , this means that both S 2 1 and S 2 2 are unbiased estimators of the common variance σ 2 . We combine them to obtain a better estimator of σ 2 . One possible combination is to take a weighted average of the variances with weights based on their respective degrees of freedom. This gives us an unbiased estimator of σ 2 , known as the pooled sample variance : S 2 p = ν 1 ν 1 + ν 2 S 2 1 + ν 2 ν 1 + ν 2 S 2 2 = ( n 1 - 1) S 2 1 + ( n 2 - 1) S 2 2 n 1 + n 2 - 2 , where ν i = n i - 1, for i = 1 , 2. The pooled sample standard deviation is S p = q S 2 p . As we replace σ by S p in the standardization of X 1 - X 2 , we get the following studentization: X 1 - X 2 - ( μ 1 - μ 2 ) S p p 1 /n 1 + 1 /n 2 has a T ( n 1 + n 2 - 2) distribution. (10.3) For testing H 0 : μ 1 - μ 2 = δ 0 , we use the test statistic: T 0 = X 1 - X 2 - δ 0 S p p 1 /n 1 + 1 /n 2 . If H 0 is true, T 0 has a T ( n 1 + n 2 - 2) distribution. A hypothesis test based on this test statistic is called Student’s two-sample t -test . The p -value is given in Table 10.2, where t 0 = x 1 - x 2 s p p 1 /n 1 + 1 /n 2 is the observed value of the test statistic T 0 , and T has a T ( n 1 + n 2 - 2) distribution. Table 10.2 The p -value for comparison of two means: σ 2 1 = σ 2 2 Alternative Hypothesis p -value H 1 : μ 1 - μ 2 > δ 0 P ( T > t 0 ) H 1 : μ 1 - μ 2 < δ 0 P ( T < t 0 ) H 1 : μ 1 - μ 2 6 = δ 0 2 P ( T > | t 0 | ) A (1 - α ) 100% confidence interval for μ 1 - μ 2 is x 1 - x 2 ± t s p r 1 n 1 + 1 n 2 , Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Comparison of Two Independent Samples 171 where t is a value such that P ( - t < T < t ) = 1 - α, and T has a T ( n 1 + n 2 - 2) distribution. This means that P ( T > t ) = α/ 2, i.e. t = t α/ 2 ,n 1 + n 2 - 2 . Example 10.2. An agriculture researcher wants to test the claim that on average, a new fertilizer yields taller plants at maturity. A completely ran- domized design is used to generate the data. Sixteen similar plots with one seedling (the experimental units) are randomly assigned to the treatments, which in this case are the new and the old fertilizer. A balance design is used, i.e. both treatment groups are of equal size. The plants are measured at maturity (in cm). Here are the data: Old Fertilizer New Fertilizer 46.1 49.8 37.7 51.5 54.2 50.7 44.7 50.7 30.9 41.9 38.5 36.4 38.0 59.4 55.0 41.9 Summary Data Size Mean Variance n 1 = 8 x 1 = 43 . 14 s 2 1 = 71 . 65 n 2 = 8 x 2 = 47 . 79 s 2 2 = 52 . 66 The researcher wants to test H 0 : μ 1 - μ 2 = 0 against H 1 : μ 1 - μ 2 < 0 using Student’s two-sample t -test. Figure 10.1 gives an overlay of the normal probability plots for the two samples. There are no systematic tendencies away from the lines, hence we do not have strong evidence against normality. Furthermore, the slopes of the lines are similar. So it appears that the equal variance assumption holds. To further assess this underlying assumption, we can also do a comparative box plot analysis (see Figure 10.1). The first sample (old fertilizer) appears to be slightly more spread out, but this difference in variability is not striking. We do not have strong evidence against the equal variance assumption. It is reasonable to assume that the populations are normal with equal variances. The pooled sample variance and standard deviation are s 2 p = ( n 1 - 1) s 2 1 + ( n 2 - 1) s 2 2 n 1 + n 2 - 2 = 62 . 155 and s p = 62 . 155 = 7 . 8838 . The observed test statistic is t 0 = x 1 - x 2 s p p 1 /n 1 + 1 /n 2 = 43 . 14 - 47 . 79 7 . 8338 p 1 / 8 + 1 / 8 = - 1 . 18 . Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
172 Expect the Unexpected: A First Course in Biostatistics Fig. 10.1 Normal probability plots and comparative box plots for the plant heights The p -value is P ( T < t 0 ) = P ( T < - 1 . 18) = P ( T > 1 . 18) , where T has a T ( n 1 + n 2 - 2) = T (14) distribution. Referring to row ν = 14 in Table 18.4, 1.18 falls between 0 . 692 and 1 . 345, which have areas to the right of 0 . 25 and 0 . 10. Thus, 0 . 10 < p -value < 0 . 25. Using a statistical package, we see that p -value = 0.129. At a significance level of α = 0 . 05, we cannot reject H 0 . The data do not appear to support the hypothesis that the use of the new fertilizer produces taller plants. A 95% confidence interval for μ 1 - μ 2 is x 1 - x 2 ± t s p r 1 n 1 + 1 n 2 = - 4 . 65 ± 8 . 4554 = [ - 13 . 11 , 3 . 81] , where t = t 0 . 025 , 14 = 2 . 145. We are 95% confident that the difference in means is from - 13 . 11 cm to 3 . 81 cm. We are highly confident that the absolute difference in means is not larger than 14 cm. However we cannot say the same about 5 cm, since - 5 lies in the confidence interval. In the next example, we see that we can sometimes use a log- transformation to satisfy the underlying conditions to use Student’s two- sample t -test. Example 10.3. Dichloromethane is a volatile liquid that is widely used as a solvent. A chemical engineer wants to compare the dichloromethane concentration at two treatment water plants near industrial facilities. She Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Comparison of Two Independent Samples 173 suspects that the distributions of the dichloromethane concentration are skewed to the right due to occasional higher discharges from the industrial facilities. She verifies her hunch with histograms (see Figure 10.2). She decides to apply a log transformation, that is, the new measure- ments are read in ln( μg/L ). The normal probability plots for the data in the original scale and the log scale are given in Figure 10.3. It is evident from the normal probability plots that the data in the original scale are Fig. 10.2 Histograms for the dichloromethane concentrations from plants 1 and 2 Fig. 10.3 Normal probability plots for the concentrations and log-concentrations Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
174 Expect the Unexpected: A First Course in Biostatistics not normal, and furthermore, it appears that the variances are not equal. However, the log data appears to be normal and the variances appear to be equal since the lines in the probability plots are nearly parallel. It is safe to assume that the log concentrations from the two plants follow normal distributions with equal variances. To compare the dichloromethane concentration at the two plants, the chemical engineer tests H 0 : μ 1 - μ 2 = 0 against H 1 : μ 1 - μ 2 6 = 0, where μ i is the mean of the log concentrations from plant i , for i = 1 , 2 . The summary data for the log concentrations are Plant n x s 2 1 25 2.934 1.162 2 25 2.664 1.209 The pooled sample variance is s 2 p = ( n 1 - 1) s 2 1 + ( n 2 - 1) s 2 2 n 1 + n 2 - 2 = 1 . 1855 . The observed value of Student’s two sample t -test statistic is t 0 = x 1 - x 2 s p p 1 /n 1 + 1 /n 2 = 0 . 88 . The p -value is 2 P ( T > | t 0 | ) = 2 P ( T > 0 . 88) , where T follows a T ( n 1 + n 2 - 2) = T (48) distribution. We cannot find the range of the p -value, using Table 18.4, since this table does not include the row ν = 48. We can approximate the p -value using the row ν = . The approximate interval is 0 . 2 < p -value < 0 . 5. Using a statistical software the chemical engineer computed p -value = 0 . 385. Since the p -value is large, we should not reject the hypothesis that the mean log-concentrations are the same. It appears that the means of the log concentrations are not different. In Example 10.3, we transformed the data using a logarithm. We did this because Student’s two-sample t -test requires that the populations fol- low a normal distribution. After inspecting the transformed data, the sam- ples appeared to come from normal populations with equal variances, thus we could safely compare the means of the transformed data with Student’s two sample t -test. Note that, when comparing the means of the log transformed data, we are actually comparing the geometric means of the data on the original scale. To clarify the distinction between the mean and the geometric mean Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Comparison of Two Independent Samples 175 (for the population or the sample), we introduce the following definition. Definition 10.1. Let X 1 , X 2 , . . . , X n be a random sample from a popula- tion represented by the random variable X . The geometric mean of the population is G = e μ , where μ = E (ln X ). An estimate for G is the (sam- ple) geometric mean defined by g = e (1 /n ) n i =1 ln x i = ( Q n i =1 x i ) 1 /n , where x 1 , . . . , x n are the observed values of the random sample X 1 , . . . , X n . Example 10.3 (continued). We construct a 95% confidence interval for the difference in means of the log-concentrations for the data from Ex- ample 10.3. Since it is reasonable to assume that the two populations of log-concentrations are independent and normally distributed with equal variances, then a 95% confidence interval for μ 1 - μ 2 is x 1 - x 2 ± t s p r 1 n 1 + 1 n 2 = 0 . 270 ± 0 . 61930 = [ - 0 . 349 , 0 . 889] , where t = 2 . 011 satisfies 95% = P ( - t < T < t ) and T follows a T (48) distribution. (The value of t = 2 . 011 was obtained using a statistical pack- age.) We are 95% confident that μ 1 - μ 2 is between - 0 . 349 and 0 . 889 (in ln( μg/L )). Since 0 lies within the confidence interval, the means of the log-concentrations do not appear to be different. We denote by G i the geometric mean of the population i , consisting of the dichloromethane concentrations (in μg/L ) from plant i , for i = 1 , 2. Note that G i = e μ i , where μ i is the mean of the log-concentration from plant i , for i = 1 , 2. Exponentiating the difference in means gives us the ratio of the geometric means, that is e μ 1 - μ 2 = e μ 1 /e μ 2 = G 1 /G 2 . Since we are 95% confident that - 0 . 349 < μ 1 - μ 2 < 0 . 889, then we are also 95% confident that 0 . 71 = e - 0 . 349 < G 1 /G 2 < e 0 . 889 = 2 . 43. Since 1 lies within the interval, there appears to be no difference between the geometric means of the concentrations. Case (2). Normal Populations with Unequal Variances The assumption of equality of the two variances is sometimes not rea- sonable. So we should try to adapt our techniques to the case of unequal variances: σ 2 1 6 = σ 2 2 . This is known as the Behrens-Fisher problem . There are exact solutions to Behrens-Fisher problem (see [21]). These solutions are beyond the scope of this book. We present an approximate solution. In 1938, Welch [71] proposed an approximate solution to the Behrens- Fisher problem. Welch argued that the inference concerning μ 1 - μ 2 for Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
176 Expect the Unexpected: A First Course in Biostatistics two independent normal population can be based on X 1 - X 2 - ( μ 1 - μ 2 ) p S 2 1 /n 1 + S 2 2 /n 2 , (10.4) which follows approximately a T distribution with ν degrees of freedom, where ν = ( s 2 1 /n 1 + s 2 2 /n 2 ) 2 ( s 2 1 /n 1 ) 2 / ( n 1 - 1) + ( s 2 2 /n 2 ) 2 / ( n 2 - 1) . (10.5) ν is called Welch’s number of degrees of freedom . It follows that we can construct the following approximate (1 - α ) 100% confidence interval for μ 1 - μ 2 : x 1 - x 2 ± t s s 2 1 n 1 + s 2 2 n 2 , where P ( - t T t ) = 1 - α and T has a T ( ν ) distribution. Note that t = t α/ 2 since P ( T > t ) = α/ 2. Since the number of degrees of freedom must be an integer, we round down ν to the nearest integer. This rounding procedure is for conservative reasons. For instance, if ν = 7 . 8, we need to decide between ν = 7 and ν = 8. Since the value t 0 . 025 (7) = 2 . 365 is greater than t 0 . 025 (8) = 2 . 306, the 95% confidence interval based on the T distribution with ν = 7 degrees of freedom will be larger than the 95% confidence interval based on the T distribution with ν = 8 degrees of freedom. Hence, the smaller interval (based on ν = 8) may not actually contain the value μ 1 - μ 2 . By work- ing with a larger interval, we minimize the risk that the interval does not contain the value μ 1 - μ 2 . To test H 0 : μ 1 - μ 2 = δ 0 , we use the test statistic T 0 = X 1 - X 2 - δ 0 p S 2 1 /n 1 + S 2 2 /n 2 . A test based on this test statistic is often called Welch’s approximate two- sample t -test . This test is sometimes also called the Welch–Satterthwaite t -test or the Satterthwaite t -test. The p -value of this test is given in Table 10.3, where t 0 is the observed value of T 0 , T has a T ( ν ) distribution, and ν is given in (10.5). Welch’s method is not exact, but is generally a good approximation. However, if the population variances are equal, or if the sample sizes are Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Comparison of Two Independent Samples 177 Table 10.3 The p -value for comparison of two means: σ 2 1 6 = σ 2 2 Alternative Hypothesis p -value H 1 : μ 1 - μ 2 > δ 0 P ( T > t 0 ) H 1 : μ 1 - μ 2 < δ 0 P ( T < t 0 ) H 1 : μ 1 - μ 2 6 = δ 0 2 P ( T > | t 0 | ) rather small and the population variances can be assumed to be approxi- mately equal, it is more accurate to use Student’s two-sample t-test. Fur- thermore, when the population variances are equal, Student’s two-sample t -test is more powerful. Example 10.4. A ornithologist wants to compare the breeding biology of two different species of swallows. In particular, she wants to compare the average egg mass (in grams). Here are the summary data: Sample Size Mean Var. Species 1 18 1.872 0.264 Species 2 12 2.783 2.060 Min. Q1 Median Q3 Max. Species 1 0.900 1.400 1.900 2.300 2.800 Species 2 0.400 1.250 3.300 3.800 4.700 She wants to test H 0 : μ 1 - μ 2 = 0 against H 1 : μ 1 - μ 2 6 = 0, where μ i is the mean egg mass (in grams) for species i , for i = 1 , 2, with a two-sample t test. To verify the underlying assumptions of the test, she produced normal probability plots and comparative box plots (see Figure 10.4). There are no systematic tendencies away from the normal probability plot lines, hence we do not have strong evidence against normality. How- ever, the slopes of the lines are different. So it appears that the equal variance assumption may not hold. To further assess the underlying as- sumptions, we look at the comparative box plots. The egg mass for the second species are more spread out. It might not be sensible to assume that the population variances are equal. She decides to use Welch’s approximate two-sample t -test. The observed value of the test statistic is t 0 = x 1 - x 2 p s 2 1 /n 1 + s 2 2 /n 2 = - 2 . 11 . The p -value is 2 P ( T > | t 0 | ) = 2 P ( T > 2 . 11), where T has an approximate Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
178 Expect the Unexpected: A First Course in Biostatistics T ( ν ) distribution with the following number of degrees of freedom: ν = ( s 2 1 /n 1 + s 2 2 /n 2 ) 2 ( s 2 1 /n 1 ) 2 / ( n 1 - 1) + ( s 2 2 /n 2 ) 2 / ( n 2 - 1) = 12 . 89 . We round down the number of degrees of freedom to the nearest integer, that is ν = 12. Referring to row ν = 12 in Table 18.4, 2.11 falls be- tween 1 . 782 and 2 . 179, which have areas to the right of 0 . 05 and 0 . 025, respectively. Thus, 0 . 05 < p -value < 0 . 10. The p -value computed with a statistical package is 0.056. At a level of significance of α = 0 . 10, we can accept the alternative hypothesis that the egg mass of the two species are different on average. Fig. 10.4 Normal probability plots and comparative box plots for the egg masses Technology Component using R : Assume that the data for the two populations are saved in the numerical vectors x1 and x2 , respectively. To produce the overlayed QQ-plots for x1 (in blue) and x2 (in red), together with the fitted lines, we use: lmts=range(x1,x2) qqnorm(x1,ylim=lmts,col="blue") abline(mean(x1),sd(x1),col="blue") par(new=T) qqnorm(x2,ylim=lmts,col="red") abline(mean(x2),sd(x2),col="red") Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Comparison of Two Independent Samples 179 par(new=F) Remark: The above procedure gives the plot of the pairs ( z i , y i ) with the fitted line of equation y = ˆ μ + ˆ σz with ˆ μ = ¯ x and ˆ σ = s , for each of the two variables. This procedure is used for verifying the assumption that the two populations are normally distributed with equal variances. We say that the two populations are normally distributed if both plots seem to be linear. We say that the two populations have equal variances if the two lines seem to be parallel. To produce side-by-side boxplots, we use: boxplot(x1,x2) Remark: If you assigned the data to a dataframe (for example, using the function read.table() ), refer to the last item of the Technology component at the end of Section 7.3 to see how to produce side-by-side boxplots and overlayed normal probability plots in the same graphics window. To test the hypothesis H 0 : μ 1 = μ 2 against μ 1 6 = μ 2 and calculate a 95% confidence interval for μ 1 - μ 2 when the two populations are normally distributed with equal variances, we use: t.test(x1,x2,var.equal=TRUE) Remark: In the case of normal populations with unequal vari- ances, we use the same command as above, but without including var.equal=TRUE . To change the confidence level to 98% (or any other value), we use: t.test(x1,x2,conf.lev=0.98,var.equal=TRUE) To test the hypothesis H 0 : μ 1 = μ 2 against μ 1 > μ 2 when the two populations are normally distributed with equal variances, we use: t.test(x1,x2,alternative="greater",var.equal=TRUE) Remark: This procedure produces also a one-sided confidence interval which is not discussed in this book. In the case of normal popula- tions with unequal variances, we use the same command as above, but without including var.equal=TRUE . To test the hypothesis H 0 : μ 1 = μ 2 against μ 1 < μ 2 when the two populations are normally distributed with equal variances, we use: Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
180 Expect the Unexpected: A First Course in Biostatistics t.test(x1,x2,alternative="less",var.equal=TRUE) Remark: This procedure produces also a one-sided confidence interval which is not discussed in this book. In the case of normal popula- tions with unequal variances, we use the same command as above, but without including var.equal=TRUE . If you assigned the data to a dataframe (for example with the function read.table() ), we use: t.test(y~x, data) where in the dataframe data , we have a numerical vector y , and a categorical vector x identifying the two groups. You should also use the arguments var.equal and alternative as above. 10.4 Confidence Intervals and Tests for Proportions To compare two proportions p 1 and p 2 from two independent populations, we discuss inferences concerning the difference p 1 - p 2 . We begin discussing the point estimation of the difference in proportions. We follow the discus- sion with interval estimation and hypothesis testing. Consider two independent binomial experiments. The probability of success for the i -th experiment is p i and the number of successes is a random measurement denoted by Y i , for i = 1 , 2. The number of observations per experiment are n 1 and n 2 , respectively. The respective sample proportions are b p 1 = Y 1 /n 1 and b p 2 = Y 2 /n 2 . A natural estimator for p 1 - p 2 is b p 1 - b p 2 . The estimator is unbiased since the expected value of the estimator is E ( b p 1 - b p 2 ) = p 1 - p 2 . The variance of this estimator is equal to Var( b p 1 - b p 2 ) = Var( b p 1 ) + Var( b p 2 ) = p 1 (1 - p 1 ) n 1 + p 2 (1 - p 2 ) n 2 . Similar to the estimation of the difference in means, the larger the samples, the more precise the estimate. Assuming that both samples are large, as we standardize b p 1 - b p 2 , we obtain that (approximately) b p 1 - b p 2 - ( p 1 - p 2 ) p p 1 (1 - p 1 ) /n 1 + p 2 (1 - p 2 ) /n 2 has an N (0 , 1) distribution. (10.6) As in the one sample case, the latter standardization cannot be used di- rectly since the variance is unknown (it involves the true proportions p 1 and p 2 ). However if we use the estimated variance, it can be shown that (approximately) b p 1 - b p 2 - ( p 1 - p 2 ) p b p 1 (1 - b p 1 ) /n 1 + b p 2 (1 - b p 2 ) /n 2 has an N (0 , 1) distribution, (10.7) Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Comparison of Two Independent Samples 181 when n 1 and n 2 are large. What are large sample sizes? This is not an easy question to answer. A common rule of thumb is to not use the latter normal approximation when the observed number of successes or the observed number of failures in either one of the groups is less than 5. Using (10.7), we construct the (approximate) confidence interval for p 1 - p 2 at a level of confidence of (1 - α ) 100%. This interval is: b p 1 - b p 2 ± z s b p 1 (1 - b p 1 ) n 1 + b p 2 (1 - b p 2 ) n 2 , where z is value such that P ( - z < Z < z ) = 1 - α, and Z follows a standard normal distribution. In practice, we usually want to compare our data against a model with equal proportions. In other words, we would like to test the null hypothesis H 0 : p 1 - p 2 = 0 (or equivalently H 0 : p 1 = p 2 ) against an appropriate alternative hypothesis. Assuming that H 0 holds, then the probability of success is the same for all trials in both experiments. This common prob- ability is p = p 1 = p 2 . If this is the case, we can consider the n = n 1 + n 2 observations as a sample from a binomial distribution with n trials and probability p of success. The corresponding sample proportion (called the pooled sample proportion ) is b p = Y 1 + Y 2 n = n 1 n b p 1 + n 2 n b p 2 . Note that the pooled sample proportion is a weighted average of the respec- tive sample proportions, where the weights are the relative sample sizes. Assuming that H 0 is true (i.e. p 1 = p 2 ) the standardization in (10.6) becomes b p 1 - b p 2 - 0 p p (1 - p ) p 1 /n 1 + 1 /n 2 . Using b p instead of p , we get the following test statistic: Z 0 = b p 1 - b p 2 p b p (1 - b p ) p 1 /n 1 + 1 /n 2 . Since the p -value is the probability of observing a value as extreme as z 0 (which is the observed value of the test statistic) in the direction of the al- ternative hypothesis, this hypothesis must be taken in consideration when computing the p -value. We usually want to test the null hypothesis of equal- ity against one of the following three alternative forms. Table 10.4 gives the corresponding p -value in the three cases. Here Z has approximately a standard normal distribution. The test is a large sample test. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
182 Expect the Unexpected: A First Course in Biostatistics Table 10.4 The p -value for the comparison of two proportions Alternative Hypothesis p -value H 1 : p 1 - p 2 > 0 P ( Z > z 0 ) H 1 : p 1 - p 2 < 0 P ( Z < z 0 ) H 1 : p 1 - p 2 6 = 0 2 P ( Z > | z 0 | ) Example 10.5. Refer to Example 3.7. We denote by p 1 and p 2 the proportions of recaptured moths in the light-colored population, respec- tively the dark-colored population. Among the n 1 = 137 light-colored moths, y 1 = 18 were recaptured, whereas among the n 2 = 493 dark-colored moths, y 2 = 131 were recaptured. The proportions of recaptured moths are: b p 1 = 0 . 131 for the light-colored moths, and b p 2 = 0 . 266 for the dark- colored moths. Is there a statistical difference between the proportions of recaptured moths, at a level of significance of α = 0 . 05? If so, we wish to investigate the biological (practical) significance. We assume that the samples are independent. Since both sample sizes are large and the observed number of successes and of failures are not too small, we can safely perform a large sample test. To test H 0 : p 1 - p 2 = 0 against H 1 : p 1 - p 2 6 = 0, we compute the test statistic: z 0 = b p 1 - b p 2 p b p (1 - b p ) p 1 /n 1 + 1 /n 2 = 0 . 131 - 0 . 266 p (0 . 2365)(1 - 0 . 2365) p 1 / 137 + 1 / 493 = - 3 . 29 , where the pooled sample proportion is b p = y 1 + y 2 n 1 + n 2 = 18 + 131 137 + 493 = 0 . 2365 . The p -value is the probability of observing a difference in proportions as extreme as b p 1 - b p 2 = - 0 . 135, under the assumption that both proportions are equal. This is approximately equal to 2 P ( Z > | z 0 | ) = 2 P ( Z > 3 . 29), where Z follows a standard normal approximately. Using Table 18.3, we can argue that the p -value is 0 . 001. There is a statistical significant difference between the proportions. To investigate the biological significance, we construct a 95% confidence interval for p 1 - p 2 : b p 1 - b p 2 ± 1 . 96 q b p 1 (1 - b p 1 ) n 1 + b p 2 (1 - b p 2 ) n 2 = - 0 . 135 ± 0 . 0687 = [ - 0 . 204 , - 0 . 066] . We are 95% confident that the difference in proportion p 2 - p 1 is between 6.6% to 20.4%. Recall that biological significance cannot be determined by a test. Only by using their Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Comparison of Two Independent Samples 183 good judgement and experience can scientists determine what is biologi- cally significant. In this instance, if we assume that an absolute difference in proportions of at least 5% is biologically significant, then our findings are significant. Kettlewell hypothesized that a larger proportion of the dark-colored moths will be recaptured. We compute the corresponding p -value to test his claim. We want to test H 0 : p 1 - p 2 = 0 against H 1 : p 1 - p 2 < 0. The observed value of the test statistic is z 0 = - 3 . 29. The p -value for this left- tailed test is approximately equal to P ( Z < z 0 ) = P ( Z < - 3 . 29), where Z has a standard normal distribution. Using Table 18.2, we can argue that the p -value is 0.0005. 10.5 Problems Problem 10.1. It is believed that nutritional deprivation affects various components of the immune system, such as the tuberculin skin reactivity. In the study [58], a sample of 8 Sprague-Dawley male rats were fed with a normal diet consisting of 18% protein. A state of malnutrition was induced in another sample of 8 rats, which were fed with a diet consisting of only 5% protein. After 4 weeks, the rats were given an intradermal injection of 25 μg of purified protein derivative of tuberculin. The following table gives the skin reactivity diameter of erythema and induration (in mm) for the two groups of rats. 18% Protein Diet 5% Protein Diet 13.3 5.1 16.3 8.7 9.9 8.7 9.3 8.5 16.1 8.1 9.7 6.9 9.7 6.9 14.1 12.3 (a) Using a statistical software, verify the assumption that the two popu- lations are normal with equal variances. (b) Test the hypothesis H 0 : μ 1 = μ 2 versus H 1 : μ 1 > μ 2 , where μ 1 is the average level of tuberculin reactivity in the rats with a normal diet, and μ 2 is the average level of tuberculin reactivity in the malnourished rats. State your conclusion. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
184 Expect the Unexpected: A First Course in Biostatistics (c) Construct a 95% confidence interval for μ 1 - μ 2 . Can we say that the skin reactivity diameter in the malnourished rats is at least 7mm smaller than in the control group? Problem 10.2. A study was conducted to see if vitamin D and calcium supplementation has any effect on the risk of breast cancer (see [14]). In this study, 36,282 women were randomly assigned to two groups. The first group consisting of 18,176 women took a supplement of 1,000 mg of calcium with 400 IU of vitamin D daily. The second group consisting of 18,106 women was the placebo group. Both groups were followed-up for a period of 7 years. At the end of this period, it was found that 528 patients in the first group and 546 patients in the second group have developed breast cancer. Find a 90% confidence interval for the difference p 1 - p 2 , where p 1 denotes the proportion of women with breast among those who take a daily calcium-vitamin D supplement and p 2 is the proportion of women with breast cancer in the general population. Using this interval, can we say that calcium and vitamin D supplementation decreases or increases the risk of breast cancer? Problem 10.3. It is claimed that the supplementation with Coenzyme Q10 (CoQ10) during pregnancy reduces the rate of pre-eclampsia, or preg- nancy induced hypertension (see [65]). 235 pregnant women at risk of pre- eclampsia were randomly divided into two groups. The first group of 118 women received 200 mg of CoQ10 daily from the 20th week of pregnancy until delivery. The other group of 117 women received a placebo. 17 women in the CoQ10 group developed pre-eclampsia, compared with 30 women in the placebo group. Can we conclude that supplementation with CoQ10 reduces the risk of developing pre-eclampsia? Justify your conclusion using a test of hypothesis at significance level α = 0 . 05, and a 95% confidence interval. Problem 10.4. We continue with the situation in Problem 8.8. Assume that the two sample sizes are n 1 = 19 and n 2 = 12 and the two sample variances are s 2 1 = 0 . 81 and s 2 2 = 0 . 49. Is there enough evidence that fam- ilies from culled populations have a lower bunching intensity than families from non-culled populations? Use a test of hypothesis at level α = 0 . 005. Suppose that the two populations are normally distributed with equal vari- ances. Problem 10.5. Rhodamine 6G (R6G) is a fluorochrome mitochondrial dye with potential use for cancer treatment. One of the objectives of the Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Comparison of Two Independent Samples 185 study [24] was to show that the administration of R6G during a period of hypoglycemia reduces the growth rate of the Walker 256 tumor. A group of n 1 = 7 rats underwent implantation of 100 mg of viable fragments of Walker 256 carcinosarcoma, and after 48 hours they were administered R6G. The animals were fasted for 24 hours prior to the drug administration and 8 hours after. After a week, the tumors were weighed yielding a sample average and a sample standard deviation ¯ x 1 = 3 . 6 g and s 1 = 0 . 3 g. A control group of n 2 = 7 rats which received the same tumor transplant had the sample average and sample standard deviation ¯ x 2 = 7 . 1 g and s 2 = 0 . 7 g. Can we conclude that the administration of R6G reduces the tumor growth rate? Justify your answer using a test of hypothesis and a 95% confidence interval. Assume that the two populations have normal distributions with equal variances. Problem 10.6. Nurses interested in the effect of prenatal care divided 18 expectant mothers into two groups of size 9. Group 1 received prenatal con- sultations, while those in group 2 received no prenatal consultations. The summary statistics on birth weight for group 1 are ¯ x 1 = 99 . 6 ounces and s 1 = 6 . 82 ounces for group 1, respectively ¯ x 2 = 85 . 3 ounces and s 2 = 16 . 75 ounces for group 2. Construct a 95% confidence interval for μ 1 - μ 2 , where μ 1 denotes the average birth weight for babies whose mothers received pre- natal consultations, and μ 2 denotes the average birth weight for babies whose mothers received no prenatal consultations. Using this interval, can we conclude that babies whose mothers did not receive prenatal consulta- tions have a smaller weight at birth? Assume that the two populations are normal with unequal variances. Problem 10.7. Recent studies have shown that exercise is one of the most efficient ways of increasing the release of the growth hormone in children and teenagers. However, when exercise is combined with L-arginine sup- plementation, children seem to grow less. The height increase (in cm) in one year was recorded for two samples of 14-year old boys. The boys in the first group participated in a physical activity for at least 3 hours a week. The boys in the second group participated in the same activities, and had a supplementation of L-arginine included in their diet. The following table gives the summary of the data: Group Size Mean Standard Deviation Exercise n 1 = 50 ¯ x 1 = 23 . 5 s 1 = 5 . 6 Exercise and L-arginine n 2 = 60 ¯ x 2 = 21 . 4 s 2 = 6 . 9 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
186 Expect the Unexpected: A First Course in Biostatistics Use a large sample test to check if there is enough evidence that the L- arginine supplementation slows down the release of the growth hormone, when compared to exercise alone. Use the level α = 0 . 05. Problem 10.8. Measles is among the world’s most contagious diseases, which can cause severe complications and even death. This disease is easily preventable through vaccination. Herd immunity occurs when the vacci- nation of a significant portion of the population provides protection even to the non-vaccinated individuals. For measles, it is estimated that this portion should be at least 83%. During a measles outbreak in sub-Saharan Africa, in a sample of 989 children from a country in which the measles vaccination rate was higher than 83%, 43 became infected with measles, while in a sample of 845 children from a neighboring country in which the measles vaccination rate was lower than 83%, 148 became infected with measles. Using this data, can we conclude that measles vaccination is ef- fective for lowering the infection rate? Use a test of hypotheses of level α = 0 . 005. Hint: Compare the proportion p 1 of infected children in the country in which the vaccination rate was higher than 83% with the proportion p 2 of infected children in the country in which the vaccination rate was lower than 83%. Problem 10.9. A pH level of the soil between 5.3 and 6.5 is optimal for strawberries. To measure the pH level, a field is divided into two lots. In each lot, we randomly select 20 samples of soil. The data are below. Lot 1 5.66 5.73 5.68 5.77 5.73 5.71 5.68 5.58 6.11 5.37 5.67 5.53 5.59 5.94 5.84 5.53 5.64 5.73 5.30 5.65 Lot 2 5.25 6.73 6.25 5.21 5.63 6.41 5.89 6.76 5.13 5.64 5.94 6.16 5.64 6.54 5.79 5.91 6.17 6.90 5.76 6.07 (a) Using a statistical software, verify the assumption that the two popu- lations are normally distributed. (b) Using a statistical software, assess the assumption that the two popu- lations have equal variances. (c) Test the hypothesis H 0 : μ 1 = μ 2 versus H 1 : μ 1 6 = μ 2 , where μ 1 is Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Comparison of Two Independent Samples 187 the mean pH level of the soil in lot 1, and μ 2 is the mean pH level of the soil in lot 2. State your conclusion. Use the level α = 0 . 05. Problem 10.10. The table below gives the size of human groups involved in bear-human interactions at a particular park. The interactions were classified according to the behavior of the bear. Behavior Inquisitive Avoidance Mean x 1 = 3 . 5 x 2 = 2 . 4 Standard Deviation s 1 = 5 . 2 s 2 = 2 . 3 Sample Size n 1 = 65 n 2 = 55 Can we conclude that the mean size of the human groups involved in bear interactions are different according to the behavior of the bear? Use the level α = 0 . 05. Which test did you use to compare the two means? Problem 10.11. In a particular park it is believed that the type of be- havior observed during human-bear interactions depends on the type of location. In the front country, among 109 human-bear interactions, 35 in- volved a neutral or an avoidance behaviour. In the back country, among 83 human-bear interactions, 69 involved a neutral or an avoidance behavior. Can we conclude that the proportion of human-bear interactions that are classified as a neutral or an avoidance behavior is larger in the back country compared to the front country? Use the level α = 0 . 05. Problem 10.12. A botanist is testing a new tomato fertilizer. He was growing two different groups of 8 plants each, using the standard fertilizer for the first group, and the new fertilizer for the second group. After 70 days, he measured the tomato yield (in kg) for each plant. The data is given in the table below: Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
188 Expect the Unexpected: A First Course in Biostatistics Standard New Plant Fertilizer Plant Fertilizer 1 4.76 1 5.60 2 4.25 2 4.98 3 3.98 3 5.12 4 3.44 4 3.86 5 3.87 5 4.56 6 4.78 6 5.76 7 3.99 7 4.21 8 3.21 8 4.05 Mean 4.035 Mean 4.767 Standard Deviation 0.56 Standard Deviation 0.71 Find a 95% confidence interval for μ 1 - μ 2 , where μ 1 is the average tomato yield per plant using the standard fertilizer, and μ 2 is the average tomato yield per plant using the new fertilizer. Interpret the result. Problem 10.13. The Nobel Prize in Chemistry in 1937 was divided be- tween Norman Haworth for his work on carbohydrates and vitamin C, and Paul Karrer for his work on carotenoids, flavins and vitamins A and B2. Vitamin C is an ascorbic acid with antioxidant properties. A study is un- dertaken to compare the amount of ascorbic acid (in mg) in two popular brands of vitamin C (labeled as 100 mg). The summary of the data follows: Brand 1 Brand 2 Mean x 1 = 118 x 2 = 122 Standard Deviation s 1 = 1 . 2 s 2 = 1 . 75 Number of Tablets n 1 = 15 n 2 = 15 Assume that the amount of ascorbic acid in a tablet is normally distributed, and the variance of this amount is the same for the two brands. (a) Compute the pooled standard deviation for the two samples. (b) Give the range of the p -value of Student’s two-sample t -test to compare the mean amount of ascorbic acid per tablet for the two brands. What can we conclude? (Use a two sided test of level α = 0 . 01.) (c) Construct a 95% confidence interval for μ 1 - μ 2 , where μ 1 is the mean amount of ascorbic acid per tablet for brand 1, and μ 2 is the mean amount of ascorbic acid per tablet for brand 2. Problem 10.14. We want to compare the density of organisms (in number of organisms per square meter) at two different locations along a river. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Comparison of Two Independent Samples 189 Below are the descriptive statistics for two samples of size 12 each, taken from the two locations. Mean Standard Deviation Location 1 9,168.75 3,700.57 Location 2 2,168.33 815.26 Assume that the samples are selected from independent normal popula- tions with unequal variances. Can we conclude that the mean density of organisms at the two locations are different? Use the level α = 0 . 05. Problem 10.15. We want to compare the germination rate of a new strain of a plant against an old strain of the same plant. Below are the data. Germinated Did Not Germinate Total Old Strain 125 15 140 New Strain 152 8 160 (a) Can we conclude that the germination rates differ? Use the level α = 0 . 05. (b) Construct a 98% confidence interval for the difference between the ger- mination rates. Problem 10.16. Consider a study comparing two medications for severe bladder infections. The variable x is the length of time (in days) to recovery. For the n 1 = 15 patients who were given medication 1, we observed a mean recovery time of x 1 = 16 . 87 days. The mean recovery time was x 2 = 19 . 09 days for the n 2 = 18 patients who were given medication 2. (a) Here are overlayed quantile-quantile plot for the two samples of recovery times. Is it reasonable to assume that both populations of recovery times are normally distributed with equal variances? Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
190 Expect the Unexpected: A First Course in Biostatistics (b) Based on the following R output, compute the value of the pooled standard deviation s p . > t.test(x1,x2,var.equal=TRUE) Two Sample t-test data: x1 and x2 t = -5.174, df = 31, p-value = 1.304e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.105940 -1.349615 sample estimates: mean of x mean of y 16.86667 19.09444 (c) Based on the R output in (b), give a 95% confidence interval for difference between the mean recovery time on medication 1 and the mean recovery time on medication 2. (d) Based on the confidence interval from (c), which medication is best? Did you know? In 1923, the Nobel Prize committee credited the practical extraction of insulin to a team at the University of Toronto, and awarded the Nobel Prize in Physiology/Medicine to Frederick Banting and John James Richard Macleod for the discovery of insulin. Banting, shared his prize with his assistant Charles Best, who was chosen on a flip of coin to help him carry out the lab work in the summer of 1921. MacLeod shared the prize with the biochemist James Collip, who helped to purify the extracts from ox pancreas. The first injection of insulin was given at the Toronto General Hospital to a 14-year old dying diabetic patient in January 1922. The patent for insulin was sold to the University of Toronto for one dollar. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help