chap 10 Expect_The_Unexpected_A_First_Course_In_Biostatist..._----_(Statistics) (3)

pdf

School

University of Ottawa *

*We aren’t endorsed by this school

Course

2379

Subject

Biology

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by GrandUniverseHyena41

Chapter 10 Comparison of Two Independent Samples Biologists are often interested in the comparison of groups. Consider the following examples. Do two different species of swallow produce similar eggs on average? Does a type of fertilizer produce larger plants on average, com- pared to another type of fertilizer? In this chapter, we introduce methods to compare two independent groups. We discuss how interval estimation and hypothesis testing can be used to infer whether there are differences be- tween the two populations. We first discuss techniques to compare means, and end the chapter with techniques to compare proportions. 10.1 Study/Experimental Design When analyzing data, it is important to consider the design of the study or experiment. This is especially true when comparing groups. The design of the study often dictates the probability model that will be used to describe the data collection process from the populations of interest. It is only when the probability model is appropriate, that we can generalize our results from the samples to the populations. Scientists often want to compare groups that are outcomes from a con- trolled experiment which is run under different experimental conditions. For example, a simple experiment might be designed to test a claim that a particular type of fertilizer produces taller plants compared to another type of fertilizer. The response variable in this instance is the height of the plants. The primary factor for this experiment is the fertilizer. The levels of the factor are called treatments . So the treatments in this case are the types of fertilizer. In a controlled experiment we assign the treatments to the experimental units, which could be plots with one seedling in this case. This assignment determines the treatment groups. 163 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.

164 Expect the Unexpected: A First Course in Biostatistics It is possible that there are uncontrolled factors that might affect the response variable. These are called nuisance factors . For example the genetic predisposition of a seedling to produce a tall plant might be a nuisance factor. Randomization is used to average the effects of the nuisance factors over the different groups. We should randomly assign the types of fertilizer to the seedlings. The purpose of a controlled experiment is to determine if there is a cause-and-effect relationship. In our case, this means that the use of the new fertilizer produces taller plants on average. If the controlled experi- ment is randomized and the treatment groups are statistically significantly different, then we can be confident that there is indeed a cause-and-effect relationship. One of the simplest experimental designs is called a completely random- ized design . For completely randomized designs, the levels of the primary factor are randomly assigned to the experimental units. Our fertilizer ex- periment has such a design. The tools introduced in this chapter apply to experiments with a completely randomized design. In some circumstances, the distribution of the response variable can be highly spread-out. This variability might be due to nuisance factors. For example, females and males might react differently to a particular drug. This noise can be prohibitive, in the sense that we would need very large samples in order to identify significant treatment effects. To reduce this noise we can construct homogeneous subgroups, called blocks . The variance within each block should be smaller than the variance of the entire sample. So the estimates within the blocks should be more precise. As we combine the estimates across blocks, we should obtain an estimate of the treatment effect that is more precise than without blocking . If we randomly assign all of the treatments to the experimental units within each block, then we say that the experiment has a randomized com- plete block design . As an example, if we want to compare a drug to a placebo and we believe that the gender has also an effect on the response, we divide the subjects into blocks according to their gender. If we have ten subjects of each gender, we randomly assign the drug to five subjects of each gender. The remainder of the subjects are given the placebo. We do not discuss the analysis of block designs in this chapter. The techniques presented in this chapter do not apply only to completely randomized experiments. They are also applicable in a non-experimental setting. Consider the study [64], where the authors compare the breeding biology of the Welcome Swallow in Australia and New Zealand. The factor Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.

Comparison of Two Independent Samples 165 (in this case, the location) is not assigned to the unit of study (the bird). Such a study is called an observational study . An observational study can identify associations, but not causality. We are not randomly assigning the treatments to the units of study. So there is a danger that any association that we find between the response and the factor may be due to some third variable, called a lurking variable, which is not evenly distributed among the groups. Maybe it is access to food that caused the difference in breeding biology, and not the location. So we should not say that it is the observational factor that caused the significant result. However, we can say that there is an association. The techniques in this chapter can be used to compare samples from an observational study as long as it is reasonable to assume that observations within the samples are independent, and that there is independence between the two samples. 10.2 Confidence Intervals and Tests for Means: Large Samples In this section, we discuss techniques to compare the means of two inde- pendent populations, when both sample sizes are large. We use X 1 and X 2 to denote the random measurements from population 1 and population 2, respectively. Their means are denoted by μ 1 = E ( X 1 ) and μ 2 = E ( X 2 ) and their variances are denoted by σ 2 1 = Var( X 1 ) and σ 2 2 = Var( X 2 ). We assume that we have a random sample of size n 1 ≥ 40 from pop- ulation 1, whose mean and variance are denoted by X 1 , respectively S 2 1 . Similarly, we have a random sample of size n 2 ≥ 40 from population 2, whose mean and variance are denoted by X 2 , respectively S 2 2 . From Ex- ample 7.8, we know that E ( X 1 ) = μ 1 , Var( X 1 ) = σ 2 1 n 1 , E ( X 2 ) = μ 2 , Var( X 2 ) = σ 2 2 n 1 . To compare the two means, we examine the difference in means μ 1 - μ 2 . We begin the discussion with point estimation. A natural estimator of μ 1 - μ 2 is the difference in sample means X 1 - X 2 . This estimator is unbiased since its expected value is E ( X 1 - X 2 ) = E ( X 1 ) - E ( X 2 ) = μ 1 - μ 2 . The variance of the estimator is Var( X 1 - X 2 ) = Var( X 1 ) + Var( X 2 ) = σ 2 1 n 1 + σ 2 2 n 2 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-09-29 20:15:50. Copyright © 2017. World Scientific Publishing Company. All rights reserved.

Your preview ends here