Chap 13 Expect_The_Unexpected_A_First_Course_In_Biostatist..._----_(Statistics) (5)

pdf

School

University of Ottawa *

*We aren’t endorsed by this school

Course

2379

Subject

Mathematics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by GrandUniverseHyena41

Chapter 13 Regression and Correlation Biologists are often interested in the relationship between two variables. We learn in this chapter to describe the relationship between two quantitative variables with a correlation analysis. We also learn to describe one of the variables as a linear function of the other variable. This is called a regression analysis. 13.1 Sample Covariance and Correlation In this section, we introduce some techniques that describe the association between two quantitative variables. We consider two examples. In Exam- ple 13.1, we describe the association between the heights of mothers and daughters. This is an example of a positive linear association, where the heights of the daughters tend to increase as the heights of the mothers in- crease. In Example 13.2, we examine the relationship between the number of colds and vitamin C. This is an example of a negative linear association. As the dosage of vitamin C increases, the number of colds tend to decrease on average. Consider n paired observations ( x i , y i ), for i = 1 , . . . , n , from a pair ( X, Y ) of random variables. We can use a scatter plot to describe the association between x and y . In Figure 13.1, we have an illustration of linear associations. For each scatter plot, we display a horizontal line at y and a vertical line at x . These lines define four quadrants. If there is a positive linear association between X and Y , then most of the points are going to lie in quadrants I and III, where ( x i - x )( y i - y ) is positive. While for a negative association, most of the points are going to lie in quadrants II and IV, where ( x i - x )( y i - y ) is negative. To describe the linear association between the two variables, we can use 225 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.

226 Expect the Unexpected: A First Course in Biostatistics the sample covariance d cov xy = ∑ n i =1 ( x i - x )( y i - y ) n - 1 = ( ∑ n i =1 x i y i ) - (1 /n )( ∑ n i =1 x i )( ∑ n i =1 y i ) n - 1 . It will be positive for positive linear associations and it will be negative for negative linear associations. So the covariance captures the sign (also called the direction) of a linear association. Fig. 13.1 An illustration of linear associations We now define a statistic which is based on the covariance. The sample correlation is r xy = d cov xy s x s y , where s x and s y are the respective sample standard deviations. The sam- ple correlation is also called Pearson’s correlation , or the product-moment correlation . The sample correlation satisfies the following properties which justify its suitability as a descriptive measure of the intensity of the linear association: • It is invariant to linear scaling. In other words, the correlation remains the same regardless if we measure height in millimeters, centimeters or meters. • It has the same sign as the covariance, so it is negative for negative linear associations and positive for positive linear associations. • A correlation is always between - 1 and 1. It is equal to 1 or - 1 if and only if the points ( x 1 , y 1 ) , . . . , ( x n , y n ) fall exactly on a line. Furthermore, if there is no association between X and Y , then the correlation should be near 0. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.

Regression and Correlation 227 If the relationship between X and Y is linear, then we interpret this relationship as being stronger as r approaches 1 or - 1 and as being weaker as r approaches 0. If the relationship between X and Y is not linear, then the sample correlation is more difficult to interpret. In Example 13.3, we have two variables that are strongly related, but the correlation is near zero. Example 13.1. The data below gives the heights (in cm) for a sample of n = 12 pairs of mother and daughter. Height Daughter 160 165 156 169 152 156 Mother 163 165 162 161 161 160 Daughter 162 156 161 160 164 162 Mother 164 159 164 161 163 168 Figure 13.2 gives the scatter plot of the height Y of the daughter against the height X of the mother. There appears to be a positive linear associ- ation between the two variables. The sample covariance is d cov xy = 4 . 9318 and the respective standard deviations are s x = 2 . 4664 and s y = 4 . 6928. The sample correlation between the heights of the daughters and the moth- ers is equal to r xy = d cov xy / ( s x s y ) = 0 . 426 . The intensity of the linear association between heights of the mother and the daughter is moderately weak. Fig. 13.2 Scatter plot for pairs of mother and daughter Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.

Your preview ends here