Chap 13 Expect_The_Unexpected_A_First_Course_In_Biostatist..._----_(Statistics) (5)

pdf

School

University of Ottawa *

*We aren’t endorsed by this school

Course

2379

Subject

Mathematics

Date

Jan 9, 2024

Type

pdf

Pages

18

Uploaded by GrandUniverseHyena41

Report
Chapter 13 Regression and Correlation Biologists are often interested in the relationship between two variables. We learn in this chapter to describe the relationship between two quantitative variables with a correlation analysis. We also learn to describe one of the variables as a linear function of the other variable. This is called a regression analysis. 13.1 Sample Covariance and Correlation In this section, we introduce some techniques that describe the association between two quantitative variables. We consider two examples. In Exam- ple 13.1, we describe the association between the heights of mothers and daughters. This is an example of a positive linear association, where the heights of the daughters tend to increase as the heights of the mothers in- crease. In Example 13.2, we examine the relationship between the number of colds and vitamin C. This is an example of a negative linear association. As the dosage of vitamin C increases, the number of colds tend to decrease on average. Consider n paired observations ( x i , y i ), for i = 1 , . . . , n , from a pair ( X, Y ) of random variables. We can use a scatter plot to describe the association between x and y . In Figure 13.1, we have an illustration of linear associations. For each scatter plot, we display a horizontal line at y and a vertical line at x . These lines define four quadrants. If there is a positive linear association between X and Y , then most of the points are going to lie in quadrants I and III, where ( x i - x )( y i - y ) is positive. While for a negative association, most of the points are going to lie in quadrants II and IV, where ( x i - x )( y i - y ) is negative. To describe the linear association between the two variables, we can use 225 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
226 Expect the Unexpected: A First Course in Biostatistics the sample covariance d cov xy = n i =1 ( x i - x )( y i - y ) n - 1 = ( n i =1 x i y i ) - (1 /n )( n i =1 x i )( n i =1 y i ) n - 1 . It will be positive for positive linear associations and it will be negative for negative linear associations. So the covariance captures the sign (also called the direction) of a linear association. Fig. 13.1 An illustration of linear associations We now define a statistic which is based on the covariance. The sample correlation is r xy = d cov xy s x s y , where s x and s y are the respective sample standard deviations. The sam- ple correlation is also called Pearson’s correlation , or the product-moment correlation . The sample correlation satisfies the following properties which justify its suitability as a descriptive measure of the intensity of the linear association: It is invariant to linear scaling. In other words, the correlation remains the same regardless if we measure height in millimeters, centimeters or meters. It has the same sign as the covariance, so it is negative for negative linear associations and positive for positive linear associations. A correlation is always between - 1 and 1. It is equal to 1 or - 1 if and only if the points ( x 1 , y 1 ) , . . . , ( x n , y n ) fall exactly on a line. Furthermore, if there is no association between X and Y , then the correlation should be near 0. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Regression and Correlation 227 If the relationship between X and Y is linear, then we interpret this relationship as being stronger as r approaches 1 or - 1 and as being weaker as r approaches 0. If the relationship between X and Y is not linear, then the sample correlation is more difficult to interpret. In Example 13.3, we have two variables that are strongly related, but the correlation is near zero. Example 13.1. The data below gives the heights (in cm) for a sample of n = 12 pairs of mother and daughter. Height Daughter 160 165 156 169 152 156 Mother 163 165 162 161 161 160 Daughter 162 156 161 160 164 162 Mother 164 159 164 161 163 168 Figure 13.2 gives the scatter plot of the height Y of the daughter against the height X of the mother. There appears to be a positive linear associ- ation between the two variables. The sample covariance is d cov xy = 4 . 9318 and the respective standard deviations are s x = 2 . 4664 and s y = 4 . 6928. The sample correlation between the heights of the daughters and the moth- ers is equal to r xy = d cov xy / ( s x s y ) = 0 . 426 . The intensity of the linear association between heights of the mother and the daughter is moderately weak. Fig. 13.2 Scatter plot for pairs of mother and daughter Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
228 Expect the Unexpected: A First Course in Biostatistics Example 13.2. Consider an experiment where different daily dosages of vitamin C (in mg) were randomly assigned to subjects. For each subject, we count the number of times that the person contracted the common cold over a period of three years. Here are the data: Dosage (in mg) Number of Colds 0 12 10 10 15 14 15 14 7 8 11 9 30 10 12 9 8 11 50 7 10 8 4 6 Figure 13.3 gives the scatter plot of the number Y of colds against the daily dosage X of vitamin C. There appears to be a negative linear association between X and Y . The sample covariance is d cov xy = - 34 . 0132 and the respective standard deviations are s x = 18 . 9789 and s y = 2 . 8074. The sample correlation between the two variables is equal to r xy = - 0 . 638 . The intensity of the linear association between the number of colds and the dosage of vitamin C is moderately strong. Fig. 13.3 Scatter plot: number of colds against vitamin C It is recommended that you always produce a scatter plot. The scat- ter plot is a useful diagnostic tool . It allows us to verify the underlying assumption of linearity between y and x as seen in the following example. Example 13.3. To investigate the effect of a particular stimulant on re- action times, the researchers randomly assigned a dosage of the stimulant to the subjects. The treatment groups are 0 mg, 10 mg, 20 mg, 30 mg, 40 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Regression and Correlation 229 mg and 50 mg. Each group contains 10 subjects. The response y is the reaction time (in seconds) and the predictor x is the assigned dosage (in mg). The correlation is r xy = - 0 . 1673, since the slope is negative, it ap- pears that on average an increase in the dosage will decrease the reaction time. Furthermore, if we assume that the association is linear, then the association is weak (since r xy is zero). The investigators were prudent and were not ready to conclude that the stimulant has little to no effect on the reaction times. They produced the scatter plot of the pairs ( x i , y i ) (see Figure 13.4), and noticed that the rela- tionship between y and x does not appear to be linear. They assessed that the correlation does not adequately measure the strength of the association in this case. In fact using techniques that are outside the scope of this book, it can be shown that there is a strong association between the reaction times and the dosage of the stimulant. It is just that this association is not linear. For a fixed dosage of the stimulant, the reaction time does not vary a lot. Fig. 13.4 The least squares line of reaction time We end this section with a short discussion on causation . Scientists generally want to establish a causality relationship, i.e. a relationship in which the response (or effect) is a consequence of a cause (or causal factor). The scientific method can be used to establish a cause-and-effect relation- ship. The method involves performing experiments in which we can control the cause and the possible causal factors, and observe a significant effect. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
230 Expect the Unexpected: A First Course in Biostatistics However, in biology and medicine, it is often unethical to assign a factor to a unit. For example, it is unethical to ask someone to smoke two cigarettes a day. Nevertheless, there are acceptable methods that can be used to dis- tinguish causal from noncausal associations. Refer to Chapter 2 in [57] for a discussion on establishing causality in the context of epidemiology. An aspect that is important for a causal association is the strength of association. If there is a causal relationship, then there is an association between the cause and effect. Therefore, a strong correlation between two variables can hint at the existence of a causal relationship. But a large correlation alone is not proof of causation. Let us consider an example. Say we select a few communities at random, and we measure the correlation between the number of bananas consumed in a month per capita and the prevalence of a disease. Say the confidence interval for the correlation is [ - 0 . 9; - 0 . 85]. We have observed a strong correlation between the two variables. Does this mean that eating more bananas causes the risk of developing the disease to decrease? It is doubtful. In this case, it is likely that there are lurking variables (such as a healthy lifestyle) that are causes of both eating more bananas and the decreased risk of disease. A significant correlation between two variables is not sufficient evidence for a cause-and-effect relationship, however it does hint at the possibility of the existence of such a relationship. A significant correlation between two variables is evidence of an association between the two variables. Technology Component using R : Suppose that x and y are numerical vectors of equal length. To compute the covariance between the two variables, we use cov(x,y) To compute the correlation between the two variables, we use cor(x,y) 13.2 Least Squares Line In this section, we begin by describing the association between a variable y (also called the response ) and a variable x (also called the predictor ) with a line of best fit. We assume that we have a random sample of paired observations ( x i , y i ) for i = 1 , . . . , n . Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regression and Correlation 231 Example 13.4 (Part 1). Consider the data from Example 13.2. The pre- dictor variable x is the dosage of vitamin C and the response variable y is the number of colds. For these data, the line of best fit is ˆ y = 12 . 0 - 0 . 0944 x , which is overlayed in the scatter plot in Figure 13.5. We can use the line to estimate the mean number of colds in three years for a dosage of 35 mg: ˆ μ Y | x =35 = 12 . 0 - 0 . 0944 (35) = 8 . 696 . Fig. 13.5 Least squares line for the number of colds against dosage of vitamin C To find the line of best fit, denoted by ˆ y = ˆ α + ˆ β x , we will define what we mean by “best”. Consider the i -th case ( x i , y i ). The corresponding fitted value ˆ y i = ˆ α + ˆ β x i is the evaluation of the estimated line at x = x i . The difference between the i th observed response and the i -th fitted value is called the i -th residual e i = y i - ˆ y i . A residual is sometimes called an observed error. The sum of the squared residuals: L = n X i =1 h y i - α + ˆ β x i ) i 2 , is used as measure of fit. In some sense, L represents a distance between the observed responses and the estimated line. We say that the line of best fit is the line that minimizes L . This criterion of fit was independently proposed in the 18th century by the German mathematician Carl Friedrich Gauss and by the French mathematician Adrien-Marie Legendre. It is known as the method of least-squares . Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
232 Expect the Unexpected: A First Course in Biostatistics The minimum of the least-squares criterion L can be found by differen- tiating it with respect to ˆ α and ˆ β and by setting these partial derivatives equal to zero. We obtain a system of two equations in ˆ α and ˆ β that we need to solve. After some simplification, these equations can be shown to be n X i =1 y i = n ˆ α + ˆ β n X i =1 x i and n X i =1 x i y i = ˆ α n X i =1 x i + ˆ β n X i =1 x 2 i . (13.1) These equations are called the normal equations . As we isolate ˆ α in the first equation and substitute it in the second equation to obtain ˆ β , we get the least-squares estimates of the intercept ˆ α = n i =1 y i n - ˆ β n i =1 x i n , (13.2) and of the slope ˆ β = ( n i =1 x i y i ) - (1 /n )( n i =1 x i )( n i =1 y i ) ( n i =1 x 2 i ) - (1 /n )( n i =1 x i ) 2 = n i =1 ( x i - x )( y i - y ) n i =1 ( x i - x ) 2 . (13.3) All the quantities involved in this solution should seem familiar. Actu- ally the slope of the least-squares line has a few other useful representations, such as ˆ β = n i =1 ( x i - x )( y i - y ) / ( n - 1) n i =1 ( x i - x ) 2 / ( n - 1) = d cov xy s 2 x = r xy s y s x , where x and y are respectively the sample means of the predictors and the responses, s 2 x is the sample variance of the predictors, d cov xy is the sample covariance between x and y , and r xy is the sample correlation between x and y . Note that the slope of the least-squares line will always have the same sign as the sample correlation between the response and the predictor. Example 13.5. Refer to the mother-daughter sample of size n = 12 from Example 13.1. The response variable y is the height of the daughter and the predictor variable x is the height of the mother. We summa- rize the data with the following sums: x i = 1 , 951 . 0, y i = 1 , 923 . 0, x 2 i = 317 , 267 . 0, x i y i = 312 , 702 . 0 and y 2 i = 308 , 403 . Using (13.2) and (13.3) to compute the least squares estimates, we get the following estimated line b y = 0 . 8107 x + 28 . 4421. Figure 13.6 gives the scatter plot of the pairs ( x i , y i ) and the estimated regression line. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Regression and Correlation 233 Fig. 13.6 Least squares line for the heights of mothers and daughters The least squares line describes the central tendency of the response y as a function of the predictor x . We should also describe the dispersion about the line, since not all of the observations will fall on the line. We can measure the variability about the least squares line with the residual standard deviation which is defined as s e = s n i =1 e 2 i n - 2 = s n i =1 ( y i - ˆ y i ) 2 n - 2 . Note that n i =1 e 2 i / ( n - 2) is approximately the average squared deviation of the n responses, away from the least squares line. However instead of dividing by n , we divide by n - 2 which corresponds to the number of degrees of freedom in this case. To describe the variability about the center, we need to first estimate the center by estimating the slope and the intercept. This leads to a loss of 2 degrees of freedom. Example 13.4 (Part 2). Consider the number of colds data from Exam- ple 13.2. To describe the precision of least squares estimation, we compute the residual standard standard deviation. Using R , we get s e = 0 . 775 colds. So typically, the number of colds in three years is about 0.775 colds away from the least square line. Technology Component using R : Suppose that x and y are numerical vectors of equal length. We use lm(y~x) to compute the line of least squares. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
234 Expect the Unexpected: A First Course in Biostatistics We assign the estimated linear model to mod with the command mod=lm(y~x) . The commands mod$residuals and mod$df.residuals give the vector of the residuals and the corresponding degrees of free- dom, respectively. The following command will give the residual stan- dard deviation sqrt(sum((model$residuals)^2)/model$df.residual) To produce a scatter plot of y against x , we use plot(x,y) To overlay the least square line onto the plot, we use abline(lm(y~x)) 13.3 Problems Problem 13.1. The height of a child as an adult can be predicted using the child’s height at the age of 2. The following table gives the heights of 20 women (in cm), as adults and at the age of 2: Adult Height At Adult Height At Height ( y ) Age of 2 ( x ) Height ( y ) Age of 2 ( x ) 164.6 86.4 158.3 83.1 166.1 87.6 159.8 84.5 167.4 88.9 160.6 85.2 163.8 85.7 162.5 84.3 162.9 84.1 173.5 93.9 168.1 89.0 171.9 92.7 169.3 90.1 165.3 85.2 167.4 87.2 164.1 84.2 168.5 88.3 167.5 86.3 165.9 86.3 175.3 95.2 For this data, we have: 20 X i =1 y i = 3 , 322 . 8 , 20 X i =1 x i = 1748 . 2 20 X i =1 y 2 i = 552 , 414 . 5 20 X i =1 x 2 i = 153 , 028 , 20 X i =1 x i y i = 290 , 710 . 1 Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Regression and Correlation 235 (a) Calculate the estimated least squares line. (b) Find the sample correlation between the height as an adult and the height at the age of 2. (c) Estimate the mean height of a girl as an adult, whose height at age 2 is 87.2 cm. (d) Predict the height of a girl as an adult, whose height at age 2 is 84 cm. Problem 13.2. Melanoma is a type of skin cancer which forms from melanocytes (pigment-producing cells). Melanoma is considered as the most dangerous form of skin cancer. It is not the most common of the skin cancers in North America, but it does cause the most deaths. Melanoma is caused mainly by exposure to ultraviolet radiation (either from the sun or tanning beds). The authors of article [23] studied the association between melanoma mortality rates and the geographical latitude. The data is in the file SkinCancer.txt . The latitude ( x ) of the largest city in each state or province was used as an estimate of geographical center of population. The mortality rate ( y ) for the male population is the number of deaths per year per 100,000 individuals. (The mortality rates are age-standardized to ac- count for populations of different ages.) Here is a scatter plot of melanoma mortality rates for the male population against the latitude of the state or province. (a) Here are a few summary statistics: x = 40 . 3762; y = 1 . 3506; s x = 5 . 6851; s y = 0 . 4036 and d cov xy = n i =1 ( x i - x )( y i - y ) n - 1 = - 1 . 8003 . Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
236 Expect the Unexpected: A First Course in Biostatistics Compute the correlation between the melanoma mortality rate and the lat- itude. Based on the above scatter plot and the value of the correlation, describe the association between the melanoma mortality rate and the lat- itude. (b) Using the statistics from part (a), compute the least square line to de- scribe the melanoma mortality rates for the male population as a function of the latitude of the province or state. Give an interpretation to the slope of this line. (c) Consider the female population. Using statistical software, compute the least squares line to describe the melanoma mortality rates as a function of the latitude of the province or state. Give an interpretation to the slope of this line. Furthermore, construct a scatter plot of the melanoma mortality rates against the latitude. (d) Consider the scatter plot from part (c). There is a state/province with a much lower than expected mortality rate. Identify this state or province. Problem 13.3. Systolic arterial blood pressure (SBP) and diastolic ar- terial blood pressure (DBP) frequently display a linear relationship char- acterized by the systolic-versus-diastolic slope and the sample correlation (see [30]). The following table gives the SBP and the DBP for 15 men aged between 40 and 65: SBP ( y ) DBP ( x ) SBP ( y ) DBP ( x ) 112 63 156 100 120 69 124 82 135 70 99 56 142 82 105 65 132 76 124 73 115 67 144 89 119 71 134 76 128 73 (a) Calculate the mean SBP and the mean DBP for this sample. (b) Calculate the sample covariance cov xy , the sample variances s 2 x , s 2 y , and the sample correlation r xy . (c) Give the line of the best fit which expresses the SBP as a function of the DBP. (d) Give the point estimate for the SBP of a man of age between 40 and 65, whose DBP is equal to 75. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Regression and Correlation 237 Problem 13.4. Pulmonary vascular resistance (PVR) occurs when the pulmonary artery creates resistance against the blood flowing into it from the right ventricle. An elevated PVR is frequently observed in patients with advanced heart failure. The researchers in [46] hypothesized that inhalation of nitric oxide would decrease PVR in such patients. To test this hypothesis, they studied the hemodynamic effects of inhalation of nitric oxide (80 ppm) for 10 minutes in 19 patients with heart failure associated to left ventricular dysfunction. Here is a scatter plot that displays the change in PVR (in percentage) against the PVR at baseline. (a) Denote the change in PVR (in percentage) as y and the PVR at baseline as x . Here are the covariance between x and y and also the respective standard deviations d cov xy = - 2783 . 822; s y = 29 . 6938; s x = 136 . 4879 . Compute the correlation between these two variables. (b) Describe the association between the change in PVR (in percentage) and PVR at baseline. Problem 13.5. Since Confederation, the Canadian population has been growing steadily. The following table gives the population of Canada (in millions) since 1951. The data is taken from Statistics Canada website. We Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
238 Expect the Unexpected: A First Course in Biostatistics denote by y the Canadian population and x the year. We have: 30 X i =1 x i = 59 , 400 , 30 X i =1 y i = 730 . 381 , 30 X i =1 x i y i = 1 , 449 , 110 n X i =1 x 2 i = 117 , 620 , 990 , 30 X i =1 y 2 i = 18 , 756 . 71 . Year Population Year Population Year Population 1951 14.009 1961 18.239 1971 21.963 1953 14.845 1963 18.931 1973 22.494 1955 15.698 1965 19.644 1975 23.143 1957 16.610 1967 20.500 1977 23.727 1959 17.483 1969 21.001 1979 24.203 Year Population Year Population Year Population 1981 24.821 1991 27.945 2001 31.012 1983 25.367 1993 28.682 2003 31.676 1985 25.843 1995 29.303 2005 32.359 1987 26.449 1997 29.965 2007 33.115 1989 27.056 1999 30.404 2009 33.894 (a) Construct a scatter plot of the data. Give the estimated regression line of the population as a function of the year. (b) Calculate the sample correlation r xy . Interpret the result. (c) Compute the residual standard deviation s e . Problem 13.6. We would like to describe the relationship between the mean adult female body mass (in kg) of grizzly bears ( y ) and the percentage of meat in the diet ( x ). Below are the data for n = 12 different regions. x y x y 5 120 42 169 6 122 42 171 7 117 60 201 11 129 76 210 12 132 77 225 26 139 79 220 (a) Calculate the mean and standard deviation for the mean adult female body mass and for the percentage of meat in the diet. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Regression and Correlation 239 (b) Draw a scatter plot of the mean adult female body mass against the percentage of meat in the diet. (c) Calculate the sample covariance and the sample correlation between the percentage of meat in the diet and the mean adult female body mass. Problem 13.7. A large study was conducted to test the hypothesis that the skeletal muscle mass of women reduces with age. All women involved in the study had a body mass index of at most 35. For each of the 125 women participating in this study, the researchers recorded their total skeletal mus- cle mass (in kg) and their age (in years). The data are found in the file SkeletalMass.txt. The first column gives the skeletal muscle mass and the second column gives the age. (a) Construct a scatter plot of the data. Give the estimated regression line of the skeletal muscle mass as a function of age. (b) Calculate the sample correlation r xy . Interpret the result. (c) Compute the residual standard deviation. Problem 13.8. Bears play a role in the transfer of marine isotopes, in particular those taken from salmon, into the terrestrial ecosystem (see [36]). The values of the nitrogen isotope signature δ 15 N (in per mil) measured from a certain foliage are modeled as a function of the distance from the river (in metres). Below are the data from a river with few bears and little to no salmon. Distance 50 100 150 200 250 300 350 400 δ 15 N - 3 . 48 - 4 . 02 - 3 . 00 - 3 . 24 - 3 . 96 - 3 . 80 - 3 . 14 - 3 . 80 (a) Produce a scatter plot and compute the least squares line describing the value of the nitrogen isotope signature as a function of the distance from the river. (b) Compute the residual standard deviation. (c) Calculate the sample correlation r xy . Can we conclude that the value of the nitrogen isotope signature and the distance from the river are corre- lated? Problem 13.9. Continue with the situation in Problem 13.8. Consider now the following data from a river with few bears and an abundant salmon population. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
240 Expect the Unexpected: A First Course in Biostatistics Distance 50 75 100 125 150 200 225 δ 15 N 0.18 - 0 . 97 - 1 . 74 - 1 . 96 - 2 . 13 - 2 . 31 - 2 . 65 Distance 250 300 325 350 375 400 δ 15 N - 2 . 53 - 2 . 52 - 2 . 55 - 2 . 59 - 2 . 71 - 2 . 87 (a) Produce a scatter plot and compute the least squares line describing the value of the nitrogen isotope signature as a function of the distance from the river. Does the association appear to be linear? (b) Because there is an abundant salmon population, but few bears for the nitrogen transfer, it is hypothesized that the value of the nitrogen isotope signature is correlated with the inverse distance from the river. We trans- form the data by defining the predictor x = 1 / distance. Produce a scatter plot and compute the least squares line describing the values of the nitrogen isotope signature as a function of x . What are your findings? (c) Compute the correlation between the values of the nitrogen isotope sig- nature y and x = 1 / distance? Problem 13.10. With an increase in human activity in bear habitats, there are more human-bear interactions (see [1]). The following data were collected over a few years in the back country of a particular park. They represent the number of human-bear interactions and the number of people using a shuttle bus during a two-week period. Number of Human-Bear Number of Human-Bear Bus Users Interactions Bus Users Interactions 1,750 1 14,000 16 2,000 1 14,025 10 5,880 2 14,035 8 6,000 2 14,250 12 7,775 2 15,004 10 10,002 4 15,250 12 10,025 5 15,300 9 10,035 3 15,750 11 11,050 5 15,750 20 12,004 9 16,000 12 (a) Produce a scatter plot and compute the least squares line describing the number of human-bear interactions as a function of the number of bus users. Does the association appear to be linear? Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Regression and Correlation 241 (b) Apply a logarithm transformation to the response by defining a new response variable y = ln (number of interactions). Produce a scatter plot and compute the least squares line describing y as a function of the number of bus users. Does the association appear to be linear? (c) Use the residuals from part (c) to produce a normal probability plot of the residuals and a residual plot. Use these plots to perform diagnostics of the underlying assumptions of the simple linear model. What are your findings? (d) Using the least squares line from part (c), predict the number of human- bear interactions for a two-week period in which there are 8,000 shuttle bus users. Construct the corresponding 95% prediction interval and interpret the result. (e) Using the least squares line from part (c), estimate the mean number of human-bear interactions for a two-week period in which there are 8,000 shuttle bus users. Construct the corresponding 95% confidence interval and interpret the result. Did you know? More than two thirds of the world’s plant species are found in the tropical rainforests, which are renowned for their massive bio-diversity. Rainforests, once covered 14% of the earth’s land surface, now cover only 6%. Nearly half of the world’s species of plants, animals and microorganisms will be destroyed or severely threatened over the next 25 years, due to rainforest deforestation. Experts estimate that the last remaining rainforests could be consumed in less than 40 years. The Trop- ical Plants Database is an international project dedicated to providing ac- curate and factual information on the plants of the Amazon Rainforest, created by the joint efforts of botanists, ethnobotanists, health professionals and phytochemists. More information about this project can be found at http://www.rain-tree.com/plants.htm. Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
This page intentionally left blank This page intentionally left blank This page intentionally left blank This page intentionally left blank Balan, R., & Lamothe, G. (2017). Expect the unexpected : A first course in biostatistics (second edition). World Scientific Publishing Company. Created from ottawa on 2023-12-03 18:33:31. Copyright © 2017. World Scientific Publishing Company. All rights reserved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help