Lab 2 handout

pdf

School

University of Guelph *

*We aren’t endorsed by this school

Course

4160

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

9

Uploaded by learner122321

Report
1 Laboratory 2 Statistical Tools for Plant Breeders MBG*4160/PLNT*6340 Plant Breeding One of the most important tools used in plant breeding is statistics. Statistical methods and techniques are applied by plant breeders to assist them in evaluating and comparing individual plants, populations and varieties using a limited number of observations. The application of statistical methods allows the breeder to make such comparisons with a certain degree of confidence, which is crucial to the success of a plant breeding program. Many decisions that a plant breeder makes will depend to a large extent on the reliability and repeatability of the performance of the selected material, which is achieved by the application of sound experimental and statistical methods. The objective of this laboratory is for students to: 1. Know how to recognize a randomized complete block design (RCBD) 2. Know how to construct a data table and compute an analysis of variance (ANOVA) for a RCBD 3. Use and explain the meaning of the F-test; and 4. Apply a Least Significant Difference (LSD) test to the means to determine the significance of differences between entries in an experiment Since STAT*2040 (Statistics I) was a prerequisite for MBG*4160 Plant Breeding, we will work on the assumption that you have covered ANOVA for Completely Randomized Design (CRD) and Randomized Complete Block Design (RCBD). The intention of this laboratory will be to show you how these designs can be put to practical use to the advantage of a plant breeder. For a more thorough theoretical explanation of the two statistical designs, you should consult your statistics class notes or one of many textbooks on statistics. Note: Attached in Appendix 1 is a summary report on a field trial for a set of experimental soybean genotypes compared to the check varieties, which was conducted by the University of Guelph. The information included in Appendix 2 is intended to provide you with a general background on experimental designs that are commonly used in plant breeding experiments. You may save this for your records in case you end up working in the plant breeding industry. However, you do not need to memorize the information in either appendix, just reach an understanding of the concepts and experimental designs presented . Please note: Your lab report is due in the DropBox on the day of your next lab period, which is on Monday, September 25 th or Wednesday, September 27 th , depending on your lab section. It is worth 5% of your overall grade.
2 Central tendency and dispersion The two main characteristics of any set of data gathered in an experiment are its mean and variance. A mean is a measure of a central tendency . A variance is a measure of dispersion . For a set of n pieces of data, the sample mean is calculated as: Note that the subscript ‘i’ has replaced the 1,2,3... in the more general form of the equation. The sample variance of those n pieces of data is calculated as: The numerator ( ) X X i 2 is called the Sum of Squares, or SS. Notice that if the data deviate greatly from the mean, the SS will be very large, and if the data do not deviate much from the mean, the SS will be small. Also, as the number of pieces of data increase, the sum of squares increases. The denominator, n-1, is called the degrees of freedom , usually abbreviated ‘df’. Notice that since the formula for the sample variance has df in the denominator, the size of the variance is very much dependent on the number of observations in the sample. Question: Would more observations lead to a larger or a smaller variance? The answer is a bit more complicated than you might first think. More observations will increase the size of the numerator, as well as increasing the df. What do you think? Would you expect that by increasing the sample size would you expect to see the variance increase, decrease, or remain unchanged? Standard deviation, CV, and standard error . You likely learned in statistics that the square root of the sample variance / s 2 equals the standard deviation (s). Did you also learn that the Coefficient of Variation (CV) is 100 × s X ? If you calculated the mean of n pieces of data, the variance of that mean equals the sample X (X X X ... X ) n X n n i = + + + + = 1 2 3 s X X n X X n n i i i 2 2 2 2 1 1 = = ( ) ( ) ( ) ( )
3 variance divided by n. In other words, S 2 x = s 2 /n. It follows that the standard deviation of the mean, called the standard error of the mean (se) has the following equation: Look over the following three sets of data. Notice their Means, Variances, Standard error of the mean, and CV’s. What do you see? Data Mean Variance se CV (%) A 77, 79, 78, 78, 78, 77, 79, 79, 77, 77 77.9 0.77 0.28 1.1 B 73, 83, 78, 77, 78, 72. 84, 85, 70, 79 77.9 26.3 1.62 6.6 C 37, 119, 58, 98, 18, 136, 78, 89, 67, 77 77.7 1263.12 11.24 45.7 Clearly, the means are similar, but the variance, se, and CV are larger in B and even larger in C. This demonstrates that knowing the mean of a sample does not provide you with a complete picture of the sample. The variance is very important in describing the dispersion of the data. It also means that the magnitude of the variance will depend on the variation within each set of data and not simply on the number of observations. se s n s n = = 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 EXERCISE The objective of this exercise is for you to compare the 100 seed weight of three seedlots of soybean; Variety A, Variety B, and Variety C. Your task is to determine if there are differences in 100 seed weight among the three seedlots, and to determine if the Variety C seedlot is similar to variety A or variety B. (Normally, the instructions for the hands on work in the lab are as follows: The seed has been obtained from a RCBD experiment which had four replicates (blocks). Thus for each seedlot, there are four packages, one package collected from each replicate. For each seedlot, count out 100 seeds and record the weight of the 100 seed subsamples on a data sheet.) What should this data sheet look like? How about something like the following: 100 seedlot weights Seedlot Total Block A B C Σ X. j Σ X 2 ij 1 2 3 4 Total Σ X i. Σ X 2 ij Mean ANOVA After you have obtained the data, calculate the two types of row and column totals in the table (sum of the values and sum of the values squared). In the lightly shaded area, insert the grand row/column totals. As a check of your values, the grand total of the rows should equal the grand total of the columns. Calculate the means for each of the three seedlots. Does it appear that the seedlots differ in 100 seed weight? Now we will calculate the ANOVA for this RCBD experiment. We will use the simple form of the SS equation in order to do this:
5 Calculate the Correction Factor (CF) or correction term section of the SS equation. Remember that the n value is the total number of data points you are analyzing (3 varieties x 4 replicates =12). CF X n X rt ij ij = = = ( ) ( ) 2 2 Now calculate the Total SS: SS X CF Total ij = = ( ) 2 Now calculate the Variety SS using the totals for each of the varieties. Note the divisor is the number of blocks: SS X r CF Variety i = = ( ) . 2 Now calculate the Block (or replicate) SS using the totals for each of the replicates. Note the divisor is the number of treatments: SS X t CF Block j = = ( ) . 2 The Error SS is the remainder: SS Total - (SS Variety + SS Blocks ) = SS X X n X CF i i i = = ( ) ( ) ( ) 2 2 2
6 Now enter the SS and df in the following table and complete the ANOVA: ANOVA for the 100 seed weight data Source of variation df Sum of Squares (SS) Mean square (MS) F Total Blocks Varieties Error The Mean square (MS) is another name for the variance. This is the SS (s 2 ) divided by the degrees of freedom ( df ) = n-1 . Note that you obtain the error mean square by dividing the residual SS (ie the SS not accounted for by either blocks or varieties) by the error df. You do not subtract mean squares to get it. The F ratio is calculated as the Variety MS ÷ Error MS. It is a way of determining whether the Variety MS is larger than what would be expected by random error. An F-table is attached. What is the critical F-value for the test of the Variety MS/Error MS ratio? What is the critical F-value for the test of the Block MS/Error MS ratio? Is the Variety MS in the above analysis significant? What does it mean? One of the main points of this exercise is so that you can partition the Total sum of squares into parts associated with block, varieties, and error. Do you understand what we mean by partitioning the total sum of squares into parts? This is important, so if you don’t see what we mean, ask. Mean comparison At this point, we know whether the effect of varieties as a whole was significant or not, but we are still not able to say anything about the significance of the difference between two specific varieties. In order do this, we must perform a comparison between their mean values. There are a number of different tests that can be used, the simplest being the least significant difference (LSD) test. The LSD provides a t-test of the difference between the means of a pair of varieties. Here are the steps involved in this test:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 1. Compute the LSD value at the α level of significance as: LSD = t α, df s d where t α is the tabular t value, at α level of significance, with df= error degrees of freedom, and s d is the standard error of the mean difference computed as: s d = (2 x Error MS)/r Note: Always calculate & present the LSD to one more decimal place than the variety means. 2. Compute the mean difference between the pair of varieties: d ij = (mean Variety I) - (mean Variety J) 3. Compare d ij with the LSD. If d ij is greater, then the two varieties differ significantly. If d ij is smaller than LSD, then you conclude that there is no difference between the two treatments. When constructing a table listing means for a number of treatments, the LSD value for the experiment can be included at the end of the list. The results of mean comparisons are often presented by letters following the mean: Example list of means: Entry Yield Maturity (days) 1 9.35 a 154 a 2 9.16 ab 152 ab 3 9.01 ab 148 ab 4 8.89 bc 145 b 5 8.73 c 149 ab Means followed by the same letter are not significantly different according to a ____ test. In the above example, Variety 1 is higher in yield than variety 4 & 5 but no different from 2 & 3. Variety 1 is later in maturing compared to variety 4, but no different from the other three.
8 For your study, complete the following table. (Remember the CV is calculated using the test mean, and the se using the MS error & the number of blocks): Variety 100 seed weight LSD (0.05) CV* (%) Means followed by the same letter are not significantly different according to a LSD (α=0.05) test. * The easiest way to calculate your standard deviation (s or σ) as part of the formula for CV is to take the square root of the Mean Square Error term from the ANOVA Table. The formula for CV is provided on page 2 of this handout. Typically, the MSE in an ANOVA table equals the s 2 for the whole experiment. Is the Seedlot for Variety C similar to Variety A or Variety B? Explain in your own words what it means statistically. Table 1. The 5% points for the F distribution. Adapted from Steel & Torrie. 1960. Principles and Procedures of Statistics. McGraw-Hill, Toronto. Denominat or df Numerator df 1 2 3 4 5 6 7 8 9 10 12 1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 243.9 2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.4 19.41 3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.28 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69
9 Table 2. The 5% values for the t distribution. Adapted from Steel & Torrie. 1960. Principles and Procedures of Statistics. McGraw-Hill, Toronto. df t df t df t df t 1 12.706 11 2.201 21 2.080 40 2.021 2 4.303 12 2.179 22 2.074 60 2.000 3 3.182 13 2.160 23 2.069 120 1.980 4 2.776 14 2.145 24 2.064 4 1.960 5 2.571 15 2.131 25 2.060 6 2.447 16 2.120 26 2.056 7 2.365 17 2.110 27 2.052 8 2.306 18 2.101 28 2.048 9 2.262 19 2.093 29 2.045 10 2.228 20 2.086 30 2.042
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help