Understanding Standard Deviation and the Empirical Rule

Example. What is the standard deviation of the Keurig data? What are the units? s = √ 0.816 = 0.903 ounces How many of the six cups of coffee fall within one standard deviation of the mean? 8.0 – 0.903 = 7.097 and 8.0 + 0.903 = 8.903 Four of the six cups of coffee fall within one standard deviation from the mean: 8.1, 7.5, 7.8, 8.3 Example. Find the mean and the standard deviation for the following sample of 10 shoe sizes. How many observations fall within one standard deviation of the mean? Shoe Size Deviation Deviation^2 8 -1.2 1.44 7 -2.2 4.84 11 1.8 3.24 10 0.8 0.64 9 -0.2 0.04 12 2.8 7.84 10 0.8 0.64 8 -1.2 1.44 7 -2.2 4.84 10 0.8 0.64 ´ x = 9.2 s = 1.7 Six out of ten observations fall within one standard deviation of the mean: 8, 10, 9, 10, 8, 10

Understanding the Standard Deviation Roughly speaking, a small standard deviation tells us that most of the observations are relatively close to one another. In contrast, a large standard deviation means the observations are more spread out. Example. Each of the following samples has a mean of 50. Which has the largest standard deviation? Which has the smallest standard deviation? (We discussed this example on 9/11, and we worked on it again today.) 1. 0 20 40 50 60 80 100 2. 0 48 49 50 51 52 100 3. 0 1 2 50 98 99 100 Row 3 has the largest standard deviation. Row 2 has the smallest standard deviation. Example. Find the standard deviation for the following three samples. 1. 1 4 6 7 8 10 2. 5 8 10 11 12 14 3. 3 12 18 21 24 30 The mean of the first sample is 6. The mean of the second sample is 10. The calculation of the standard deviation for the first two samples is identical: s = √ (− 5 ) 2 +(− 2 ) 2 +( 0 ) 2 +( 1 ) 2 +( 2 ) 2 +( 4 ) 2 5 = √ 10 The data values in the third sample are obtained by multiplying each of the data values in the first sample by three. s = 3 √ 10

3.5 The Empirical Rule If the distribution of a data set is symmetric and bell-shaped, we can use the standard deviation to estimate the proportion of observations that will be within a certain distance from the mean.  Approximately 68% of the data is within one standard deviation of the mean.  Approximately 95% of the data is within two standard deviations of the mean.  Approximately 99.7% of the data is within three standard deviations of the mean. Example. The mean height of a population of 430 college students is 65.9 inches with a standard deviation of 4.6 inches. Label the histograms below showing one, two and three standard deviations above and below the mean.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example. The distribution of shoe sizes in a large sample of college students is symmetric and bell shaped with a mean of ´ x = 8.75 and a standard deviation of s = 2.07 . Approximately 95% of the shoe sizes in the sample will fall within what range? ´ x − 2 s = 8.75 − 2 ∗ 2.07 = 4.61 ´ x + 2 s = 8.75 + 2 ∗ 2.07 = 12.89 Example. The waiting time at a bus stop has a mean of μ = 19.5 minutes and a standard deviation of σ = 2.1 minutes. Assuming that the distribution of wait times is symmetric and bell shaped, approximately what percentage of wait times will be less than 17.4 minutes? Observe that 17.4 = 19.5 – 2.1 is one standard deviation below the mean. According to the Empirical Rule, approximately 68% of the wait times will fall between 17.4 and 21.6 minutes. Thus, approximately 32% of the wait times will fall outside this range. By symmetry, approximately 16% of the wait times will be less than 17.4 minutes. Example. The Wechler Intelligence Scale has a mean of μ = 100 with a standard deviation of σ = 15 . If the scores are symmetric and bell shaped, what percentage of test takers will receive a score higher than 130? Note that 130 = 100 + 2*15 is two standard deviations above the mean. According to the Empirical Rule, approximately 95% of the scores will fall between 70 and 130. Thus, approximately 5% of the scores will be outside this range. By symmetry, approximately 2.5% of the scores will be below 70, and approximately 2.5% of the scores will be above 130.

z-Scores The z-score of an observation is the number of standard deviations the observation is above or below the mean.  A positive z -score indicates that the data value is above average.  A negative z -score indicates that the data value is below average.  A z -score of zero indicates that the data value is equal to the mean. To find the z -score for an observation, calculate the deviation by subtracting the mean, then divide by the standard deviation. Population z -score z = x − μ σ Sample z -score z = x −´ x s Example. The following table summarizes the heights of 431 college students grouped by gender. Gender Mean Standard Deviation Female 63.6 3.5 Male 70.0 3.4 Other 65.4 2.4 The tallest female student is 75 inches, the tallest male student is 80 inches, and the tallest nonbinary student is 68 inches. Which student is the tallest relative to their gender? Female z = 75 − 63.6 3.5 = 11.4 3.5 = 3.26 Male z = 80 − 70.0 3.4 = 10.0 3.4 = 2.94 Other z = 68 − 65.4 2.4 = 2.6 2.4 = 1.08

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Standardizing the data by converting all the observations to z -scores allows us to compare multiple data sets on equivalent scales. Standardizing does not affect the shape of the data. It is just a change of scale, like converting temperatures from Fahrenheit to Celsius. The mean of the z -scores will always be 0 and the standard deviation will always be 1.

3.8 Measures of Central Tendency with Excel Although it is important to understand how to compute the various measures of central tendency by hand, in practice we will usually use technology such as a graphing calculator, Microsoft Excel, or another statistical analysis program to calculate the mean, median and/or mode of a data set. Example. Twenty MATH 1680 students were asked how many siblings they have. The data is shown below. Use Microsoft Excel to find the mean and the median. 1 1 3 2 1 2 1 3 5 0 2 3 1 1 5 4 1 3 0 2 1. Open the Microsoft Excel file named “MATH 1680 Chapter 3 Data” 2. Click on the Siblings sheet 3. In cell C1, type “Mean” in bold 4. In cell C2, enter the formula =AVERAGE(A2:A21) 5. In cell C3, enter the formula =AVERAGE(A:A) 6. In cell D1, type “Median” in bold 7. In cell D2, enter the formula =MEDIAN(A2:A21) 8. In cell D3, enter the formula =MEDIAN(A:A) Finding the mode of a data set with Microsoft Excel is a bit more complicated because there may be zero, one, or many modes. If there is a unique mode, we can use the Excel function =MODE.SNGL(): 9. In cell E1, type “Mode” in bold 10. In cell E2, enter the formula =MODE.SNGL(A2:A21) 11. In cell E3, enter the formula =MODE.SNGL(A:A) When there are multiple modes, we can use the array formula =MODE.MULT(), which outputs the modes into different cells: 1. Click on the Multimodal sheet 2. In cell C1, type “Modes” in bold 3. Highlight the cell range C2:C5, enter the formula =MODE.MULT(A:A), then press CTRL+SHIFT+ENTER 4. Highlight column A, select Data > Sort & Filter > Sort Smallest to Largest , and observe what happens to the modes in the cell range C2:C5

The =MODE.MULT function behaves differently when there is only one mode: 5. Click on the Siblings sheet 6. In cell F1, type “Modes” in bold 7. Highlight the cell range F2:F5, enter the formula =MODE.MULT(A:A), then press CTRL+SHIFT+ENTER Unfortunately, the =MODE.SNGL() and =MODE.MULT() functions only work with quantitative data: 1. Click on the Car Sales sheet 2. In cell C1, type “Modes” in bold 3. Highlight the cell range C2:C5, enter the formula =MODE.MULT(A:A), then press CTRL+SHIFT+ENTER Box-and-Whisker Plots with Excel Box-and-whisker plots are especially useful when comparing different categories or subgroups of the population with respect to a quantitative variable. Example. Use Excel to construct side-by-side boxplots for the height of MATH 1680 students grouped by gender. 1. Click on the Height sheet 2. Highlight columns A:C, then select Insert > Charts > Insert Statistic Chart > Box and Whisker 3. Right-click on the vertical scale of the boxplot, the select Format Axis 4. Set the minimum bound to 45 and the maximum bound to 85

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

3.10 Measures of Spread with Excel Variance and Standard Deviation with Excel We can also use Excel to calculate the variance and standard deviation of a data set. You should use the =VAR.P() and =STDEV.P() functions when analyzing census data and the =VAR.S() and =STDEV.S() functions when analyzing survey data. Example. Use Excel to find the population mean, population variance, and population standard deviation of the heights of MATH 1680 students grouped by gender. 1. Click on the Height sheet 2. In cell E2, type “Mean” in bold 3. In cell E3, type “Variance” in bold 4. In cell E4, type “StDev” in bold 5. In cell F1, type “Female” in bold 6. In cell G1, type “Male” in bold 7. In cell H1, type “Other” in bold 8. In cell F2, enter the formula =AVERAGE(A:A) 9. In cell G2, enter the formula =AVERAGE(B:B) 10. In cell H2, enter the formula =AVERAGE(C:C) 11. In cell F3, enter the formula =VAR.P(A:A) 12. In cell G3, enter the formula =VAR.P(B:B) 13. In cell H3, enter the formula =VAR.P(C:C) 14. In cell F4, enter the formula =STDEV.P(A:A) 15. In cell G4, enter the formula =STDEV.P(B:B) 16. In cell H4, enter the formula =STDEV.P(C:C) Finding z-Scores with Excel Recall that the z-score of an observation indicates how many standard deviations it falls above or below the mean. Converting all of the observations in a data set into z-scores is called standardizing the data. In the next two examples, we will standardize the heights of MATH 1680 students grouped by gender. Example. Use Excel to find the deviation of the heights of MATH 1680 students grouped by gender. 1. Click on the Height sheet 2. In cell J1, type “Female Deviation” in bold 3. In cell K1, type “Male Deviation” in bold 4. In cell L1, type “Other Deviation” in bold 5. In cell J2, enter the formula =A2-F$2

6. Highlight the cell range J2:J269, then select Home > Editing > Fill > Down 7. In cell K2, enter the formula =B2-G$2 8. Highlight the cell range K2:K153, then select Home > Editing > Fill > Down 9. In cell L2, enter the formula =C2-H$2 10. Highlight the cell range L2:L10, then select Home > Editing > Fill > Down 11. Copy the cell range E1:H4 to the cell range N1:Q4 Notice that average deviation is zero for all three data sets, but the variance and standard deviation didn’t change. Example. Now we will convert the deviations into z-scores by dividing by the standard deviation. 1. In cell S1, type “Female z-score” 2. In cell T1, type “Male z-score” 3. In cell U1, type “Other z-score” 4. In cell S2, enter the formula =J2/F$4 5. Highlight the cell range S2:S269, then select Home > Editing > Fill > Down 6. In cell T2, enter the formula =K2/G$4 7. Highlight the cell range T2:T153, then select Home > Editing > Fill > Down 8. In cell U2, enter the formula =L2/H$4 9. Highlight the cell range U2:U10, then select Home > Editing > Fill > Down 10. Copy the cell range N1:Q4 to the cell range W1:Z4 Notice that the mean z-score is zero for all three data sets, while the variance and standard deviation are both one.

3.4 3.5 3.8 and 3.10

Related Documents