Handout 2.4 Variability Relative to the MEDIAN

docx

School

College of San Mateo *

*We aren’t endorsed by this school

Course

20

Subject

Mathematics

Date

Feb 20, 2024

Type

docx

Pages

5

Uploaded by HighnessDonkey2838

Report
Math 80 – Statistics Variability Relative to the MEDIAN Handout 2.4 In the last handout, we found representative values (mean and median) for the center of the distributions of the monthly normal temperatures in St. Louis and San Francisco. In this lesson, we will consider the variability (or spread) in the data. Recall the monthly normal temperatures for St. Louis and San Francisco. Here are the data again. Monthly Normal Temperatures (°F) for St. Louis and San Francisco Month Jan. Feb. Mar. Apr. Ma y June July Aug . Sept. Oct. Nov . Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 1 Look at the dot plots on the right. Write a sentence comparing how variable temperatures are for the two cities. You can use words such as “more spread out”, “less spread out”, “greater variability”, etc. 80 75 70 65 60 55 50 45 40 35 30 St. Louis San Franciso Temperatures Dotplots of Temperatures for St. Louis and San Franciso 2 In previous lessons, we explored how to represent and interpret data in graphical displays such as dot plots and histograms, and then we learned to summarize the center of a distribution numerically. Reporting a number to represent the distribution is important, but in this example the centers of the distributions are similar and the variability is very different. One statistic (or number) that can be used to represent the variability is the range (or the overall range ). This is the maximum value minus the minimum value: range = maximum – minimum. Compute the ranges for the two cities. Do these values capture the differences in the distribution for St. Louis and San Francisco monthly normal temperatures? One problem with the range is that it is sensitive to outliers and extreme observations because it only depends on two values, the maximum and the minimum, which might not represent the central portion of the data well. It is possible that these two values are unusual compared to the rest of the distribution. Another approach to quantify variability is to find quartiles , which are the first and third quarter points in the data. Given the monthly temperatures of St. Louis, sorted from smallest to largest, we break the data up into quarters. Since there are 12 observations in the St. Louis data, each quarter of the data contains 12/4 = 3 observations. The dividers in the following table illustrate the four quarters of the data. St. Louis 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8 The median value of 57.55 °F, which you computed in the last handout, falls in the middle of this list, at the middle divider. The first-quarter point occurs between the values 33.9 and 45.1. This value is called the first quartile , and is denoted by Q1 . In this example, Q1 = (33.9 + 45.1)/2 = 39.5 °F. (Note that the first quartile ( Q1 ) represents the median of the lower half of the data set. That is, a median of half of the data set is a quarter point.) The third-quarter point occurs between the values of 70.2 and 75.4 °F. The third quartile is denoted by Q3 and is found to be Q3 = (70.2 + 75.4)/2 = 72.8 °F. Once again, the third quartile represents the median of the upper half of the data set. Note that when the median is an exact value from the data, that value is not included in either the upper half or lower half of the data. STATWAY™ STUDENT HANDOUT PAGE 1
Math 80 – Statistics Variability Relative to the MEDIAN Handout 2.4 3 Together with the median, minimum, and maximum, the quartiles form what is called the five-number summary . The first table shows the five-number summary for the temperatures in St. Louis. Find the five-number summary for San Francisco. St. Louis San Francisco Minimum 29.3 Minimum 51.1 Q1 (first quartile) 39.5 Q1 (first quartile) 54.65 Median 57.55 Median 56.9 Q3 (third quartile) 72.8 Q3 (third quartile) 59.6 Maximum 79.8 Maximum 62.3 4 An alternative way to quantify variability in numeric data is to find the distance between the quartiles (Q1 and Q3). This value is called the interquartile range (or called the IQR ). The formula is IQR = Q3 – Q1 . The middle 50% of the data fall between Q1 and Q3, so the IQR gives the width of the middle 50%. For the monthly normal temperatures in St, Louis, the IQR is Q3 – Q1 = 72.75 – 39.50 = 33.25 °F. Find the IQR for San Francisco. Write a sentence to compare the values of the IQRs for the two cities. For SF: IQR = Q3 – Q1 = 59.6 – 54.96 = 4.95 F IQR in St Louis is more than 6 times the IQR in SF. The values in a five-number summary can be represented in a graph called a boxplot (sometimes these are called a “box and whiskers plot”). The graph below is the boxplot for the monthly normal temperatures for St. Louis. 6 Sketch the boxplot for San Francisco to the right of the St. Louis boxplot. Write a brief comparison of the variability (spread) represented in the boxplots of the monthly normal temperatures for the two cities. San Francisco St. Louis 80 70 60 50 40 30 80 70 60 50 40 30 Temperature Boxplots of St. Louis and San Francisco Temperatures DETERMINING OUTLIERS We mentioned outliers earlier. An outlier is a data value that deviates greatly from the overall pattern. In statistics we need a rule to define outliers. We cannot just use our own judgment. While there is no universally agreed upon rule, one common rule is based on the IQR, the interquartile range. We start at Q1 and Q3 and go outwards 1.5 IQR’s. Here are the formula to compute fences: STATWAY™ STUDENT HANDOUT PAGE 2
Math 80 – Statistics Variability Relative to the MEDIAN Handout 2.4 Left fence = Q1 - 1.5(IQR) and Right fence = Q3 + 1.5(IQR). These are called the fences for outliers. Any value below the left fence or above the right fence is an outlier. 7 The ages of the last 30 Academy Award winners for Best Actress are given in the table below. 21 25 26 26 28 29 29 29 30 32 33 33 33 33 34 35 35 35 38 39 41 43 45 45 49 49 61 61 74 80 A The ages are already sorted. Find the five-number summary for the data and the IQR. Min = 21 max = 80 Med or Q2 = 34 + 35 / 2 = 34.5 Q3 = 45 Q4 = 29 IQR Q3 – Q1 = 45-26 =16 B Find the fences for outliers. Identify any outliers in the data. Lower Fence = Q1 – 1.5 / IQR =29 – 1.5(16) = 5 no left outliers There are 2 right outliers When we draw boxplots for data that contain outliers, we stop the line at the last data value that is not an outlier and draw the outliers separately. The boxplot for the actresses is given on the right. Notice that boxplots can be vertical or horizontal. 80 70 60 50 40 30 20 Age Best Actress Oscar Winner Ages, 1982-2011 SUMMARY The temperature example in this lesson generates focus on differences in variability because the centers of the two distributions are essentially identical. The context of monthly normal temperatures is accessible and provides the opportunity to concentrate on the concept of quantifying variability. The range is simple to compute, but it is sensitive to outliers and extreme observations. The IQR offers a relatively simple measure of variability, and is resistant to the effect of outliers and extreme observations. Boxplots provide a graph with a simple structure that contain visual representations of center and variability. Boxplots are analogous to a skeleton of a data set, which are always based on five simple summary values of the data set, like bones, but represent data sets of different sizes and characteristics, like bodies built on top of skeletons. The IQR also allows us to determine whether high or low values are outliers. We use the heart of the data to determine fences or boundaries where outliers start. We need to remember that the method we used is somewhat arbitrary and will sometimes give us neighboring values where one is an outlier and one is not an outlier. HOMEWORK 1 The table below gives the statistics of newborns’ weight (measured in grams) to mothers who were smokers. Determine if there are any left or right outliers. STATWAY™ STUDENT HANDOUT PAGE 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Math 80 – Statistics Variability Relative to the MEDIAN Handout 2.4 Min Q1 Median IQR Range 896 2,296 2,940 1,006 4,368 Formula : LF = Q1 – 1.5 (IQR) =2296 – 1.5 (1006)= 787 < min = no left outlier UF= Q3 + 1.5 (IQR) Q3 = Q1 + IQR = 2296 + 1006 = 3302 UF=Q3 + 1.5 (IQR) = 3302 + 1.5(1006) = 4811 Range = max-min Max = range + min = 4363 + 896 = 5264 yes we do have right outlier greater than UF 2 Suppose an error was made when the data values for monthly normal temperatures in St. Louis were recorded in the table below. The first two digits for the July temperature were reversed. Month Jan. Feb. Mar . Apr. Ma y June July Aug. Sept . Oct. Nov. Dec. Temperature 29. 3 33.9 45.1 56.7 66.1 75.4 97. 8 77.6 70.2 58.4 46.2 33.9 A How does this error affect the range? Does it increase, decrease, or leave the range unchanged? Explain briefly. B How does this error affect the IQR? Does it increase the IQR, decrease the IQR, or leave it unchanged? Explain briefly. C How does the change affect the boxplot for the data? Explain, or illustrate by sketching the boxplot. 3 One of the items on the student survey for an introductory statistics course was "Rate your aptitude to succeed in this class on a scale of 1 to 10" where 1 = Lowest and 10 = Highest. Below is the distribution of this variable for the 30 women in the class. Which boxplot on the right represents the same data set as this histogram? 4 StatCrunch Practice. Go to www.statcrunch.com and log in. Select Explore and click on Data . Under Browse all, type in the file name How Does Working Affect GPA ?”. Open the file. Click on Graph and select Boxplot . You should see a dialog box. Under Select Column(s), select GPA . Under Group by, select PT/FT Student . STATWAY™ STUDENT HANDOUT PAGE 4
Math 80 – Statistics Variability Relative to the MEDIAN Handout 2.4 Check the boxes for “Use fences to identify outliers” and “Draw boxes horizontally”. Click Compute . Use the boxplots you constructed to do the following: A Copy the boxplots into a MS Word (or Google doc) and answer the questions below. B The range is the same for both boxplots. Does that mean the variability in GPA is the same for both full-time and part-time students? Explain briefly. C Are there more, less, or about the same proportion of part-time students with a GPA of 3.5 or higher compared to full-time students? Or is it impossible to tell? Explain briefly. STATWAY™ STUDENT HANDOUT PAGE 5