Stat 116 - Section 2.3 Fall 2023

pdf

School

University of Kentucky *

*We aren’t endorsed by this school

Course

116

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

8

Uploaded by CorporalSummer11143

Report
STAT 116 2.3 - One quantitative variable measures of spread Which city has greater variability in temperature ? Measures of spread : range, quartiles, interquartiles range, standard deviation 1. Standard deviation The standard deviation is a statistic that measures how much variability there is in the data. It is computed by adding the deviations from the mean for each value in the data set A larger standard deviation means the data values are more spread out and have more variability. 2. Estimate the standard deviation for the following histograms 1
STAT 116 95 % rule If a distribution of data is approximately symmetric and bell-shaped, about 95% of the data should fall within two standard deviations of the mean. Ex: Number of standard deviations from the mean Z-Scores A common way to determine how unusual a single data value is, that is independent of the units used, is to count how many standard deviations it is away from the mean. This quantity is known as the z-score. If the data have a distribution that is symmetric and bell-shaped, we know from the 95% rule that about 95% of the data will fall within two standard deviations of the mean. This means that only about 5% of the data values will have z -scores beyond ±2 = ( ) / s - for a sample = ( ) / - for population 2
STAT 116 Ex : 1) A study of 66,831 dairy cows found that the mean milk yield was 12.5 kg per milking with a standard deviation of 4.3 kg per milking (data from Berry, et al., 2013). a) A cow produces 18.1 kg per milking. What is this cow’s z-score? b) A cow produces 12.5 kg per milking. What is this cow’s z-score? c) A cow produces 8 kg per milking. What is this cow’s z-score? 2) Which is better, an ACT score of 28 or a combined SAT score of 2100? ACT : µ = 21, s = 5 SAT : µ = 1500, s = 325 Assume ACT and SAT scores have approximately bell-shaped distributions. Answer the question using z-scores. 2. Range Difference between the minimum and maximum value in a data set The minimum and maximum in a dataset identify the extremes of the distribution: the smallest and largest values, respectively. Range = maximum value - minimum value 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT 116 3. Quartiles and interquartiles range Quartiles break the data set into quarters, just like the median breaks it in half. The median is the 50th percentile , since it divides the data into two equal halves. If we divide each of those halves again, we obtain two additional statistics known as the first ( Q 1 ) and third ( Q 3 ) quartiles , which are the 25th and 75th percentiles. Minimum, first quartile, median, third quartile, maximum provide a good summary of important characteristics of the distribution and are known as the five number summary . Interquartile range IQR : Difference between third and first quartiles IQR = Q 3 - Q 1 Ex : 1. Find the five number summary for the following dataset: 4, 5, 8, 4, 11, 8, 18, 12, 5, 15, 22, 7, 14, 11, 12 Detection of Outliers: IQR Method Upper Fence : Q3 + 1.5 ( IQR) Lower Fence : Q1 - 1.5 ( IQR) We call a data set an outlier if it is : smaller than the lower fence Larger than the upper fence For a five number summary we check the min and max value. 4
STAT 116 Ex: The data below describe a sample. The information given includes the five number summary, the sample size, and the largest and smallest data values in the tails of the distribution. Clearly identify any outliers, using the IQR method. Five number summary: ( 42, 72, 78, 80, 99 ); n = 120. Tails: 42, 63, 65, 67, 68, …, 88, 89, 95, 96, 99. 2. Consider the serum cholesterol level of a sample ( n = 25) of overweight men. Use the dataset below to answer the following questions. a) Give the five number summary. b) Use the IQR method to identify any outliers 5
STAT 116 Percentiles: The p th percentile is the value of a quantitative variable which is greater than P percent of the data. 1. If a student scored in the 75th percentile on a critical reading test, this means... 2. We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better. We could also have used percentiles: ACT score of 28: 91st percentile ; SAT score of 2100: 97th percentile Summary - one quantitative variable Measures of spread : range , quartiles, interquartile range, standard deviation Measures of center : mean , median, mode Comparing Statistics Measures of Center: Mean (not resistant) Median (resistant) Measures of Spread : Standard deviation (not resistant) IQR (resistant) Range (not resistant) Resistant means it is not heavily influenced by outliers 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Section 2.3: One Quantitative Variable: Measures of Spread Example 1: Tips for a Pizza Delivery Person A pizza delivery person recorded all of her tips (and other variables) over several shifts. She discusses the results, and much more, on “Diary of a Pizza Girl” on the Slice website. The variable Tip in the PizzaGirl dataset includes the 24 tips she recorded, and the values are also given below. Use technology to find the mean and the standard deviation for these values. Give answers to two decimal places. 2, 4, 6, 2, 3, 3, 2, 0, 5, 3, 3, 4.5, 2.5, 8, 2, 2, 2, 3, 3, 3, 3, 5, 2, 0 Example 2: Percent of Body Fat in Men The variable BodyFat in the BodyFat dataset gives the percent of weight made up of body fat for 100 men. For this sample, the mean percent body fat is 18.6 and the standard deviation is 8.0. The distribution of the body fat values is roughly symmetric and bell-shaped. (If you are on a computer, check this!) a). Find an interval that is likely to contain roughly 95% of the data values. b). The largest percent body fat of any man in the sample is 40.1 and the smallest is 3.7. Find and interpret the z -score for each of these values. Which is relatively more extreme? Quick Self-Quiz: Estimating Mean and Standard Deviation The histogram below shows the data for the quantitative variable Height for 355 students. a). Estimate the mean and the standard deviation from the histogram. b). Estimate the value of the maximum height for a person in the sample and use your estimated values of mean and standard deviation to find and interpret an estimated z -score for this person’s height.
page 2 Example 3: Tips for a Pizza Girl, revisited We revisit the data given for tips for a pizza girl, given in Example 1. a). Use technology to find the five number summary. b). What is the range? What is the IQR? If one of the pizza delivery customers had given a $20 tip, which measure of spread would the large tip have a greater effect on: the range or the IQR? Example 4: Percentiles of SAT Scores A score of 400 on the SAT Mathematics General Test is at the 16 th percentile for all 2012 college-bound seniors taking the SAT. Clearly explain in terms of SAT scores what it means to be “at the 16 th percentile”. Quick Self-Quiz: Five Number Summary and Skewness 1. Heights, revisited. A histogram of heights of 355 students is shown on the reverse. Use this graph to give a rough estimate of the five number summary of the heights in this sample. 2. For each five number summary below, indicate whether the data appear to be symmetric, skewed to the right, or skewed to the left. a). (10, 57, 85, 88, 93) b). (200, 300, 400, 500, 600) c). (5, 30, 40, 50, 75) d). (5, 7, 8, 15, 42) e). (100, 430, 600, 620, 650)