show that the computing formula for s2 on page 44 is equivalent to the one used to define s2 on page 37
Transcribed Image Text: time plot made it clear that the trend was the key feature, not the average, which we
a poor summary. The testing machine required more work.
44 Chapter 2 Organization and Description of Data
was
2.7 The Calculation of x and s
Here, we discuss methods for calculating i and s from data that are already grouped
into intervals. These calculations are, in turn, based on the formulas for the mean
and standard deviation for data consisting of all of the individual observations. In
this latter case, we obtain ï by summing all of the observations and dividing by the
sample size n.
An alternative formula for s2 forms the basis of the grouped data formula for
variance. It was originally introduced to simplify hand calculations.
Variance (handheld
calculator formula)
E xi
/n
i=1
i=1
52
n - 1
(In Exercise 2.51 you will be asked to show that this formula is, in fact, equivalen
to the one on page 37.) This expression for variance is without , which reduce
roundoff error when using a handheld calculator.
Calculating variance using the handheld calculator formula
Find the mean and the standard deviation of the following miles per gallon (mpg
obtained in 20 test runs performed on urban roads with an intermediate-size car:
EXAMPLE 18
algaib
19.7
21.5
22.5
22.2
22.6
21.9
20.5
19.3
19.9
21.7
Transcribed Image Text: lmple size
six days are
37
Sec 2.5 Descriptive Measures
important aspect of a set of data-their "middle" or their "average"-but they tell
us nothing about the extent of variation.
We observe that the dispersion of a set of data is small if the values are closely
bunched about their mean, and that it is large if the values are scattered widely about
their mean. It would seem reasonable, therefore, to measure the variation of a set of
data in terms of the amounts by which the values deviate from their mean.
If a set of numbers x1, x2, ..., Xn has mean x, the differences
4 15
12.5 requests
X1 - X, X2 - X,... , Xn – X
are called the deviations from the mean. We might use the average of the deviations
as a measure of variation in the data set. Unfortunately, this will not do. For instance,
refer to the observations 11, 9, 17, 19, 4, 15, displayed above in Figure 2.12, where
X = 12.5 is the balance point. The six deviations are -1.5, -3.5, 4.5, 6.5, -8.5, and
2.5. The sum of positive deviations
st
7 19
est values, is 13 requests.
4.5 + 6.5 + 2.5 = 13.5
on as the balance point, c
ram for the data on the mu
the dot diagram, each ober
istance along the horizon.
weights and the horizontl
r of inertia or balance pir
balance point of the obsan
exactly cancels the sum of the negative deviations
-1.5 – 3.5 – 8.5 = -13.5
so the sum of all the deviations is 0.
As you will be asked to show in Exercise 2.50, the sum of the deviations is
always zero. That is,
> (x; -x) = 0
|
i=1
so the mean of the deviations is always zero. Because the deviations sum to zero, we
need to remove their signs. Absolute value and square are two natural choices. If we
take their absolute value, so each negative deviation is treated as positive, we would
obtain a measure of variation. However, to obtain the most common measure of vari-
ation, we square each deviation. The sample variance, s-, is essentially the average
of the squared deviations from the mean, x, and is defined by the following formula.
12.5
15
de a single number to e
reason for preferring
Sample Variance
n - 1
7 actually gives a mor
contained in the obser
in of estima