sampling_dist_avg_key
pdf
keyboard_arrow_up
School
University of California, Berkeley *
*We aren’t endorsed by this school
Course
156
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
6
Uploaded by BailiffAardvarkMaster964
PHW142 Practice Problems
Using The Normal Curve For Sums And Averages KEY
1: Blood Cholesterol
based on Baldi and Moore exercise 13.5 (both editions)
A government sample survey plans to measure the blood cholesterol levels of a simple random sample (SRS)
of men aged 20 to 34 years. The researchers will report the mean
¯
x
from their sample as an estimate of the
mean cholesterol level
µ
in this population.
1a
Explain to someone who knows no statistics what it means to say that
¯
x
is an “unbiased” of
µ
1a answer
In the long-run, the average of the values of the averages
¯
x
from many samples is equal to
µ
.
As Baldi and Moore say, ’
¯
x
is not systematically higher or lower than
µ
.’
1b
The sample result
¯
x
is an unbiased estimator of the true population
µ
no matter what size SRS the study
uses. Explain to someone who knows no statistics why a large sample gives more trustworthy results than a
small sample. [Think about the concentration of the sample averages
¯
x
around the mean
µ
.]
1b answer
The sampling distribution of
¯
x
is much more concentrated around
µ
for large samples. We’ll use the term
’precise’ to describe such estimates.
As Baldi and Moore say, more informally, ’With large samples,
¯
x
is more likely to be close to
µ
.’
1
2: More On Blood Cholesterol
based on Baldi and Moore exercise 13.7 (both editions)
Suppose that in fact the blood cholesterol levels of all men aged 20 to 34 years follows the normal distribution
with mean
µ
= 188 milligrams per deciliter (mg/dl) and standard deviation
σ
= 41 mg/dl.
2a
What is the probability that an individual selected at random from this population has a blood cholesterol
level between 185 and 191 mg/dl ?
2a answer
pnorm
(
191
,
mean =
188
,
sd =
41
,
lower.tail=
TRUE
)
-
pnorm
(
185
,
mean =
188
,
sd =
41
,
lower.tail=
TRUE
)
## [1] 0.05832974
2b
Choose a simple random sample (SRS) of 100 men from this population. What are the mean and standard
deviation of the sampling distribution of
¯
x
? What is its shape, and how do you know the shape?
2b answer
mean of the sampling distribution of
¯
x
is the mean of the population, 188 mg/dl
standard deviation of
¯
x
is
41
√
100
= 4.1 mg/dl
shape = Normal curve, as the distribution of the blood cholesterol levels is normal curve.
2c
For our SRS of 100 men, what is the probability that
¯
x
takes a value between 185 and 191 mg/dl? (Another
way to describe this probability is that
¯
x
estimates
µ
within
±
3 mg/dl.)
2c answer
pnorm
(
191
,
mean =
188
,
sd =
4.1
,
lower.tail=
TRUE
)
-
pnorm
(
185
,
mean =
188
,
sd =
4.1
,
lower.tail=
TRUE
)
## [1] 0.5356528
the probability that an
SRS of size 100
from this population has an
average
blood cholesterol level
between 185 and 191 mg/dl is about .5358
2
2d
Now choose an SRS of 1000 men from this population. What is the probability that
¯
x
falls within
±
3 mg/dl
of
µ
?
2d answer
pnorm
(
191
,
mean =
188
,
sd =
1.297
,
lower.tail =
TRUE
)
-
pnorm
(
185
,
mean =
188
,
sd =
1.297
,
lower.tail =
TRUE
)
## [1] 0.979279
2e
Explain why the answers to questions (a) (c) and (d) are different.
2e answer
Part a is asking about the distribution of the blood cholesterol levels of the individuals in the population.
These values have a standard deviation of 41 mg/dl, so very few of them are in the narrow range 185 to 191
mg/dl, even though this interval is centered around the mean 188 mg/dl.
Parts c and d, are about the distribution of the averages of samples drawn from this population. In part
b, the standard deviation of the averages of simple random samples of size 100 drawn from this population
is 4.1. The spread of the averages is much narrower than the spread of the individuals, so the proportion
in this range is high. In part c, The larger sample is much more likely to give a precise estimate of
µ
, and
therefore the proportion of
¯
x
in this narrow range is even higher.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3: Gypsy Moths
Baldi and Moore exercise 13.9 (3rd edition), not in 4th edition
The gypsy moth is a serious threat to oak and aspen trees.
A state agriculture department places traps
throughout the state to detect the moths. When traps are checked periodically, the mean number of moth
trapped is only 0.5, but some traps have several moths.
The distribution of moth counts is discrete and
strongly skewed, with standard deviation of 0.7.
3a
Why do Baldi and Moore say that “the distribution of moth counts is discrete”?
3a answer
Because counts of moths in traps are integers.
3b
Using the 68-95-99.7 percent empirical rule for symmetric-mound shaped distributions, explain how we know
that the distribution of moth counts is strongly skewed.
3b answer
3 *
σ
for this population would be 2 * 0.7 = 2.1.
If the population were anything like a symmetric, mound-shaped distribution, the lower limit for the X
values would be 0.5 - 2.1 = -1.6.
Since the counts cannot be below 0, the distribution must have a long right tail to have a standard deviation
of 0.7 with a mean of 0.5.
3c
Assuming the mean of 0.5 moths per trap and the standard deviation of 0.7 moths per trap, what are the
mean and standard deviation of
¯
x
the average number of moths per trap in 50 traps?
3c answer
the expected value of
¯
x
is the population mean 0.5 moths/trap.
the standard deviation of
¯
x
is
0
.
7
√
50
= 0
.
099
3d
Explain why the shape of this sampling distribution of
¯
x
is approximately a normal curve.
3d answer
For larger sample sizes, the CLT applies and the sampling distribution will be a normal curve. For n = 50,
the sampling distribution of
¯
x
shows some departure from the normal curve.
4
4: Sleep Times Of College Students
Suppose a very large study has determined that the total daily sleep times of college students have a mean
of 7 hours, with a standard deviation of 2.5 hours.
4a
Explain why researchers may use the normal curve to answer questions about the behavior of averages of
simple random samples of size 100 from this population.
4a answer
The sample size is 100, so the Central Limit Theorem tells us the sampling distribution will be approximately
normal curve.
4b
Suppose researchers took a random sample of size 100 from this population.
What is the probability of
getting a sample average between 6.5 and 7.5 hours?
4b answer
std dev of
¯
x
=
2
.
5
√
100
=
.
25
hours
pnorm
(
7.5
,
mean =
7
,
sd =
.
25
,
lower.tail =
TRUE
)
-
pnorm
(
6.5
,
mean =
7
,
sd =
.
25
,
lower.tail =
TRUE
)
## [1] 0.9544997
4c
Suppose researchers took a random sample of size 100 from this population and found that the sample
average was 7.7 hours. Explain, using probability, why this is a rare event.
4c answer
pnorm
(
7.7
,
mean =
7
,
sd =
.
25
,
lower.tail =
FALSE
)
## [1] 0.00255513
If the mean is 7 hours, less than 3 in 1,000 samples of size 100 would give an
¯
x
value of 7.7 hours or larger.
5
5: Central Limit Theorem
Baldi and Moore exercise 13.8 (both editions)
Asked what the central limit theorem says, a student replies, “As you take larger and larger samples from a
population, the histogram of the sample values looks more and more Normal.”
Is the student right? Explain your answer
5 answer
No. The histogram of the sample values will look like the population distribution, whatever it might happen
to be.
(For example, if we roll a fair die many times, the histogram of sample values should look relatively flat—
probability close to 1/6 for each value 1, 2, 3, 4, 5, and 6.)
However, the central limit theorem says that the histogram of sample
averages
(from many large samples)
will look more and more normal.
6: Worker’s Compensation
Baldi and Moore exercise 13.10a (both editions)
An insurance company knows that, in the entire population of millions of insured workers, the mean annual
cost of workers’ compensation claims is
µ
= $650 per insured worker, and the standard deviation is
σ
=
$60,000. The distribution of losses is strongly right-skewed: Most policies have no loss, but a few have large
losses, up to millions of dollars.
If the company sells 90,000 policies, what is the shape, mean, and standard deviation of the sampling distri-
bution of the mean claim loss? Consider these 90,000 policies a random sample of all workers’ compensation
insurance policies.
6 answer
The central limit theorem says that, in spite of the skewness of the population distribution, the average
claim among 90,000 policies will be approximately Normal.
We also know that it has mean equal to the population mean ($650) and standard deviation equal to:
σ
√
n
=
60000
√
90000
=
60000
300
= 200
This is the distribution N($650, $200).
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Recommended textbooks for you
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill