sampling_dist_avg_key

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

156

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by BailiffAardvarkMaster964

PHW142 Practice Problems Using The Normal Curve For Sums And Averages KEY 1: Blood Cholesterol based on Baldi and Moore exercise 13.5 (both editions) A government sample survey plans to measure the blood cholesterol levels of a simple random sample (SRS) of men aged 20 to 34 years. The researchers will report the mean ¯ x from their sample as an estimate of the mean cholesterol level µ in this population. 1a Explain to someone who knows no statistics what it means to say that ¯ x is an “unbiased” of µ 1a answer In the long-run, the average of the values of the averages ¯ x from many samples is equal to µ . As Baldi and Moore say, ’ ¯ x is not systematically higher or lower than µ .’ 1b The sample result ¯ x is an unbiased estimator of the true population µ no matter what size SRS the study uses. Explain to someone who knows no statistics why a large sample gives more trustworthy results than a small sample. [Think about the concentration of the sample averages ¯ x around the mean µ .] 1b answer The sampling distribution of ¯ x is much more concentrated around µ for large samples. We’ll use the term ’precise’ to describe such estimates. As Baldi and Moore say, more informally, ’With large samples, ¯ x is more likely to be close to µ .’ 1

2: More On Blood Cholesterol based on Baldi and Moore exercise 13.7 (both editions) Suppose that in fact the blood cholesterol levels of all men aged 20 to 34 years follows the normal distribution with mean µ = 188 milligrams per deciliter (mg/dl) and standard deviation σ = 41 mg/dl. 2a What is the probability that an individual selected at random from this population has a blood cholesterol level between 185 and 191 mg/dl ? 2a answer pnorm ( 191 , mean = 188 , sd = 41 , lower.tail= TRUE ) - pnorm ( 185 , mean = 188 , sd = 41 , lower.tail= TRUE ) ## [1] 0.05832974 2b Choose a simple random sample (SRS) of 100 men from this population. What are the mean and standard deviation of the sampling distribution of ¯ x ? What is its shape, and how do you know the shape? 2b answer mean of the sampling distribution of ¯ x is the mean of the population, 188 mg/dl standard deviation of ¯ x is 41 √ 100 = 4.1 mg/dl shape = Normal curve, as the distribution of the blood cholesterol levels is normal curve. 2c For our SRS of 100 men, what is the probability that ¯ x takes a value between 185 and 191 mg/dl? (Another way to describe this probability is that ¯ x estimates µ within ± 3 mg/dl.) 2c answer pnorm ( 191 , mean = 188 , sd = 4.1 , lower.tail= TRUE ) - pnorm ( 185 , mean = 188 , sd = 4.1 , lower.tail= TRUE ) ## [1] 0.5356528 the probability that an SRS of size 100 from this population has an average blood cholesterol level between 185 and 191 mg/dl is about .5358 2

2d Now choose an SRS of 1000 men from this population. What is the probability that ¯ x falls within ± 3 mg/dl of µ ? 2d answer pnorm ( 191 , mean = 188 , sd = 1.297 , lower.tail = TRUE ) - pnorm ( 185 , mean = 188 , sd = 1.297 , lower.tail = TRUE ) ## [1] 0.979279 2e Explain why the answers to questions (a) (c) and (d) are different. 2e answer Part a is asking about the distribution of the blood cholesterol levels of the individuals in the population. These values have a standard deviation of 41 mg/dl, so very few of them are in the narrow range 185 to 191 mg/dl, even though this interval is centered around the mean 188 mg/dl. Parts c and d, are about the distribution of the averages of samples drawn from this population. In part b, the standard deviation of the averages of simple random samples of size 100 drawn from this population is 4.1. The spread of the averages is much narrower than the spread of the individuals, so the proportion in this range is high. In part c, The larger sample is much more likely to give a precise estimate of µ , and therefore the proportion of ¯ x in this narrow range is even higher. 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

3: Gypsy Moths Baldi and Moore exercise 13.9 (3rd edition), not in 4th edition The gypsy moth is a serious threat to oak and aspen trees. A state agriculture department places traps throughout the state to detect the moths. When traps are checked periodically, the mean number of moth trapped is only 0.5, but some traps have several moths. The distribution of moth counts is discrete and strongly skewed, with standard deviation of 0.7. 3a Why do Baldi and Moore say that “the distribution of moth counts is discrete”? 3a answer Because counts of moths in traps are integers. 3b Using the 68-95-99.7 percent empirical rule for symmetric-mound shaped distributions, explain how we know that the distribution of moth counts is strongly skewed. 3b answer 3 * σ for this population would be 2 * 0.7 = 2.1. If the population were anything like a symmetric, mound-shaped distribution, the lower limit for the X values would be 0.5 - 2.1 = -1.6. Since the counts cannot be below 0, the distribution must have a long right tail to have a standard deviation of 0.7 with a mean of 0.5. 3c Assuming the mean of 0.5 moths per trap and the standard deviation of 0.7 moths per trap, what are the mean and standard deviation of ¯ x the average number of moths per trap in 50 traps? 3c answer the expected value of ¯ x is the population mean 0.5 moths/trap. the standard deviation of ¯ x is 0 . 7 √ 50 = 0 . 099 3d Explain why the shape of this sampling distribution of ¯ x is approximately a normal curve. 3d answer For larger sample sizes, the CLT applies and the sampling distribution will be a normal curve. For n = 50, the sampling distribution of ¯ x shows some departure from the normal curve. 4

4: Sleep Times Of College Students Suppose a very large study has determined that the total daily sleep times of college students have a mean of 7 hours, with a standard deviation of 2.5 hours. 4a Explain why researchers may use the normal curve to answer questions about the behavior of averages of simple random samples of size 100 from this population. 4a answer The sample size is 100, so the Central Limit Theorem tells us the sampling distribution will be approximately normal curve. 4b Suppose researchers took a random sample of size 100 from this population. What is the probability of getting a sample average between 6.5 and 7.5 hours? 4b answer std dev of ¯ x = 2 . 5 √ 100 = . 25 hours pnorm ( 7.5 , mean = 7 , sd = . 25 , lower.tail = TRUE ) - pnorm ( 6.5 , mean = 7 , sd = . 25 , lower.tail = TRUE ) ## [1] 0.9544997 4c Suppose researchers took a random sample of size 100 from this population and found that the sample average was 7.7 hours. Explain, using probability, why this is a rare event. 4c answer pnorm ( 7.7 , mean = 7 , sd = . 25 , lower.tail = FALSE ) ## [1] 0.00255513 If the mean is 7 hours, less than 3 in 1,000 samples of size 100 would give an ¯ x value of 7.7 hours or larger. 5

5: Central Limit Theorem Baldi and Moore exercise 13.8 (both editions) Asked what the central limit theorem says, a student replies, “As you take larger and larger samples from a population, the histogram of the sample values looks more and more Normal.” Is the student right? Explain your answer 5 answer No. The histogram of the sample values will look like the population distribution, whatever it might happen to be. (For example, if we roll a fair die many times, the histogram of sample values should look relatively flat— probability close to 1/6 for each value 1, 2, 3, 4, 5, and 6.) However, the central limit theorem says that the histogram of sample averages (from many large samples) will look more and more normal. 6: Worker’s Compensation Baldi and Moore exercise 13.10a (both editions) An insurance company knows that, in the entire population of millions of insured workers, the mean annual cost of workers’ compensation claims is µ = $650 per insured worker, and the standard deviation is σ = $60,000. The distribution of losses is strongly right-skewed: Most policies have no loss, but a few have large losses, up to millions of dollars. If the company sells 90,000 policies, what is the shape, mean, and standard deviation of the sampling distri- bution of the mean claim loss? Consider these 90,000 policies a random sample of all workers’ compensation insurance policies. 6 answer The central limit theorem says that, in spite of the skewness of the population distribution, the average claim among 90,000 policies will be approximately Normal. We also know that it has mean equal to the population mean ($650) and standard deviation equal to: σ √ n = 60000 √ 90000 = 60000 300 = 200 This is the distribution N($650, $200). 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

sampling_dist_avg_key

Related Documents