MNET 315 Ch 11 Text Nonparametric Tests (Missing from book)
pdf
keyboard_arrow_up
School
New Jersey Institute Of Technology *
*We aren’t endorsed by this school
Course
315
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
53
Uploaded by LieutenantKookabura2603
C H A P T E R 11
582
Nonparametric Tests
11.1
The Sign Test
11.2
The Wilcoxon Tests
Case Study
11.3
The Kruskal-Wallis Test
11.4
Rank Correlation
11.5
The Runs Test
Uses and Abuses
Real Statistics—Real Decisions
Technology
In a recent year, the most common form of reported identity theft was employment- or tax-related fraud, which accounted for 34% of cases. The second most common form was credit card fraud, which accounted for 33% of cases.
583
Where You’re Going
In this chapter, you will study additional statistical tests that do not require the population distribution to meet any specific conditions. Each of these tests has usefulness in real-life applications.
With the data above, the number of fraud complaints F
and the number of identity theft victims V
can be related by the regression equation V
=
0.145
F
+
429.103. The correlation coefficient is approximately 0.915, so there is a strong positive correlation. You can determine that the correlation is significant by using Table 11 in Appendix B. Further analysis of the data, however, can show that the variables do not appear to have a bivariate normal distribution, which is one of the requirements for using the Pearson correlation coefficient.
So, although a simple correlation test might indicate a relationship between the number of fraud complaints and the number of identity theft victims, one might question the results because the data do not fit the requirements for the test. Similar tests you will study in this chapter, such as Spearman’s rank correlation test, will give you additional information. The Spearman’s rank correlation coefficient for this data is approximately 0.962. At a
=
0.01, there is in fact a significant correlation between the number of fraud complaints and the number of identity theft victims for each state.
Fraud complaints
Identity theft victims
x
y
Number of Fraud Complaints
and Identity Theft Victims
for 25 States
20,000
40,000
60,000
80,000
100,000 120,000
5,000
10,000
15,000
20,000
25,000
Where You’ve Been
Up to this point in the text, you have studied dozens of different statistical formulas and tests that can help you in a decision-making process. Specific conditions had to be satisfied in order to use these formulas and tests.
Suppose it is believed that as the number of fraud complaints in a state increases, the number of identity theft victims also increases. Can this belief be supported by actual data? The table below shows the numbers of fraud complaints and the numbers of identify theft victims for 25 randomly selected states in a recent year. (Source: Federal Trade Commission)
Fraud complaints
39,344
45,528
33,745
21,117
7593
117,189
5768
7800
14,635
Identity theft victims
4007
8748
6203
4933
1484
12,787
789
1348
2532
Fraud complaints
5642
48,594
107,557
4600
25,636
7525
112,006
77,213
Identity theft victims
1170
8251
17,430
711
3993
1352
20,205
11,009
Fraud complaints
20,350
22,385
7206
2775
51,036
12,750
40,423
9948
Identity theft victims
3337
4312
1216
503
5718
2540
8310
1093
The Sign Test
11.1
584
CHAPTER 11 Nonparametric Tests
What You Should Learn
How to use the sign test to test a population median
How to use the paired-sample sign test to test the difference between two population medians (dependent samples)
The Sign Test for a Population Median The Paired-Sample Sign Test
The Sign Test for a Population Median
Many of the hypothesis tests studied so far have imposed one or more requirements for a population distribution. For instance, some tests require that a population must have a normal distribution, and other tests require that population variances be equal. What should you do when such requirements cannot be met? For these cases, statisticians have developed hypothesis tests that are “distribution free.” Such tests are called nonparametric tests.
A nonparametric test
is a hypothesis test that does not require any specific conditions concerning the shapes of population distributions or the values of population parameters.
DEFINITION
Nonparametric tests are usually easier to perform than corresponding parametric tests. They are, however, usually less efficient than parametric tests. Stronger evidence is required to reject a null hypothesis using the results of a nonparametric test. Consequently, whenever possible, you should use a parametric test. One of the easiest nonparametric tests to perform is the sign test.
The only condition necessary to use a sign test is that the sample is randomly selected.
The sign test
is a nonparametric test that can be used to test a population median against a hypothesized value k.
DEFINITION
The sign test for a population median can be left-tailed, right-tailed, or two-tailed. The null and alternative hypotheses for each type of test are shown below.
Left-tailed test: H
0
: median
Ú
k
and H
a
: median
6
k
Right-tailed test: H
0
: median
…
k
and H
a
: median
7
k
Two-tailed test: H
0
: median
=
k
and H
a
: median
≠
k
To use the sign test, first compare each entry in the sample with the hypothesized median k
. When the entry is below the median, assign it a -
sign; when the entry is above the median, assign it a +
sign; and when the entry is equal to the median, assign it a 0. Then compare the number of +
and -
signs. (The 0’s are ignored.) When there is a large difference between the number of +
signs and the number of -
signs, it is likely that the median is different from the hypothesized value and you should reject the null hypothesis.
Study Tip
For many nonparametric tests, statisticians test the median instead of the mean.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.1 The Sign Test
585
Table 8 in Appendix B lists the critical values for the sign test for selected levels of significance and sample sizes. When the sign test is used, the sample size n
is the total number of +
and -
signs. When the sample size is greater than 25, you can use the standard normal distribution to find the critical values.
When n
…
25, the test statistic for the sign test is x
, the smaller number of +
or -
signs.
When n
7
25, the test statistic
for the sign test is
z
=
1
x
+
0.5
2
-
0.5
n
1
n
2
where x
is the smaller number of +
or -
signs and n
is the sample size, i.e., the total number of +
and -
signs.
Test Statistic for the Sign Test
Because x
is defined to be the smaller number of +
or -
signs, the rejection region is always in the left tail. Consequently, the sign test for a population median is always a left-tailed test or a two-tailed test. When the test is two-tailed, use only the left-tailed critical value. (When x
is defined to be the larger number of +
or -
signs, the rejection region is always in the right tail. Right-tailed sign tests are presented in the exercises.)
Performing a Sign Test for a Population Median
In Words In Symbols
1.
Verify that the sample is random.
2.
Identify the claim. State the null State H
0
and H
a
. and alternative hypotheses.
3.
Specify the level of significance. Identify a
.
4.
Determine the sample size n
by n
=
total number of
assigning +
signs, -
signs, and 0’s +
and -
signs to the sample data.
5.
Determine the critical value. When n
…
25, use Table 8 in Appendix B.
When n
7
25, use Table 4 in Appendix B.
6.
Find the test statistic. When n
…
25, use x
=
smaller number of +
or -
signs.
When n
7
25, use
z
=
1
x
+
0.5
2
-
0.5
n
1
n
2
.
7.
Make a decision to reject or fail If the test statistic is less than
to reject the null hypothesis. or equal to the critical value, then reject H
0
. Otherwise, fail to reject H
0
. 8.
Interpret the decision in the context of the original claim.
GUIDELINES
Study Tip
Because the 0’s are ignored, there are two possible outcomes when comparing a data entry with a hypothesized median: a +
or a -
sign. If the median is k
, then about half of the values will be above k
and half will be below. As such, the probability for each sign is 0.5. Table 8 in Appendix B is constructed using the binomial distribution where p
=
0.5.
When n
7
25, you can use the normal approximation (with a continuity correction) for the binomial. In this case, use m
=
np
=
0.5
n
and s
=
1
npq
=
1
n
2
.
586
CHAPTER 11 Nonparametric Tests
Using the Sign Test
A website administrator for a company claims that the median number of visitors per day to the company’s website is no more than 1500. An employee doubts the accuracy of this claim. The numbers of visitors per day for 20 randomly selected days are listed below. At a
=
0.05, can the employee reject the administrator’s claim?
1469 1462 1634 1602 1500 1463 1476 1570 1544 1452 1487 1523 1525 1548 1511 1579 1620 1568 1492 1649
SOLUTION
The claim is “the median number of visitors per day to the company’s website is no more than 1500.” So, the null and alternative hypotheses are
H
0
: median
…
1500 (Claim)
and H
a
: median
7
1500.
To compare each data entry with the hypothesized median 1500, subtract 1500 from each data entry and assign the appropriate sign or 0. For instance, here are the comparisons for the first row of data entries.
1469
-
1500
=
-
31, assign a -
sign
1462
-
1500
=
-
38, assign a -
sign
1634
-
1500
=
+
134, assign a +
sign
1602
-
1500
=
+
102, assign a +
sign
1500
-
1500
=
0, assign a 0
The results of comparing each data entry with the hypothesized median 1500 are shown.
-
-
+
+
0 -
-
+
+
-
-
+
+
+
+
+
+
+
-
+
You can see that there are 7 -
signs and 12 +
signs. So, n
=
12
+
7
=
19. Because n
…
25, use Table 8 in Appendix B to find the critical value. The test is a one-tailed test with a
=
0.05 and n
=
19. So, the critical value is 5. Because n
…
25, the test statistic x
is the smaller number of +
or -
signs. So, x
=
7. Because x
=
7 is greater than the critical value, the employee should fail to reject the null hypothesis.
Interpretation
There is not enough evidence at the 5% level of significance for the employee to reject the website administrator’s claim that the median number of visitors per day to the company’s website is no more than 1500.
TRY IT YOURSELF 1
A real estate agency claims that the median number of days a home is on the market in its city is greater than 120. A homeowner wants to verify the accuracy of this claim. The numbers of days on the market for 24 randomly selected homes are shown below. At a
=
0.025, can the homeowner support the agency’s claim?
118 167 72 79 76 106 102 113 73 119 162 114 120 93 135 147 77 157 115 88 152 70 65 91
Answer: Page T1
EXAMPLE 1
SECTION 11.1 The Sign Test
587
Using the Sign Test
An organization claims that the median annual attendance for museums in the United States is at least 39,000. A random sample of 125 museums reveals that the annual attendances for 79 museums were less than 39,000, the annual attendances for 42 museums were more than 39,000, and the annual attendances for 4 museums were 39,000. At a
=
0.01, is there enough evidence to reject the organization’s claim? (Adapted from American Association of Museums)
SOLUTION
The claim is “the median annual attendance for museums in the United States is at least 39,000.” So, the null and alternative hypotheses are
H
0
: median
Ú
39,000 (Claim)
and H
a
: median
6
39,000.
Because n
7
25, use Table 4 in Appendix B, the Standard Normal Table, to find the critical value. Because the test is a left-tailed test with a
=
0.01, the critical value is z
0
=
-
2.33. Of the 125 museums, there are 79 -
signs and 42 +
signs. When the 0’s are ignored, the sample size is
n
=
79
+
42
=
121, and x
=
42.
With these values, the test statistic is
z
=
1
42
+
0.5
2
-
0.5
1
121
2
2
121
2
=
-
18
5.5
≈
-
3.27.
The figure shows the location of the rejection region and the test statistic z
.
Because z
is less than the critical value, it is in the rejection region. So, you reject the null hypothesis.
z
−
3
−
4
−
2
−
1
0
1
2
3
4
z
0
= −
2.33
α
= 0.01
z
≈
−
3.27
Interpretation
There is enough evidence at the 1% level of significance to reject the organization’s claim that the median annual attendance for museums in the United States is at least 39,000.
TRY IT YOURSELF 2
An organization claims that the median age of museum workers in the United States is 46 years old. A random sample of 95 museum workers reveals that 57 museum workers were less than 46 years old, 34 museum workers were more than 46 years old, and 4 museum workers were 46 years old. At a
=
0.10, can you reject the organization’s claim? (Adapted from American Association of Museums)
Answer: Page T1
EXAMPLE 2
Picturing the World
For recent college graduates in the United States, a financial analyst claims that the median auto loan is $21,883. A random sample of recent college graduates reveals that the loans for 42 graduates were less than $21,883 and the loans for 35 graduates were greater than $21,883. (Adapted from lendedu.com)
Would you use a parametric test or a nonparametric test to test the claim that for recent college graduates in the United States, the median auto loan is $21,883? Explain your reasoning.
Study Tip
When performing a two-tailed sign test, remember to use only the left-tailed critical value.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
588
CHAPTER 11 Nonparametric Tests
The Paired-Sample Sign Test
In Section 8.3, you learned how to use a t
@
test for the difference between means of dependent samples. That test required both populations to be normally distributed. When the parametric condition of normality cannot be satisfied, you can use the paired-sample sign test to test the difference between two population medians. To perform the paired-sample sign test for the difference between two population medians, these conditions must be met.
1.
A sample must be randomly selected from each population.
2.
The samples must be dependent (paired).
The paired-sample sign test can be left-tailed, right-tailed, or two-tailed. This test is similar to the sign test for a single population median. However, instead of comparing each data entry with a hypothesized median and recording a +
, -
, or 0, you find the difference between corresponding data entries and record the sign of the difference. Generally, to find the difference, subtract the entry representing the second variable from the entry representing the first variable. Then compare the number of +
and -
signs. (The 0’s are ignored.) When the number of +
signs is approximately equal to the number of -
signs, you should fail to reject the null hypothesis. When there is a large difference between the number of +
signs and the number of -
signs, you should reject the null hypothesis.
Performing a Paired-Sample Sign Test
In Words In Symbols
1.
Verify that the samples are random and dependent.
2.
Identify the claim. State the null State H
0
and H
a
. and alternative hypotheses.
3.
Specify the level of significance. Identify a
.
4.
Determine the sample size n
by n
=
total number of finding the difference for each +
and -
signs data pair. Assign a +
sign for a positive difference, a -
sign for a negative difference, and a 0 for no difference.
5.
Determine the critical value. Use Table 8 in Appendix B.
6.
Find the test statistic. x
=
smaller number of +
or -
signs
7.
Make a decision to reject or fail If the test statistic is less than to reject the null hypothesis. or equal to the critical value, then reject H
0
. Otherwise, fail to reject H
0
. 8.
Interpret the decision in the context of the original claim.
GUIDELINES
SECTION 11.1 The Sign Test
589
Using the Paired-Sample Sign Test
A psychologist claims that the number of repeat offenders will decrease when first-time offenders complete a particular rehabilitation course. You randomly select 10 prisons and record the number of repeat offenders during a two-year period. Then, after first-time offenders complete the course, you record the number of repeat offenders at each prison for another two-year period. The results are shown in the table below. At a
=
0.025, can you support the psychologist’s claim?
Prison
1
2
3
4
5
6
7
8
9
10
Before
21
34
9
45
30
54
37
36
33
40
After
19
22
16
31
21
30
22
18
17
21
SOLUTION
To support the psychologist’s claim, use the null and alternative hypotheses below.
H
0
: The number of repeat offenders will not decrease.
H
a
: The number of repeat offenders will decrease. (Claim)
The table below shows the sign of the differences between the “before” and “after” data.
Prison
1
2
3
4
5
6
7
8
9
10
Before
21
34
9
45
30
54
37
36
33
40
After
19
22
16
31
21
30
22
18
17
21
Sign
+
+
-
+
+
+
+
+
+
+
You can see that there is 1 -
sign and there are 9 +
signs. So, n
=
1
+
9
=
10. Because the test is a one-tailed test with a
=
0.025 and n
=
10, the critical value is 1. The test statistic x
is the smaller number of +
or -
signs. So, x
=
1. Because x
is equal to the critical value, you reject the null hypothesis.
Interpretation
There is enough evidence at the 2.5% level of significance to support the psychologist’s claim that the number of repeat offenders will decrease.
TRY IT YOURSELF 3
A medical researcher claims that a new vaccine will decrease the number of colds in adults. You randomly select 14 adults and record the number of colds each has in a one-year period. After giving the vaccine to each adult, you again record the number of colds each has in a one-year period. The results are shown in the table below. At a
=
0.05, can you support the researcher’s claim?
Adult
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Before vaccine
3
4
2
1
3
6
4
5
2
0
2
5
3
3
After vaccine
2
1
0
1
1
3
3
2
2
2
3
4
3
2
Answer: Page T1
EXAMPLE 3
11.1 EXERCISES
590
CHAPTER 11 Nonparametric Tests
For Extra Help:
MyLab Statistics
Building Basic Skills and Vocabulary
1.
What is a nonparametric test? How does a nonparametric test differ from a parametric test? What are the advantages and disadvantages of using a nonparametric test?
2.
When the sign test is used, what population parameter is being tested?
3.
Describe the test statistic for the sign test when the sample size n
is less than or equal to 25 and when n
is greater than 25.
4.
In your own words, explain why the hypothesis test discussed in this section is called the sign test.
5.
Explain how to use the sign test to test a population median.
6.
List the two conditions that must be met in order to use the paired-sample sign test.
Using and Interpreting Concepts
Performing a Sign Test
In Exercises 7–22, (a) identify the claim and state H
0
and H
a
,
(b) find the critical value, (c) find the test statistic, (d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret the decision in the context of the original claim.
7. Credit Card Charges
A financial service accountant claims that the median credit card balance of college students is more than $300. You randomly select the credit card accounts of 12 college students and record the balance for each account. The balances (in dollars) are listed below. At a
=
0.01, can you support the accountant’s claim? (Adapted from Sallie Mae)
346.71 382.59
255.03 202.17
309.80 265.88
299.41 270.38
296.54 318.46
245.92 309.47
8.
Temperature
A meteorologist claims that the median daily high temperature for the month of July in Pittsburgh is 83
°
Fahrenheit. The high temperatures (in degrees Fahrenheit) for 15 randomly selected July days in Pittsburgh are listed below. At a
=
0.01, is there enough evidence to reject the meteorologist’s claim? (Adapted from U.S. National Oceanic and Atmospheric Administration)
74
79
81
86
90
79
81
83
81
74
78
76
84
82
85
9. Sales Prices of Homes
A real estate agent claims that the median sales price of new privately owned one-family homes sold in a recent month is $253,000 or less. The sales prices (in dollars) of 10 randomly selected homes are listed below. At a
=
0.05, is there enough evidence to reject the agent’s claim? (Adapted from National Association of Realtors)
262,600 300,100
269,200 249,400
183,400 253,500 325,600
223,500 241,300
271,300
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.1 The Sign Test
591
10. Temperature
During a weather report, a meteorologist claims that the median daily high temperature for the month of January in San Diego is 66
°
Fahrenheit. The high temperatures (in degrees Fahrenheit) for 16 randomly selected January days in San Diego are listed below. At a
=
0.01, can you reject the meteorologist’s claim? (Adapted from U.S. National Oceanic and Atmospheric Administration)
78 74 72 72 70 70 72 78 74 71 72 74 77 79 75 73
11. Credit Card Debt
A financial services institution claims that the median amount of credit card debt for families holding such debts is at least $2300. In a random sample of 104 families with credit card debt, the debts of 60 families were less than $2300 and the debts of 44 families were greater than $2300. At a
=
0.02, can you reject the institution’s claim? (Adapted from Board of Governors of the Federal Reserve System)
12. Financial Debt
A financial services accountant claims that the median amount of financial debt for families holding such debts is less than $60,000. In a random sample of 70 families with financial debt, the debts of 24 families were less than $60,000 and the debts of 46 families were greater than $60,000. At a
=
0.025, can you support the accountant’s claim? (Adapted from Board of Governors of the Federal Reserve System)
13. Social Media
A research group claims that the median age of the users of a social media website is greater than 30 years old. In a random sample of 24 users, 11 are less than 30 years old, 10 are more than 30 years old, and 3 are 30 years old. At a
=
0.01, can you support the research group’s claim? (Adapted from Pew Research Center)
14. Social Networking
A research group claims that the median age of the users of a social networking website is less than 32 years old. In a random sample of 20 users, 5 are less than 32 years old, 13 are more than 32 years old, and 2 are 32 years old. At a
=
0.05, can you support the research group’s claim? (Adapted from Pew Research Center)
15. Unit Size
A renters’ organization claims that the median number of rooms in renter-occupied units is four. You randomly select 120 renter-occupied units and obtain the results shown below. At a
=
0.05, can you reject the organization’s claim? (Adapted from U.S. Census Bureau)
Unit size
Number of units
Fewer than 4 rooms
29
4 rooms
38
More than 4 rooms
53
Square footage
Number of units
Less than 1000
13
1000
2
More than 1000
7
TABLE FOR EXERCISE 15 TABLE FOR EXERCISE 16
16. Square Footage
A renters’ organization claims that the median square footage of renter-occupied units is 1000 square feet. You randomly select 22 renter-occupied units and obtain the results shown above. At a
=
0.10, can you reject the organization’s claim? (Adapted from U.S. Census Bureau)
17. Hourly Wages
A labor organization claims that the median hourly wage of computer systems analysts is $41.93. In a random sample of 45 computer systems analysts, 18 earn less than $41.93 per hour, 25 earn more than $41.93 per hour, and 2 earn $41.93 per hour. At a
=
0.01, can you reject the labor organization’s claim? (Adapted from U.S. Bureau of Labor Statistics)
592
CHAPTER 11 Nonparametric Tests
18. Hourly Wages
A labor organization claims that the median hourly wage of podiatrists is at least $60.01. In a random sample of 23 podiatrists, 17 earn less than $60.01 per hour, 5 earn more than $60.01 per hour, and 1 earns $60.01 per hour. At a
=
0.05, can you reject the labor organization’s claim? (Adapted from U.S. Bureau of Labor Statistics)
19. Lower Back Pain
A physician claims that lower back pain intensity scores will decrease after receiving acupuncture treatment. The table shows the lower back pain intensity scores for eight patients before and after receiving acupuncture for eight weeks. At a
=
0.05, is there enough evidence to support the physician’s claim? (Adapted from Archives of Internal Medicine)
Patient
1
2
3
4
5
6
7
8
Intensity score (before)
59.2
46.3
65.4
74.0
79.3
81.6
44.4
59.1
Intensity score (after)
12.4
22.5
18.6
59.3
70.1
70.2
13.2
25.9
20. Lower Back Pain
A physician claims that lower back pain intensity scores will decrease after taking anti-inflammatory drugs. The table shows the lower back pain intensity scores for 12 patients before and after taking anti-inflammatory drugs for 8 weeks. At a
=
0.05, is there enough evidence to support the physician’s claim? (Adapted from Archives of Internal Medicine)
Patient
1
2
3
4
5
6
Intensity score (before)
71.0
42.1
79.1
57.5
64.0
60.4
Intensity score (after)
60.1
23.4
86.2
62.1
44.2
49.7
Patient
7
8
9
10
11
12
Intensity score (before)
68.3
95.2
48.1
78.6
65.4
59.9
Intensity score (after)
58.3
72.6
51.8
82.5
63.2
47.9
21. Improving SAT Scores
A tutoring agency claims that by completing a special course, students will improve their math SAT scores. In part of a study, 12 students take the math part of the SAT, complete the special course, then take the math part of the SAT again. The students’ scores are shown below. At a
=
0.05, is there enough evidence to support the agency’s claim?
Student
1
2
3
4
5
6
Score on first SAT
300
450
350
430
300
470
Score on second SAT
300
520
400
410
300
480
Student
7
8
9
10
11
12
Score on first SAT
530
200
200
350
360
250
Score on second SAT
700
250
390
350
480
300
SECTION 11.1 The Sign Test
593
22. SAT Scores
A guidance counselor claims that students who take the SAT twice will improve their scores the second time they take the SAT. The table shows both math SAT scores for 12 students who took the SAT twice. At a
=
0.01, can you support the guidance counselor’s claim?
Student
1
2
3
4
5
6
Score on first SAT
440
510
420
450
620
450
Score on second SAT
440
570
510
470
610
450
Student
7
8
9
10
11
12
Score on first SAT
350
470
320
510
630
570
Score on second SAT
370
530
290
500
640
600
23. Feeling Your Age
A research organization conducts a survey by randomly selecting adults and asking each, “How do you feel relative to your age?” The results are shown in the figure. (Adapted from Pew Research Center)
Younger
11
Older
3
My age
9
(a) Use a sign test to test the null hypothesis that the proportion of adults who feel older is equal to the proportion of adults who feel younger. Assign a +
sign to each adult who responded “older,” assign a -
sign to each adult who responded “younger,” and assign a 0 to each adult who responded “my age.” Use a
=
0.05.
(b) What can you conclude?
24. Contacting Parents
A research organization conducts a survey by randomly selecting adults and asking each, “How frequently do you contact your parents by phone?” The results are shown in the figure. (Adapted from Pew Research Center)
Weekly
12
Daily
8
Other
6
(a) Use a sign test to test the null hypothesis that the proportion of adults who contact their parents by phone weekly is equal to the proportion of adults who contact their parents by phone daily. Assign a +
sign to each adult who responded “weekly,” assign a -
sign to each adult who responded “daily,” and assign a 0 to each adult who responded “other.” Use a
=
0.05.
(b) What can you conclude?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
594
CHAPTER 11 Nonparametric Tests
Extending Concepts
More on Sign Tests
When you are using a sign test for n
7
25
and the test is left-tailed, you know you can reject the null hypothesis when the test statistic
z
=
1
x
+
0.5
2
-
0.5n
1
n
2
is less than or equal to the left-tailed
critical value, where x is the smaller
number of +
or -
signs. For a right-tailed test, you can reject the null hypothesis when the test statistic
z
=
1
x
-
0.5
2
-
0.5n
1
n
2
is greater than or equal to the right-tailed
critical value, where x is the larger
number of +
or -
signs.
In Exercises 25–28, use a right-tailed test and (a) identify the claim and state H
0
and H
a
,
(b) find the critical value, (c) find the test statistic, (d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret the decision in the context of the original claim.
25. Weekly Earnings
A labor organization claims that the median weekly earnings of female workers is less than or equal to $765. To test this claim, you randomly select 50 female workers and ask each to provide her weekly earnings. The table shows the results. At a
=
0.01, can you reject the organization’s claim? (Adapted from U.S. Bureau of Labor Statistics)
Weekly earnings
Number of workers
Less than $765
18
$765
3
More then $765
29
Weekly earnings
Number of workers
Less than $950
23
$950
2
More than $950
45
TABLE FOR EXERCISE 25 TABLE FOR EXERCISE 26
26. Weekly Earnings
A labor organization claims that the median weekly earnings of male workers is greater than $950. To test this claim, you randomly select 70 male workers and ask each to provide his weekly earnings. The table shows the results. At a
=
0.01, can you support the organization’s claim? (Adapted from U.S. Bureau of Labor Statistics)
27. Ages of Brides
A marriage counselor claims that the median age of brides at the time of their first marriage is less than or equal to 27 years old. In a random sample of 65 brides, 24 are less than 27 years old, 35 are more than 27 years old, and 6 are 27 years old. At a
=
0.05, can you reject the counselor’s claim? (Adapted from U.S. Census Bureau)
28. Ages of Grooms
A marriage counselor claims that the median age of grooms at the time of their first marriage is greater than 28 years old. In a random sample of 56 grooms, 33 are less than 28 years old and 23 are more than 28 years old. At a
=
0.05, can you support the counselor’s claim? (Adapted from U.S. Census Bureau)
The Wilcoxon Tests
11.2
SECTION 11.2 The Wilcoxon Tests
595
What You Should Learn
How to use the Wilcoxon signed-rank test to determine whether two dependent samples are selected from populations having the same distribution
How to use the Wilcoxon rank sum test to determine whether two independent samples are selected from populations having the same distribution
The Wilcoxon Signed-Rank Test The Wilcoxon Rank Sum Test
The Wilcoxon Signed-Rank Test
In this section, you will study the Wilcoxon signed-rank test and the Wilcoxon rank sum test. Unlike the sign test from Section 11.1, the strength of these two nonparametric tests is that each considers the magnitude, or size, of the data entries.
In Section 8.3, you used a t
@
test together with dependent samples to determine whether there was a difference between two populations. To use the t
@
test to test such a difference, you must assume (or know) that the dependent samples are randomly selected from populations having a normal distribution. But, what should you do when the normality assumption cannot be made? Instead of using the two-sample t
@
test, you can use the Wilcoxon signed-rank test.
The Wilcoxon signed-rank test
is a nonparametric test that can be used to determine whether two dependent
samples were selected from populations having the same distribution.
DEFINITION
Performing a Wilcoxon Signed-Rank Test
In Words In Symbols
1.
Verify that the samples are random and dependent.
2.
Identify the claim. State the null State H
0
and H
a
. and alternative hypotheses.
3.
Specify the level of significance. Identify a
.
4.
Determine the sample size n
, which is the number of pairs of data for which the difference is not 0.
5.
Determine the critical value. Use Table 9 in Appendix B.
6.
Find the test statistic w
s
. Headers: Sample 1, Sample 2, Difference, Absolute value, Rank,
and Signed rank.
Signed rank takes on the same sign as its corresponding difference.
a.
Complete a table using the headers listed at the right.
b.
Find the sum of the positive ranks and the sum of the negative ranks.
c.
Select the smaller absolute value of the sums.
7.
Make a decision to reject or fail If w
s
is less than or equal
to reject the null hypothesis. to the critical value, then reject H
0
. Otherwise, fail to reject H
0
.
8.
Interpret the decision in the context of the original claim.
GUIDELINES
Study Tip
Recall that the absolute value of a number is its value, disregarding its sign. A pair of vertical bars, 0
0
, is used to denote absolute value. For example, 0
3
0
=
3 and 0
-
7
0
=
7.
596
CHAPTER 11 Nonparametric Tests
Performing a Wilcoxon Signed-Rank Test
A golf club manufacturer claims that golfers can lower their scores by using the manufacturer’s newly designed golf clubs. The table shows the scores of 10 golfers while using the old design and while using the new design on the same golf course. At a
=
0.05, can you support the manufacturer’s claim?
Golfer
1
2
3
4
5
6
7
8
9
10
Score (old design)
89
84
96
74
91
85
95
82
92
81
Score (new design)
83
83
92
76
91
80
87
85
90
77
SOLUTION
The claim is “golfers can lower their scores.” To test this claim, use the null and alternative hypotheses below.
H
0
: The new design does not lower scores.
H
a
: The new design lowers scores. (Claim)
This Wilcoxon signed-rank test is a one-tailed test with a
=
0.05, and because one data pair has a difference of 0, n
=
9 instead of 10. From Table 9 in Appendix B, the critical value is 8. To find the test statistic w
s
, complete a table as shown below.
Score (old design)
Score (new design)
Difference
Absolute value
Rank
Signed rank
89
83
6
6
8
8
84
83
1
1
1
1
96
92
4
4
5.5
5.5
74
76
-
2
2
2.5
-
2.5
91
91
0
0
—
—
85
80
5
5
7
7
95
87
8
8
9
9
82
85
-
3
3
4
-
4
92
90
2
2
2.5
2.5
81
77
4
4
5.5
5.5
The sum of the negative ranks is
-
2.5
+
1
-
4
2
=
-
6.5.
The sum of the positive ranks is
8
+
1
+
5.5
+
7
+
9
+
2.5
+
5.5
=
38.5.
The test statistic is the smaller absolute value of these two sums. Because 0
-
6.5
0
6
0
38.5
0
, the test statistic is w
s
=
6.5. Because the test statistic is less than the critical value, that is, 6.5
6
8, you reject the null hypothesis.
Interpretation
There is enough evidence at the 5% level of significance to support the claim that golfers can lower their scores by using the newly designed clubs.
EXAMPLE 1
Study Tip
Do not assign a rank to any difference of 0. In the case of a tie between data entries, use the average of the corresponding ranks. For instance, when two data entries are tied for the fifth rank, use the average of 5 and 6, which is 5.5, as the rank for both entries. The next data entry will be assigned a rank of 7, not 6.
When three entries are tied for the fifth rank, use the average of 5, 6, and 7, which is 6, as the rank for all three data entries. The next data entry will be assigned a rank of 8.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.2 The Wilcoxon Tests
597
TRY IT YOURSELF 1
A quality control inspector wants to test the claim that a spray-on water repellent is effective. To test this claim, he selects 12 pieces of fabric, sprays water on each, and measures the amount of water repelled (in milliliters). He then applies the water repellent and repeats the experiment. The table shows the results. At a
=
0.01, can he conclude that the water repellent is effective?
Fabric
1
2
3
4
5
6
7
8
9
10
11
12
No repellent
8
7
7
4
6
10
9
5
9
11
8
4
Repellent applied
15
12
11
6
6
8
8
6
12
8
14
8
Answer: Page T1
The Wilcoxon Rank Sum Test
In Sections 8.1 and 8.2, you used a z
@
test (
s
1
and s
2
known) or a t
@
test (
s
1
and s
2
unknown) together with independent samples to determine whether there was a difference between two populations. To use a z
@
test or a t
@
test to test such a difference, you must assume (or know) that the samples are random and independent, and either the populations are normally distributed or each sample size is at least 30. But, what should you do when the normality and sample size assumptions cannot be made? You can still compare the populations using the Wilcoxon rank sum test.
The Wilcoxon rank sum test
is a nonparametric test that can be used to determine whether two independent
samples were selected from populations having the same distribution.
DEFINITION
A requirement for the Wilcoxon rank sum test is that the sample size of each sample must be at least 10. When calculating the test statistic for the Wilcoxon rank sum test, let n
1
represent the sample size of the smaller sample and n
2
represent the sample size of the larger sample. When the two samples have the same size, it does not matter which one is n
1
or n
2
.
When calculating the sum of the ranks R
, combine both samples and rank the combined data. Then sum the ranks for the smaller of the two samples. When the two samples have the same size, you can use the ranks from either sample, but you must use the ranks from the sample you associate with n
1
.
For two independent samples, the test statistic z
for the Wilcoxon rank sum test is
z
=
R
-
m
R
s
R
where R
is the sum of the ranks for the smaller sample,
m
R
=
n
1
1
n
1
+
n
2
+
1
2
2
,
and
s
R
=
B
n
1
n
2
1
n
1
+
n
2
+
1
2
12
.
Test Statistic for the Wilcoxon Rank Sum Test
Picturing the World
To help determine when knee arthroscopy patients can resume driving after surgery, the driving reaction times (in milliseconds) of 10 right knee arthroscopy patients were measured before surgery and 4 weeks after surgery using a computer-linked car simulator. The table shows the results. (Adapted from Knee Surgery, Sports Traumatology, Arthroscopy Journal)
Patient
Reaction time before surgery
Reaction time 4 weeks after surgery
1
720
730
2
750
645
3
735
745
4
730
640
5
755
660
6
745
670
7
730
650
8
725
730
9
770
675
10
700
705
At A
=
0.05, can you conclude that the reaction times changed significantly four weeks after surgery?
Study Tip
Use the Wilcoxon signed-rank test for dependent samples and the Wilcoxon rank sum test for independent samples.
598
CHAPTER 11 Nonparametric Tests
Performing a Wilcoxon Rank Sum Test
In Words In Symbols
1.
Verify that the samples are random and independent.
2.
Identify the claim. State the null State H
0
and H
a
. and alternative hypotheses.
3.
Specify the level of significance. Identify a
.
4.
Determine the critical value(s) Use Table 4 in Appendix B. and the rejection region(s).
5.
Determine the sample sizes. n
1
…
n
2
6.
Find the sum of the ranks for the R
smaller sample.
a.
List the combined data in ascending order.
b.
Rank the combined data.
c.
Add the sum of the ranks for the smaller sample, n
1
.
7.
Find the test statistic and sketch z
=
R
-
m
R
s
R
the sampling distribution.
8.
Make a decision to reject or fail If z
is in the rejection region,
to reject the null hypothesis. then reject H
0
. Otherwise, fail to reject H
0
.
9.
Interpret the decision in the context of the original claim.
GUIDELINES
Performing a Wilcoxon Rank Sum Test
The table shows the earnings (in thousands of dollars) of a random sample of 10 male and 12 female pharmaceutical sales representatives. At a
=
0.10, can you conclude that there is a difference between the males’ and females’ earnings?
Male earnings
78
93
114
101
98
94
86
95
117
99
Female earnings
86
77
101
93
85
98
91
87
84
97
100
90
SOLUTION
The claim is “there is a difference between the males’ and females’ earnings.” To test this claim, use the null and alternative hypotheses below.
H
0
: There is no difference between the males’ and the females’ earnings.
H
a
: There is a difference between the males’ and the females’ earnings. (Claim)
Because the test is a two-tailed test with a
=
0.10, the critical values are -
z
0
=
-
1.645 and z
0
=
1.645. The rejection regions are z
6 -
1.645 and z
7
1.645.
EXAMPLE 2
SECTION 11.2 The Wilcoxon Tests
599
The sample size for men is 10 and the sample size for women is 12. Because 10
6
12, n
1
=
10 and n
2
=
12. Before calculating the test statistic, you must find the values of R
, m
R
, and s
R
. The table shows the combined data listed in ascending order and the corresponding ranks.
Ordered data
Sample
Rank
77
F
1
78
M
2
84
F
3
85
F
4
86
M
5.5
86
F
5.5
87
F
7
90
F
8
91
F
9
93
M
10.5
93
F
10.5
94
M
12
95
M
13
97
F
14
98
M
15.5
98
F
15.5
99
M
17
100
F
18
101
M
19.5
101
F
19.5
114
M
21
117
M
22
Because the smaller sample is the sample of males, R
is the sum of the male rankings.
R
=
2
+
5.5
+
10.5
+
12
+
13
+
15.5
+
17
+
19.5
+
21
+
22
=
138
Using n
1
=
10 and n
2
=
12, you can find m
R
and s
R
as follows.
m
R
=
n
1
1
n
1
+
n
2
+
1
2
2
=
10
1
10
+
12
+
1
2
2
=
230
2
=
115
Study Tip
Remember that in the case of a tie between data entries, use the average of the corresponding ranks.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
600
CHAPTER 11 Nonparametric Tests
s
R
=
B
n
1
n
2
1
n
1
+
n
2
+
1
2
12
=
B
1
10
21
12
21
10
+
12
+
1
2
12
=
A
2760
12
=
2
230
≈
15.17
When R
=
138, m
R
=
115, and s
R
≈
15.17, the test statistic is
z
=
R
-
m
R
s
R
≈
138
-
115
15.17
≈
1.52.
The figure shows the location of the rejection regions and the test statistic z
.
Because z
is not in the rejection region, you fail to reject the null hypothesis.
z
≈
1.52
α
1 −
= 0.90
α
= 0.05
1
2
α
= 0.05
1
2
z
0
−
1
−
3
1
2
3
z
0
= 1.645
−
z
0
= −
1.645
Interpretation
There is not enough evidence at the 10% level of significance to conclude that there is a difference between the males’ and females’ earnings.
TRY IT YOURSELF 2
You are investigating the automobile insurance claims paid (in thousands of dollars) by two insurance companies. The table shows a random sample of 12 claims paid by the two insurance companies. At a
=
0.05, can you conclude that there is a difference in the claims paid by the companies?
Company A
6.2
10.6
2.5
4.5
6.5
7.4
Company B
7.3
5.6
3.4
1.8
2.2
4.7
Company A
9.9
3.0
5.8
3.9
6.0
6.3
Company B
10.8
4.1
1.7
3.0
4.4
5.3
Answer: Page T1
11.2 EXERCISES
SECTION 11.2 The Wilcoxon Tests
601
For Extra Help:
MyLab Statistics
Building Basic Skills and Vocabulary
1.
How do you know whether to use a Wilcoxon signed-rank test or a Wilcoxon rank sum test?
2.
What is the requirement for the sample size of each sample when using the Wilcoxon rank sum test?
Using and Interpreting Concepts
Performing a Wilcoxon Test
In Exercises 3– 8,
(a) identify the claim and state H
0
and H
a
.
(b) decide whether to use a Wilcoxon signed-rank test or a Wilcoxon rank sum test. (c) find the critical value(s).
(d) find the test statistic.
(e) decide whether to reject or fail to reject the null hypothesis.
(f ) interpret the decision in the context of the original claim.
3. Calcium Supplements and Blood Pressure
In a study testing the effects of calcium supplements on blood pressure in men, 12 men were randomly chosen and given a calcium supplement for 12 weeks. The table shows the measurements for each subject’s diastolic blood pressure taken before and after the 12-week treatment period. At a
=
0.01,
can you reject the claim that there was no reduction in diastolic blood pressure? (Adapted from The Journal of the American Medical Association)
Patient
1
2
3
4
5
6
Before treatment
108
109
120
129
112
111
After treatment
99
115
105
116
115
117
Patient
7
8
9
10
11
12
Before treatment
117
135
124
118
130
115
After treatment
108
122
120
126
128
106
4. Wholesale Trade and Manufacturing
A private industry analyst claims that there is no difference in the salaries earned by workers in the wholesale trade and manufacturing industries. The table shows the salaries (in thousands of dollars) of a random sample of 10 wholesale trade workers and 10 manufacturing workers. At a
=
0.10, can you reject the analyst’s claim? (Adapted from U.S. Bureau of Economic Analysis)
Wholesale trade
70
66
65
80
62
69
73
77
74
72
Manufacturing
71
67
56
74
54
65
76
58
64
52
602
CHAPTER 11 Nonparametric Tests
5. Earnings by Degree
A college administrator claims that there is a difference in the earnings of people with bachelor’s degrees and those with advanced degrees. The table shows the earnings (in thousands of dollars) of a random sample of 11 people with bachelor’s degrees and 10 people with advanced degrees. At a
=
0.05, is there enough evidence to support the administrator’s claim? (Adapted from U.S. Census Bureau)
Bachelor’s degree
62
58
71
84
78
58
52
64
68
60
62
Advanced degree
88
91
99
85
90
91
98
98
95
87
6. Headaches
A medical researcher wants to determine whether a new drug affects the number of headache hours experienced by headache sufferers. To do so, the researcher randomly selects seven patients and asks each to give the number of headache hours (per day) each experiences before and after taking the drug. The table shows the results. At a
=
0.05, can the researcher conclude that the new drug affects the number of headache hours?
Patient
1
2
3
4
5
6
7
Headache hours (before)
0.8
2.4
2.8
2.6
2.7
0.9
1.2
Headache hours (after)
1.6
1.3
1.6
1.4
1.5
1.6
1.7
7. Teacher Salaries
A teacher’s union representative claims that there is a difference in the salaries earned by teachers in Wisconsin and Michigan. The table shows the salaries (in thousands of dollars) of a random sample of 11 teachers from Wisconsin and 12 teachers from Michigan. At a
=
0.05, is there enough evidence to support the representative’s claim? (Adapted from National Education Association)
Wisconsin
55
59
49
56
51
61
55
61
53
47
52
Michigan
61
65
55
62
57
67
61
67
59
53
58
76
8. Heart Rate
A physician wants to determine whether an experimental medication affects an individual’s heart rate. The physician randomly selects 15 patients and measures the heart rate of each. The subjects then take the medication and have their heart rates measured after one hour. The table shows the results. At a
=
0.05, can the physician conclude that the experimental medication affects an individual’s heart rate?
Patient
1
2
3
4
5
6
7
8
Heart rate (before)
72
81
75
76
79
74
65
67
Heart rate (after)
73
80
75
79
74
76
73
67
Patient
9
10
11
12
13
14
15
Heart rate (before)
76
83
66
75
76
78
68
Heart rate (after)
74
77
70
77
76
75
74
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.2 The Wilcoxon Tests
603
Extending Concepts
Wilcoxon Signed-Rank Test for n
+
30
When you are performing a Wilcoxon signed-rank test and the sample size n is greater than 30, you can use the Standard Normal Table and the formula below to find the test statistic.
z
=
w
s
-
n
1
n
+
1
2
4
B
n
1
n
+
1
21
2
n
+
1
2
24
In Exercises 9 and 10, perform the Wilcoxon signed-rank test using the test statistic for n
7
30.
9. Fuel Additive
A petroleum engineer wants to know whether a certain fuel additive improves a car’s gas mileage. To decide, the engineer records the gas mileages (in miles per gallon) of 33 randomly selected cars with and without the fuel additive. The table shows the results. At a
=
0.10, can the engineer conclude that the gas mileage is improved?
Car
1
2
3
4
5
6
7
8
9
10
11
Without additive
36.4
36.4
36.6
36.6
36.8
36.9
37.0
37.1
37.2
37.2
36.7
With additive
36.7
36.9
37.0
37.5
38.0
38.1
38.4
38.7
38.8
38.9
36.3
Car
12
13
14
15
16
17
18
19
20
21
22
Without additive
37.5
37.6
37.8
37.9
37.9
38.1
38.4
40.2
40.5
40.9
35.0
With additive
38.9
39.0
39.1
39.4
39.4
39.5
39.8
40.0
40.0
40.1
36.3
Car
23
24
25
26
27
28
29
30
31
32
33
Without additive
32.7
33.6
34.2
35.1
35.2
35.3
35.5
35.9
36.0
36.1
37.2
With additive
32.8
34.2
34.7
34.9
34.9
35.3
35.9
36.4
36.6
36.6
38.3
10. Fuel Additive
A petroleum engineer claims that a fuel additive improves gas mileage. The table shows the gas mileages (in miles per gallon) of 32 randomly selected cars measured with and without the fuel additive. Test the petroleum engineer’s claim at a
=
0.05.
Car
1
2
3
4
5
6
7
8
9
10
11
Without additive
34.0
34.2
34.4
34.4
34.6
34.8
35.6
35.7
30.2
31.6
32.3
With additive
36.6
36.7
37.2
37.2
37.3
37.4
37.6
37.7
34.2
34.9
34.9
Car
12
13
14
15
16
17
18
19
20
21
22
Without additive
33.0
33.1
33.7
33.7
33.8
35.7
36.1
36.1
36.6
36.6
36.8
With additive
34.9
35.7
36.0
36.2
36.5
37.8
38.1
38.2
38.3
38.3
38.7
Car
23
24
25
26
27
28
29
30
31
32
Without additive
37.1
37.1
37.2
37.9
37.9
38.0
38.0
38.4
38.8
42.1
With additive
38.8
38.9
39.1
39.1
39.2
39.4
39.8
40.3
40.8
43.2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
604 CASE STUDY
College Ranks
604 CHAPTER 11 Nonparametric Tests
Each year, Forbes and the Center for College Affordability and Productivity (CCAP) release a list of the best colleges in the United States. Over 600 colleges and universities are ranked according to factors that fall into one of five categories.
1. Postgraduate success,
which is based on salary of alumni by school and the alumni who appear on CCAP’s America’s Leaders list
2. Student debt,
which is based on three components: average federal student loan debt load, student loan default rates, and predicted versus actual percent of students taking federal loans
3. Student satisfaction,
which is based on student retention rates and student evaluations of professors
4. Graduation rate,
which is based on how many students actually finish their degrees in four years and the actual versus predicted rate
5. Academic success,
which is based on students who have won competitive scholarships and fellowships, and students who have gone on to earn Ph.D.s
The table shows the student populations for randomly selected colleges by region on the 2016 list.
EXERCISES
1.
Construct a side-by-side box-and-whisker plot for the four regions. Do any of the median student populations appear to be the same? Do any appear to be different?
In Exercises 2 –5, use the sign test to test the claim. What can you conclude? Use a
=
0.05.
2.
The median student population at a college in the Northeast is less than or equal to 7000.
3.
The median student population at a college in the Midwest is greater than or equal to 8000.
4.
The median student population at a college in the South is 10,000.
5.
The median student population at a college in the West is different from 8000.
In Exercises 6 and 7, use the Wilcoxon rank sum test to test the claim. Use a
=
0.01.
6.
There is no difference between student populations for colleges in the Midwest and colleges in the West.
7.
There is a difference between student populations for colleges in the Northeast and colleges in the South.
Student populations
Northeast
Midwest
South
West
1,805
24,766
6,621
1,498
9,181
2,948
14,769
1,394
14,317
1,459
29,175
1,144
2,113
3,688
15,984
8,132
20,445
3,418
2,850
12,820
1,632
14,747
27,511
50,320
5,123
14,906
24,932
31,354
755
5,931
49,610
2,127
15,117
2,791
10,033
19,934
18,090
11,458
1,575
31,332
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The Kruskal-Wallis Test
11.3
SECTION 11.3 The Kruskal-Wallis Test
605
What You Should Learn
How to use the Kruskal-Wallis test to determine whether three or more samples were selected from populations having the same distribution
The Kruskal-Wallis Test
The Kruskal-Wallis Test
In Section 10.4, you learned how to use one-way ANOVA techniques to compare the means of three or more populations. When using one-way ANOVA, you should verify that each independent sample is selected from a population that is normally, or approximately normally, distributed. When you cannot verify that the populations are normal, you can still compare the distributions of three or more populations. To do so, you can use the Kruskal-Wallis test.
The Kruskal-Wallis test
is a nonparametric test that can be used to determine whether three or more independent samples were selected from populations having the same distribution.
DEFINITION
For a Kruskal-Wallis test, the null and alternative hypotheses are always similar to these statements.
H
0
: All of the populations have the same distribution.
H
a
: At least one population has a distribution that is different from the others.
The conditions for using the Kruskal-Wallis test are that the samples must be random and independent, and the size of each sample must be at least 5. If these conditions are met, then the sampling distribution for the Kruskal-Wallis test is approximated by a chi-square distribution with k
-
1 degrees of freedom, where k
is the number of samples. You can calculate the Kruskal-Wallis test statistic using the formula below.
For three or more independent samples, the test statistic
for the Kruskal-Wallis test is
H
=
12
N
1
N
+
1
2
a
R
2
1
n
1
+
R
2
2
n
2
+
c
+
R
2
k
n
k
b
-
3
1
N
+
1
2
where
k
is the number of samples,
n
i
is the size of the i
th sample,
N
is the sum of the sample sizes,
and
R
i
is the sum of the ranks of the i
th sample.
Test Statistic for the Kruskal-Wallis Test Performing a Kruskal-Wallis test consists of combining and ranking the sample data. The data are then separated according to sample and the sum of the ranks of each sample is calculated.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
606
CHAPTER 11 Nonparametric Tests
These sums are then used to calculate the test statistic H
, which is an approximation of the variance of the rank sums. When the samples are selected from populations having the same distribution, the sums of the ranks will be approximately equal, H
will be small, and you should fail to reject the null hypothesis.
When the samples are selected from populations not having the same distribution, the sums of the ranks will be quite different, H
will be large, and you should reject the null hypothesis.
Because you only reject the null hypothesis when H
is significantly large, the Kruskal-Wallis test is always a right-tailed test.
Performing a Kruskal-Wallis Test
In Words In Symbols
1.
Verify that the samples are random and independent, and each sample size is at least 5.
2.
Identify the claim. State the null State H
0
and H
a
. and alternative hypotheses.
3.
Specify the level of significance. Identify a
.
4.
Identify the degrees d.f.
=
k
-
1 of freedom.
5.
Determine the critical value Use Table 6 in Appendix B. and the rejection region.
6.
Find the sum of the ranks for each sample.
a.
List the combined data in ascending order.
b.
Rank the combined data.
7.
Find the test statistic and sketch H
=
12
N
1
N
+
1
2
#
a
R
2
1
n
1
+
R
2
2
n
2
+
c
+
R
2
k
n
k
b
-
3
1
N
+
1
2
the sampling distribution.
8.
Make a decision to reject or fail If H
is in the rejection region,
to reject the null hypothesis. then reject H
0
. Otherwise, fail to reject H
0
.
9.
Interpret the decision in the context of the original claim.
GUIDELINES
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.3 The Kruskal-Wallis Test
607
Performing a Kruskal-Wallis Test
You want to compare the numbers of crimes reported in three police precincts in a city. To do so, you randomly select 10 weeks for each precinct and record the numbers of crimes reported. The table shows the results. At a
=
0.01, can you conclude that the distribution of the numbers of crimes reported in at least one precinct is different from the others?
Number of crimes reported for the week
101st Precinct (Sample 1)
106th Precinct (Sample 2)
113th Precinct (Sample 3)
60
65
69
52
55
51
49
64
70
52
66
61
50
53
67
48
58
65
57
50
62
45
54
59
44
70
60
56
62
63
SOLUTION
You want to test the claim that the distribution of the numbers of crimes reported in at least one precinct is different from the others. The null and alternative hypotheses are as follows.
H
0
: The distribution of the numbers of crimes reported is the same in all three precincts.
H
a
: The distribution of the numbers of crimes reported in at least one precinct is different from the others. (Claim)
The test is a right-tailed test with a
=
0.01 and d.f.
=
k
-
1
=
3
-
1
=
2. From Table 6 in Appendix B, the critical value is x
2
0
=
9.210. The rejection region is x
2
7
9.210. Before calculating the test statistic, you must find the sum of the ranks for each sample. The table shows the combined data listed in ascending order and the corresponding ranks.
Ordered data
Sample
Rank
44
101st
1
45
101st
2
48
101st
3
49
101st
4
50
101st
5.5
50
106th
5.5
51
113th
7
52
101st
8.5
52
101st
8.5
53
106th
10
Ordered data
Sample
Rank
54
106th
11
55
106th
12
56
101st
13
57
101st
14
58
106th
15
59
113th
16
60
101st
17.5
60
113th
17.5
61
113th
19
62
106th
20.5
Ordered data
Sample
Rank
62
113th
20.5
63
113th
22
64
106th
23
65
106th
24.5
65
113th
24.5
66
106th
26
67
113th
27
69
113th
28
70
106th
29.5
70
113th
29.5
EXAMPLE 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
608
CHAPTER 11 Nonparametric Tests
The sum of the ranks for each sample is as follows.
R
1
=
1
+
2
+
3
+
4
+
5.5
+
8.5
+
8.5
+
13
+
14
+
17.5
=
77
R
2
=
5.5
+
10
+
11
+
12
+
15
+
20.5
+
23
+
24.5
+
26
+
29.5
=
177
R
3
=
7
+
16
+
17.5
+
19
+
20.5
+
22
+
24.5
+
27
+
28
+
29.5
=
211
Using these sums and the values n
1
=
10, n
2
=
10, n
3
=
10, and N
=
30, the test statistic is
H
=
12
30
1
30
+
1
2
a
77
2
10
+
177
2
10
+
211
2
10
b
-
3
1
30
+
1
2
≈
12.521.
The figure shows the location of the rejection region and the test statistic H
.
Because H
is in the rejection region, you reject the null hypothesis.
2
4
6
8
10
12
14
χ
2
H
≈
12.521
α
= 0.01
0
2
χ
= 9.210
Interpretation
There is enough evidence at the 1% level of significance to support the claim that the distribution of the numbers of crimes reported in at least one precinct is different from the others.
TRY IT YOURSELF 1
You want to compare the salaries of veterinarians who work in Texas, Florida, and California. To compare the salaries, you randomly select several veterinarians in each state and record their salaries. The table shows the salaries (in thousands of dollars). At a
=
0.05, can you conclude that the distribution of the veterinarians’ salaries in at least one state is different from the others? (Adapted from U.S. Bureau of Labor Statistics)
Sample salaries (in thousands of dollars)
TX (Sample 1)
FL (Sample 2)
CA (Sample 3)
85.3
143.3
111.3
149.9
135.9
83.4
97.9
121.6
126.8
91.0
80.4
146.1
89.6
116.6
154.0
147.7
106.7
160.2
63.3
84.7
57.6
74.8
95.0
113.2
118.7
105.3
131.0
101.1
Answer: Page T1
Picturing the World
The randomly collected data below were used to compare the water temperatures (in degrees Fahrenheit) of cities bordering the Gulf of Mexico. (Adapted from National Oceanographic Data Center)
Cedar Key, FL (Sample 1)
Eugene Island, LA (Sample 2)
Dauphin Island, AL (Sample 3)
62
51
63
69
55
51
77
57
54
59
63
60
60
74
75
75
82
80
83
85
70
65
60
78
79
64
82
86
76
84
82
83
86
At A
=
0.05, can you conclude that at least one temperature distribution is different from the others?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11.3 EXERCISES
SECTION 11.3 The Kruskal-Wallis Test
609
For Extra Help:
MyLab Statistics
Building Basic Skills and Vocabulary
1.
What are the conditions for using a Kruskal-Wallis test?
2.
Explain why the Kruskal-Wallis test is always a right-tailed test.
Using and Interpreting Concepts
Performing a Kruskal-Wallis Test
In Exercises 3 – 6, (a) identify the claim and state H
0
and H
a
,
(b) find the critical value and identify the rejection region, (c) find the test statistic H, (d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret the decision in the context of the original claim.
3. Home Insurance
The table shows the annual premiums for a random sample of home insurance policies in Connecticut, Massachusetts, and Virginia. At a
=
0.05, can you conclude that the distribution of the annual premiums in at least one state is different from the others? (Adapted from National Association of Insurance Commissioners)
State
Annual premium (in dollars)
Connecticut
1303
1098
1263
1413
1538
1179
1320
Massachusetts
1382
1302
1257
1572
1387
1166
1034
Virginia
1035
950
766
845
1132
838
755
4. Hourly Rates
A researcher wants to determine whether there is a difference in the hourly pay rates for registered nurses in Indiana, Kentucky, and Ohio. The researcher randomly selects several registered nurses in each state and records the hourly pay rate for each. The table shows the results. At a
=
0.05, can the researcher conclude that the distribution of the hourly pay rates of registered nurses in at least one state is different from the others? (Adapted from U.S. Bureau of Labor Statistics)
State
Hourly pay rate (in dollars)
Indiana
28.83
29.28
27.68
28.43
31.27
26.13
30.47
Kentucky
27.77
26.40
28.92
31.02
29.37
32.42
25.42
Ohio
27.84
32.24
33.64
33.91
27.34
29.89
5. Annual Salaries
The table shows the annual salaries for a random sample of private industry workers in Kentucky, North Carolina, South Carolina, and West Virginia. At a
=
0.10, can you conclude that the distribution of the annual salaries of private industry workers in at least one state is different from the others? (Adapted from U.S. Bureau of Labor Statistics)
State
Annual salary (in thousands of dollars)
Kentucky
39.9
41.6
50.5
62.1
38.3
32.9
39.9
North Carolina
48.8
47.2
41.9
59.6
40.8
44.9
48.8
South Carolina
35.4
43.0
49.1
48.5
40.3
41.7
35.4
West Virginia
34.8
45.9
36.6
45.1
50.3
38.1
34.8
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
610
CHAPTER 11 Nonparametric Tests
6. Caffeine Content
The table shows the amounts of caffeine (in milligrams) in 16-ounce servings for a random sample of beverages. At a
=
0.01, can you conclude that the distribution of the amounts of caffeine in at least one beverage is different from the others? (Adapted from Center for Science in the Public Interest)
Beverage
Amount of caffeine in 16-ounce serving (in milligrams)
Coffees
320
300
206
150
266
Soft drinks
95
96
56
51
71
72
47
Energy drinks
200
141
160
152
154
166
Teas
100
106
42
15
32
10
Extending Concepts
Comparing Two Tests
In Exercises 7 and 8,
(a) perform a Kruskal-Wallis test.
(b) perform a one-way ANOVA test, assuming that each population is normally distributed and the population variances are equal.
(c) Compare the results.
7. Hospital Patient Stays
An insurance underwriter claims that the number of days patients spend in the hospital is different in at least one region of the United States. The table shows the numbers of days randomly selected patients spent in the hospital in four U.S. regions. At a
=
0.01, can you support the underwriter’s claim? (Adapted from U.S. National Center for Health Statistics)
Region
Number of days
Northeast
8
6
6
3
5
11
3
8
1
6
Midwest
5
4
3
9
1
4
6
3
4
7
South
5
8
1
5
8
7
5
1
West
2
3
6
6
5
4
3
6
5
8. Energy Consumption
The table shows the energy consumed (in millions of Btu) in one year for a random sample of households from four U.S. regions. At a
=
0.01, can you conclude that the energy consumed is different in at least one region? (Adapted from U.S. Energy Information Administration)
Region
Energy consumed (in millions of Btu)
Northeast
61
95
140
127
93
97
84
123
89
163
Midwest
59
158
169
140
95
187
123
104
88
37
72
South
86
35
67
86
142
69
65
62
West
81
39
85
35
113
46
125
70
77
63
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Rank Correlation
11.4
SECTION 11.4 Rank Correlation
611
What You Should Learn
How to use the Spearman rank correlation coefficient to determine whether the correlation between two variables is significant
The Spearman Rank Correlation Coefficient
The Spearman Rank Correlation Coefficient
In Section 9.1, you learned how to measure the strength of the relationship between two variables using the Pearson correlation coefficient r
. Two requirements for the Pearson correlation coefficient are that the variables are linearly related and that the variables have a bivariate normal distribution. When these requirements cannot be met, you can examine the relationship between two variables using the nonparametric equivalent to the Pearson correlation coefficient—the Spearman rank correlation coefficient.
The Spearman rank correlation coefficient has several advantages over the Pearson correlation coefficient. For instance, the Spearman rank correlation coefficient can be used to describe the relationship between linear or nonlinear data. The Spearman rank correlation coefficient can be used for data at the ordinal level. And, the Spearman rank correlation coefficient is easier to calculate by hand than the Pearson correlation coefficient.
The Spearman rank correlation coefficient r
s
is a measure of the strength of the relationship between two variables. The Spearman rank correlation coefficient is calculated using the ranks of paired sample data entries. If there are no ties in the ranks of either variable, then the formula for the Spearman rank correlation coefficient is
r
s
=
1
-
6
Σ
d
2
n
(
n
2
-
1
)
where n
is the number of paired data entries and d
is the difference between the ranks of a paired data entry. If there are ties in the ranks and the number of ties is small relative to the number of data pairs, then the formula can still be used to approximate r
s
.
DEFINITION
The values of r
s
range from -
1 to 1, inclusive. When the ranks of corresponding data pairs are exactly identical, r
s
is equal to 1. When the ranks are in “reverse” order, r
s
is equal to -
1. When the ranks of corresponding data pairs have no relationship, r
s
is equal to 0.
After calculating the Spearman rank correlation coefficient, you can determine whether the correlation between the variables is significant. You can make this determination by performing a hypothesis test for the population correlation coefficient r
s
. The null and alternative hypotheses for this test are listed below.
H
0
: r
s
=
0 (There is no correlation between the variables.)
H
a
: r
s
≠
0 (There is a significant correlation between the variables.)
Table 10 in Appendix B lists the critical values for the Spearman rank correlation coefficient for selected levels of significance and sample sizes. The test statistic for the hypothesis test is the Spearman rank correlation coefficient r
s
.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
612
CHAPTER 11 Nonparametric Tests
Testing the Significance of the Spearman Rank Correlation Coefficient
In Words In Symbols
1.
Identify the claim. State the null State H
0
and H
a
. and alternative hypotheses.
2.
Specify the level of significance. Identify a
.
3.
Determine the critical value. Use Table 10 in Appendix B.
4.
Find the test statistic. r
s
=
1
-
6
Σ
d
2
n
(
n
2
-
1
)
5.
Make a decision to reject or fail If 0
r
s
0
is greater than the to reject the null hypothesis. critical value, then reject H
0
. Otherwise, fail to reject H
0
. 6.
Interpret the decision in the context of the original claim.
GUIDELINES
The Spearman Rank Correlation Coefficient
The table shows the school enrollments of males and females for a random sample of 10 colleges. At a
=
0.05, can you conclude that there is a significant correlation between the number of males and the number of females enrolled at a college?
Male
Female
1786
2182
4246
4415
1419
1537
1188
1236
2394
2182
1079
919
4049
4209
3595
3741
1102
1086
1345
1282
SOLUTION
The claim is “there is a significant correlation between the number of males and the number of females enrolled at a college.” The null and alternative hypotheses are listed below.
H
0
: r
s
=
0 (There is no correlation between the number of males and the number of females enrolled at a college.)
H
a
: r
s
≠
0 (There is a significant correlation between the number of males and the number of females enrolled at a college.) (Claim)
EXAMPLE 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.4 Rank Correlation
613
Each data set has 10 entries. Because a
=
0.05 and n
=
10, the critical value is 0.648. Before calculating the test statistic, you must find Σ
d
2
, the sum of the squares of the differences of the ranks of the data sets. You can use a table to calculate Σ
d
2
, as shown below.
Male
Rank
Female
Rank
d
d
2
1786
6
2182
6.5
-
0.5
0.25
4246
10
4415
10
0
0
1419
5
1537
5
0
0
1188
3
1236
3
0
0
2394
7
2182
6.5
0.5
0.25
1079
1
919
1
0
0
4049
9
4209
9
0
0
3595
8
3741
8
0
0
1102
2
1086
2
0
0
1345
4
1282
4
0
0
Σ
d
2
=
0.5
When n
=
10 and Σ
d
2
=
0.5, the test statistic is
r
s
=
1
-
6
Σ
d
2
n
(
n
2
-
1
)
=
1
-
6
1
0.5
2
10
(
10
2
-
1
)
≈
0.997.
Because 0
r
s
0
≈
0.997
7
0.648, you reject the null hypothesis.
Interpretation
There is enough evidence at the 5% level of significance to conclude that there is a significant correlation between the number of males and the number of females enrolled at a college.
TRY IT YOURSELF 1
The table shows the prices (in dollars per bushel) received for oat and wheat for a random sample of seven U.S. farmers. At a
=
0.10, can you conclude that there is a significant correlation between the oat and wheat prices? (Adapted from U.S. Department of Agriculture)
Oat
Wheat
1.84
3.67
1.97
3.49
2.03
3.68
2.25
3.88
2.35
3.91
2.31
4.02
2.40
4.15
Answer: Page T1
Picturing the World
The table shows the retail prices (in dollars per pound) for ground beef and fresh whole chicken for a random sample of nine U.S. grocery stores. (Adapted from U.S. Bureau of Labor Statistics)
Beef
Chicken
3.69
1.44
3.66
1.42
3.65
1.48
3.68
1.50
3.60
1.47
3.55
1.46
3.55
1.41
3.56
1.47
3.59
1.46
Does a significant correlation exist between ground beef and chicken prices in U.S. grocery stores? Use A
=
0.10.
Study Tip
Remember that in the case of a tie between data entries, use the average of the corresponding ranks.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11.4 EXERCISES
614
CHAPTER 11 Nonparametric Tests
For Extra Help:
MyLab Statistics
Building Basic Skills and Vocabulary
1.
What are some advantages of the Spearman rank correlation coefficient over the Pearson correlation coefficient?
2.
Describe the ranges of the Spearman rank correlation coefficient and the Pearson correlation coefficient.
3.
What does it mean when r
s
is equal to 1? What does it mean when r
s
is equal to -
1? What does it mean when r
s
is equal to 0?
4.
Explain, in your own words, what r
s
and r
s
represent in Example 1.
Using and Interpreting Concepts
Testing a Claim
In Exercises 5 – 8, (a) identify the claim and state H
0
and H
a
,
(b) find the critical value, (c) find the test statistic r
s
, (d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret the decision in the context of the original claim.
5. Farming Expenses
In an agricultural report, a commodities analyst claims that there is a significant correlation between purchased seed expenses and fertilizer and lime expenses in the farming business. The table shows the total purchased seed expenses and fertilizer and lime expenses for farms in eight randomly selected states for a recent year. At a
=
0.05, is there enough evidence to support the analyst’s claim? (Source: U.S. Department of Agriculture)
State
Purchased seed expenses (in millions of dollars)
Fertilizer and lime expenses (in millions of dollars)
Arkansas
490
480
California
1530
2060
Florida
490
480
Kentucky
266
402
Michigan
741
642
North Carolina
380
470
Ohio
879
858
Washington
360
560
6. Exercise Machines
The table shows the overall scores and the prices for a random sample of nine different models of elliptical exercise machines. The overall score represents the ergonomics, exercise range, ease of use, construction, heart-rate monitoring, and safety. At a
=
0.05, can you conclude that there is a significant correlation between the overall score and the price? (Source: Consumer Reports)
Overall score
77
75
73
71
Price (in dollars)
3700
1700
1300
900
Overall score
66
66
64
62
58
Price (in dollars)
1000
1400
1800
1000
700
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.4 Rank Correlation
615
7. Crop Prices
The table shows the prices (in dollars per bushel) received for barley and corn for a random sample of nine U.S. farmers. At a
=
0.05, can you conclude that there is a significant correlation between the barley and corn prices? (Adapted from U.S. Department of Agriculture)
Barley
4.89
4.52
4.85
4.97
5.12
4.91
5.08
4.98
4.87
Corn
3.21
3.22
3.29
3.23
3.33
3.40
3.44
3.49
3.43
8.
Vacuum Cleaners
The table shows the overall scores and the prices for a random sample of 12 different models of vacuum cleaners. The overall score represents cleaning, airflow, handling, noise, and emissions. At a
=
0.10, can you conclude that there is a significant correlation between the overall score and the price? (Source: Consumer Reports)
Overall score
65
71
69
47
55
38
Price (in dollars)
150
200
550
350
470
90
Overall score
47
47
47
57
34
65
Price (in dollars)
80
130
210
190
300
260
Test Scores and GNI
In Exercises 9 –12, use the table below. The table shows the average achievement scores of 15-year-olds in science and mathematics along with the gross national incomes (GNI) of nine randomly selected countries for a recent year. (The GNI is a measure of the total value of goods and services produced by the economy of a country.) (Source: Organization for Economic Cooperation and Development; The World Bank)
Country
Science average
Mathematics average
GNI (in billions of dollars)
Canada
528
516
1,529
France
495
493
2,458
Germany
509
506
3,437
Italy
481
490
1,815
Japan
538
532
4,549
Mexico
416
408
1,143
Spain
493
486
1,192
Sweden
493
494
503
United States
496
470
18,496
9. Science and GNI
At a
=
0.10, can you conclude that there is a significant correlation between science achievement scores and GNI?
10. Math and GNI
At a
=
0.10, can you conclude that there is a significant correlation between mathematics achievement scores and GNI?
11. Science and Math
At a
=
0.10, can you conclude that there is a significant correlation between science and mathematics achievement scores?
12. Writing a Summary
Use the results from Exercises 9 –11 to write a summary about the correlation (or lack of correlation) between test scores and GNI.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
616
CHAPTER 11 Nonparametric Tests
Extending Concepts
Testing the Spearman Rank Correlation Coefficient for n
+
30
When you are testing the significance of the Spearman rank correlation coefficient and the sample size n is greater than 30, you can use the expression below to find the critical value.
{
z
2
n
-
1
,
z corresponds to the level of significance
In Exercises 13 and 14, test the Spearman rank correlation coefficient.
13. Work Injuries
The table shows the average hours worked per week and the numbers of on-the-job injuries for a random sample of U.S. companies in a recent year. At a
=
0.10, can you conclude that there is a significant correlation between average hours worked and the number of on-the-job injuries?
Hours worked
46
43
41
40
41
42
45
45
42
45
44
44
Injuries
22
25
18
17
20
22
28
29
24
26
26
25
Hours worked
45
46
47
47
46
46
49
50
50
42
41
42
Injuries
27
29
29
30
29
29
30
30
30
23
22
23
Hours worked
41
41
41
41
40
39
38
39
39
Injuries
21
19
18
18
17
16
16
16
16
14. Work Injuries in Construction
The table shows the average hours worked per week and the numbers of on-the-job injuries for a random sample of U.S. construction companies in a recent year. At a
=
0.05, can you conclude that there is a significant correlation between average hours worked and the number of on-the-job injuries?
Hours worked
38
38
37
38
38
40
39
39
39
40
39
41
Injuries
11
11
9
10
10
17
15
14
14
16
15
17
Hours worked
41
42
41
41
41
42
42
42
42
41
41
39
Injuries
17
21
18
18
18
22
21
19
21
18
17
12
Hours worked
38
38
39
39
36
37
36
37
37
37
37
Injuries
12
11
13
12
6
6
6
6
7
8
7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The Runs Test
11.5
SECTION 11.5 The Runs Test
617
What You Should Learn
How to use the runs test to determine whether a data set is random
The Runs Test for Randomness
The Runs Test for Randomness
In obtaining a sample of data, it is important for the data to be selected randomly. But how do you know whether the sample data are truly random? One way to test for randomness in a data set is to use a runs test for randomness.
Before using a runs test for randomness, you must first know how to determine the number of runs in a data set.
A run
is a sequence of data having the same characteristic. Each run is preceded by and followed by data with a different characteristic or by no data at all. The number of data in a run is called the length
of the run.
DEFINITION
Finding the Number of Runs
A liquid-dispensing machine has been designed to fill one-liter bottles. A quality control inspector decides whether each bottle is filled to an acceptable level and passes inspection 1
P
2
or fails inspection 1
F
2
. Determine the number of runs for each sequence and find the length of each run.
1.
P P P P P P P P F F F F F F F F
2.
P F P F P F P F P F P F P F P F
3.
P P F F F F P F F F P P P P P P
SOLUTION
1.
There are two runs. The first 8 P
’s form a run of length 8 and the first 8 F ’s form another run of length 8, as shown below.
P P P P P P P P F F F F F F F F
1st run 2nd run
2.
There are 16 runs each of length 1, as shown below.
P F P F P F P F P F P F P F P F
1st run 2nd run… …16th run
3.
There are 5 runs, the first of length 2, the second of length 4, the third of length 1, the fourth of length 3, and the fifth of length 6, as shown below.
P P F F F F P F F F P P P P P P
1st run 2nd run 3rd run 4th run 5th run
EXAMPLE 1
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
618
CHAPTER 11 Nonparametric Tests
TRY IT YOURSELF 1
A machine produces engine parts. An inspector measures the diameter of each engine part and determines whether the part passes inspection 1
P
2
or fails inspection 1
F
2
. The results are shown below. Determine the number of runs in the sequence and find the length of each run.
P P P F P F P P P P F F P F P P F F F P P P F P P P
Answer: Page T1
When each value in a set of data can be categorized into one of two separate categories, you can use the runs test for randomness to determine whether the data are random.
The runs test for randomness
is a nonparametric test that can be used to determine whether a sequence of sample data is random.
DEFINITION
The runs test for randomness considers the number of runs in a sequence of sample data in order to test whether a sequence is random. When a sequence has too few or too many runs, it is usually not random. For instance, the sequence
P P P P P P P P F F F F F F F F
from Example 1, part 1, has too few runs (only 2 runs). The sequence
P F P F P F P F P F P F P F P F
from Example 1, part 2, has too many runs (16 runs). So, these sample data are probably not random.
You can use a hypothesis test to determine whether the number of runs in a sequence of sample data is too high or too low. The runs test is a two-tailed test, and the null and alternative hypotheses are listed below.
H
0
: The sequence of data is random.
H
a
: The sequence of data is not random.
When using the runs test, let n
1
represent the number of data that have one characteristic and let n
2
represent the number of data that have the second characteristic. It does not matter which characteristic you choose to be represented by n
1
. Let G
represent the number of runs.
n
1
=
number of data with one characteristic
n
2
=
number of data with the other characteristic
G
=
number of runs
Table 12 in Appendix B lists the critical values for the runs test for selected values of n
1
and n
2
at the a
=
0.05 level of significance. (In this text, you will use only the a
=
0.05 level of significance when performing runs tests.) When n
1
or n
2
is greater than 20, you can use the standard normal distribution to find the critical values.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.5 The Runs Test
619
You can calculate the test statistic for the runs test as follows.
When n
1
…
20 and n
2
…
20, the test statistic
for the runs test is G
, the number of runs.
When n
1
7
20 or n
2
7
20, the test statistic
for the runs test is
z
=
G
-
m
G
s
G
where
m
G
=
2
n
1
n
2
n
1
+
n
2
+
1 and s
G
=
B
2
n
1
n
2
1
2
n
1
n
2
-
n
1
-
n
2
2
1
n
1
+
n
2
2
2
1
n
1
+
n
2
-
1
2
.
Test Statistic for the Runs Test
Performing a Runs Test for Randomness
In Words In Symbols
1.
Identify the claim. State the null State H
0
and H
a
. and alternative hypotheses.
2.
Specify the level of significance. Identify a
. (Use a
=
0.05 for the runs test.)
3.
Determine the number of data that Determine n
1
, n
2
, and G
. have each characteristic and the number of runs.
4.
Determine the critical values. When n
1
…
20 and n
2
…
20, use Table 12 in Appendix B.
When n
1
7
20 or n
2
7
20, use Table 4 in Appendix B.
5.
Find the test statistic. When n
1
…
20 and n
2
…
20, use G
.
When n
1
7
20 or n
2
7
20, use
z
=
G
-
m
G
s
G
6.
Make a decision to reject or fail If G
is less than or equal to
to reject the null hypothesis. the lower critical value or greater than or equal to the upper critical value, then reject H
0
. Otherwise, fail to reject H
0
.
Or, if z
is in the rejection region, then reject H
0
. Otherwise, fail to reject H
0
.
7.
Interpret the decision in the context of the original claim.
GUIDELINES
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
620
CHAPTER 11 Nonparametric Tests
Using the Runs Test
As people enter a concert, an usher records where they are sitting. The results for 13 people are shown, where L
represents a lawn seat and P
represents a pavilion seat. At a
=
0.05, can you conclude that the sequence of seat locations is not random?
L L L P P L P P P L L P L
SOLUTION
The claim is “the sequence of seat locations is not random.” To test this claim, use the null and alternative hypotheses below.
H
0
: The sequence of seat locations is random.
H
a
: The sequence of seat locations is not random. (Claim)
To find the critical values, first determine n
1
, the number of L
’s; n
2
the number of P
’s; and G
, the number of runs.
L L L P P L P P P L L P L
1st 2nd 3rd 4th 5th 6th 7th run run run run run run run
n
1
=
number of L
’s
=
7
n
2
=
number of P
’s
=
6
G
=
number of runs
=
7
Because n
1
…
20, n
2
…
20, and a
=
0.05, use Table 12 to find the lower critical value 3 and the upper critical value 12. The test statistic is the number of runs G
=
7. Because the test statistic G
is between the critical values 3 and 12, you fail to reject the null hypothesis.
Interpretation
There is not enough evidence at the 5% level of significance to support the claim that the sequence of seat locations is not random. So, it appears that the sequence of seat locations is random.
TRY IT YOURSELF 2
The genders of 15 students as they enter a classroom are shown below, where F
represents a female and M
represents a male. At a
=
0.05, can you conclude that the sequence of genders is not random?
M F F F M M F F M F M M F F F
Answer: Page T1
EXAMPLE 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SECTION 11.5 The Runs Test
621
Using the Runs Test
You want to determine whether the selection of recently hired employees in a large company is random with respect to gender. The genders of 36 recently hired employees are shown below, where F
represents a female and M
represents a male. At a
=
0.05, can you conclude that the sequence of employees is not random?
M M F F F F M M M M M M F F F F F M M M M M M M F F F M M M M F M M F M
SOLUTION
The claim is “the sequence of employees is not random.” To test this claim, use the null and alternative hypotheses below.
H
0
: The sequence of employees is random.
H
a
: The sequence of employees is not random. (Claim)
To find the critical values, first determine n
1
, the number of F ’s; n
2
, the number of M
’s; and G
, the number of runs.
M M F F F F M M M M M M
1st run 2nd run 3rd run
F F F F F M M M M M M M
4th run 5th run
F F F M M M M F M M F M
6th 7th 8th 9th 10th 11th run run run run run run
n
1
=
number of F ’s
=
14
n
2
=
number of M
’s
=
22
G
=
number of runs
=
11
Because n
2
7
20, use Table 4 in Appendix B to find the critical values. Because the test is a two-tailed test with a
=
0.05, the critical values are
-
z
0
=
-
1.96.
and
z
0
=
1.96.
Before calculating the test statistic, find the values of m
G
and s
G
, as follows.
m
G
=
2
n
1
n
2
n
1
+
n
2
+
1
=
2
1
14
21
22
2
14
+
22
+
1
=
616
36
+
1
≈
18.11
EXAMPLE 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
622
CHAPTER 11 Nonparametric Tests
s
G
=
B
2
n
1
n
2
1
2
n
1
n
2
-
n
1
-
n
2
2
1
n
1
+
n
2
2
2
1
n
1
+
n
2
-
1
2
=
B
2
1
14
21
22
2
[2
1
14
21
22
2
-
14
-
22]
1
14
+
22
2
2
1
14
+
22
-
1
2
≈
2.81
You can find the test statistic as follows.
z
=
G
-
m
G
s
G
≈
11
-
18.11
2.81
≈
-
2.53
The figure shows the location of the rejection regions and the test statistic z
.
Because z
is in the rejection region, you reject the null hypothesis.
z
−
3
−
2
−
1
0
1
2
3
−
z
0
= −
1.96
z
≈
−
2.53
z
0
= 1.96
α
= 0.025
1
2
α
α
= 0.025
1
2
1 −
= 0.95
Interpretation
There is enough evidence at the 5% level of significance to support the claim that the sequence of employees with respect to gender is not random.
TRY IT YOURSELF 3
Let S
represent a day in a small town in which it snowed and let N
represent a day in the same town in which it did not snow. The snowfall results for the entire month of January are shown below. At a
=
0.05, can you conclude that the sequence is not random?
N N N S S N N S N S N N N N N S N S N S N N S N S S N N N N N
Answer: Page T1
When n
1
or n
2
is greater than 20, you can also use a P
@
value to perform a hypothesis test for the randomness of the data. In Example 3, you can calculate the P
@
value to be 0.0114. Because P
6
a
, you reject the null hypothesis.
Picturing the World
The sequence shows the National Football League conference of each winning team for the first 51 Super Bowls, where A
represents the American Football Conference and N
represents the National Football Conference. (Source: National Football League)
N N A A A N A A A A A N A A A N N A N N N N N N N N N N N N N A A N A A N A A A A N A N N N A N A A A
At A
=
0.05, can you conclude that the sequence of conferences of Super Bowl winning teams is random?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11.5 EXERCISES
SECTION 11.5 The Runs Test
623
For Extra Help:
MyLab Statistics
Building Basic Skills and Vocabulary
1.
In your own words, explain why the hypothesis test discussed in this section is called the runs test.
2.
Describe the test statistic for the runs test when the sample sizes n
1
and n
2
are less than or equal to 20 and when either n
1
or n
2
is greater than 20.
Using and Interpreting Concepts
Finding the Number of Runs
In Exercises 3 – 6, determine the number of runs in the sequence. Then find the length of each run.
3.
T F T F T T T F F F T F 4.
U U D D U D U U D D U D U U
5.
M F M F M F F F F F F M M M F F M M M M
6.
A A A B B B A B B A A A A A A B A A B A B B
7.
Find the values of n
1
and n
2
in Exercise 3.
8.
Find the values of n
1
and n
2
in Exercise 4.
9.
Find the values of n
1
and n
2
in Exercise 5.
10.
Find the values of n
1
and n
2
in Exercise 6.
Finding Critical Values
In Exercises 11–14, use the sequence and Table 12 in Appendix B to determine the number of runs that are considered too high and the number of runs that are considered too low for the data to be in random order.
11.
T F T F T F T F T F T F
12.
M F M M M M M M F F M M
13.
N S S S N N N N N S N S N S S N N N
14.
X X X X X X X Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Performing a Runs Test
In Exercises 15 – 20, (a) identify the claim and state H
0
and H
a
,
(b) find the critical values, (c) find the test statistic, (d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret the decision in the context of the original claim. Use a
=
0.05.
15. Coin Toss
A coach records the results of the coin toss at the beginning of each football game for a season. The results are shown, where H
represents heads and T
represents tails. The coach claimed the tosses were not random. Test the coach’s claim.
H T T T H T H H T T T T H T H H
16. Senate
The sequence shows the majority party of the U.S. Senate after each election for a recent group of years, where R
represents the Republican party and D
represents the Democratic party. Can you conclude that the sequence is not random? (Source: U.S. Senate)
R D D D R R R R R R R D D D D D D D R D D R D D D D D D D D D D D D D R R R D D D D R R R D R R D D D D R R
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
624
CHAPTER 11 Nonparametric Tests
17. Baseball
The sequence shows the Major League Baseball league of each World Series winning team from 1969 to 2016, where N
represents the National League and A
represents the American League. Can you conclude that the sequence of leagues of World Series winning teams is not random? (Source: Major League Baseball)
N A N A A A N N A A N N N N A A A N A N A N A A A N A N A A A N A N A A N A N A N N N A N A N
18. Number Generator
A number generator outputs the sequence of digits shown, where O
represents an odd digit and E
represents an even digit. Test the claim that the digits were not randomly generated. O O O E E E E O O O O O E E E E O O E E E E O O O O E E E E O O
19. Dog Identifications
A team of veterinarians record, in order, the genders of every dog that is microchipped at their pet hospital in one month. The genders of recently microchipped dogs are shown, where F
represents a female and M
represents a male. A veterinarian claims that the microchips are random by gender. Do you have enough evidence to reject the doctor’s claim?
M M F M F F F F F M M M F F F M F F F F F M F F F M F F F
20. Golf Tournament
A golf tournament official records whether each past winner is American-born (
A
) or foreign-born (
F ). The results are shown for every year the tournament has existed. Can you conclude that the sequence is not random?
F F A F F A F F A F F A F F A F F A F F F F F F A F F A F F A F F A F F A F A F F A F F F F F A F F F F F A F F F A Extending Concepts
Runs Test with Quantitative Data
In Exercises 21–23, use the following information to perform a runs test. You can also use the runs test for randomness with quantitative data. First, calculate the median. Then assign a +
sign to those values above the median and a -
sign to those values below the median. Ignore any values that are equal to the median. Use a
=
0.05.
21. Daily High Temperatures
The sequence shows the daily high temperatures (in degrees Fahrenheit) for a city during the month of July. Test the claim that the daily high temperatures do not occur randomly.
84 87 92 93 95 84 82 83 81 87 92 98 99 93 84 85 86 92 91 95 84 92 83 81 87 92 98 89 93 84 85
22. Exam Scores
The sequence shows the exam scores of a class based on the order in which the students finished the test. Test the claim that the scores occur randomly.
83 94 80 76 92 89 65 75 82 87 90 91 81 99 97 72 72 89 90 92 87 76 74 66 88 81 90 92 89 76 80
23.
Use technology to generate a sequence of 30 numbers from 1 to 99, inclusive. Test the claim that the sequence of numbers is not random.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
USES AND
ABUSES Statistics in the Real World
Uses and Abuses
625
EXERCISES
1.
Insufficient Evidence
Give an example of a nonparametric test in which there is not enough evidence to reject the null hypothesis.
2.
Using an Inappropriate Test
Discuss the nonparametric tests described in this chapter and match each test with its parametric counterpart, which you studied in earlier chapters.
Uses
Nonparametric Tests
Before you could perform many of the hypothesis tests you learned about in previous chapters, you had to ensure that certain conditions about the population were satisfied. For instance, before you could perform a t
@
test, you had to verify that the population was normally distributed or the sample size was at least 30. One advantage of the nonparametric tests shown in this chapter is that they are distribution free. That is, they do not require any particular information about the population or populations being tested. Another advantage of nonparametric tests is that they are easier to perform than their parametric counterparts. This means that they are easier to understand and quicker to use. Nonparametric tests can often be used when data are at the nominal or ordinal level.
Abuses
Insufficient Evidence
Stronger evidence is needed to reject a null hypothesis in a nonparametric test than in a corresponding parametric test. That is, when you are trying to support a claim represented by the alternative hypothesis, you might need a larger sample when performing a nonparametric test. When the outcome of a nonparametric test results in failure to reject the null hypothesis, you should investigate the sample size used. It may be that a larger sample will produce different results.
Using an Inappropriate Test
In general, when information about the population (such as the condition of normality) is known, it is more efficient to use a parametric test. When information about the population is not known, however, nonparametric tests can be helpful.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
626
CHAPTER 11 Nonparametric Tests
Chapter Summary
11
Example(s)
Review Exercises
What Did You Learn?
Section 11.1
How to use the sign test to test a population median z
=
1
x
+
0.5
2
-
0.5
n
1
n
2
1, 2
1– 3, 6
How to use the paired-sample sign test to test the difference between two population medians (dependent samples)
3
4, 5
Section 11.2
How to use the Wilcoxon signed-rank test and the Wilcoxon rank sum test to determine whether two samples are selected from populations having the same distribution
z
=
R
-
m
R
s
R
, m
R
=
n
1
1
n
1
+
n
2
+
1
2
2
, s
R
=
B
n
1
n
2
1
n
1
+
n
2
+
1
2
12
1, 2
7, 8
Section 11.3
How to use the Kruskal-Wallis test to determine whether three or more samples were selected from populations having the same distribution
H
=
12
N
1
N
+
1
2
a
R
2
1
n
1
+
R
2
2
n
2
+
c
+
R
2
k
n
k
b
-
3
1
N
+
1
2
1
9, 10
Section 11.4
How to use the Spearman rank correlation coefficient to determine whether the correlation between two variables is significant
r
s
=
1
-
6
Σ
d
2
n
(
n
2
-
1
)
1
11, 12
Section 11.5
How to use the runs test to determine whether a data set is random
G
=
number of runs, z
=
G
-
m
G
s
G
, m
G
=
2
n
1
n
2
n
1
+
n
2
+
1,
s
G
=
B
2
n
1
n
2
1
2
n
1
n
2
-
n
1
-
n
2
2
1
n
1
+
n
2
2
2
1
n
1
+
n
2
-
1
2
1– 3
13, 14
The table summarizes parametric and nonparametric tests. Always use the parametric test when the conditions for that test are satisfied.
Test application
Parametric test
Nonparametric test
One-sample tests
z
@
test for a population mean t
@
test for a population mean
Sign test for a population median
Two-sample tests
Dependent samples Independent samples
t
@
test for the difference between means z
@
test for the difference between means
t
@
test for the difference between means
Paired-sample sign test
Wilcoxon signed-rank test
Wilcoxon rank sum test
Tests involving three or more samples
One-way ANOVA
Kruskal-Wallis test
Correlation
Pearson correlation coefficient
Spearman rank correlation coefficient
Randomness
(No parametric test)
Runs test
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Review Exercises
627
Review Exercises
11
Section 11.1
In Exercises 1– 6, use a sign test to test the claim by doing the following.
(a) Identify the claim and state H
0
and H
a
.
(b) Find the critical value.
(c) Find the test statistic.
(d) Decide whether to reject or fail to reject the null hypothesis.
(e) Interpret the decision in the context of the original claim.
1.
A store manager claims that the median number of customers per day is no more than 650. The numbers of customers per day for 17 randomly selected days are listed below. At a
=
0.01, can you reject the manager’s claim?
675 665 601 642 554 653 639 650 645 550 677 569 650 660 682 689 590
2.
A company claims that the median credit score for U.S. adults is at least 710. The credit scores for 13 randomly selected U.S. adults are listed below. At a
=
0.05, can you reject the company’s claim? (Adapted from Fair Isaac Corporation)
750 782 805 695 700 706 625 589 690 772 745 704 710
3.
A government agency claims that the median sentence length for all federal prisoners is 2 years. In a random sample of 180 federal prisoners, 65 have sentence lengths that are less than 2 years, 109 have sentence lengths that are more than 2 years, and 6 have sentence lengths that are 2 years. At a
=
0.10, can you reject the agency’s claim? (Adapted from U.S. Sentencing Commission)
4.
In a study testing the effects of calcium supplements on blood pressure in men, 10 randomly selected men were given a calcium supplement for 12 weeks. The table shows the measurements for each subject’s diastolic blood pressure taken before and after the 12-week treatment period. At a
=
0.05, can you reject the claim that there was no reduction in diastolic blood pressure? (Adapted from the American Medical Association)
Patient
1
2
3
4
5
Before treatment
107
110
123
129
112
After treatment
100
114
105
112
115
Patient
6
7
8
9
10
Before treatment
111
107
112
136
102
After treatment
116
106
102
125
104
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
628
CHAPTER 11 Nonparametric Tests
5.
In a study testing the effects of an herbal supplement on blood pressure in men, 11 randomly selected men were given an herbal supplement for 12 weeks. The table shows the measurements for each subject’s diastolic blood pressure taken before and after the 12-week treatment period. At a
=
0.05, can you reject the claim that there was no reduction in diastolic blood pressure? (Adapted from The Journal of the American Medical Association)
Patient
1
2
3
4
5
6
Before treatment
123
109
112
102
98
114
After treatment
124
97
113
105
95
119
Patient
7
8
9
10
11
Before treatment
119
112
110
117
130
After treatment
114
114
121
118
133
6.
An association claims that the median annual salary of lawyers is $118,160. In a random sample of 125 lawyers, 76 were paid less than $118,160, and 49 were paid more than $118,160. At a
=
0.05, can you reject the association’s claim? (Adapted from U.S. Bureau of Labor Statistics)
Section 11.2
In Exercises 7 and 8, use a Wilcoxon test to test the claim by doing the following.
(a) Identify the claim and state H
0
and H
a
.
(b) Decide whether to use a Wilcoxon signed-rank test or a Wilcoxon rank sum test.
(c) Find the critical value(s).
(d) Find the test statistic.
(e) Decide whether to reject or fail to reject the null hypothesis.
(f ) Interpret the decision in the context of the original claim.
7.
A career placement advisor claims that there is a difference in the total times required to earn a doctorate degree by female and male graduate students. The table shows the total times (in years) to earn a doctorate for a random sample of 12 female and 12 male graduate students. At a
=
0.01, can you support the advisor’s claim? (Adapted from Survey of Earned Doctorates)
Female
9
11
9
12
11
8
10
13
6
6
8
9
Male
8
7
8
10
9
7
7
9
10
8
9
7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Review Exercises
629
8.
A medical researcher claims that a new drug affects the number of headache hours experienced by headache sufferers. The numbers of headache hours (per day) experienced by eight randomly selected patients before and after taking the drug are shown in the table. At a
=
0.05, can you support the researcher’s claim?
Patient
1
2
3
4
5
6
7
8
Headache hours (before)
0.9
2.3
2.7
2.4
2.9
1.9
1.2
3.1
Headache hours (after)
1.4
1.5
1.4
1.8
1.3
0.6
0.7
1.9
Section 11.3
In Exercises 9 and 10, use the Kruskal-Wallis test to test the claim by doing the following.
(a) Identify the claim and state H
0
and H
a
.
(b) Find the critical value and identify the rejection region.
(c) Find the test statistic H.
(d) Decide whether to reject or fail to reject the null hypothesis.
(e) Interpret the decision in the context of the original claim.
9.
The table shows the ages for a random sample of doctorate recipients in three fields of study. At a
=
0.01, can you conclude that the distribution of the ages of the doctorate recipients in at least one field of study is different from the others? (Adapted from Survey of Earned Doctorates)
Field of study
Age
Life sciences
31
32
34
31
30
32
35
31
32
34
29
Physical sciences
30
31
32
31
30
29
31
30
32
33
30
Social sciences
32
35
31
33
34
31
35
36
32
30
33
10.
The table shows the starting salaries for a random sample of college graduates in four fields of engineering. At a
=
0.05, can you conclude that the distribution of the starting salaries in at least one field of engineering is different from the others? (Adapted from National Association of Colleges and Employers)
Field of engineering
Starting salary (in thousands of dollars)
Chemical
68.4
65.9
71.7
70.5
64.3
69.9
67.5
65.7
69.4
71.1
Computer
68.2
67.6
65.8
66.4
69.5
72.6
67.0
70.2
68.5
66.4
Electrical
66.9
65.5
66.1
64.4
67.6
67.3
68.9
68.1
67.1
67.4
Mechanical
65.5
64.8
65.6
63.7
65.6
65.3
68.1
68.6
64.9
62.7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
630
CHAPTER 11 Nonparametric Tests
Section 11.4
In Exercises 11 and 12, use the Spearman rank correlation coefficient to test the claim by doing the following.
(a) Identify the claim and state H
0
and H
a
.
(b) Find the critical value.
(c) Find the test statistic r
s
.
(d) Decide whether to reject or fail to reject the null hypothesis.
(e) Interpret the decision in the context of the original claim.
11.
The table shows the overall scores and the prices for six randomly selected video disk players. The overall score is based mainly on picture quality. At a
=
0.10, can you conclude that there is a significant correlation between the overall score and the price? (Source: Consumer Reports)
Overall score
93
91
90
87
85
69
Price (in dollars)
500
300
500
150
250
130
12.
The table shows the overall scores and the prices per gallon for seven randomly selected interior paints. The overall score represents hiding, surface smoothness, and resistance to staining, scrubbing, gloss change, sticking, mildew, and fading. At a
=
0.10, can you conclude that there is a significant correlation between the overall score and the price? (Adapted from Consumer Reports)
Overall score
46
73
64
56
94
86
50
Price per gallon (in dollars)
24
40
25
24
40
38
26
Section 11.5
In Exercises 13 and 14, (a) identify the claim and state H
0
and H
a
,
(b) find the critical values, (c) find the test statistic, (d) decide whether to reject or fail to reject the null hypothesis, and (e) interpret the decision in the context of the original claim. Use a
=
0.05.
13.
A highway patrol officer stops speeding vehicles on an interstate highway. The genders of the last 25 drivers who were stopped are shown, where F
represents a female driver and M
represents a male driver. Can you conclude that the stops were not random by gender?
F M M M F M F M F F F M M F F F M M M F M M F F M
14.
The sequence shows the departure status of the last 18 buses to leave a bus station, where T
represents a bus that departed on time and L
represents a bus that departed late. Can you conclude that the departure status of the buses is not random?
T T T T L L L L T L L L T T T T T T
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chapter Quiz
631
Chapter Quiz
11
Take this quiz as you would take a quiz in class. After you are done, check your work against the answers given in the back of the book.
In Exercises 1– 5, (a) identify the claim and state H
0
and H
a
,
(b) decide which nonparametric test to use, (c) find the critical value(s), (d) find the test statistic, (e) decide whether to reject or fail to reject the null hypothesis, and (f ) interpret the decision in the context of the original claim.
1.
An organization claims that the median number of annual volunteer hours is 52. In a random sample of 75 people who volunteered last year, 47 volunteered for less than 52 hours, 23 volunteered for more than 52 hours, and 5 volunteered for 52 hours. At a
=
0.05, can you reject the organization’s claim? (Adapted from U.S. Bureau of Labor Statistics)
2.
A labor organization claims that there is a difference in the hourly earnings of union workers and nonunion workers in state and local governments. The table shows the hourly earnings (in dollars) for a random sample of 10 union workers and 10 nonunion workers in state and local governments. At a
=
0.10, can you support the organization’s claim? (Adapted from U.S. Bureau of Labor Statistics)
Union
Nonunion
29.75 28.15 32.30 35.52 32.88 27.85 27.35 29.05 27.60 26.75
26.15 23.10 21.20 26.95 22.05 24.75 22.50 22.25 21.40 20.45
3.
The table shows the sales prices for a random sample of apartment condominiums and cooperatives in four U.S. regions. At a
=
0.01, can you conclude that the distribution of the sales prices in at least one region is different from the others? (Adapted from National Association of Realtors)
Region
Sales price (in thousands of dollars)
Northeast
257.3
250.3
242.7
275.0
270.7
254.8
264.2
243.4
Midwest
166.9
183.1
178.9
153.9
148.5
169.9
163.3
165.1
South
181.3
156.7
155.6
170.4
175.3
196.3
178.4
166.8
West
320.2
303.6
357.4
331.7
291.6
327.4
321.7
308.0
4.
The table shows the numbers of emails sent and the numbers of emails received in a week for a random sample of nine people. At a
=
0.01, can you conclude that there is a significant correlation between the number of emails sent and the number of emails received?
Emails sent
30
30
25
26
24
18
18
25
28
Emails received
32
36
21
22
20
20
22
23
23
5.
A meteorologist wants to determine whether days with rain occur randomly in April in his hometown. To do so, the meteorologist records whether it rains for each day in April. The results are shown, where R
represents a day with rain and N
represents a day with no rain. At a
=
0.05, can the meteorologist conclude that days with rain are not random?
N R R N N N N R N R R N R R R N R R R R N N N N R N R N N R
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Chapter Test
11
632
CHAPTER 11 Nonparametric Tests
Take this test as you would take a test in class.
In Exercises 1– 5, (a) identify the claim and state H
0
and H
a
,
(b) decide which nonparametric test to use, (c) find the critical value(s), (d) find the test statistic, (e) decide whether to reject or fail to reject the null hypothesis, and (f ) interpret the decision in the context of the original claim.
1.
The mayor called on council members at a town meeting in the sequence shown, where R
represents a Republican council member and D
represents a Democrat council member. At a
=
0.05, can you conclude that the selection of members was not random?
R D D D R R D R D D R D D D R R D R R R R D R R R D D D R D R D R R
2.
An employment agency representative wants to determine whether there is a difference in the annual household incomes in four regions of the United States. The representative randomly selects several households in each region and records the annual household income for each. The table shows the results. At a
=
0.01, can the representative conclude that the distribution of the annual household incomes in at least one region is different from the others? (Adapted from U.S. Census Bureau)
Region
Household income (in thousands of dollars)
Northeast
64.2
57.0
65.6
64.7
59.9
62.4
61.5
Midwest
56.0
61.1
51.9
55.2
57.4
58.5
58.7
South
49.3
50.5
54.1
46.4
51.3
54.1
51.9
West
64.0
61.9
58.6
60.7
59.6
61.2
63.1
3.
An investment company claims that the median age of people with mutual funds is 51 years. The ages (in years) of 20 randomly selected mutual fund owners are listed below. At a
=
0.01, is there enough evidence to reject the company’s claim? (Adapted from Investment Company Institute)
46
34
33
27
58
64
54
36
38
42
26
51
49
44
46
50
39
34
51
63
4.
An employment agency claims that there is a difference in the weekly earnings of workers who are union members and workers who are not union members. The table shows the weekly earnings (in dollars) for a random sample of nine union members and eight nonunion members. At a
=
0.05, can you support the agency’s claim? (Adapted from U.S. Bureau of Labor Statistics)
Member
951
1090
788
896
980
1087
1136
1000
890
919
1026
Nonmember
850
783
954
649
747
906
895
730
790
687
5.
The table shows the overall scores and the prices for a random sample of eight different suitcases. The overall score represents the ease of use, features, construction, and durability of a suitcase. At a
=
0.05, can you conclude that there is a significant correlation between the overall score and the price? (Adapted from Consumer Reports)
Overall score
90
85
81
78
72
68
64
61
Price (in dollars)
495
230
190
160
350
230
260
200
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
REAL STATISTICS REAL DECISIONS
Putting it all together
Real Statistics—Real Decisions
633
In a recent year, according to the Bureau of Labor Statistics, the median number of years that wage and salary workers had been with their current employer (called employee tenure) was 4.2 years. Information on employee tenure has been gathered since 1996 using the Current Population Survey (CPS),
a monthly survey of about 60,000 households that provides information on employment, unemployment, earnings, demographics, and other characteristics of the U.S. population ages 16 and over. With respect to employee tenure, the questions measure how long workers have been with their current employers, not how long they plan to stay with their employers.
EXERCISES
1.
How Would You Do It?
(a) What sampling technique would you use to select the sample for the CPS?
(b) Do you think the technique in part (a) will give you a sample that is representative of the U.S. population? Why or why not?
(c) Identify possible flaws or biases in the survey on the basis of the technique you chose in part (a).
2.
Is There a Difference?
A congressional representative claims that the median tenure for workers from the representative’s district is less than the national median tenure of 4.2 years. The claim is based on the representative’s data, which is shown in the table at the right above. (Assume that the employees were randomly selected.)
(a) Is it possible that the claim is true? What questions should you ask about how the data were collected?
(b) How would you test the representative’s claim? Can you use a parametric test, or do you need to use a nonparametric test?
(c) State the null hypothesis and the alternative hypothesis.
(d) Test the claim using a
=
0.05. What can you conclude?
3.
Comparing Male and Female Employee Tenures
A congressional representative claims that there is a difference between the median tenures for male workers and female workers. The claim is based on the representative’s data, which is shown in the table at the right. (Assume that the employees were randomly selected from the representative’s district.)
(a) How would you test the representative’s claim? Can you use a parametric test, or do you need to use a nonparametric test?
(b) State the null hypothesis and the alternative hypothesis.
(c) Test the claim using a
=
0.05. What can you conclude?
www.bls.gov
Employee Tenure of 20 Workers
4.6
2.6
3.3
2.8
1.5
1.9
4.0
5.0
3.9
5.1
3.7
5.4
3.6
3.9
6.2
1.7
4.6
3.1
4.4
3.6
TABLE FOR EXERCISE 2
Employee tenure for a sample of male workers
Employee tenure for a sample of female workers
3.9
4.4
4.4
4.9
4.7
5.4
4.3
4.3
4.9
4.0
3.8
1.8
3.6
5.1
4.7
5.1
2.3
3.3
6.5
2.2
0.9
5.2
5.1
3.0
1.3
4.0
TABLE FOR EXERCISE 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
EXCEL
MINITAB
TI-84 PLUS
TECHNOLOGY
U.S. Income and Economic Research
634 CHAPTER 11 Nonparametric Tests
Extended solutions are given in the technology manuals that accompany this text. Technical instruction is provided for Minitab, Excel, and the TI-84 Plus.
The National Bureau of Economic Research (NBER) is a private, nonprofit, nonpartisan research organization. The NBER provides information for better understanding of how the U.S. economy works. Researchers at the NBER concentrate on four types of empirical research: developing new statistical measurements, estimating quantitative models of economic behavior, assessing the effects of public policies on the U.S. economy, and projecting the effects of alternative policy proposals.
One of the NBER’s interests is the median income of people in different regions of the United States. The table at the right shows the annual incomes (in dollars) of a random sample of people (15 years and over) in a recent year in four U.S. regions: Northeast, Midwest, South, and West.
In Exercises 1– 5, refer to the annual incomes of people in the table. Use a
=
0.05 for all tests.
1.
Construct a box-and-whisker plot for each region. Do the median annual incomes appear to differ between regions?
2.
Use technology to perform a sign test to test the claim that the median annual income in the Midwest is greater than $30,000. 3.
Use technology to perform a Wilcoxon rank sum test to test the claim that the median annual incomes in the Northeast and South are the same. 4.
Use technology to perform a Kruskal-Wallis test to test the claim that the distributions of annual incomes for all four regions are the same. 5.
Use technology to perform a one-way ANOVA to test the claim that the average annual incomes for all four regions are the same. Assume that the populations of incomes are normally distributed, the samples are independent, and the population variances are equal. How do your results compare with those in Exercise 4? 6.
Repeat Exercises 1, 3, 4, and 5 using the data in the table below. The table shows the annual incomes (in dollars) of a random sample of families in a recent year in four U.S. regions: Northeast, Midwest, South, and West.
Annual income of families (in dollars)
Northeast
Midwest
South
West
70,225
67,357
61,072
70,527
128,686
97,795
63,918
80,168
91,252
45,198
54,699
59,137
127,864
64,479
99,562
76,928
79,411
84,647
61,082
61,302
62,529
60,658
39,088
90,710
56,461
79,352
66,672
69,716
80,559
72,338
42,988
98,707
59,332
75,972
71,434
99,676
88,559
66,853
58,433
47,719
54,603
72,805
85,764
76,136
79,256
69,636
56,547
54,417
70,807
82,608
65,464
71,171
87,708
71,869
49,965
76,402
69,976
91,479
61,471
53,273
EXERCISES
Annual income of people (in dollars)
Northeast
Midwest
South
West
45,481
25,781
19,946
37,922
31,922
28,326
35,140
31,198
27,750
26,910
33,323
24,129
23,179
34,609
36,008
32,194
24,304
32,945
18,030
34,924
32,216
32,119
24,251
22,491
30,393
30,990
24,581
28,668
28,897
44,317
32,005
42,207
25,981
18,021
37,091
24,465
20,439
42,193
33,866
20,776
40,562
25,054
21,746
28,521
48,863
27,703
26,324
37,422
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt