Handout 2.2 Constructing Histograms
docx
keyboard_arrow_up
School
College of San Mateo *
*We aren’t endorsed by this school
Course
20
Subject
Mathematics
Date
Feb 20, 2024
Type
docx
Pages
4
Uploaded by HighnessDonkey2838
Math 80 – Statistics
Constructing Histograms Handout 2.2
HISTOGRAMS OF SINGLE QUANTITATIVE DATA SET
In the last handout, we analyzed data from the 2006–2007 NBA season when the league changed to a new synthetic basketball. The NBA responded to pressure from the players by changing back to the traditional leather basketball on January 1, 2007. We also examined whether the change back to the traditional ball seemed to be associated with differences in the distribution of total points scored by the two teams in the games during the last week of 2006 and the first week of 2007. It is possible that the change in basketballs might have been associated with a change in the difference in points scored by the teams. But there might be other explanations. In this lesson, we will consider the distributions of a different variable. “Is there a home-court advantage in the NBA?” To answer this question, we will look at the difference
in the teams’ scores, which will be calculated by taking the home team’s score and subtracting the visiting team’s score. For example, if the final score of a game for the home team is 110 and for the visiting team is 90, then the difference
is 20 (=110-90). And if the final score of a game for the home team is 88 and for the visiting team is 100, then the difference
is –12 (= 88 – 100). The following tables show the differences in scores for the games during the last week of 2006 and the first week of 2007.
2006 Data
16
13
11
1
19
5
2
23
–7
8
10
8
-7
6
19
-3
7
6
10
15
–13
–7
6
23
–16
–25
6
–13
15
25
29
2
-4
10
-10
14
17
3
2
14
23
9
10
–1
10
26
9
–10
23
10
14
22
1
-11
10
2007 Data
–6
–8
–1
4
24
–11
–8
5
12
–3
7
23
17
4
3
8
9
–15
2
–18
–2
11
3
9
–24
–4
14
-3
5
-7
-14
4
19
–9
–9
2
5
32
28
–5
–18
13
11
5
-12
-5
1
Based only on a visual examination of the data values above, can you determine if there’s a home-court advantage?
2
We will summarize the 2006 dataset by grouping it into bins. A table of bins is created below to help you group the final score differences into intervals of five points each (for example, 6 to 10 points, 11 to 15 points, and so on). The bins start with the lowest final score difference of –25 and end with the highest final score difference of 29. Last Week of 2006 Season Only
Score Diff
Frequency
Relative Frequency
–25 to –21
1
1/55 = 0.018 = 1.8%
–20 to –16
1
1.8%
–15 to –11
3
5.5%
–10 to –6
5
9.1%
–5 to –1
3
5.5%
0 to 4
6
10.9%
5 to 9
10
18.2%
10 to 14
12
21.8%
15 to 19
20 to 24
25 to 29
A
For each value in the 2006 Data table, we will determine the bin it falls into. For example, the first final score difference of 16 belongs in the bin that represents the range 15 to 19. The frequency
is equal to the number of values in that bin (interval). For example, the frequency is 5 for the bin labelled –10 to –6, as shown in the table. The relative frequency
is the proportion of the total number
of observations that fall in each range of the table. There were 55 games played in the final week of 2006, so the first relative frequency is 1/55 = 0.018 = 1.8% for the bin labeled –25 to –19.
Now, complete the table. Can you think of a reason why we would want to examine relative frequency? STATWAY™
STUDENT HANDOUT
PAGE 1
Math 80 – Statistics
Constructing Histograms Handout 2.2
The table created on the previous page is called a frequency distribution table
. A table is one way to display data. Another is a graph. 30
25
20
15
10
5
0
-5
-10
-15
-20
-25
12
10
8
6
4
2
0
Score Difference
Frequency
Frequency Histogram of 2006 Score Differences
B
Use the frequency column and the score difference ‘bins’ to construct a graph called a frequency
histogram
.
Each grouping of score differences (-25 to -21, -20 to -
16, and so on) from the table above is used to create a bar on the graph. Draw the bars for each group in the table on the graph to the left. Look at the bars of the histogram carefully. What do the heights of each bar represent?
3
A relative frequency histogram
represents the relative frequencies for the bins, instead of the frequencies. In other words, it corresponds to the percentage of the total number of score differences in each bin. Below are the graphs of the frequency and relative frequency histograms. Do you notice any similarities between the two histograms?
In a case like this where the distribution extends further to the left than the right from the peak, we say the shape of the distribution
is skewed to the left
(or left-skewed). If the distribution extends further to the right we say it is skewed to the right
(or right-skewed). If the distribution looks similar on both sides of the center, we say its shape is symmetric
.
4
The following questions relate to important features of a graph such as a histogram.
A
How would you describe the shape
of the distribution of final score differences in the last week of 2006? Is it symmetrical or does it extend more in one direction?
B
Estimate the value for the center
of the distribution, and the typical range
(or
spread
) of the distribution.
It is often a good idea to imagine what the histogram might look like before you make the graph. That way you’ll be less likely to
be fooled by errors in the data or when you accidentally graph the wrong variable. Also, different features of the distribution may appear more obvious at different bin width choices. When you use technology, it’s usually easy to vary the bin width so you can make sure that a feature you think you see isn’t a consequence of a certain bin width choice. STATWAY™
STUDENT HANDOUT
PAGE 2
30
25
20
15
10
5
0
-5
-10
-15
-20
-25
.25
.20
.15
.10
.05
0
2006 Score Difference
Relative Frequency
Relative Frequency Histogram of 2006 Score Difference
Math 80 – Statistics
Constructing Histograms Handout 2.2
2006
Frequency
30
20
10
0
-10
-20
-30
20
15
10
5
0
Histogram of 2006
2006
Frequency
3
0
2
8
2
6
2
4
2
2
2
0
1
8
1
6
1
4
1
2
1
0
8
6
4
2
0
-2
-4
-6
-8
-1
0
-1
2
-1
4
-1
6
-1
8
-
2
0
-2
2
-2
4
-2
6
9
8
7
6
5
4
3
2
1
0
Histogram of 2006
Here are two more histograms created by choosing different bin widths (a wider one and a narrower one) for the score differences data from the last week of 2006.
Another important feature to look for in a histogram is the outliers. We should always mention any stragglers, or outliers
, that stand off away from the body of the distribution. Outliers can affect almost every method we discuss in this course, so we’ll always be on the lookout for them. An outlier can be the most informative part of our data, or it might just be an error. Don’t throw it away without comment. Treat it specially and discuss it when you tell about your data, or find the error and fix it if you are able. When we discuss variability, we’ll have a rule of thumb for deciding when a point might be considered an outlier.
HOMEWORK
1
Here is a histogram of the score differences for the first week of 2007.
30
20
10
0
-10
-20
9
8
7
6
5
4
3
2
1
0
Score Difference
Frequency
Histogram of 2007 Score Differences
A
In how many games did the home team win by 10 or more points?
12 games
B
Interpret the height of the third bar. 4
They were 4 games whose score difference was between -15 to - 11
C
How many games were played during the first week of 2007? 47 games
D What percent of the games did the home team win by 10 or more points? 12/47=0.255 or 25.5%
2
The histogram below gives the distribution of the number of grams of fat for an item on the Burger King menu (source: Burger King’s Your Guide to Nutrition).
Grams of Fat in Burger King's Menu
0
2
4
6
8
10
0
10
20
30
40
50
60
More
Bin
Frequency
A How many items have 30 or fewer grams of fat?
19 items
B What percent of the items have more than 40 grams of fat?
2/25 = 0.08 or 8%
C Describe the histogram. When describing the histogram, make sure to discuss the center
(use the word “typical” to represent the center), spread
(use the word “majority” to represent the spread), and shape
in the context of this problem. Use complete sentences.
STATWAY™
STUDENT HANDOUT
PAGE 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Math 80 – Statistics
Constructing Histograms Handout 2.2
3
StatCrunch practice
The February 2011 issue of Consumer Reports
magazine provides the Overall Score ratings for exercise treadmills. There are two treadmill categories: non-folding and folding. Here are the Overall Score ratings for these two types of treadmill:
Non-folding Treadmills
85
84
83
82
81
78
78
69
65
60
Folding Treadmills
81
79
76
76
75
75
75
74
73
73
73
72
71
71
71
70
70
70
70
69
66
66
65
65
64
63
61
61
50
50
50
Suppose your friend is going to purchase a treadmill, but cannot decide whether to purchase a non-folding or folding model. Regardless of the price, which type would you recommend for the highest quality? What evidence do you have to recommend one type over the other? To answer the above questions, you will use StatCrunch to help you construct side-by-side histograms to compare the distributions of the Overall Score ratings for the two types of treadmills. First you will learn how to enter data into StatCrunch and then you will create the histograms.
Go to www.statcrunch.com
and log in.
Select the tab Open StatCrunch. It will bring you to a blank spreadsheet. Click on “
var1
” and replace with “
Type
” for the 2 different types of treadmill. Click on “
var2
” and replace with “
Rating
” for the treadmill rating.
Now enter the dataset for the Non-folding treadmills, and then follow by the Folding treadmills. Your spreadsheet should look similar the table below.
Row
Type
Rating
var3
Var4
1
Non-folding
85
2
Non-folding
84
3
…
10
Non-folding
60
11
Folding 81
12
Folding
79
…
After you’ve entered all the data, you are ready to make histograms. Click on Graph
and select Histogram
.
You should now see a dialog box. Under Select Column(s)
, click on Rating
. Under Group by
, select Type
.
Before you click on Compute
, you can choose different width for the bin size. You can also make adjustments after
you’ve viewed your histograms. Use the histograms to help you answer the questions above.
If you’d like, you can click on the tab Data
and select Save
to save this dataset.
Copy the histograms into a MS Word (or Google doc) and answer the following questions: A
Regardless of the price, which type would you recommend for the highest quality? B
What evidence do you have to recommend one type over the other?
STATWAY™
STUDENT HANDOUT
PAGE 4