tutorial 4 pdf
pdf
keyboard_arrow_up
School
Wilfrid Laurier University *
*We aren’t endorsed by this school
Course
285
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
11
Uploaded by KidTeam15496
1
EC285 A,B Tutorial Questions
Week 3
Topic: Displaying and Describing Categorical
Data
Edda Claus
EC285 A,B - Fall 2023
6 October 2023
1.
Below are histograms a., b. and c. for the grades of 114 randomly sampled students in an economics class. Approximately what percentage of students in each case (a., b. and c.) got a grade of 70
a.
Frequency histogram: height = n in each bin. A grade of 70 or more is the last two bins, with n = 7 and 19 in each, so the % is roughly 26/144 = 18%.
b.
It’s the same histogram, just in relative frequency terms. The answers will therefore actually be exactly the same. But if I only had this chart, I would say: we have roughly 6% in the over 80 bin, and 17% in the 70-80 bin, so 23% overall.
2
c.
Again, same histogram, just in density terms. Fraction in a particular range = sum of areas of the bins. Width of bins is 10. First bin looks like it’s around 0.005 in height, and second around 0.017 in height? That would mean 0
.
005×10+0
.
017×10 = 22%?
d.
Explain any differences in your answers to a, b, and c.
Any differences have to be due to rounding/eyeballing errors, since they all represent exactly the same distribution.
e.
Averaged over all sections, of the entire class of 2100 students, 25% had a grade above 70%. How does this differ from the answers you got for Question 1? Give a plausible explanation using concepts from class and the readings.
I’m going to go with 22% have a grade above 70 in the sample of 114 students. This is a bit below 25%. The main reason there would be a difference between the two is just due to randomness in the sample selection – I just happened to get a sample of students who had a slightly weaker grade than the class on average. If the actual percentage with a grade above 70 were 25%, then we’d expect in a sample of 114 to get about 28 or 29 with a 70 or above. 22% of 114 is 25. So we maybe had 3 or 4 fewer students than we’d expect getting above a 70. That is not very many! There is the possibility that the sample deviated systematically from perfect randomization, so that there might be some non-sampling error too – perhaps we only took a sample of one section, and that section had students who were a bit less strong than the other students? But there’s really no reason to think there’s anything going on but sampling error. Note also that sample statistics from relatively small samples are less good at estimating things on the extremes of the distribution than at the centre, and generally less good at estimating things with small probability or frequency in the population.
2.
The professor in the class is told that the mean grade in the class overall – which is
62% – is too low and needs to be adjusted up, to get to 72%.
1
a.
If the professor decides to do this by adding a fixed number of percentage points to each student’s grade, how much would she have to add? Write a formula for this adjustment (
AdjustedGrade = f
(
Grade
) – but what function exactly?)
AdjustedGrade = Grade + 10
b.
The summary statistics and relative frequency histogram for the sample of 114 students before the adjustment are given below. What would be the new values for each statistic after the adjustment was applied (you can check your answers below)? How does the adjustment affect the shape of the histogram?
1
Note: this has never, to my knowledge, happened in Economics at WLU.
3
Variable
Obs
Mean
Std. Dev.
Min
Max
grade
114
59,12265
12.59926
28.01079
87.377
69.12
12.599
38.01
97.377
The shape of the histogram is completely unaffected – its position on the x axis just shifts up by 10.
c.
Suppose the professor increases each employee’s score by 17% (ie for a grade of 50% the adjusted grade would be 50 × 1
.
17 = 58
.
5 per cent). What is the equation that describes the adjustment? Comparing the histogram of the adjusted scores (below) with the histogram of the unadjusted scores, how does this adjustment affect the shape of the histogram? The mean? The s.d.? In your opinion, is this fair?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
AdjustedGrade2 = Grade × 1
.
17
The histograms look pretty similar, just (a) shifted up by 10, and (b) going a bit further up. The range is now from 33 to 102 which is a bit bigger than the original range of 28 to 87. So the spread has increased. That said, if we just adjusted the width of the bars to be 17% more than in the first panel, there’d be no difference at all. The mean has increased by roughly 10 percentage points (by construction). But because the percent adjustment means that those who started off with bigger scores get a bigger jump in grade, the range increased, as did the standard deviation.
d.
Suppose the professor adjusts these by multiplying by 1.05 and then adding 7 points to each grade. What is the equation that describes the adjustment? Comparing the histogram of these adjusted scores (below) with the histogram of the unadjusted scores, how does this adjustment affect the shape of the histogram? The mean? The s.d.? In your opinion, is this fair for all of the employees?
AdjustedGrade3 = Grade.
05 + 7
The histograms again look pretty similar. Again, if we made each bar 5% wider, and started up a bit further in the grade distribution, they’d be identical.
Which you think is the fairest depends on whether you think the grading was harsher at the bottom or at the top – and probably also on where you would have fall on the distribution (eg the person who gets more than 100% using the adjustment, but who really can’t benefit from that, might not think it’s fair that everyone else’s letter grade goes up and hers doesn’t).
e.
A linear transformation is one that takes the form Y = a + bX where a and b are constants. A linear transformation does not change the shape of the histogram: if it is skewed it will remain skewed; if it is symmetric it will remain symmetric; if it is
5
Bell shaped it will remain Bell shaped; if it is bi-modal it will remain bimodal. Are the adjustments in (b) – (d) examples of linear transformations? Why does the histogram after Adjustment 2 look a bit different from the original, and what could we do to make it look identical (hint: how many bars are in each histogram)? Would changes in units of measurement (e.g. dollars versus thousands of dollars) also be examples of linear transformations?
Yes all the transformations here are linear. Basically, each of them leaves the shape of the histogram unchanged (if you start from the lowest value and have the same number of bins). The reason adjusted grade 2 looks a bit different is that its histogram has more bins than the others. Changes in units of measurement are also definitely linear transformations.
Summary Statistics for each of the grade adjustments:
Variable
Obs
Mean
Std. Dev.
Min
Max
AdjustedGr1
114
69.12265
12.59926
38.0108
97.377
AdjustedGr2
114
69.1735
14.74113
32.77263
102.2311
AdjustedGr3
114
69.07879
13.22922
36.41133
98.74585
3.
Using the descriptive terms for histograms given in lecture, how would you describe the graph below? Roughly how many observations are negative?
6
If asked to describe this chart, I would say it is roughly symmetric around 29. It appears to be bimodal, with modes in roughly the 18-27 and 41-48 ranges (roughly). That said, this is a sample, so that the bimodality is likely a result of sampling error. It seems unlikely we’d be able to say that the underlying population has a bimodal distribution.
In a density histogram, the % of observations in a given bar is equal to its area – so height by width.
The width of the bars is roughly 50/5.6 = 9 units. Eyeballing it, I would put the heights of the bars at 0.002, 0.003, 0.004 and 0.007. Those sum up to 0.016. Multiplied by 9 to get the area, that’s 14.4. So around 14 to 15% of observations look to be below zero.
4.
What are the main differences between a bar chart and a histogram?
Bar charts are best used for categorical data. Histograms are a lot like a bar chart, except the categories are ranges of numerical data. The horizontal axis for a bar chart is just categories, whereas for a histogram it is the real number line (ie it’s quantitative). But in terms of construction and interpretation, they are quite similar. The key difference is that the ordering of the categories in a bar chart doesn’t have to mean anything, since the categories may be nominal. (A bar chart of an ordinal categorical variable is more like a histogram.)
5.
Consider the data given below on weights of trucks in tons at a weighbridge on highway 401 in Ontario:
62.5
52.0
70.5
63.0
57.0
67.0
64.0
58.5
67.5
64.5
61.0
63.5
59.5
66.0
53.0
74.5
66.0
55.5
62.5
62.5
59.5
69.5
52.0
70.0
63.0
64.0
63.5
56.5
74.0
64.5
a.
Develop a frequency distribution using the following bins: 50-55, 55-60, 60-65, 65-
70, and 70-75.
b.
Develop a relative frequency distribution using the bins from part a.
c.
Draw a histogram based on the relative frequency distribution from part b.
d.
Is this distribution symmetric or skewed?
a. & b. Note that 70.0 is referred to the last bin.
Bins
Frequency
Relative Frequency
50 – 55
3
0.10
55 – 60
6
0.20
60 – 65
12
0.40
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7
65 – 70
5
0.17
70 – 75
4
0.13
c.
d. Fairly symmetric
6.
Consider the data given below on the average number of products returned per day at a retail store in Fredericton, NB, during 20 consecutive months.
2.9
2.1
3.5
4.2
5.1
6.5
7.8
8.1
4.5
3.8
2.8
2.2
3.3
3.4
4.4
5.5
6.1
4.2
2.5
2.1
a.
Construct a stem-and-leaf display.
b.
Is the distribution unimodal, bimodal or multimodal?
c.
Is the distribution symmetric or skewed?
d.
What would be a better measure of the centre for this distribution, the mean or the median?
2
1
1
2
5
8
9
3
3
4
5
8 4
2
3
4
5 a.
5
1
5
6
1
5
7
8
8
1
b.
Unimodal
c.
Skewed to the right
d.
Median, as the distribution is skewed
7.
Consider the data given below on monthly average gas prices in Belleville, ON.
110
125
99
115
119
95
110
132
85
a.
Compute the mean.
b.
Compute the median.
c.
What is the mode?
8
d.
What is the lower quartile, Q
1
?
e.
What is the upper quartile, Q
3
?
a.
110
b.
110
c.
110
d.
99
e.
119
8.
A population increased by 3% at the end of year 1, by 5% at the end of year 2, and by 1% at the end of year 3. What was the compound annual growth rate?
a.
3%
b.
2.99%
c.
2.98%
d.
3.1%
e.
28.97%
b. 2.99%
9.
Using the data set
110
125
99
115
119
95
110
132
85
a.
Compute the range.
b.
Compute the interquartile range (IQR).
c.
Compute the variance.
d.
Compute the standard deviation.
(a)
47
(b)
20
(c)
223.25
9
(d)
14.94
10.
The data given below shows the average number of customers per day purchasing a promotional product in a health food store in Guelph, Ontario, over 13 consecutive weeks.
10.5
27.0
21.5
15.0
19.0
20.0
12.5
22.5
18.5
19.0
17.0
20.0
21.0
a.
Provide the five-number summary.
b.
Compute the value of the upper fence.
c.
Compute the value of the lower fence.
d.
Show the boxplot.
Max
27.0
Upper Quartile, Q3
21.0 a.
Median
19.0
Lower Quartile, Q1
17.0
Min
10.5
b.
27.0
c.
11.0
d.
11.
The number of orders received per day at a clothing web site in British Columbia has a mean of 1175 and a standard deviation of 250.
a.
One day the number of clothing orders received is 1500. Calculate its standardized value (its z
-score).
b.
What does its z-score tell us?
c.
Another day the number of clothing orders received is 950. Calculate its standardized value (its z
-score).
d.
What does its z-score tell us?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10
a.
z = 1.3
b.
This observation is 1.3 standard deviations above the mean.
c.
z = -0.9
d.
This observation is 0.9 standard deviations below the mean.
12.
The number of orders received per day at a jewellery web site in Nova Scotia has a mean of 52 and a standard deviation of 15.
a.
One day the number of jewellery orders received is 97. Calculate its standardized value (its z
-score).
b.
What does the z
-score tell us?
c.
Which value is more extreme, 97 jewellery orders or 1500 clothing orders (from the question a. above)? Explain.
d.
What number of jewellery orders would be the same number of standard deviations below its mean as 950 clothing orders (from from the question c. above)?
a.
z = 3
b.
This observation is 3 standard deviations above the mean.
c.
The 97 jewellery orders is more extreme because it is more standard deviations from its mean.
d.
38.5
13.
The following data represent quarterly profits (in thousands of dollars) for a small clean-tech start up business in Ottawa over a three year period.
-20
-10
-15
10
12
15
12
22
22
28
25
38
a.
Construct a stem-and-leaf display.
b.
Construct a time series plot.
c.
Are the data stationary? Explain.
d.
Which display (stem and leaf or time series plot) is preferable for these data? Why?
-2
0
11
-1
0
5
0
a.
1
0
2
2
5
2
2
2
5
8
3
8
b.
c.
No. There is an increasing trend.
d.
The time series plot is preferable because the stem-and-leaf display cannot capture the trend.
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL