STAT1201_Sherlow_Assignment1_21Sept2023
docx
keyboard_arrow_up
School
Athabasca University, Athabasca *
*We aren’t endorsed by this school
Course
1201
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
11
Uploaded by CoachBravery11850
STAT1201_Sherlow_Assignment1_21Sept2023
Thompson Rivers University STAT 1201 Assignment #1
Unit 1
Ashley Sherlow
September 21, 2023
Student Number: T00745766
STAT1201_Sherlow_Assignment1_21Sept2023
STAT 1201 - Introduction to Probability and Statistics
1. Chapter 1: For the study summarized in the abstract below:
a. Identify the “5 W’s plus H.”
Who: Individuals with type 2 diabetes
What: -
Ambulatory blood pressure (BP)
-
Heart rate
-
Vascular function
-
Oxidative stress
-
Dosage (RRR-alpha-tocopherol, mixed tocopherols, placebo)
Why: investigate the effect of supplementation of either alpha- or mixed tocopherols on BP, pulse pressure, and heart rate
Where: Royal Perth Hospital, Perth, Australia
When: Published
Jan 25, 2007
How: randomized double-blind, placebo-controlled trial, data collected through measurement of BP, pulse pressure, and heart rate. b. Name each variable and identify whether it is categorical or quantitative. For each quantitative variable identify the units used.
Variable
Categorical or Quantitative?
Ambulatory BP
Quantitative, mmHg
Heart rate
Quantitative, bpm
Vascular function
Quantitative, pulse pressure mmHg
Oxidative stress
Unclear how this was measured
Best guess: Categorical as there are no measurement units
Dosage
Categorical
2. Chapter 2 - The Great One
Who - Gretsky
What - Number of games played each season
Why - Unknown, likely to spot trends in the number of games played in each season
When - over the span of Wayne Gretsky’s career
STAT1201_Sherlow_Assignment1_21Sept2023
Where - NHL
a.
b. The distribution of the number of games Gretsky played each season is unimodal with a mode toward the top end of the data, and skewed to the low end. There is a noteworthy outlier depicting two seasons where he played between 45 and 50 games, another outlier depicting a season where he played 64 games with a gap between both of these outliers and the more densely distributed data. c. I would expect the median to be higher, since it is less affected by outliers. Conversely, means are more affected by skewness and outliers, pulling the mean towards the low end, in this case. d. I would use the median in this case as this is more appropriate for summarizing skewed distributions. e. In my mind, the most unusual feature is the value at the very low end of the distribution – 2 seasons where Gretsky played between 45 and 49 games (
the x-axis value that represents the left border of the bar is inclusive of the 'bucket', while the right border would therefore not be
). While there is another value at one season of 64 games, this is closer to the mode and therefore not as drastic of a deviation from the median. Some further investigation would be
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT1201_Sherlow_Assignment1_21Sept2023
required; this may indicate some years throughout Gretsky’s career that were abnormal in terms
of the number of games played (e.g. individual circumstances such as an injury or circumstances that affected how many games the whole team could play). 3. Chapter 2
a. Mean
ȳ
= Σy / n
ȳ = (10 + 12 + 14 + 16 + 20) / 5 = 72 / 5 = 14.4
Median
(n + 1) / 2 = 3
3rd value = 14
Standard Deviation
SD = √ [∑(y - ȳ)
2
/ (n-1)]
= √ [(10 - 14.4)2 + ... + (20 - 14.4)
2
/ 4]
= √ [59.2 / 4]
= √ 14.8
= 3.84
IQR
Median = 14
Q1: 11
Q3: 18
IQR = Q3 - Q1 = 18 - 11 = 7
Interquartile Range: 7
b. Mean
ȳ
= Σy / n
ȳ = (3 + 10 + 12 + 14 + 16 + 20) / 6 = 75 / 6 = 12.7
Median
n / 2 = 3
n / 2 + 1 = 4
3rd value = 12
4th value = 14
Standard Deviation
SD = √ [∑(y - ȳ)
2
/ (n-1)]
= √ [(3 - 12.5)2 + ... + (20 - 12.5)
2
/ 5]
= √ [167.5 / 5]
= √ 33.5
STAT1201_Sherlow_Assignment1_21Sept2023
= 5.79
IQR
Q1: 10
Q3: 16
IQR = Q3 - Q1 = 16 - 10 = 6
Interquartile Range: 6
Interestingly, in the second set of calculations, one of the median values (given that there are two because of the even number of values) stayed the same, with another one on the lower end. Unsurprisingly, the standard deviation changed quite a bit from the first round of calculations, which makes sense given the addition of an outlying, lower number: 3. The mean wasn’t dramatically dissimilar between the two calculations, as was the case with the IQR. c. Mean
ȳ
= Σy / n
ȳ = (20 + 24 + 28 + 32 + 40) / 5 = 144 / 5 = 28.8
Median
(n + 1) / 2 = 3
3rd value = 28
Standard Deviation
SD = √ [∑(y - ȳ)
2
/ (n-1)]
= √ [(20 - 28.8)2 + ... + (40 - 28.8)
2
/ 4]
= √ [236.8 / 4]
= √ 59.2
= 7.69
IQR
Median = 28
Q1: 22
Q3: 36
IQR = Q3 - Q1 = 36 - 22 = 14
Interquartile Range: 14
Cool! These values are exactly double those in a. 4. Chapter 2: Exercise 48
a. Stem and leaf plot
STAT1201_Sherlow_Assignment1_21Sept2023
8
2, 3, 6, 8
9
7, 8
10
1, 1, 5, 6
11
8
12
4, 6, 8
13
1, 3, 6
14
15
0
16
6
17
18
4
b. First thing to note is that the line separating the left column from the right column of the stem-
and-leaf plot represents a decimal point in the tens place. The distribution of values is bimodal with two modes relatively close together towards the lower end of the dataset, with one mode in the “8-” bucket and the other in the “10-” bucket. The distribution is skewed to the high end. There are 3 outliers, with the most outlying value at 184, with two others closer to the modes at 150 and 166. Therefore, the distribution is not symmetrical, uniform, or unimodal. 5. Chapter 3: Exercise 38
Driver
Student
Staff
Total
Origin
American
Count
107
105
212
% of Column
54.9%
64.00%
59.05%
Euro
Count
33
12
45
% of Column
16.9%
7.30%
12.53%
Asian
Count
55
47
102
% of Column
28.2%
28.70%
28.41%
Total
Count
195
164
359
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT1201_Sherlow_Assignment1_21Sept2023
a. 40.9%
b. 50.5%
c. 54.9%
d. American 59.1% | European 12.5% | Asian 28.4%
e. Driver
Student
Staff
Origin
American
29.8%
29.25%
Euro
9.2%
3.34%
Asian
15.3%
13.09%
f. While the data for European cars does seem to differ more between Student and Staff driver classifications, compared to the conditional distributions for American and Asian cars, the sample size for European cars in total is relatively small. For this reason, it is likely that the origin of the car is independent of the type of driver. One may want to expand the sample size in
order to confirm.
STAT1201_Sherlow_Assignment1_21Sept2023
6. Chapter 4: Exercise 20
a. In the histogram, the distribution of strength ratings is symmetrical and bimodal, with two adjacent modes at ratings from 2-3 and 3-4. There are no outliers. In the boxplot, the the distribution for the biceps surgery is skewed to the low end and is, therefore, not symmetrical, but it is clear that this surgery produces higher strength ratings in the years following. Notably, this box plot appears to have no median line, meaning that the median is equal to either the lower or upper quartile. Without a table of data and just the histogram, my guess would be that the median is the line on the lower quartile side. There are no apparent outliers beyond the min and max lines. Conversely, the deltoid results do not show whiskers with the min and max lines,
indicating that these far outliers are more than 3 IQRs beyond the nearest quartile on either end.
The median line in the fairly short (therefore small IQR) indicates that the majority of the data collected falls in a small window of data with just a few outliers. b. Overall, the range is 4 (5 - 1 = 4 strength rating range). For Biceps, the range is 2.10 (4.10 - 2.00 = 2.10 strength rating range)
For Deltoids, the range 2 (3.00 - 1.00 = 2 strength rating range)
c. From the boxplot, it’s clear that there are distinct differences in strength ratings between the two surgeries, with the biceps surgery being more successful. In the histogram, this fact is hidden since the higher scores compensate for the lower scores from the deltoid surgery and the scores themselves are not grouped into the two surgery types. This can be seen just visually
or by looking at the median of each group. d. As mentioned in earlier responses, the box plot for the triceps surgery appears to have no median line, meaning that the median is equal to either the lower or upper quartile. At worst, the
median line would then be that of the lower quartile, which is still higher than the median of the deltoid surgery. e. Besides one far outlier (that should be investigated further), the biceps surgery outperformed the deltoid surgery almost
every time. There is a bit of overlap between the minimum denoted for biceps and the median on the deltoid. f. While the biceps surgery performed better overall, the height of the boxplot for biceps is far greater, indicating a wider range of results in the middle half. The more extreme minimum and maximum values also mirror this. Conversely, the deltoid surgery box is far shorter, indicating less variance between the values collected overall for this group. This, however, is negated by far outliers but, again, these need to be explored further in order to truly take these into account.
Interesting that while the confidence in the success rate for biceps would be higher overall, more consistent results come from the deltoid surgery. 7. Chapter 4
a. Max: 0:31:33
STAT1201_Sherlow_Assignment1_21Sept2023
Q3: 0:31:21
Median: 0:30:56
Q1: 0:29:53
Min: 0:27:38
# of Values: 14
Cases: Runners in 2019 VPG Realty Grouse Grind Mountain Run
b. c.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT1201_Sherlow_Assignment1_21Sept2023
d. The distribution of race times is unimodal with a mode in the 31-minute bin and skewed to the
low end. There are no outliers of note. On the boxplot, the median is found closer to the lower quartile. 8. Chapter 5
a. M = 10, SD = 2.5
i. Lower = 8, Upper = 12
Using a calculator normalcdf function, the proportion of response times within the range given is
57.6%
ii. Lower = 0, Upper = 9
Because there is no lower limit specified, I used 0 as the lower limit. Using a calculator normalcdf function, the proportion of response times within the range given is 34.5%
iii. Using a calculator: invNorm (0.8, 10, 2.5) = 12.1 minutes
iv. IQR = 75th percentile (Q3) - 25th percentile (Q1)
Used a calculator invNorm function to find the values for both Q3 and Q1 and then calculated IQR using those values.
STAT1201_Sherlow_Assignment1_21Sept2023
= 11.69 - 8.31
IQR = 3.38
b. Mean (M) = 8
95% were at most 12 minutes
Using the 68 - 95 - 99.7 Rule, 12 minutes = 2 StdDev (SD)
Since 95% values fall within 2 SD of the mean… 12 = M + 2SD
12 = 8 + 2SD
12 - 8 = 2SD
4 / 2 = SD
2 = SD
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageCollege Algebra (MindTap Course List)AlgebraISBN:9781305652231Author:R. David Gustafson, Jeff HughesPublisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning