Introduction to Probability and Statistics: Assignment 1 Insights

STAT1201_Sherlow_Assignment1_21Sept2023 Thompson Rivers University STAT 1201 Assignment #1 Unit 1 Ashley Sherlow September 21, 2023 Student Number: T00745766

STAT1201_Sherlow_Assignment1_21Sept2023 STAT 1201 - Introduction to Probability and Statistics 1. Chapter 1: For the study summarized in the abstract below: a. Identify the “5 W’s plus H.” Who: Individuals with type 2 diabetes What: - Ambulatory blood pressure (BP) - Heart rate - Vascular function - Oxidative stress - Dosage (RRR-alpha-tocopherol, mixed tocopherols, placebo) Why: investigate the effect of supplementation of either alpha- or mixed tocopherols on BP, pulse pressure, and heart rate Where: Royal Perth Hospital, Perth, Australia When: Published Jan 25, 2007 How: randomized double-blind, placebo-controlled trial, data collected through measurement of BP, pulse pressure, and heart rate. b. Name each variable and identify whether it is categorical or quantitative. For each quantitative variable identify the units used. Variable Categorical or Quantitative? Ambulatory BP Quantitative, mmHg Heart rate Quantitative, bpm Vascular function Quantitative, pulse pressure mmHg Oxidative stress Unclear how this was measured Best guess: Categorical as there are no measurement units Dosage Categorical 2. Chapter 2 - The Great One Who - Gretsky What - Number of games played each season Why - Unknown, likely to spot trends in the number of games played in each season When - over the span of Wayne Gretsky’s career

STAT1201_Sherlow_Assignment1_21Sept2023 Where - NHL a. b. The distribution of the number of games Gretsky played each season is unimodal with a mode toward the top end of the data, and skewed to the low end. There is a noteworthy outlier depicting two seasons where he played between 45 and 50 games, another outlier depicting a season where he played 64 games with a gap between both of these outliers and the more densely distributed data. c. I would expect the median to be higher, since it is less affected by outliers. Conversely, means are more affected by skewness and outliers, pulling the mean towards the low end, in this case. d. I would use the median in this case as this is more appropriate for summarizing skewed distributions. e. In my mind, the most unusual feature is the value at the very low end of the distribution – 2 seasons where Gretsky played between 45 and 49 games ( the x-axis value that represents the left border of the bar is inclusive of the 'bucket', while the right border would therefore not be ). While there is another value at one season of 64 games, this is closer to the mode and therefore not as drastic of a deviation from the median. Some further investigation would be

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

STAT1201_Sherlow_Assignment1_21Sept2023 required; this may indicate some years throughout Gretsky’s career that were abnormal in terms of the number of games played (e.g. individual circumstances such as an injury or circumstances that affected how many games the whole team could play). 3. Chapter 2 a. Mean ȳ = Σy / n ȳ = (10 + 12 + 14 + 16 + 20) / 5 = 72 / 5 = 14.4 Median (n + 1) / 2 = 3 3rd value = 14 Standard Deviation SD = √ [∑(y - ȳ) 2 / (n-1)] = √ [(10 - 14.4)2 + ... + (20 - 14.4) 2 / 4] = √ [59.2 / 4] = √ 14.8 = 3.84 IQR Median = 14 Q1: 11 Q3: 18 IQR = Q3 - Q1 = 18 - 11 = 7 Interquartile Range: 7 b. Mean ȳ = Σy / n ȳ = (3 + 10 + 12 + 14 + 16 + 20) / 6 = 75 / 6 = 12.7 Median n / 2 = 3 n / 2 + 1 = 4 3rd value = 12 4th value = 14 Standard Deviation SD = √ [∑(y - ȳ) 2 / (n-1)] = √ [(3 - 12.5)2 + ... + (20 - 12.5) 2 / 5] = √ [167.5 / 5] = √ 33.5

STAT1201_Sherlow_Assignment1_21Sept2023 = 5.79 IQR Q1: 10 Q3: 16 IQR = Q3 - Q1 = 16 - 10 = 6 Interquartile Range: 6 Interestingly, in the second set of calculations, one of the median values (given that there are two because of the even number of values) stayed the same, with another one on the lower end. Unsurprisingly, the standard deviation changed quite a bit from the first round of calculations, which makes sense given the addition of an outlying, lower number: 3. The mean wasn’t dramatically dissimilar between the two calculations, as was the case with the IQR. c. Mean ȳ = Σy / n ȳ = (20 + 24 + 28 + 32 + 40) / 5 = 144 / 5 = 28.8 Median (n + 1) / 2 = 3 3rd value = 28 Standard Deviation SD = √ [∑(y - ȳ) 2 / (n-1)] = √ [(20 - 28.8)2 + ... + (40 - 28.8) 2 / 4] = √ [236.8 / 4] = √ 59.2 = 7.69 IQR Median = 28 Q1: 22 Q3: 36 IQR = Q3 - Q1 = 36 - 22 = 14 Interquartile Range: 14 Cool! These values are exactly double those in a. 4. Chapter 2: Exercise 48 a. Stem and leaf plot

STAT1201_Sherlow_Assignment1_21Sept2023 8 2, 3, 6, 8 9 7, 8 10 1, 1, 5, 6 11 8 12 4, 6, 8 13 1, 3, 6 14 15 0 16 6 17 18 4 b. First thing to note is that the line separating the left column from the right column of the stem- and-leaf plot represents a decimal point in the tens place. The distribution of values is bimodal with two modes relatively close together towards the lower end of the dataset, with one mode in the “8-” bucket and the other in the “10-” bucket. The distribution is skewed to the high end. There are 3 outliers, with the most outlying value at 184, with two others closer to the modes at 150 and 166. Therefore, the distribution is not symmetrical, uniform, or unimodal. 5. Chapter 3: Exercise 38 Driver Student Staff Total Origin American Count 107 105 212 % of Column 54.9% 64.00% 59.05% Euro Count 33 12 45 % of Column 16.9% 7.30% 12.53% Asian Count 55 47 102 % of Column 28.2% 28.70% 28.41% Total Count 195 164 359

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

STAT1201_Sherlow_Assignment1_21Sept2023 a. 40.9% b. 50.5% c. 54.9% d. American 59.1% | European 12.5% | Asian 28.4% e. Driver Student Staff Origin American 29.8% 29.25% Euro 9.2% 3.34% Asian 15.3% 13.09% f. While the data for European cars does seem to differ more between Student and Staff driver classifications, compared to the conditional distributions for American and Asian cars, the sample size for European cars in total is relatively small. For this reason, it is likely that the origin of the car is independent of the type of driver. One may want to expand the sample size in order to confirm.

STAT1201_Sherlow_Assignment1_21Sept2023 6. Chapter 4: Exercise 20 a. In the histogram, the distribution of strength ratings is symmetrical and bimodal, with two adjacent modes at ratings from 2-3 and 3-4. There are no outliers. In the boxplot, the the distribution for the biceps surgery is skewed to the low end and is, therefore, not symmetrical, but it is clear that this surgery produces higher strength ratings in the years following. Notably, this box plot appears to have no median line, meaning that the median is equal to either the lower or upper quartile. Without a table of data and just the histogram, my guess would be that the median is the line on the lower quartile side. There are no apparent outliers beyond the min and max lines. Conversely, the deltoid results do not show whiskers with the min and max lines, indicating that these far outliers are more than 3 IQRs beyond the nearest quartile on either end. The median line in the fairly short (therefore small IQR) indicates that the majority of the data collected falls in a small window of data with just a few outliers. b. Overall, the range is 4 (5 - 1 = 4 strength rating range). For Biceps, the range is 2.10 (4.10 - 2.00 = 2.10 strength rating range) For Deltoids, the range 2 (3.00 - 1.00 = 2 strength rating range) c. From the boxplot, it’s clear that there are distinct differences in strength ratings between the two surgeries, with the biceps surgery being more successful. In the histogram, this fact is hidden since the higher scores compensate for the lower scores from the deltoid surgery and the scores themselves are not grouped into the two surgery types. This can be seen just visually or by looking at the median of each group. d. As mentioned in earlier responses, the box plot for the triceps surgery appears to have no median line, meaning that the median is equal to either the lower or upper quartile. At worst, the median line would then be that of the lower quartile, which is still higher than the median of the deltoid surgery. e. Besides one far outlier (that should be investigated further), the biceps surgery outperformed the deltoid surgery almost every time. There is a bit of overlap between the minimum denoted for biceps and the median on the deltoid. f. While the biceps surgery performed better overall, the height of the boxplot for biceps is far greater, indicating a wider range of results in the middle half. The more extreme minimum and maximum values also mirror this. Conversely, the deltoid surgery box is far shorter, indicating less variance between the values collected overall for this group. This, however, is negated by far outliers but, again, these need to be explored further in order to truly take these into account. Interesting that while the confidence in the success rate for biceps would be higher overall, more consistent results come from the deltoid surgery. 7. Chapter 4 a. Max: 0:31:33

STAT1201_Sherlow_Assignment1_21Sept2023 Q3: 0:31:21 Median: 0:30:56 Q1: 0:29:53 Min: 0:27:38 # of Values: 14 Cases: Runners in 2019 VPG Realty Grouse Grind Mountain Run b. c.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

STAT1201_Sherlow_Assignment1_21Sept2023 d. The distribution of race times is unimodal with a mode in the 31-minute bin and skewed to the low end. There are no outliers of note. On the boxplot, the median is found closer to the lower quartile. 8. Chapter 5 a. M = 10, SD = 2.5 i. Lower = 8, Upper = 12 Using a calculator normalcdf function, the proportion of response times within the range given is 57.6% ii. Lower = 0, Upper = 9 Because there is no lower limit specified, I used 0 as the lower limit. Using a calculator normalcdf function, the proportion of response times within the range given is 34.5% iii. Using a calculator: invNorm (0.8, 10, 2.5) = 12.1 minutes iv. IQR = 75th percentile (Q3) - 25th percentile (Q1) Used a calculator invNorm function to find the values for both Q3 and Q1 and then calculated IQR using those values.

STAT1201_Sherlow_Assignment1_21Sept2023 = 11.69 - 8.31 IQR = 3.38 b. Mean (M) = 8 95% were at most 12 minutes Using the 68 - 95 - 99.7 Rule, 12 minutes = 2 StdDev (SD) Since 95% values fall within 2 SD of the mean… 12 = M + 2SD 12 = 8 + 2SD 12 - 8 = 2SD 4 / 2 = SD 2 = SD

STAT1201_Sherlow_Assignment1_21Sept2023

Related Documents