STAT1201_Sherlow_Assignment2_4October2023 FEEDBACK

docx

School

Athabasca University, Athabasca *

*We aren’t endorsed by this school

Course

1201

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

7

Uploaded by CoachBravery11850

Report
STAT1201_Sherlow_Assignment1_4Oct2023 Thompson Rivers University STAT 1201 Assignment #2 Unit 2 Ashley Sherlow October 4, 2023 Student Number: T00745766
STAT1201_Sherlow_Assignment1_4Oct2023 STAT 1201 - Introduction to Probability and Statistics 1. Chapter 6: Exercise 32 a. It is not possible to have a correlation higher than 1, so there is a clear error in the calculation of the correlation coefficient. This could be due to the existence of outliers, or an inattention to the three conditions required to calculate correlation. b. The statement makes a bold claim that the correlation indicates a causation between literacy rate and standard of living. Instead, the student should restrict their statement to only a description of the correlation as an indicator of linear association. 2. Chapter 6: Exercise 52
STAT1201_Sherlow_Assignment1_4Oct2023 a. b. The Vehicle Static Weight vs. Vehicle Weight-in-Motion plot shows minimal scatter around a generally straight form that trends slightly upward (positively), which indicates that the linear trend of Vehicle Static Weight vs. Vehicle Weight-in-Motion is fairly consistent and moderately strong. c. Since the plot is generally linear, it’s fairly clear that the weight-in-motion scale is relatively accurate when compared to the static-motion scale (which is assumed to be correct). There is a slight tendency for the weight-in-motion scale to overshoot the static weight measurement, but this is not a hard-and-fast rule. There is one outlying value that is worth investigating to understand factors that may influence either scale’s accuracy for that particular datapoint (or if there was another extenuating circumstance that affected that particular instance). d. Mechanics: Using a TI-84 Plus Calculator, I used the diagnostics function and input the data to find the correlation coefficient.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT1201_Sherlow_Assignment1_4Oct2023 R = 0.965 A correlation of 0.965 tells us that there is a positive relationship between the Weight-in-Motion and Static Weight. The linear relationship between Weight-in-Motion and Static Weight is strong because the value is close to 1. However, before doing this, I did a few of these calculations separately (now just for practice) before deciding to use a calculator to calculate the correlation coefficient. Work is shown below. Calculate mean for Weight-in-Motion Mean = 32.02 Calculate mean for Static Weight = 31.28 Calculate standard deviation for Weight-in-Motion: s 2 = Σ(xi - x̄ ) 2 / n - 1 = (26 - 32.02) 2 + ... + (40.2 - 32.02) 2 / 10 - 1 = 290.876 / 9 = 32.319 s = √32.319 = 5.68 Calculate standard deviation for Static Weight: s 2 = Σ(xi - x̄ ) 2 / n - 1 = (27.9 - 31.28) 2 + ... + (35.5 - 31.28) 2 / 10 - 1 = 127.036 / 9 = 14.115 s = √14.115 = 3.76 (1000s of lbs) e. A change in units would not affect the correlation. 3. Chapter 7: Exercise 6 a. Slope = b 1 = r (s y /s x ) = 0.978 (372.265 / 8.727) = 41.718 b. In this context, the slope of 41.72 tells us that for each unit change in x, the y-value will change by 41.72. More specifically, the slope says that we expect that the price of a disk drive equals about $41.72 per 1 TB in capacity, on average. c. b 0 = ȳ-b 1 = 254.34-(41.718 * 7.67) = -65.64 d. The y-intercept is typically a starting/base value, but in the context of price per TB, this value of -65.64 is not meaningful. e. ^Price = -65.64 +41.718 Capacity f. ^ Price = -65.64 +41.718(20) = $768.72
STAT1201_Sherlow_Assignment1_4Oct2023 g. The 20 TB drive on Amazon is priced higher than the predicted price for a drive of that capacity. The difference between the listing price and the predicted price is $254.52. Therefore, the drive on Amazon is not a good buy based on this model. h. The model underestimates the price. i. The coefficient of r = 0.98 between Price and Capacity is high, indicating a strong positive association. With correlation squared resulting in 95%, it can be said that according to this model, 95% of the variability in Price is accounted for by variation in the capacity. However, a high correlation does not verify that the model is accurate – calculating residuals is how a model can be deemed accurate or not. 4. Chapter 7: Exercise 38 A. There is a clear shape in the residuals, so this model has room for improvement. B. this residuals plot demonstrates a decreasing trend, suggesting an error variance which decreases with the independent variable. Because the residuals plot doesn’t show a consistent variance, it is not likely to be true and the regression is not a good one. C. This residuals plot shows a fairly horizontal pattern suggesting that the variance of the residuals is constant, but it doesn’t tend to gather towards the middle of the plot. 5. Chapter 7: Exercise 68 a. Birthrates tend to decline steadily over the years represented in this graph, 1970-2010. b. Y-hat = 242.6 + (-0.1142)x
STAT1201_Sherlow_Assignment1_4Oct2023 c. By using the residual plot function on a calculator, I can see that the residuals show no apparent pattern, though a case could be made that there is an outlier that could be examined further. d. The slope says that we expect live births per 1000 population in the United States to decrease by about 0.11 per year. e. The estimated birthrate for year 1978 would be 16.7 live births per 1000 population. f. The model’s prediction was not accurate, leaving a residual ‘error’ of -1.7. g. The predicted birthrate for 2020 based on my model is 11.9 per 1000 population. My faith in this prediction is moderate. While I don’t think that this number is guaranteed to be accurate, it does offer a prediction that is in line with the general trend of birthrates in the United States. Further R 2 shows that 80% of the variability in birthrates is accounted for by the model’s variation in the year. h. The predicted birthrate for 2050 based on my model is 8.5 per 1000 population. My faith in this prediction is lower. I don’t think that this number is guaranteed to be accurate, though it does offer a prediction that is in line with the general trend of birthrates in the United States. A model for this set of data may not take other factors into account and I think that with a prediction for a year quite far from the last set of data would benefit from reevaluation as more data is collected. That said, the R 2 value does instill fairly strong confidence in the model in general. 6. Chapter 8: Exercise 32
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT1201_Sherlow_Assignment1_4Oct2023 For scatterplot a: A. The point is unusual in terms of its x-value. Y-values of that degree do exist. In this case, this point would have high leverage but not a large residual. B. Because that point wouldn’t have a large residual, I would not deem it as an influential point – both a large residual and high leverage would need to be present for that to be the case. C. This high-leverage point would make the correlation (coefficient) weaker. Being that correlation represents the strength of the association, a high-leverage point would ‘dilute’ the strength of the association – if we were to subtract the mean from that variable, there would be a drastic difference since that point would be furthest from the mean when compared to the other, more ‘normal’ variables. It would also be furthest in terms of standard deviations. This point also violates conditions of correlation and would cause a weak correlation. D. The slope of the regression line would decrease. This is because the x-value is far from the mean, therefore tipping the line towards it and change the ‘degree’ of the slope. For scatterplot b: A. The point is high leverage likely with a small residual. B. Yes, the point is influential. C. The high-leverage point would make the correlation weaker, getting closer to zero. In fact, it would change the direction of the correlation entirely. D. The slope without the outlying data point would slope downward but, with the addition of the point, the slope would experience a dramatic change – sloping upward. For scatterplot c: A. The point is an outlier – high residual – but is not high leverage since the x-value is close to the mean of x. B. Yes this is an influential point. C. Without the point, correlation would be extremely close to 1, so it would become stronger if that point were removed. D. The point, being that it’s so close to the mean of x, would not affect the slope of the regression line much at all. For scatterplot d: A. The point is of high leverage since it is an x-value that is far from the mean of x, but has a small residual. B. It is not influential to the slope. C. Without the point, correlation would still be close to -1, though it would decrease slightly. D. The slope wouldn’t change much because the outlier is in line with the other points.