Slides_2021-12-07_annotated

pdf

School

University of Toronto *

*We aren’t endorsed by this school

Course

220

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

52

Uploaded by ElderGuanacoMaster1183

Report
STA220 L0201 THE PRACTICE OF STATISTICS Professor Gwendolyn Eadie Dec 7, 2021 We will begin at U of T time: 11:10AM ET
OFFICE HOURS Prof. Gwendolyn Eadie gwen.eadie@utoronto.ca Regular office hours this week and next week: Tuesday 1:10pm 2:00pm Wednesday: 11:10am 12:00pm
REMINDERS & UPDATES Reminders: Today is the last class Final Assessment Thursday Dec. 16, 19:00 22:00 ET Logistics on Quercus! Must join the zoom call Multiple choice and short answers (exact number of questions not decided yet) Results from anonymous poll regarding grading scheme…
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
RESULTS OF CLASS VOTE ON PROPOSED NEW GRADING SCHEME
TODAY'S CLASS Gwen’s notes on regression (from last year, very similar to Josh’s guest lecture last week) Review of scatterplots and correlation Simple Linear Regression Cautions about Linear Regression Coefficient of Determination Inference of Slope Checking Conditions Log Transformations for linear regression Examples
REVIEW OF SCATTERPLOTS & CORRELATION
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Review of Scatterplots & Correlation A scatterplot allows us to visualize the relationships between two quantitative variables The correlation measures the strength of a linear relationship between the quantitative variables OpenIntro Statistics, 4th Ed., Diez, Çetinkaya-Rundel, and Barr
Review of Scatterplots & Correlation The correlation quantifies the strength of a linear trend or relationship, and can be positive or negative There can be a strong relationship between variables even if the linear correlation is not strong OpenIntro Statistics, 4th Ed., Diez, Çetinkaya-Rundel, and Barr
Correlation only captures the strength of a linear relationship (image: Wikipedia, by DenisBoigelot, original uploader was Imagecreator
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
SIMPLE LINEAR REGRESSION
Simple Linear Regression Useful when the relationship between two quantities can be summarized by a straight line Correlation describes the strength of the correlation between x and y, whereas the regression line is used to describe the relationship between x and y The regression line is a model which follows the equation:
Simple Linear Regression A regression line can tell you something about the effect of the predictor or independent variable on the response variable Slope of a regression line is related to the correlation of the points: Intercept of a regression line is:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Simple Linear Regression The linear regression line is the line that best predicts the response variable. How do we find that line? We find the line that minimizes the squared distance vertically from the line! Deviations from the line are called the residuals :
Residuals It's always a good idea to plot the residuals residuals versus the predictor (or the residuals versus the predicted value of the response) Independent or Predictor Variable Response or Dependent Variable Independent or Predictor Variable Residual
OpenIntro Statistics, 4th Ed., Diez, Çetinkaya-Rundel, and Barr
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CAUTIONS ABOUT LINEAR REGRESSION
A regression line might look like it describes the data well … … but it can be dangerous to extrapolate beyond the data points you have.
Influential and Leverage Points A point is influential if the regression line changes substantially when the point is removed. To check if a point is influntial, remove the point and find the new regression line see how the slope changes
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Influential and Leverage Points A point is influential if the regression line changes substantially when the point is removed. To check if a point is influntial, remove the point and find the new regression line see how the slope changes A point is a leverage point if it is an outlier in the horizontal direction
OpenIntro Statistics, 4th Ed., Diez, Çetinkaya-Rundel, and Barr
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
SUMMARY OF CAUTIONS Start with a scatter plot to see if a line is a reasonable choice After doing linear regression, look at the residuals Don't extrapolate beyond the range of the data! Watch for influential points
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
COEFFICIENT OF DETERMINATION
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Coefficient of Determination: R 2 The proportion of variation in the response variable that is explained by the regression line R 2 Explains the percentage of variation explained by the line
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Review of Sum of Squares Regression sum of squares Residual sum of squares Total sum of squares
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
IMPORTANT POINTS ABOUT R 2 No general rule for a good R 2 value Large R 2 doesn't necessarily mean a good fit R 2 explains the percentage of variation explained by the line
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
INFERENCE OF SLOPE
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Inference of the slope How can we know if the slope is significantly different from 0? Recall that the linear model is We want to make inferences about beta1 If errors are normally distributed, then observed values of the response y are also normally distributed given x. Therefore, our estimate of beta1 is normally distributed:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Inference of the slope: hypothesis test Null hypothesis: Alternative hypothesis: The test statistic is: Even if p-value is small, must watch out for small sample size and influential points! These things should make us less certain and more careful about interpreting results
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Confidence Interval of the slope The confidence interval for the estimate of the slope is Note: we are using a t-distribution with n-2 degrees of freedom
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CHECK-IN QUESTIONS a) there is a strong causal relationship between x and y b) 80% of the variation in y is explained by the regression line c) the regression line is a good fit to the data d) there is a good chance you can extrapolate using this regression If we find a coefficient of determination that is 0.8 for data x and y that were fit with a linear regression, then which of the following is most definitely true?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CHECKING CONDITIONS
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Checking Conditions for Linear Regression Look at the data in a scatterplot No curvature No influential points No groups or patterns Observations must be independent Note than observations taken close in time are usually not independent Variation in the errors should be constant Errors should be normally distributed
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
TRANSFORMATION IN LINEAR REGRESSION
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
We may transform the data to make the relationship linear on a different scale. In this course, we focus on transforming to log base 10. In this course, we either transform the explanatory variable, or transform the response variable Many relationships are not linear
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
log 10 ? = ? If we increase by 1 unit, what does this do to 𝑥 ? log 10 (𝑥) + 1 log 10 (??) Review of log 10 rules
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OpenIntro Statistics, 4th Ed., Diez, Çetinkaya-Rundel, and Barr
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
EXAMPLES
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Short Answer Examples from last year’s final In astronomy, the Faber-Jackson relation describes the relationship between two quantities: the luminosity (symbol 𝐿 of a galaxy) and the velocity dispersion (symbol 𝐷 ) of its central stars. The luminosity of a galaxy is related to its intrinsic brightness, and the velocity dispersion is related to the speed at which stars travel within the center of the galaxy. These quantities are both estimated using data collected from telescopes. Plotted are the log base 10 of the velocity dispersion versus the log base 10 of the luminosity, for a sample of galaxies measured by the HyperLeda Survey. A linear regression of 𝒍𝒐𝒈 ?? 𝑫 versus 𝒍𝒐𝒈 ?? 𝑳 was fit (this is the Faber- Jackson relation), and this line is also plotted. The following may be useful: ? = 10 𝑙𝑜𝑔 10 𝑎 𝑙𝑜𝑔 10 ?? = 𝑙𝑜𝑔 10 ? + 𝑙𝑜𝑔 10 ? 𝑙𝑜𝑔 10 ? ? = 𝑙𝑜𝑔 10 ? − 𝑙𝑜𝑔 10 ?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The estimate of the intercept 𝛽 0 for the regression line is 0.741 with standard error 0.222, and the estimate for slope term 𝛽 1 for the regression line is 0.150 with standard error 0.021. Plotted below are the residuals from the regression line. a) (1 mark) Write the equation for the regression line in terms of 𝑙𝑜𝑔 10 𝐷 and 𝑙𝑜𝑔 10 𝐿 . b) (1 mark) Looking at the residuals plot, what does the dashed line represent? c) (2 marks) What can you say about the correlation between these two variables? What information do you need to calculate the correlation?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
d) (1 mark) Imagine that the degrees of freedom in this study was 382. What is the total number of data points? e) (3 marks ) Describe how you would estimate a 95% confidence interval for the slope, and state which quantities are needed.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
f) (2 marks) Looking at the residuals plot, do you think the regression is appropriate? Explain g) (2 marks) Using the regression line, what is the change in the value of 𝐷 if the 𝑙𝑜𝑔 10 𝐿 was increased by 1? The regression line is given by 𝑙𝑜𝑔 10 𝐷 = 0.741 + 0.150 𝑙𝑜𝑔 10 𝐿 . If 𝑙𝑜𝑔 10 𝐿 was increased by 1, then the change in 𝑙𝑜𝑔 10 𝐷 would be: change in 𝑙𝑜𝑔 10 𝐷 = 0.150*(1) → D will be multiplied by 10 0.150 → D will be multiplied by 1.4125
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
REVIEW EXAMPLES
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Short Answer Example 1 Diseases I and II are prevalent among people in a certain population. It is assumed that 15% of the population will contract disease I at some point in their lifetime, and 11% will contract disease II at some point in their lifetime, while 2% of the population will contract both diseases. a) (3 points) Find the chance that a randomly chosen person from this population will contract at least one of the two diseases. b) (2 points) If someone has contracted at least one of the two diseases, what are the chances they will contract both? c) (3 points) Is contracting one disease dependent on the other? Justify numerically and explain your answer.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Short Answer Example 1 Diseases I and II are prevalent among people in a certain population. It is assumed that 15% of the population will contract disease I at some point in their lifetime, and 11% will contract disease II at some point in their lifetime, while 2% of the population will contract both diseases. a) (3 points) Find the chance that a randomly chosen person from this population will contract at least one of the two diseases. b) (2 points) If someone has contracted at least one of the two diseases, what are the chances they will contract both? c) (3 points) Is contracting one disease dependent on the other? Justify numerically and explain your answer.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Short Answer Example 1 Diseases I and II are prevalent among people in a certain population. It is assumed that 15% of the population will contract disease I at some point in their lifetime, and 11% will contract disease II at some point in their lifetime, while 2% of the population will contract both diseases. a) (3 points) Find the chance that a randomly chosen person from this population will contract at least one of the two diseases. b) (2 points) If someone has contracted at least one of the two diseases, what are the chances they will contract both? c) (3 points) Is contracting one disease dependent on the other? Justify numerically and explain your answer.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Short Answer Example 2 Imagine the daily high temperature in Toronto was recorded every day for 12 days in July, and the average of these temperatures is 27.1°C with a standard deviation of 3.5°C. a) (4 points) Calculate a 98% confidence interval for the mean assuming all necessary conditions hold. Interpret the confidence interval you computed. Round your final answer to two decimal places. b) (2 points) Explain the meaning of confidence intervals. Are the end points of confidence intervals random? Explain your answer. c) (2 points) In the 12 days, the range of the maximum daily temperature was 25°C to 38°C. Recall that the average was 27.1°C. What are the possible implications for the interval you calculated in part (a)?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Short Answer Example 2 Imagine the daily high temperature in Toronto was recorded every day for 12 days in July, and the average of these temperatures is 27.1°C with a standard deviation of 3.5°C. a) (4 points) Calculate a 98% confidence interval for the mean assuming all necessary conditions hold. Interpret the confidence interval you computed. Round your final answer to two decimal places. b) (2 points) Explain the meaning of confidence intervals. Are the end points of confidence intervals random? Explain your answer. c) (2 points) In the 12 days, the range of the maximum daily temperature was 25°C to 38°C. Recall that the average was 27.1°C. What are the possible implications for the interval you calculated in part (a)?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Short Answer Example 2 Imagine the daily high temperature in Toronto was recorded every day for 12 days in July, and the average of these temperatures is 27.1°C with a standard deviation of 3.5°C. a) (4 points) Calculate a 98% confidence interval for the mean assuming all necessary conditions hold. Interpret the confidence interval you computed. Round your final answer to two decimal places. b) (2 points) Explain the meaning of confidence intervals. Are the end points of confidence intervals random? Explain your answer. c) (2 points) In the 12 days, the range of the maximum daily temperature was 25°C to 38°C. Recall that the average was 27.1°C. What are the possible implications for the interval you calculated in part (a)?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
COURSE EVALUATIONS
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
We want to hear from you! Please fill out the online course evaluation Your feedback matters a great a deal and is taken very seriously
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STA220 L0201 THANK YOU! This semester has been challenging for all of us, and I truly appreciated your active participation in the class throughout the semester, your great questions, and your commitment to the course! I wish you the best of luck on exams and final projects/assessments in all your courses. Prof. Eadie (& guest star Shadow!)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help