Entity Academy Lesson 8 Linear Regression (AutoRecovered)

docx

School

University of South Florida *

*We aren’t endorsed by this school

Course

102

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

Uploaded by ssaintclair

- Sindy Saintclair Thursday, December 24, 2021 Lesson 8 – Linear Regression Learning Objectives and Questions Notes and Answers Introductio n Linear regression is a method for investigating the relationship between two variables. In linear regression, the relationship between the variables is represented as a line, and you compute parameters of the line (how steep it is and where it starts) as well as determine how accurately this line represents the relationship. I will begin this lesson with scatter plots, which are used extensively to understand the relationship between continuous variables. I will then learn correlation. What is regression? - Allows you to predict y based on values of x - both IV and DV can be continuous - the basic statistic behind modeling - + simple = only one IV - + linear = data forms a straight line Code for Regression modelName <- lm(DV ~ IV, data) summary(modelName) Interpreting Regression  the omnibus p value is at the bottom of the text, which determines if the overall model is significant (only if < 0.05)  next is the adjusted R squared, which is the variability in the DV accounted for in the IV; convert to a %  p value related to each specific variable; significant if < 0.05  One unit increase in this variable influences the DV by the estimate amount Making a scatter plot with best fit line ggplot(data, aes(x=column, y=column)) + geom_point( )

+ geom_smooth(method=lm, se=FALSE) Create scatterplot s In a scatter plot, data are displayed as a collection of points. Each data point is determined by the values of two variables, one on the horizontal (left to right) axis and one on the vertical (up and down) axis. Scatter plots makes the relationship between variables easy to see. In R, creating a scatter plot is relatively simple. In this lesson, you will use ggplot2 to create scatter plots as well. If you have closed R since last using ggplot2 , remember that you will need to load it by using the following command at the beginning of every RStudio session: library (ggplot2) You could also click the check box next to ggplot2 in the Packages tab. You will start by creating a scatter plot using the faithful data set; you will plot eruption times versus waiting times, with eruptions on the horizontal axis, or x= axis, and waiting on the vertical, or y= axis: d <- ggplot( faithful , aes( x = eruptions, y = waiting)) d + geom_point() These commands produce the following scatter plot: You can add a title and improve the axis labels using the ggtitle() , xlab() , and ylab() functions:

d + geom_point () + ggtitle ( "Old Faithful Eruption vs Waiting Times" ) + xlab ( "Eruption Time (min)" ) + ylab ( "Wating Time (min)" ) You see from this plot that there are two clusters of data: - a short eruption time followed by a short wait until the next eruption - a long eruption time followed by a long wait After having created a scatter plot, you may want to see how well the data fir a straight line. You can do this easily in ggplot2 with the additional function of adding in + geom_smooth() and specifying as an argument to geom_smooth() that you want the method, or shape of line, to lm . lm stands for linear model. A linear model will create a straight line on the graph. Here’s how all that code fits together: d + geom_point() + geom_smooth( method = lm ) This gives the following scatter plot with a best fit line. The phrase “best fit line” means that the line you see below wasn’t just plunked on the graph any old place; it was strategically fit to all the data points to be as close to as many of the points as possible. And here is the addition of your best fit line:

Your preview ends here