For this discussion post, we are going to create a linear regression model from a set of data. Suppose we get a job working for a company that provides their employees with company stock every year they work with the company. The amount of stock earned may be based on a few other variables, but all we are looking at right now is time served with the company. Here is the data we collected: Years With Company Company Stock (Thousands of $) 11 51 14 64 9 41 8 40 14 61 12 54 3 21 7 31 8 39 10 51 7 27 Does the data look approximately linear? Or does there appear to be some sort of non-linear trend in our data? What is the regression model created from our data set? What does the slope represent in our regression model? If an employee has been with the company for 13 years, how much company stock would we expect them to have? In the problem, it was mentioned there are other variables that could impact the amount of company stock awarded to an employee. What are some other variables we may want to collect next time to add to our model?
Sometimes we have data that can be paired together, such as height and weight of an individual. When two variables are measured and paired together, we have what we call paired data. When we have paired data, we can plot the results using a
For two variable relationships, we can often report the data as ordered pairs in (x, y) format. Our x-variable in this case is known as the explanatory variable, and the y-variable is known as the response variable. We use values of x to predict y.
If there appears to be a linear relationship between the two variables, we can create a least squares regression line. This is a line that “best” fits the dataset we have. The result will be a linear equation and we can use this equation to predict future values.
For simple linear regression (ordered pairs with only two variables), the general form of a regression line follows:�̂=β0+β1�, where β0 represents the intercept and β1represents the slope. We can now predict values of y for a given x-value.
For example, supposed we are looking to see if there is a relationship between a student’s homework performance (x-variable) and their score on the final exam (y-variable). We can collect information for all students and see what their homework grade is as well as their final exam grade. We create a scatterplot the data and see it looks approximately linear. When we run a regression model, we see that �̂=50+0.45�. Let’s predict the final exam score for a student that scored an 80 on their homework assignments. We just take our created regression model, and substitute the value of 80 in for x, and calculate y: �̂=50+0.45(80)=86. Based on our model, we would expect a student who scored an 80 on their homework assignments to score an 86 on their final exam.
Instructions
For this discussion post, we are going to create a linear regression model from a set of data. Suppose we get a job working for a company that provides their employees with company stock every year they work with the company. The amount of stock earned may be based on a few other variables, but all we are looking at right now is time served with the company. Here is the data we collected:
Years With Company |
Company Stock (Thousands of $) |
11 |
51 |
14 |
64 |
9 |
41 |
8 |
40 |
14 |
61 |
12 |
54 |
3 |
21 |
7 |
31 |
8 |
39 |
10 |
51 |
7 |
27 |
- Does the data look approximately linear? Or does there appear to be some sort of non-linear trend in our data?
- What is the regression model created from our data set?
- What does the slope represent in our regression model?
- If an employee has been with the company for 13 years, how much company stock would we expect them to have?
- In the problem, it was mentioned there are other variables that could impact the amount of company stock awarded to an employee. What are some other variables we may want to collect next time to add to our model?
Trending now
This is a popular solution!
Step by step
Solved in 6 steps with 15 images