To monitor and improve its productivity, a company made an investigation and found out that the factor that affects the productivity the most is the absenteeism. The company data analytics department have collected data about the two variables (Productivity and Absenteeism) for the 12 past years as shown in the table below. Now the purpose of the company is to determine, through regression analysis, whether the productivity is statistically affected by the absenteeism level or not. Year Absenteeism Productivity (in number of absent worker) (in Million AED) 1 204 342 2 352 336 3 154 406 4 206 410 5 422 278 6 530 214 7 750 138 8 482 268 9 374 262 10 120 356 11 188 396 12 634 152 Questions: Construct a scatter diagram for the data about productivity and absenteeism then interpret the possible relationship that can be found. Construct a simple regression model to predict the annual productivity by the variable Absenteeism. What is the interpretation that can be made based on the regression results? Compute r2 and r for the regression model constructed earlier. What interpretation can be made?
Correlation
Correlation defines a relationship between two independent variables. It tells the degree to which variables move in relation to each other. When two sets of data are related to each other, there is a correlation between them.
Linear Correlation
A correlation is used to determine the relationships between numerical and categorical variables. In other words, it is an indicator of how things are connected to one another. The correlation analysis is the study of how variables are related.
Regression Analysis
Regression analysis is a statistical method in which it estimates the relationship between a dependent variable and one or more independent variable. In simple terms dependent variable is called as outcome variable and independent variable is called as predictors. Regression analysis is one of the methods to find the trends in data. The independent variable used in Regression analysis is named Predictor variable. It offers data of an associated dependent variable regarding a particular outcome.
To monitor and improve its productivity, a company made an investigation and found out that the factor that affects the productivity the most is the absenteeism. The company data analytics department have collected data about the two variables (Productivity and Absenteeism) for the 12 past years as shown in the table below. Now the purpose of the company is to determine, through
Year |
Absenteeism |
Productivity |
(in number of absent worker) |
(in Million AED) |
|
1 |
204 |
342 |
2 |
352 |
336 |
3 |
154 |
406 |
4 |
206 |
410 |
5 |
422 |
278 |
6 |
530 |
214 |
7 |
750 |
138 |
8 |
482 |
268 |
9 |
374 |
262 |
10 |
120 |
356 |
11 |
188 |
396 |
12 |
634 |
152 |
Questions:
- Construct a
scatter diagram for the data about productivity and absenteeism then interpret the possible relationship that can be found. - Construct a simple regression model to predict the annual productivity by the variable Absenteeism. What is the interpretation that can be made based on the regression results?
- Compute r2 and r for the regression model constructed earlier. What interpretation can be made?
- Calculate the prediction error for the annual productivity for the years 8, 9, and 10, based on the corresponding regression model constructed previously.
- For the regression model of the annual productivity by the variable Absenteeism, draw the errors graph and check if the errors respect the regression assumptions or not.

Step by step
Solved in 6 steps with 12 images









