CFrench - 7-4 MAT 243
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
243
Subject
Mathematics
Date
Apr 3, 2024
Type
docx
Pages
11
Uploaded by AmbassadorSquirrelPerson1009
MAT 243 Project Three Summary Report
Christian French
christian.french@snhu.edu
Southern New Hampshire University
1. Introduction
The data set that I am exploring is a large data set from the years 1995 and 2015. The results that I find will be used to both predict the total number of wins for my team and how to improve their performance. To run this project, I will be using multiple regression models.
2. Data Preparation
The variable avg_points_differential is the average point differential between the team and their opponents in a regular season. In other words, it is the comparison of the difference between the points scored by our team versus opponent teams in a regular season. The variable avg_elo_n is the average relative skill of each team within a regular season.
3. Simple Linear Regression: Scatterplot and Correlation for the Total Number of Wins and Average Relative Skill
You constructed a scatterplot of the total number of wins and the average relative skill to study their correlation. You also calculated the Pearson correlation coefficient along with its P-value. See Step 2 in the Python script to address the following items:
Data visualization techniques are used to study the relationship between two variables by giving a visualization of the data that is given and can more clearly show if a trend is present.
The correlation coefficient is used to show the strength and direction of the association between two variables. If both the X and Y values increase, it shows that the correlation is positive. On the other hand, if only one of the X or Y values increases while the other decreases, it shows a negative correlation. The strength comes from how close or far apart the points are on the scatterplot. If they are very close together and don’t veer far off from the
linear line, it is strong. If they are close with some variation, then it is moderate. If they are moderately spread out, there is a weak correlation, and if they are very spread out along the graph, there is no correlation present.
The Pearson correlation coefficient is 0.9072, and in combination with the scatterplot, it shows us there is a strong positive correlation between both the total number of wins and the average relative skill. The correlation coefficient is considered significant as the P-value is 0.0, lower than the significant level of 1%, or 0.01.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4. Simple Linear Regression: Predicting the Total Number of Wins using Average Relative Skill
You created a simple linear regression model for the total number of wins in a regular season using the average relative skill as the predictor variable.
See Step 3 in the Python script to address the following items:
A simple linear regression model predicts the response variable (total number of wins) using the predictor variable (average relative skill) with the following equation: E(Y) = B0 + B1X1
Where E(Y) is the response variable, B0 and B1 are the progression parameters, and X1 is the predictor variable. The equation for my model is as follows:
Wins = -128.2475 + 0.1121x
The null hypothesis is H0: B1 = 0; that the average relative skill (B1) is equal to zero. The alternative hypothesis is Ha: Bi ≠ 0; that the average relative skill (B1) is not equal to zero. The level of significance is 1%, or 0.01. 173.755
Table 1: Hypothesis Test for the Overall F-Test
Statistic
Value
Test Statistic
2865
P-value
0.0000
Overall, since the p-value is less than the level of significance, it can be concluded that the null hypothesis is in favor of the alternative hypothesis. This means that there is evidence to support a significant linear relationship between points scored and total games won using this model. To put this model to use, a team with a relative skill of 1550 will have a total number of 45.51 games won, by plugging in 1550 in the regression equation mentioned before, as follows:
Wins = -128.2475 + 0.1121x
Wins = -128.2475 + 0.1121(1550)
Wins = -128.2475 + 173.755
Wins = 45.5075 Doing the same for a relative skill of 1450, the team would have a total number of 34.31 games.
5. Multiple Regression: Scatterplot and Correlation for the Total Number of Wins and Average Points Scored
The Pearson correlation coefficient and the scatterplot tell me that there is a moderate positive correlation between the total number of wins and the average points scored. It is also statistically significant as the p-value of 0.0 is less than the level of significance of 0.01.
6. Multiple Regression: Predicting the Total Number of Wins using Average Points Scored and Average Relative Skill
The multiple regression model is used in the same way the simple linear regression model
is used, with the exception that there is more than one predictor variable, instead of just one like the simple regression model. The equation for a multiple regression model is as follows:
Y = B0 + B1X1 + B2X2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Putting this equation to our data, the equation would be as follows:
Y = -152.5736 + 0.3497X1 +0.1055X2; where X1 is the average points scored and X2 is the average relative sill level.
The null hypothesis is H0: B1= B2 = 0, meaning both variables are equal to zero. Alternatively, the alternative hypothesis is Ha: at least one Bi ≠ 0, for i = 1, 2, meaning at least one of the parameters will not equal zero and that one of the slope parameters is significant in the
model. The level of significance is 1%, or 0.01.
Table 2: Hypothesis Test for the Overall F-Test Statistic
Value
Test Statistic
1580.00
P-value
0.0000
Since the p-value is less than the level of significance, the null hypothesis should be rejected in favor of the alternative hypothesis. This means that at least one of the predictor variables, average points or average relative skill, is significant in the model. When it comes to the individual t-test, the p-value for both predictor variables was 0.0000, meaning they were less than the level of significance and both are statistically significant to the model of predicting the total number of wins by a team during a season. The coefficient of determination is 0.837, or 83.7%. The closer the coefficient of determination is to 1, the more the change of the response
variable can be explained by the predictor variables. With a coefficient of determination at 83.7%, this means that the predictor variables have a relatively high effect on the response variable. To put this model to use, a team with a relative skill level of 1350 and an average of 100 points would have won a total of 16.1 games, by plugging 1350 and 100 into the regression model as follows:
Y = -152.5736 + 0.3497X1 +0.1055X2
Y = -152.5736 + 0.3497(75) +0.1055(1350)
Y = 16.0789
For a team that has a relative skill of 1600 and an average of 100 points per game, their total number of games won would be 51.2. this shows the relationship with the model and its variables, as the team with the higher points and relative skill had a higher number of games won. 7. Multiple Regression: Predicting the Total Number of Wins using Average Points Scored,
Average Relative Skill, Average Points Differential, and Average Relative Skill Differential
The multiple regression model is used in the same way the simple linear regression model
is used, with the exception that there is more than one predictor variable, instead of just one like the simple regression model. The equation for a multiple regression model is as follows:
Y = B0 + B1X1 + B2X2 + B3X3 + B4X4
Putting this equation to our data, the equation would be as follows:
Y = 34.5753 + 0.2597x1 – 0.0134X2 + 1.6206X3 + 0.0525X4; where X1 is the average points scored, X2 is the average relative sill level, X3 is the average points differential from their
opponents, and X4 is the average relative skill differential from their opponents. The null hypothesis is H0: B1= B2 = B3 = B4 = 0, meaning both variables are equal to zero. Alternatively, the alternative hypothesis is Ha: at least one Bi ≠ 0, for i = 1, 2, 3, 4, meaning at least one of the parameters will not equal zero and that one of the slope parameters is significant in the model. The level of significance is 1%, or 0.01.
a.
Report the test statistic and the P-value in a formatted table as shown below:
Table 3: Hypothesis Test for Overall F-Test
Statistic
Value
Test Statistic
1102
P-value
0.182
Since the p-value of 0.182 is less than the level of significance of 0.01, the null hypothesis should be accepted. This means we can not conclude a statistically significant relationship exists.
If this model was significant, the average points and average points differential would be the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
only two variables that would be significant, as their p-values are 0.0000, along with the average relative skill differential with a p-value of 0.004. The average skill level would not be significant as its p-value is 0.135. The coefficient of determination is 0.877, or 87.7%. If this model was accepted, the predicted number of wins for a team with an average of 75 points, a relative skill of
1350, an average point differential of -5, and an average relative skill differential of -30 would be
29.4 games, plugging the values into the equation as follows:
Y = 34.5753+ 0.2597x1 – 0.0134X2 + 1.6206X3 + 0.0525X4
Y = 34.5753+ 0.2597(75)– 0.0134(1350) + 1.6206(-5) + 0.0525(30)
Y = 29.4348
For a team averaging 100 points, having a relative skill of 1600, an average point differential of +5, and a relative skill differential of +95 would be an average of 52.2 games won.
8. Conclusion
Based on what was found within this analysis, the average points scored and relative skill
level are statistically significant, and the simple regression model equation and the first multiple regression model equation are also significant. With the multiple regression model with four predictor variables, we found that the p-value was greater than the level of significance meaning that a statistically significant difference exists within that model. Furthermore, we found that
higher relative skill and points scored resulted in more games won and a higher relative skill also
contributes to the number of games won has a better correlation. The practical importance of the analyses that were performed is that if the team were to increase both their points scored and relative skill level, they could expect to win more games.