MAT 243 Project Three Summary Report Template

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

243

Subject

Mathematics

Date

Jan 9, 2024

Type

docx

Pages

11

Uploaded by BaronKuduPerson693

Report
MAT 243 Project Three Summary Report Kaelyn Murphy Kaelyn.Murphy@snhu.edu Southern New Hampshire University
1. Introduction As the data analyst for the Dallas Mavericks, I have been tasked with analyzing performance patterns using a large set of historical data. The data set being used has been aggregated to study the total number of wins in a regular season based on the following variables: Total number of wins in a regular season Average points scored in a regular season Average relative skill of each team in a regular season Average point differential between the team and their opponents in a regular season Average relative skill differential between the team and their opponent in a regular season I will be using Python programming language to perform the statistical analysis needed to create regression models that will help make key decisions improving the key performance of the team. 2. Data Preparation The variable avg_pts_differential represents the average point differential between the team and their opponents in a regular season. This can be thought of as the overall difference in average points between two teams in a regular season. For example, if the Mavericks average points are calculated as 90 and their opponent the Bulls average points are calculated as 101, their average points differential would be -11. The number would be negative because the calculation is done based on our team minus the opposing team. The variable avg_elo_n represents the average relative skill of each team in a regular season. This can be thought of as the average of a team’s relative skill in a regular season. The variable is calculated based on the final score of the game, the game location, and the outcome of the game relative to the probability of that outcome. The higher the number, the higher the relative skill of the team. For example, in step 1 the avg_elo_n for the Bucks is 1386.60 whereas
the Bulls is 1569.89 between the years 1995 and 2015 which tells us that the Bulls had a higher average relative skill. 3. Simple Linear Regression: Scatterplot and Correlation for the Total Number of Wins and Average Relative Skill In general, data visualization techniques are used to illustrate if a relationship exists between two variables. Data visualization will also help us to identify if there are trends or correlations in the data. By using visuals like scatterplots, we can see trends, outliers, and patterns in data in a way that is easy to understand or digest. A scatterplot specifically shows the relationship between two quantitative variables measured. If there is a correlation, we can then determine if it is a positive or negative one based on what we see in the data. The correlation coefficient indicates the direction and strength of the linear relationship between two variables. When the correlation coefficient is near +1 or -1, the linear relationship is considered strong. Looking closer at correlation coefficients will show that value larger than 0.7 is considered strong, any value between 0.5 and 0.7 is considered moderate. When the correlation coefficient is near 0, the linear relationship is considered weak. Looking closer at correlation coefficients will show that a value between 0.3 and 0.5 is weak and any value less than 0.3 is considered very weak or nonexistent. If both variables are shown to increase, we can
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
expect a positive correlation. If one variable increases while the other decreases, we can expect a negative correlation. The scatterplot and the Pearson correlation coefficient both reflect a positive correlation between the total number of wins and the average relative skill. As the average relative skill of the team increases, the total number of wins increases as well. The Pearson correlation coefficient is 0.9 which indicates a strong positive correlation between the two variables. The correlation coefficient is statically significant based on the p-value of 0.0 being less than the level of significance 0.01. 4. Simple Linear Regression: Predicting the Total Number of Wins using Average Relative Skill In general, a simple linear regression model is used to predict the response variable using the predictor variable by determining if a significant impact of the variable is observed. The response variable (y) is a random variable while the predictor variable (x) is assumed to be non- random or fixed. The response variable is the variable that is trying to be explained or predicted where the predictor variable is used to predict a future outcome. For the model, we would use the equation E(Y) = β 0 + β 1 X where β 0 is the total number of wins and β 1 is the average relative skill. E is the overall expected value of Y. The null hypothesis (H 0 ) is H 0 : β 1 = 0 which indicates there is not a correlation between the total number of wins and the average relative skill in a season. The alternative hypothesis (H a ) is H a : β 1 ≠ 0 which indicates there is a correlation between the total number of wins and the average relative skill in a season. The level of significance is 0.01.
Table 1: Hypothesis Test for the Overall F-Test Statistic Value Test Statistic 2865.00 P-value 0.0000 The P-value obtained at 0.0000 is lower than the 0.01 level of significance so we would reject the null hypothesis. We then would conclude that there is significant evidence to suggest that the average relative skill is a significant predictor of total number of wins. Based on the results of the overall F-test, the average relative skill can predict the total number of wins in the regular season. The predicted total number of wins in regular season or a team that has an average relative skill of 1550 can be calculated as follows: Total number of wins = -128.2475 + 0.1121*avg_elo_n Total number of wins = -128.2475 + 0.1121*1550 Total number of wins = 45.5075 = 46 rounded to the nearest integer The predicted total number of wins in regular season or a team that has an average relative skill of 1450 can be calculated as follows: Total number of wins = -128.2475 + 0.1121*avg_elo_n Total number of wins = -128.2475 + 0.1121*1450 Total number of wins = 34.3005= 34 rounded to the nearest integer These calculations further confirm that the total number of wins correlates to the average relative skill of the team. The higher the average relative skill of the team, the higher the total number of wins and vice versa.
5. Multiple Regression: Scatterplot and Correlation for the Total Number of Wins and Average Points Scored The scatterplot and the Pearson correlation coefficient show that there is a positive correlation between the total number of wins and the average points scored. The Pearson correlation value of 0.4777 tells us the correlation is positive and weak because it falls between 0.3 and 0.5. The correlation coefficient is statistically significant with the P-value of 0.00 being less than the level of significance 0.01. 6. Multiple Regression: Predicting the Total Number of Wins using Average Points Scored and Average Relative Skill In general, a multiple linear regression model is used to estimate the relationship between two or more predictor variables and one response variable. The predictor variables here are the average points scored and the average relative skill. The response variable here is the total number of wins. The equation that would be used for this model is E(Y) = β 0 + β 1 X 1 + β 2 X 2. The null hypothesis is H 0 : β 1 = β 2 = 0 which indicates there is not a correlation between the total number of wins, the average relative skill, and the average points in a season. The alternative hypothesis is H a : β 1 ≠ 0 or β 2 ≠ 0 which indicates that at least one of the predictor variables (average relative skill or average points) are significant predictors of the total number of wins. The level of significance used is 0.01.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Table 2: Hypothesis Test for the Overall F-Test Statistic Value Test Statistic 1580.00 P-value 0.0000 The P-value obtained at 0.0000 is lower than the 0.01 level of significance so we would reject the null hypothesis. We then would conclude that there is significant evidence to suggest that the average relative skill or the average points scored are significant predictors of total number of wins. Based on the results of the overall F-test, the average relative skill or the average points scored can be used to predict the total number of wins in the regular season. The t-test is found by taking the coefficient divided by the standard error. For average relative skill, the t-statistic is found by taking 0.1055/0.002 = 52.75. For average points scored, the t-statistic is found by taking 0.3497/0.048 = 7.29. The P-value for both variables is 0.000. The P-value for both variables is less than the 0.01 level of significance which indicates that a significant relationship exists between both predictor variables individually with the response variable. The coefficient of determination (R 2 ) is 0.837. The coefficient of determination is a measurement of how well a statistical model predicts the outcome. If the coefficient of determination is 0, the model does not predict the outcome. If the coefficient of determination is between 0 and 1, the model partially predicts the outcome. If the coefficient of determination is 1, the model perfectly predicts the outcome. From our coefficient of determination, we can see that roughly 84% of the total wins can be explained by the variance in average relative skill or average points scored. The predicted total number of wins in regular season for a team that is averaging 75 points per game with a relative skill of 1350 can be calculated as follows:
Total number of wins = -152.5736 + 0.1055*avg_elo_n + 0.3497 * avg_pts Total number of wins = -152.5736 + 0.1055*1350+ 0.3497 * 75 Total number of wins = 15.9165 = 16 rounded to the nearest integer The predicted total number of wins in regular season for a team that is averaging 100 points per game with a relative skill of 1600 can be calculated as follows: Total number of wins = -152.5736 + 0.1055*avg_elo_n + 0.3497 * avg_pts Total number of wins = -152.5736 + 0.1055*1600 + 0.3497 * 100 Total number of wins = 51.1964= 51 rounded to the nearest integer These calculations further confirm that the total number of wins correlates to the average relative skill of the team and the average points scored. The higher the average relative skill and the higher average points scored of the team, the higher the total number of wins and vice versa. 7. Multiple Regression: Predicting the Total Number of Wins using Average Points Scored, Average Relative Skill, Average Points Differential, and Average Relative Skill Differential In general, a multiple linear regression model is used to estimate the relationship between two or more predictor variables and one response variable. The predictor variables here are the average points scored, the average relative skill, the average points differential, and the average relative skill differential. The response variable here is the total number of wins. The equation that would be used for this model is E(Y) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 . The null hypothesis is H 0 : β 1 = β 2 = β 3 = β 4 = 0 which indicates there is not a correlation between the total number of wins, the average relative skill, the average points scored, the average points differential or the average relative skill differential. The alternative hypothesis is H a : β 1 ≠ 0 or β 2 ≠ 0 or β 3 ≠ 0 or β 4 ≠ 0 which indicates that at least one of the predictor variables are significant predictors of the total number of wins. The level of significance used is 0.01.
Table 3: Hypothesis Test for Overall F-Test Statistic Value Test Statistic 1102.00 P-value 0.1820 The P-value obtained at 0.1820 is greater than the 0.01 level of significance so we would fail to reject the null hypothesis. The results are not statistically significant. We then would conclude that there is not significant evidence to suggest that the average relative skill, the average points scored, the average points differential or the average relative skill differential are significant predictors of total number of wins. Based on the results of the overall F-test, the average relative skill, the average points scored, the average points differential or the average relative skill differential cannot be used to predict the total number of wins in the regular season. The t-test is found by taking the coefficient divided by the standard error. For average relative skill, the t-statistic is found by taking -0.0134/0.017 = -.7882 with a P-value of 0.442 and at a significance level of 0.01 is not statistically significant. For average points scored, the t- statistic is found by taking 0.2597/0.043 = 6.040 with a P-value of 0.000 and at a significance of 0.01 is statistically significant. For average points differential, the t-statistic is found by taking 1.6206/0.135 =12.004 with a P-value of 0.000 and at a significance level of 0.01 is statically significant. For average relative skill differential, the t-statistic is found by taking .0525/0.018 = 2.916 with a P-value of 0.004 and at a significance level of 0.01 is statically significant. These results tell us that average points scored, average points differential, and average relative skill differential are significant in the prediction of overall total number of wins in the season. The coefficient of determination (R 2 ) is 0.878. The coefficient of determination is a measurement of how well a statistical model predicts the outcome. If the coefficient of determination is 0, the model does not predict the outcome. If the coefficient of determination is
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
between 0 and 1, the model partially predicts the outcome. If the coefficient of determination is 1, the model perfectly predicts the outcome. From our coefficient of determination, we can see that roughly 88% of the total wins can be explained by the variance in the average relative skill, the average points scored, the average points differential or the average relative skill differential. The predicted total number of wins in a regular season for a team that is averaging 75 points per game with a relative skill level of 1350, average point differential of -5 and average relative skill differential of -30 can be calculated as follows: Total number of wins = 34.5753 +-0.0134 * avg_elo_n + 0.2597 * avg_pts + 1.6206 * avg_pts_differential + 0.0525 * avg_elo_differential Total number of wins = 34.5753 +-0.0134 * 1350 + 0.2597 * 75 + 1.6206 * -5+ 0.0525 * -30 Total number of wins = 26.2848 =26 rounded to the nearest integer The predicted total number of wins in a regular season for a team that is averaging 100 points per game with a relative skill level of 1600, average point differential of +5 and average relative skill differential of +95 can be calculated as follows: Total number of wins = 34.5753 +-0.0134 * avg_elo_n + 0.2597 * avg_pts + 1.6206 * avg_pts_differential + 0.0525 * avg_elo_differential Total number of wins = 34.5753 +-0.0134 * 1600 + 0.2597 * 100 + 1.6206 * 5 + 0.0525 * 95 Total number of wins = 52.1958 = 52 rounded to the nearest integer These calculations further confirm that the total number of wins correlates to the average relative skill, the average points scored, the average points differential and the average relative skill differential.
8. Conclusion The results of these analyses indicate that there is a significant relationship between historical data and future predictions. In this scenario, we saw that teams with higher average points scored, higher average skill differential, and higher average points differential resulted in more games won during the regular seasons and vice versa. The correlation seen was a positive correlation between the variables and the likelihood of winning future games. The practical importance is that these results can be used to determine what is needed to improve upon to create a more successful season for future years.