Exam 4 Applied

docx

School

University of Arkansas *

*We aren’t endorsed by this school

Course

009

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

docx

Pages

5

Uploaded by ChiefWaterCrab19

Report
Avery Rutkowski Exam 4 - Applied (answers in red) Var 1= Conference Var 2= Capacity Var 3= Opened Var 4= PPG_scored Var 5= Home Games Var 6= Average Attendance Var 7- Wins (1) Using StatCrunch and the ‘CollegeBasketball Data’ provided on Blackboard, find and report the following summary statistics, representing both measures of center and measures of variation: (i) The mean and median for the number of season wins. Mean= 17.125 Median= 16.5 (ii) The standard deviation and range for the number of season wins. Standard Deviation= 6.247 Range= 28. (iii) The linear correlation coefficient between wins and each potential explanatory variable i.e., - The linear correlation coefficient between Wins and Capacity 0.340 The linear correlation coefficient between Wins and Opened 0.106
- The linear correlation coefficient between Wins and PPG_Scored 0.555 - The linear correlation coefficient between Wins and Home Games 0.308 - The linear correlation coefficient between Wins and Average Attendance 0.472 (2) Using the StatCrunch results obtained in Part (1), state initial/preliminary conclusions that could be made in five to ten complete sentences. - To find the mean, median, standard deviation, and range in part 1 for wins, I used the summary stats option to conclude that the mean was 17.125, the median was 16.5, the standard deviation was 6.247, and the range was 28. The linear correlation coefficient represents the strength of the linear relationship between the two variables. I found the linear correlation coefficients between the amount of wins with the capacity, opened, PPG_ scored, home games, and average attendance. To find these numbers, I used simple linear regression between the two variables. This concluded that the PPG_scored and
wins relationship had the highest number, so the strongest linear relationship. It also concluded that opened and wins had the lowest number, so the weakest linear relationship. (3) Using StatCrunch and the ‘CollegeBasketball Data’ provided on Blackboard, create a statistical graph(s) that can be used to visualize any potential linear relationships between wins and potential explanatory variables i.e., between Wins, Capacity, Opened, PPG_Scored, Home Games, and Average Attendance. (i) Provide the statistical graph(s) created. (ii) In two to four complete sentences, state initial/preliminary conclusions that could be made. To create these graphs, I used a simple linear regression to visualize the linear relationships between wins and potential explanatory variables. After visualizing the data, it is still clear that PPG_scored (Var 4) has the strongest linear relationship. It is also clear that opened (Var 3) has the weakest linear relationship. We can see this by how the data lines up with the line to show the relationship between the two variables.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(4) One research question that could be posed using the available data: “Is it possible to adequately predict overall/total season wins for a college basketball team using the potential explanatory variables?” Essentially, we wish to create a ‘best’ linear model to predict the number of overall/total season wins using any or all of the other variables. Note : Recall that we have criteria for selecting a ‘best’ linear model. Determine a method that would be appropriate for answering this research question. Using StatCrunch and the available data, perform the statistical procedure deemed appropriate. (i) What method did you use to answer the research question e.g., One-Sample t-Test for Means? I used the multiple linear regression method. (ii) Why did you select this particular method to answer the research question? I used the multiple linear regression method because it predicts the value of the dependent variable with multiple independent variables. In this case, it represents the relationship between the explanatory variables and the amount of wins.
(iii) In five to ten complete sentences, summarize and interpret the relevant findings of your analysis in the context of the research question. Make sure to address each of the following: - What is the one ‘best’ linear model that you selected to predict the number of overall/total season wins? - What was the overall model p-value? What does this indicate about the ‘best’ linear model? - What was the overall Adjusted R2 value? What does this indicate about the ‘best’ linear model? - Describe the process used in selecting the ‘best’ linear model among all possible linear models. How did determine this linear model was the ‘best’? H0: there is no linear correlation between the amount of wins and explanatory variables H1: there is linear correlation between the amount of wins and explanatory variables The ‘best’ linear model that I used to predict the number of overall/total wins was a multiple linear regression model. I selected this method because it compares all of the explanatory values with the amount of wins to give an overall p-value, R2, adjusted R2, and F-stat. The overall model p-value was 0. If I was using a significance level of 0.05, 0.1, 0.01, all of these suggests that we fail to reject the null hypothesis because there is insufficient evidence. The overall Adjusted R2 is the model accuracy for linear models and the overall value is 0.4151. This indicates that there is relatively low correlation. Among the other linear models, I selected this one as the best because it includes the data that is needed to answer the research question and tells us whether we can predict overall/total wins based on explanatory values, which we can not. (iv) Consider and, then, discuss any potential limitations to the current analysis. Potential limitations to the current analysis could be too many explanatory values, seasonal effects, incomplete data, and incorrect correlation.