Exam 4 Applied
docx
keyboard_arrow_up
School
University of Arkansas *
*We aren’t endorsed by this school
Course
009
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
docx
Pages
5
Uploaded by ChiefWaterCrab19
Avery Rutkowski
Exam 4 - Applied
(answers in red)
Var 1= Conference
Var 2= Capacity
Var 3= Opened
Var 4= PPG_scored
Var 5= Home Games
Var 6= Average Attendance
Var 7- Wins
(1) Using StatCrunch and the ‘CollegeBasketball Data’ provided on Blackboard, find and report
the following summary statistics, representing both measures of center and measures of
variation:
(i) The mean and median for the
number of season wins.
Mean= 17.125
Median= 16.5
(ii) The standard deviation and
range for the number of season wins.
Standard Deviation= 6.247
Range= 28.
(iii) The linear correlation coefficient between wins and
each potential explanatory variable i.e.,
-
The linear correlation coefficient between Wins and
Capacity
0.340
The linear correlation coefficient between Wins and Opened
0.106
-
The linear correlation coefficient between Wins and
PPG_Scored
0.555
-
The linear correlation coefficient between Wins and Home
Games
0.308
-
The linear correlation coefficient between Wins and
Average Attendance
0.472
(2) Using the StatCrunch results obtained in Part (1), state initial/preliminary conclusions that
could be made in five to ten complete sentences.
-
To find the mean, median, standard deviation, and range in part 1 for wins, I used the
summary stats option to conclude that the mean was 17.125, the median was 16.5, the
standard deviation was 6.247, and the range was 28. The linear correlation coefficient
represents the strength of the linear relationship between the two variables. I found the
linear correlation coefficients between the amount of wins with the capacity, opened,
PPG_ scored, home games, and average attendance. To find these numbers, I used simple
linear regression between the two variables. This concluded that the PPG_scored and
wins relationship had the highest number, so the strongest linear relationship. It also
concluded that opened and wins had the lowest number, so the weakest linear
relationship.
(3) Using StatCrunch and the ‘CollegeBasketball Data’ provided on Blackboard, create a
statistical graph(s) that can be used to visualize any potential linear relationships between wins
and potential explanatory variables i.e., between Wins, Capacity, Opened, PPG_Scored, Home
Games, and Average Attendance.
(i) Provide the statistical graph(s) created.
(ii) In two to four
complete sentences, state
initial/preliminary
conclusions that could be
made.
To create these
graphs, I used a simple
linear regression to
visualize the linear
relationships between
wins and potential
explanatory variables. After visualizing the data, it is still clear that PPG_scored (Var 4)
has the strongest linear relationship. It is also clear that opened (Var 3) has the weakest
linear relationship. We can see this by how the data lines up with the line to show the
relationship between the two variables.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(4) One research question that could be posed using the available data: “Is it possible to
adequately predict overall/total season wins for a college basketball team using the potential
explanatory variables?” Essentially, we wish to create a ‘best’ linear model to predict the number
of overall/total season wins using any or all of the other variables.
Note
: Recall that we have criteria for selecting a ‘best’ linear model.
Determine a method that would be appropriate for answering this research question. Using
StatCrunch and the available data, perform the statistical procedure deemed appropriate.
(i) What
method did you
use to
answer the
research
question e.g.,
One-Sample t-Test for Means?
I used the multiple linear regression method.
(ii) Why did you select this particular method to answer the research question?
I used the multiple linear regression method because it predicts the value of the
dependent variable with multiple independent variables. In this case, it represents the relationship
between the explanatory variables and the amount of wins.
(iii) In five to ten complete sentences, summarize and interpret the relevant findings of your
analysis in the context of the research question. Make sure to address each of the following:
-
What is the one ‘best’ linear model that you selected to predict the number of overall/total
season wins?
-
What was the overall model p-value? What does this indicate about the ‘best’ linear
model?
-
What was the overall Adjusted R2 value? What does this indicate about the ‘best’ linear
model?
-
Describe the process used in selecting the ‘best’ linear model among all possible linear
models. How did determine this linear model was the ‘best’?
H0: there is no linear correlation between the amount of wins and explanatory variables
H1: there is linear correlation between the amount of wins and explanatory variables
The ‘best’ linear model that I used to predict the number of overall/total wins was
a multiple linear regression model. I selected this method because it compares all
of the explanatory values with the amount of wins to give an overall p-value, R2,
adjusted R2, and F-stat. The overall model p-value was 0. If I was using a
significance level of 0.05, 0.1, 0.01, all of these suggests that we fail to reject the
null hypothesis because there is insufficient evidence. The overall Adjusted R2 is
the model accuracy for linear models and the overall value is 0.4151. This
indicates that there is relatively low correlation. Among the other linear models, I
selected this one as the best because it includes the data that is needed to answer
the research question and tells us whether we can predict overall/total wins based
on explanatory values, which we can not.
(iv) Consider and, then, discuss any potential limitations to the current analysis.
Potential limitations to the current analysis could be too many explanatory values,
seasonal effects, incomplete data, and incorrect correlation.