Assignment #3 adv data (3)
docx
keyboard_arrow_up
School
University of Toronto *
*We aren’t endorsed by this school
Course
343
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
docx
Pages
4
Uploaded by MagistrateRookMaster3708
A)
> scatter<-ggplot(bball,aes(PPM,AGE))
> scatter+geom_point()
This scatter plot shows that the two variables AGE and PPM have a weak positive correlation. This is because based on this figure you can see that there is a pattern in direction of the data points but the strength of the correlation isn't very dominant. However, calculations still need to be done in order to see how far the data points deviate from the mean but, just looking at this figure we can say that there is a weak positive correlation.
B)
> data2<-as.matrix(bball[,c("GAMES","PPM","MPG","HGT","FGP","AGE","FTP")])
> Hmisc::rcorr(data2)
GAMES PPM MPG HGT FGP AGE FTP
GAMES 1.00 -0.06 0.52 -0.17 0.19 0.16 0.31
PPM -0.06 1.00 0.36 0.21 0.41 -0.04 0.17
MPG 0.52 0.36 1.00 -0.01 0.34 0.18 0.39
HGT -0.17 0.21 -0.01 1.00 -0.11 0.07 -0.06
FGP 0.19 0.41 0.34 -0.11 1.00 0.11 0.28
AGE 0.16 -0.04 0.18 0.07 0.11 1.00 0.25
FTP 0.31 0.17 0.39 -0.06 0.28 0.25 1.00
n= 105 P
GAMES PPM MPG HGT FGP AGE FTP GAMES 0.5444 0.0000 0.0799 0.0489 0.1138 0.0013
PPM 0.5444 0.0002 0.0289 0.0000 0.6544 0.0915
MPG 0.0000 0.0002 0.9158 0.0004 0.0659 0.0000
HGT 0.0799 0.0289 0.9158 0.2726 0.4782 0.5342
FGP 0.0489 0.0000 0.0004 0.2726 0.2711 0.0040
AGE 0.1138 0.6544 0.0659 0.4782 0.2711 0.0110
FTP 0.0013 0.0915 0.0000 0.5342 0.0040 0.0110 I believe that the strongest correlation is between GAMES and MPG. The weakest correlation is
between FTP and HGT. > cor.test(bball$GAMES, bball$MPG,alternative="two.sided",method="pearson",conf.level=0.95)
Pearson's product-moment correlation
data: bball$GAMES and bball$MPG
t = 6.2325, df = 103, p-value = 1.019e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.3686154 0.6497989
sample estimates:
cor
0.5233085 We can take away these values from the given correlation
tvalue = 6.2325 , df = 103, p-value = 1.019e-08, r=0.5233085 Now I will calculate whether the correlation is significant by measuring of the amount of
variability in one variable that is shared by the other. In this case, 27.4% of variability in GAMES is shared with MPG. I can conclude that this correlation is significant because ±.5 = large effect
size and r = 0.5233085. > cor(bball$GAMES, bball$MPG)^2*100
[1] 27.38518
In terms of explained variance I used Pearson’s correlation coefficient (R-squared) because it will show me how much variance in one variable can be explained by the other variable. Moreover, by squaring r I will be able to gain a percentage of that relationship. Such as in the calculation above I can conclude that 27.4% of the variance in GAMES is shared with MPG. Overall, choosing pearson’s method shows a clear representation of the explained variance. Additionally, Pearson's method is most appropriate for linear relationships.
C)
I used spearman’s method because we have ordinal data within our variables. Pearson's method cannot be calculated with ordinal data therefore spearman’s is a much better choice. Additionally, I used cor.test because it will compute the level of significance of the correlation between two variables. These results show me that POS_num and FGP do not have a significant relationship and that there is a small effect size.
> cor.test(bball$POS_num, bball$FGP,alternative="less",method="spearman")
Spearman's rank correlation rho
data: bball$POS_num and bball$FGP
S = 151138, p-value = 0.9868
alternative hypothesis: true rho is less than 0
sample estimates:
rho 0.2165782 D)
> pcor(c("FGP", "PPM", "AGE"), var(data2))
[1] 0.4139502
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
> pcor.test(0.4139502,1,105)
$tval
[1] 4.592655
$df
[1] 102
$pvalue
[1] 1.25354e-05
This shows us that the partial correlation has a tvalue= 4.592655, df= 102 and a pvalue=1.25354e-05. > r_squared<-4.592655^2
> View(r_squared)
> r_squared
[1] 21.09248
Moreover, now to determine what proportion of variance between PPM and FPG is controlled by
AGE. I will square my tvalue and get 21.09248. This number tells me that 21.1% of this relationship is controlled by AGE.