Assignment #3 adv data (3)

docx

School

University of Toronto *

*We aren’t endorsed by this school

Course

343

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

4

Uploaded by MagistrateRookMaster3708

Report
A) > scatter<-ggplot(bball,aes(PPM,AGE)) > scatter+geom_point() This scatter plot shows that the two variables AGE and PPM have a weak positive correlation. This is because based on this figure you can see that there is a pattern in direction of the data points but the strength of the correlation isn't very dominant. However, calculations still need to be done in order to see how far the data points deviate from the mean but, just looking at this figure we can say that there is a weak positive correlation.
B) > data2<-as.matrix(bball[,c("GAMES","PPM","MPG","HGT","FGP","AGE","FTP")]) > Hmisc::rcorr(data2) GAMES PPM MPG HGT FGP AGE FTP GAMES 1.00 -0.06 0.52 -0.17 0.19 0.16 0.31 PPM -0.06 1.00 0.36 0.21 0.41 -0.04 0.17 MPG 0.52 0.36 1.00 -0.01 0.34 0.18 0.39 HGT -0.17 0.21 -0.01 1.00 -0.11 0.07 -0.06 FGP 0.19 0.41 0.34 -0.11 1.00 0.11 0.28 AGE 0.16 -0.04 0.18 0.07 0.11 1.00 0.25 FTP 0.31 0.17 0.39 -0.06 0.28 0.25 1.00 n= 105 P GAMES PPM MPG HGT FGP AGE FTP GAMES 0.5444 0.0000 0.0799 0.0489 0.1138 0.0013 PPM 0.5444 0.0002 0.0289 0.0000 0.6544 0.0915 MPG 0.0000 0.0002 0.9158 0.0004 0.0659 0.0000 HGT 0.0799 0.0289 0.9158 0.2726 0.4782 0.5342 FGP 0.0489 0.0000 0.0004 0.2726 0.2711 0.0040 AGE 0.1138 0.6544 0.0659 0.4782 0.2711 0.0110 FTP 0.0013 0.0915 0.0000 0.5342 0.0040 0.0110 I believe that the strongest correlation is between GAMES and MPG. The weakest correlation is between FTP and HGT. > cor.test(bball$GAMES, bball$MPG,alternative="two.sided",method="pearson",conf.level=0.95) Pearson's product-moment correlation data: bball$GAMES and bball$MPG t = 6.2325, df = 103, p-value = 1.019e-08 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.3686154 0.6497989 sample estimates: cor
0.5233085 We can take away these values from the given correlation tvalue = 6.2325 , df = 103, p-value = 1.019e-08, r=0.5233085 Now I will calculate whether the correlation is significant by measuring of the amount of variability in one variable that is shared by the other. In this case, 27.4% of variability in GAMES is shared with MPG. I can conclude that this correlation is significant because ±.5 = large effect size and r = 0.5233085. > cor(bball$GAMES, bball$MPG)^2*100 [1] 27.38518 In terms of explained variance I used Pearson’s correlation coefficient (R-squared) because it will show me how much variance in one variable can be explained by the other variable. Moreover, by squaring r I will be able to gain a percentage of that relationship. Such as in the calculation above I can conclude that 27.4% of the variance in GAMES is shared with MPG. Overall, choosing pearson’s method shows a clear representation of the explained variance. Additionally, Pearson's method is most appropriate for linear relationships. C) I used spearman’s method because we have ordinal data within our variables. Pearson's method cannot be calculated with ordinal data therefore spearman’s is a much better choice. Additionally, I used cor.test because it will compute the level of significance of the correlation between two variables. These results show me that POS_num and FGP do not have a significant relationship and that there is a small effect size. > cor.test(bball$POS_num, bball$FGP,alternative="less",method="spearman") Spearman's rank correlation rho data: bball$POS_num and bball$FGP S = 151138, p-value = 0.9868 alternative hypothesis: true rho is less than 0 sample estimates: rho 0.2165782 D) > pcor(c("FGP", "PPM", "AGE"), var(data2)) [1] 0.4139502
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> pcor.test(0.4139502,1,105) $tval [1] 4.592655 $df [1] 102 $pvalue [1] 1.25354e-05 This shows us that the partial correlation has a tvalue= 4.592655, df= 102 and a pvalue=1.25354e-05. > r_squared<-4.592655^2 > View(r_squared) > r_squared [1] 21.09248 Moreover, now to determine what proportion of variance between PPM and FPG is controlled by AGE. I will square my tvalue and get 21.09248. This number tells me that 21.1% of this relationship is controlled by AGE.