6- Week 6 Lab Associations

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

359

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by GeneralSkunkMaster1037

Week 5: Introduction to the Correlations Jeremy C. Biesanz October 05, 2022 Data and Background The psychTools package has a dataset on personality measures (epi.bfi). These are measures from the International Personality Item Pool (IPIP)— a public domain item pool that may be used freely to create personality scales and measures that assess a variety of commonly used personality constructs. We will focus on Neuroticism and an assessment of the Beck Depression inventory (BDI). Lecture and Lab Goals: • Load and examine the data and then define the variables of interest . Examine the description of the datafile that is included in the psychTools package documentation. We will examine the relationship between Neuroticism (the IPIP measure) and depression as assessed by the Beck Depression Inventory. #Obtaining the data from the psych package #The two variables of interest: #bfneur is the Neuroticism measure from the IPIP #bdi is the Beck depression inventory. library(psychTools) data( "epi.bfi" ) ?epi.bfi describe(epi.bfi, skew= F) 1. Examine descriptives. Are these measures means, sums, or something else do you think? 2. Scatterplot and Loess smoother . Follow the attached template from the lecture notes and use bdi on the x -axis and bfneur on the y -axis. Include the linear relationship and the uncertainty associated with the loess smoother. Try to make this graph beautiful and include appropriate labels for the axes. Does this relationship appear linear, nonlinear, and/or monotonic? 3. Pearson, Spearman, and Distance Correlations. Compute these three correlations. How do these three estimates compare? Are they similar or different? Second Example • Load and examine the data and then define the variables of interest . Load and examine the dataset included in the NHANES package documentation. You will need to install this package first before loading it. We will examine the relationship between Age and Weight. library(NHANES) data( "NHANES" ) #Need to remove missing values on Weight for distance correlation NHANES <- subset(NHANES, is.na(NHANES$Weight)==FALSE) ?NHANES 1

1. Scatterplot and Loess smoother . Follow the attached template from the lecture notes and use Age on the x -axis and Weight on the y -axis. Include the linear relationship and the uncertainty associated with the loess smoother. Try to make this graph beautiful and include appropriate labels for the axes. Does this relationship appear linear, nonlinear, and/or monotonic? 2. Pearson, Spearman, and Distance Correlations. Compute these three correlations. How do these three estimates compare? Are they similar or different? Note that given the large sample size here (9922), the distance correlation function will be quite slow. Example Code for Figure and Analyses (on Canvas) #Creating a small example dataset to illustrate the code #for plots and obtaining correlation estimates. #Use your dataset instead of x and y below in this example. x <- rnorm( 200 ) y <- . 3 *x + rnorm( 200 ) example <- as.data.frame(cbind(x,y)) #Example of a loess plot with the uncertainty around the loess curve (se=TRUE) #This also adds the linear relationship (without the uncertainty, se=FALSE) library(ggplot2) ggplot(example, aes(x, y)) + geom_point( size = 0.70 ) + geom_smooth( method= "loess" , colour= "darkorchid4" , size= 1 , fill= "orchid4" , se= TRUE) + geom_smooth( method= "lm" , colour= "black" , size= . 75 , fill= "grey20" , se= FALSE) + labs( x= "This is the label for the x-axis" , y= "This is the label for the y-axis" ) + theme_classic() #Pearson and Spearman correlations cor(example$x, example$y, method= "pearson" ) cor(example$x, example$y, method= "spearman" ) #Distance correlation library(energy) dcorT.test(example$x, example$y) #Note that dcor.ttest() provides the squared correlation. #Take the square root to get the distance correlation sqrt(dcorT.test(example$x, example$y)$estimate) #Test where you estimate the p-value by simulating the #null hypothesis. R is the number of samples drawn under the null. #For very large sample sizes (N > 1000), start with smaller R values #to see how long it takes to run before increasing R. dcor.test(example$x, example$y, R= 1000 ) 2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version