Lab_Wk03_2023

docx

School

University of Wollongong *

*We aren’t endorsed by this school

Course

251

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

7

Uploaded by PresidentMusicCaterpillar33

Report
STAT251 Fundamentals of Biostatistics Spring Session LABORATORY NOTES, Week 3 Exploring Associations between Quantitative Variables Aim: The focus of this lab is exploring relationships between two quantitative variables. 1. Scatterplots Open the Student Data.csv ( available for download from Moodle > Labs > Week 3). We wish to explore how a response (dependent) variable Y depends on an explanatory (independent) variable X . One tool for this purpose is the simple scatterplot . In the following, we will create a scatterplot of standing height vs. left forearm length . We will put standing height on the y-axis and left forearm length on the x-axis. First, under the Analysis tab, select Exploration > Scatterplot . Now, move the variables left forearm and standing height and to the X and Y axis, respectively, as shown in the panel to the right. The result of this procedure is the following scatterplot: 2
1.1.Adding a line of best fit To add a line of best fit , select the Linear option under Regression Line . This line of best fit to the data is the line chosen according to the least squares criterion. That is, out of all possible lines we could choose, this line minimises the squared vertical distances between the data points and the line. Discussion questions: 1. Comment on the scatterplot ( ignoring the fitted line). What does it indicate about the strength and direction of the relationship between standing height and left forearm ? Are there any features visible that might impact upon the correlation between these variables? Discuss. 2. Comment on the fit of the line of best fit (i.e., how well does the line fit the data, etc.). 1.4. Correlation The value of the correlation coefficient r is a measure of the association between points in the scatterplot. A value close to 1 means a strong positive association, with a narrow cluster of points sloping from lower left to upper right (i.e., a positive gradient). A value of r close to -1 implies a strong negative association, with the cluster of the points sloping from upper left to lower right. A value of r close to 0 implies a weak linear association. To produce information on the correlation between variables, use: Analysis > Regression > Correlation matrix, Then move the variables standing height and left forearm into the right-hand side box. Under Additional Options , deselect Report significance . The following correlation matrix should appear in the output: Correlation Matrix Correlation Matrix left forearm standing height left forearm standing height 0.2970 3
Discussion questions: 1. Does this value for the correlation coefficient r seem reasonable? Comment on both the ‘value’ and the ‘sign’ with particular reference to the scatterplot. 2. The regression equation Now we will perform linear regression using standing height and left forearm as our dependent and independent variable, respectively. Select the linear regression option as shown below. First, pull up the Linear Regression window by selecting: Analyses > Regression > Linear Regression . Then select standing height as the dependent variable, and left forearm as the covariate (independent variable). The output from this procedure gives us information on the model fit, as well as estimates of the regression coefficients: Linear Regression Model Fit Measures Mode l R 1 0.2970 0.088 2 Model Coefficients - standing height Predictor Estimat e SE t P Intercept 161.183 1 3.634 9 44.343 2 < .000 1 left forearm length 0.3354 0.078 9 4.2537 < .000 1 4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Discussion questions: 1. What are the values of the slope and intercept of the regression line? 2. Write the equation of the regression line in the form y = a + bx , using the correct variable names for these data. 3. Calculate the predicted height of a student with forearm length of 53 cm. 2. Regression outliers and Influential Points One way to see if there is a regression outlier (i.e., an outlier which significantly affects the regression equation) with respect to a given model is to look at the standardised residuals (i.e., the Z-scores). The residuals are the difference between the model prediction and observation, and the standardised residuals are the residuals divided by the estimated standard deviation. Usually, any point with a standardised residual greater than 3 or less than -3 (i.e., with |Z| > 3) is considered a potential outlier. To obtain standardised residuals from the Linear regression output, click on the Save menu at the bottom of the Linear Regression window and tick the Residuals box . When you look at the Data tab you will now have an extra column called Residuals ; these residuals are not yet standardised (converted to Z scores). To standardise them, double click on the empty column next to the Residuals column and select New computed variable . Type the following formula into the window that appears (note the Z is UPPER CASE): The function Z() is a function that returns the Z scores for a given set of numbers. Once you hit Enter, the standardised residuals will appear in the new column. 5
Select Analysis > Exploration > Descriptives and examine the distribution of the standardised residuals. Discussion questions : 1. Are there any points with standardised residuals outside of the range (-3, 3)? What is the id (record number) of this variable? We want to select only observations with a standardised residual less than -3 or greater than 3. To do this, we can use Filters . To filter data points with a standardised residual less than -3 or greater than 3, in the Data tab, select Filter . A new window will appear, where we can specify a condition . Type in the following formula: Important : To work with the standardised residuals we computed, we first need to copy the Standardised Residuals column to another column. Click on the header of the Standardised Residuals column to select the whole column, and right click to select Copy . Right click on an empty column and select Paste . Name this new column SR . We need to do this because Jamovi filters do not work on computed columns (whose values depend on the values of another variable); they only work on static columns. 6
Data points that do not satisfy the condition will be greyed out and marked with a cross, and those that do will be marked with a tick. It is clear that the point with id (Record number) of 56 is an outlier. Now, we will remove it from the regression and see how things change. Modify the condition in the filter to the following: Note that the row containing id (Record) 56 has been ‘crossed out’. It will not be included in any processing unless we disable the filter (click on the active button). With this record removed, continue the analysis. The Linear regression output should change automatically when the filter is applied. Paste these into the space below and answer the following question. Tip : click on the button in the Filter window and see what happens ! 7
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Discussion questions: 2. Based on the change in the fitted line, is the excluded data point influential? 8