PracWeek8 - Complete

docx

School

Macquarie University *

*We aren’t endorsed by this school

Course

1170

Subject

Statistics

Date

May 31, 2024

Type

docx

Pages

7

Uploaded by JudgeKnowledge14243

Report
Linear Regression – Part 1 Employability Skills As you complete this exercise, think about which of these employability skills you are using: What will we cover in this Practical? In this practical exercise we will: Create scatterplots. Fit a least squares regression line. Saving your work Don’t forget that it is useful to save your work. Save your work to your storage device to retain a copy. In this practical exercise, we use a data set collected on people living in Peru. Anthropologists gathered these data to determine the long-term effects of a change in environment on blood pressure. The data were collected from people who migrated from a primitive culture, high in the Andes, to mainstream Peruvian society - at a much lower altitude. Open the Peru.xlsx data file from iLearn. The variables in the Peru Data worksheet are described below. Column Variable Name Description ___________________________________ A AGE Age in years B YEARS Years since migration C WEIGHT Weight in kilograms D HEIGHT Height in millimetres E CHIN Chin skin fold thickness in millimetres F FOREARM Forearm skin fold thickness in millimetres 1 | Linear Regression – Part 1 Copyright Macquarie University 2020 Download and open the Peru data The Peru data
G CALF Calf skin fold thickness in millimetres H PULSE Pulse rate in beats per minute I DIASTOL Diastolic blood pressure J BMI Body mass index We will use Excel to examine the relationship between some of the variables in the Peru data set. Because interest was primarily in whether or not there is a relationship between systolic blood pressure (SYSTOL) and years since migration (YEARS) let’s begin by creating a scatterplot of this relationship. We will use YEARS as the predictor variable on the x axis and SYSTOL as the response variable on the y axis. To create a scatterplot, recall that we: Select the columns of data to be displayed in the scatterplot. Select Insert from the main menu bar. Select a Scatter plot from the Charts menu or choose the Scatter plot from the Recommended Charts. Add a Title to the scatterplot and Axis Titles. Sketch the scatterplot in the space below. Do you think that there is a positive, negative or no relation between systolic blood pressure and the number of years since migration ? There appears to be no relation. The data is scattered around a fairly flat line. 2 | Linear Regression – Part 1 Copyright Macquarie University 2020 K SYSTOL Systolic blood presure Relations between two numerical variables 80 90 100 110 120 130 140 150 160 170 180 0 10 20 30 40 50 Systolic Blood Presssure Years Since Migration Systolic Blood Pressure vs Years Since Migration
Now make scatterplots examining the relationships between systolic blood pressure and chin skin fold thickness , and systolic blood pressure and weight . For each scatterplot, SYSTOL should be the response variable on the y axis. Each of CHIN and WEIGHT will be the predictor variable for each of the scatterplots. You can use the Formal Axis command to allow the relation in the data to be more easily seen. To do this, select the x or y axis for a scatterplot. Right click and use the Format Axis command to change the minimum value for that axis to a suitable value. For example, for the variable SYSTOL on the y axis, the value of 80 could be used for the lower bound. Describe the relations you see in the table below. Peru: Relationship SYSTOL and YEARS There appears to be no relation. The data is scattered around a fairly flat line. SYSTOL and CHIN There appears to be no relation. The data is scattered around a fairly flat line. SYSTOL and WEIGHT There appears to be a positive linear relation between SYSTOL and WEIGHT. Now for each of these variables we are going to perform a linear regression to determine whether or not there is a linear relationship between each variable and systolic blood pressure. Linear regression: SYSTOL vs YEARS We will fit a linear model to these data and obtain appropriate plots to check assumptions. Click on Data then click on Data Analysis . In the Window that pops up select Regression then click OK. Select the SYSTOL data as Input Y Range and the YEARS data as Input X Range . Select Labels if you are also highlighting the column labels . Select Residuals and Residual Plots . Select New Worksheet Ply: Click OK . From the Residual Output (contained in the Summary Output created by Excel) construct a histogram of the residuals using the default bins. Excel shows the results of the regression analysis but we need to examine the assumptions for the hypothesis test to ensure that it is valid to conduct the linear regression. Our assumption checks are listed in the table on the next page. Fill in the table to determine whether or not the assumptions are met: Linear regression assumptions 3 | Linear Regression – Part 1 Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
E-28 121.4370 Was there a linear relationship between SYSTOL and YEARS? [Check the scatterplot] Yes – the scatterplot shows a linear relation, but there is not much slope to the relation. Was the histogram of residuals approximately normal (or at least not too skewed)? Yes – the histogram of the residuals is approximately Normal; unimodal and not too skewed. Are the residuals versus fitted values scattered evenly either side of the horizontal line? [i.e. there is no obvious pattern] Yes – the residuals are evenly scattered along the x axis. Now based on the above results answer the following: All/Not all of the assumptions were met so we can/cannot continue on and examine the regression output to test whether or not there is a relationship between SYSTOL and YEARS. Yes – all of the assumptions for the regression are met – so we can continue and examine the regression output. Now examine the regression output. Do you get the same results as shown below? SUMMARY OUTPUT: SYSTOL versus YEARS Coefficients Standard Error t Stat P-value 3.461 Lower 95% Upper 95% Intercept 128.9593 3.7015 34.8399 136.4816 YEARS -0.0832 0.2121 -0.3923 0.6972 -0.5144 0.3479 What is the equation of the fitted line? 𝑺𝒀𝑺𝑻𝑶𝑳̂= 𝟏𝟐𝟖. 𝟗𝟓𝟗𝟑 − (𝟎. 𝟎𝟖𝟑𝟐 × 𝒀𝑬𝑨𝑹𝑺) Let’s perform a hypothesis test on the slope for YEARS to determine whether or not a statistically significant linear relation exists between systolic blood pressure and years since migration. HATPD C Hypothesis test for : Write down the null and alternative hypotheses 𝑯 𝟎 : 𝜷 𝟏 = 𝟎 𝑯 𝟏 : 𝜷 𝟏 𝟎 Are the assumptions met? Why or why not? Yes. The scatterplot shows a linear relation. The histogram of the residuals is approximately Normal. The residual plot shows that the residuals have constant standard deviation along the range of x values. What is the value of the test statistic? t = −𝟎. 𝟑𝟗𝟐𝟑 df= 𝟑𝟔 − 𝟐 = 𝟑𝟒 What is the p-value? 𝟎. 𝟔𝟗𝟕𝟐 4 | Linear Regression – Part 1 Copyright Macquarie University 2020
Do you reject or not reject the null hypothesis? Do not reject the null hypothesis since the p-value ≥ 0.05. Write your conclusion in words: There is not enough evidence to indicate a significant linear relationship between systolic blood pressure and years since migration. Linear regression: SYSTOL vs CHIN Now we repeat the exercise and examine systolic blood pressure versus CHIN. Repeat the regression analysis replacing the predictor variable with CHIN. Examine whether or not the assumptions were met and fill in the table below. Linear regression assumptions Was there a linear relationship between SYSTOL and CHIN? [Check the scatterplot] Yes – the scatterplot shows a linear relation, but there is not much slope to the relation. Was the histogram of residuals approximately normal (or at least not too skewed)? Yes – the histogram of the residuals is approximately Normal; unimodal and not too skewed. Are the residuals versus fitted values scattered evenly either side of the horizontal line? [i.e. no pattern] Yes – the residuals are evenly scattered along the x axis. Now based on the above results answer the following: All/Not all of the assumptions were met so we can/cannot continue on and examine the regression output to test whether or not there is a relationship between SYSTOL and CHIN. Yes – all of the assumptions for the regression are met – so we can continue and examine the regression output. Now examine the regression output and fill in the missing values in the spaces below: SUMMARY OUTPUT: SYSTOL versus CHIN Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 124.7463 5.2400 23.8066 0.0000 114.0973 135.3952 CHIN 0.5020 0.7917 0.6341 0.5303 -1.1069 2.1109 What is the equation of the fitted line? 𝑺𝒀𝑺𝑻𝑶𝑳̂ = 𝟏𝟐𝟒. 𝟕𝟒𝟔𝟑 + (𝟎. 𝟓𝟎𝟐𝟎 × 𝑪𝑯𝑰𝑵) Let’s perform a hypothesis test on the slope for CHIN to determine whether or not a statistically significant linear relation exists between systolic blood pressure and chin skin fold thickness. HATPD C Hypothesis test for : 5 | Linear Regression – Part 1 Copyright Macquarie University 2020
Write down the null and alternative hypotheses 𝑯 𝟎 : 𝜷 𝟏 = 𝟎 𝑯 𝟏 : 𝜷 𝟏 𝟎 Are the assumptions met? Why or why not? Yes. The scatterplot shows a linear relation. The histogram of the residuals is approximately Normal. The residual plot shows that the residuals have constant standard deviation along the range of x values. What is the value of the test statistic? t= 𝟎. 𝟔𝟑𝟒𝟏 df= 𝟑𝟔 − 𝟐 = 𝟑𝟒 What is the p-value? 𝟎. 𝟓𝟑𝟎𝟑 Do you reject or not reject the null hypothesis? Do not reject the null hypothesis since the p-value ≥ 0.05. Write your conclusion in words: There is not enough evidence to indicate a significant linear relationship between systolic blood pressure and chin skin fold thickness. Linear regression: SYSTOL vs WEIGHT Now we repeat the exercise and examine systolic blood pressure versus WEIGHT. Repeat the analysis replacing the predictor variable with WEIGHT. Examine whether or not the assumptions were met and fill in the table below. Linear regression assumptions Was there a linear relationship between SYSTOL and WEIGHT? [Check the scatterplot] Yes – the scatterplot shows a positive linear relation. Was the histogram of residuals approximately normal (or at least not too skewed)? Yes – the histogram of the residuals is approximately normal. Are the residuals versus fitted values scattered evenly either side of the horizontal line? [i.e. no pattern] Yes – the residuals are evenly scattered along the x axis. Were all of the assumptions were met? Yes, so we can continue on and examine the regression output and test whether or not there is a relationship between SYSTOL and WEIGHT. Now examine the regression output and fill in the missing values in the spaces below: SUMMARY OUTPUT: SYSTOL versus WEIGHT Coefficients Standard Error t Stat P-value Lower 95% Upper 95% 6 | Linear Regression – Part 1 Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Intercept 76.7929 22.2100 3.4576 0.0015 31.6568 121.9291 WEIGHT 0.8112 0.3519 2.3050 0.0274 0.0960 1.5264 What is the equation of the fitted line? 𝑺𝒀𝑺𝑻𝑶𝑳̂= 𝟕𝟔. 𝟕𝟗𝟐𝟗 + (𝟎. 𝟖𝟏𝟏𝟐 × 𝑾𝑬𝑰𝑮𝑯𝑻) Let’s perform a hypothesis test on the slope for WEIGHT to determine whether or not a significant linear relation exists between systolic blood pressure and weight in kg. HATPD C Hypothesis test Write down the null and alternative hypotheses 𝑯 𝟎 : 𝜷 𝟏 = 𝟎 𝑯 𝟏 : 𝜷 𝟏 𝟎 Are the assumptions met? Why or why not? Yes. The scatterplot shows a linear relation. The histogram of the residuals is approximately Normal. The residual plot shows that the residuals have constant standard deviation along the range of x values. What is the value of the test statistic? t= 𝟐. 𝟑𝟎𝟓𝟎 df= 𝟑𝟔 − 𝟐 = 𝟑𝟒 What is the p-value? 𝟎. 𝟎𝟐𝟕𝟒 Do you reject or not reject the null hypothesis? Reject the null hypothesis since the p- value < 0.05 Write your conclusion in words: There is a significant positive linear relationship between systolic blood pressure and weight. For each additional kg of weight, systolic blood pressure increases by 0.8112 units, on average. Summarise your analyses below, being sure to comment on each of the variables examined: Weight was a statistically significant predictor for the linear regression with systolic blood pressure. For each additional kilogram of weight, systolic blood pressure increases by 0.8112 units, on average. Years since migration and chin skin fold thickness were not statistically significant predictors in the linear regressions for systolic blood pressure. Neither of these potential predictor variables may be used for prediction of systolic blood pressure. 7 | Linear Regression – Part 1 Copyright Macquarie University 2020