Lab 2

docx

School

East Tennessee State University *

*We aren’t endorsed by this school

Course

BSTA-535

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

5

Uploaded by ChefStarZebra10

Report
Question 1 (6 points) A. In a study of grade school children, ages, heights, weights and scores on a physical fitness exam were obtained from a random sample of 10 children. The data is given below. Fit a multiple linear regression model relating the scores to the ages, heights, and weights of the children. Data in the following order: score (Y) age (X 1 ) height (X 2 ) weight (X 3 ). 58 7 47.5 53 54 7 45 50 55 9 52.5 85 74 7 48 52 86 9 55 76 98 8 51 64 96 9 53 75 70 7 46 75 40 7 48 68 67 9 50.5 74 Questions 1 thru 5 are based on this data analysis: 1. What is the multiple linear regression model fit to the data? 2. Which of the variables are significantly associated with physical fitness in the children? 3. Discuss the relationship between the physical fitness and height when adjusted by age and weight. 4. What is the percentage of the variation of the physical fitness score in the sample of children can be explained by the regression model? 5. Based on those 90% confidence intervals (CI) we have obtained, what we can conclude? B. For Questions 6 thru 10. The human resource department of a major mobile company decides to determine the cause of depression amongst its customer care representatives. The number of calls received/day and number of work hours per week of the customer care representatives are employed as predictors for depression. The data is in the following order: depression score, number of calls and work hours. 59 47.5 53 55 45 50 54 52.5 85 75 48 52 87 55 76 99 51 64 95 53 75 71 46 75 42 48 68 68 50.5 74 40 45 40 42 48.5 66 48 50.5 65 44 49.0 70 91 51.5 70 49 46.5 60 96 53.5 77 40 45 65 66 52.5 65 71 51.5 67 C. For Questions 6 thru 10. Fit a multiple linear regression model. The dependent variable is depression score and two predictors are number of calls and work hours. 6. Based on the Covariance of Estimates table in the SAS output, what is the variance of the estimated slope (b1) for the variable of number of calls? 7. What is the standardized regression coefficient for work hours? (Hint: use the option 'stb' to request 8. Add an interaction term between two predictors to the model above. What we can conclude? (Hint: instead of proc reg, use proc glm with model statement option 'solution')
9. What Continued from question 8. To test the null hypothesis H 0 : beta 1 =beta 2 =beta 3 =0 vs H 1 : at least one of them is not 0. What is the F-statistic? D. For Questions 11-12. The human resource department then decided to add age to the number of predictors to determine the cause of depression amongst its customer care representatives. The data is in the following order: depression score, number of calls/day, work hours/week, age. 59 47.5 53 18 55 45 50 18 54 52.5 85 19 75 48 52 19 87 55 76 20 99 51 64 20 95 53 75 21 71 46 75 22 42 48 68 22 68 50.5 74 30 40 45 40 40 42 48.5 66 66 48 50.5 65 25 44 49.0 70 18 91 51.5 70 29 49 46.5 28 60 96 53.5 45 77 40 45 34 65 66 52.5 65 35 10. Fit a multiple regression model of depression score as predicted by number of calls/day, work hours/week and age. 11. Fit a simple linear regression model between depression and hours, and then fit a multiple linear regression model between depression and hours, age. Is it possible that age a confounder of hours? 12. For each part please answer the following questions: (1) What are the research purposes of the data analysis? (2) Discuss the advantages and disadvantages of the statistical methods used for this data analysis? First, for Questions 1 thru 5. In a study of grade school children, ages, heights, weights and scores on a physical fitness exam were obtained from a random sample of 10 children. The data is given below. Fit a multiple linear regression model relating the scores to the ages, heights, and weights of the children. Data in the following order: score (Y) age (X 1 ) height (X 2 ) weight (X 3 ). 58 7 47.5 53 54 7 45 50 55 9 52.5 85 74 7 48 52 86 9 55 76 98 8 51 64 96 9 53 75 70 7 46 75 40 7 48 68 67 9 50.5 74 What is the multiple linear regression model fit to the data? Question 1 options: Question 2 (6 points) Saved Which of the variables are significantly associated with physical fitness in the children? Question 2 options: Age. Weight. Height.
All of the above. None of the above. Question 3 (6 points) Which of the following statements are true? Question 3 options: The physical fitness score increases by 1.70 for each year increase in age. The physical fitness score increases by 4.22 for each inch increase in height. The physical fitness score decreases by 0.60 for each pound increase in weight. The physical fitness score increases by 4.22 for each inch increase in height when age and weight are held constant. The physical fitness score changes by 4.22 for each pound increase in weight controlling for age and height. Question 4 (6 points) Based on SAS output, we can conclude that 18.39% of the variation of the physical fitness score in the sample of children can be explained by the regression model. Question 4 options: True. False. Question 5 (7 points) Based on those 90% confidence intervals (CI) we have obtained, we can conclude that: Question 5 options: None of explanatory variables in the model is significant at alpha=0.10 level. The variable weight is significantly associated with physical fitness at alpha=0.10 level since the CI does not contain 1. All explanatory variables in the model are significant at alpha=0.05 level. All explanatory variables in the model are significant at alpha=0.10 level. Question 6 (7 points) B. For Questions 6 thru 9. The human resource department of a major mobile company decides to determine the cause of depression among its customer care representatives. The number of calls received/day and number of work hours per week of the customer care representatives are employed as predictors for depression. The data is in the following order: depression score, number of calls and work hours. 59 47.5 53 55 45 50 54 52.5 85 75 48 52 87 55 76 99 51 64 95 53 75 71 46 75 42 48 68 68 50.5 74 40 45 40 42 48.5 66 48 50.5 65 44 49.0 70 91 51.5 70 49 46.5 60 96 53.5 77 40 45 65 66 52.5 65 71 51.5 67 C. For Questions 6 thru 9. Fit a multiple linear regression model. The dependent variable is depression score and two predictors are number of calls and work hours.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Based on the Covariance of Estimates table in the SAS output, we can conclude that: Question 6 options: The covariance between the dependent variable and the intercept is 4487.22. The variance of the estimated slope for the variable of number of calls is 2.6120. The variance of the estimated slope for the variable of number of calls is 1.5397. The standard error of the population slope for the variable hours is 4.9051. Question 7 (7 points) Based on the Covariance of Estimates table in the SAS output, we can conclude that: Question 7 options: The standard error of b1 is the square root of 4071.1309. The covariance between b1 and b2 is the square root of -0.5028 The standard error of b2 is the square root of 0.2158. The variance of b2 is 10.6757. Question 8 (11 points) The standardized regression coefficient for hours is (Hint: use the option 'stb' to request). Question 8 options: -0.14322. 0.0000. -0.27247. -0.1846. Question 9 (11 points) Add an interaction term between two predictors to the model above. We can conclude that (Hint: instead of proc reg, use proc glm with model statement option 'solution') Question 9 options: The interaction between number of calls and work hours is not significant at alpha=0.05 level. The interaction between number of calls and work hours is significant at alpha=0.10 level. To determine that the interaction between number of calls and work hours is significant or not we should look at the p-value for the two predictors respectively. The interaction between number of calls and work hours is non-significant at alpha=0.05 level since both p-values of two predictors are greater than 0.05. To determine that the interaction between number of calls and work hours is significant or not we should look at two simple linear regression models respectively. Question 10 (11 points) Continued from question 9. To test the null hypothesis H 0 : beta 1 =beta 2 =beta 3 =0 vs H 1 : at least one of them is
not 0, the F-statistic = Question 10 options: 6.7000. 0.0071. 4.3300. 1.3100. Question 11 (11 points) For Questions 11-12. The human resource department then decided to add age to the number of predictors to determine the cause of depression among its customer care representatives. The data is in the following order: depression score, number of calls/day, work hours/week, age. 59 47.5 53 18 55 45 50 18 54 52.5 85 19 75 48 52 19 87 55 76 20 99 51 64 20 95 53 75 21 71 46 75 22 42 48 68 22 68 50.5 74 30 40 45 40 40 42 48.5 66 66 48 50.5 65 25 44 49 70 18 91 51.5 70 29 49 46.5 28 60 96 53.5 45 77 40 45 34 65 66 52.5 65 35 Fit a multiple regression model of depression score as predicted by number of calls/day, work hours/week and age. Question 11 options: Age is significant at alpha=0.05 level. Number of calls/day is significant at alpha=0.05 level. Number of work hours/week is significant at alpha=0.05 level. The overall model is not significant and the number of calls/day is not significant at alpha=0.05 level The overall model is significant and age is significant at alpha=0.05 level. Question 12 (11 points) Fit a simple linear regression model between depression and hours, and then fit a multiple linear regression model between depression and hours, age. Question 12 options: It is possible that age can be a confounder of hours. It is not possible that age can be a confounder of hours. We need fit more models to answer the questions in parts a and b. None of the above.