1690 HW 10

docx

School

University of Texas *

*We aren’t endorsed by this school

Course

PHM1690

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

6

Uploaded by marshallmalaysia

Report
Malaysia Marshall PHWM 1690 HW 10 Part A 1. Identify Relationships, Part II a. The strength of this relationship can be described as strong. The data is non-linear, however, so fitting a linear model would not be reasonable. b. This plot shows a strong, positive relationship that is non-linear. For this reason, fitting a linear model would not be reasonable. c. This shows a strong positive relationship. As the plot is linear, fitting a linear model would be reasonable. d. This plot shows a weakly positive association. Fitting a linear model would be reasonable, but there may be better alternatives. e. This plot shows a weakly negative association. A linear model would be reasonable, but possibly not the best choice. f. The strength of this relationship is moderate and linear. Fitting a linear model would be reasonable. 2. Match the Correlation, Part II a. The scatterplot that best matches a correlation coefficient of 0.49 would be scatterplot number 2. b. Scatterplot 4 best fits a correlation coefficient of -0.48. c. The best fit for a correlation coefficient of -0.03 is scatterplot 3. d. Scatterplot 1 corresponds to a correlation coefficient of -0.85. 3. Nutrition at Starbucks, Part I a. The relationship between the number of calories and the amount of carbohydrates can be described as weakly positive and linear. b. The explanatory variable is the number of calories in the Starbucks food items. The response variable is the amount of carbohydrates (in grams) that each Starbucks food item contains. c. We may want to fit a regression line to these data to determine if the linear relationship is statistically significant and to determine the correlation coefficient of the two variables. d. This data does fit the 4 conditions required for fitting a least squares line. 4. Beer and Blood Alcohol Content a. The relationship between the number of cans of beer and BAC is moderately positive and linear. b. The equation of the regression line is y = -0.0127 + 0.0180 (cans of beers) i. As the number of cans of beers increase, we can expect the BAC to increase by 0.0180 grams/deciliter. ii. The intercept can be interpreted as the expected BAC when the number of cans of beer is 0. c. Is drinking more cans of beer associated with an increase in blood alcohol? i. The null and alternative hypotheses can be written as: 1. H 0 : Drinking more cans of beer is not associated with an increase in BAC. 2. H A : Drinking more cans of beer is associated with an increase in BAC.
Malaysia Marshall PHWM 1690 HW 10 ii. Using the provided information, the p-value was found to be 0.0000. iii. We have enough statistical evidence to suggest that the number of cans of beer is a significant predictor of BAC. d. R 2 = (0.89) 2 = 0.7921. About 79.21% of variability in the BAC is explained by the simple linear regression model having the number of cans of beers as the predictor e. The expected (average) value of BAC in a person who drinks 10 cans of beer is 0.1673. We can trust this predicted value because all the necessary normality assumptions have been (fairly) satisfied. Part B 1. Body Measurements a. The summary statistics table for shoulder girth and height is charted below. b. Using height as the response variable, a scatterplot of height and shoulder girth has been generated below. The relationship between the two variables can be described as moderately positive and linear.
Malaysia Marshall PHWM 1690 HW 10 c. Is there a significant linear association between height and shoulder girth? i. Histograms for each height and shoulder girth are listed below. Both show an approximately normal distribution with no visible skew or outliers. ii. Using the STATA Command ‘pwcorr shoulder_girth height, sig’, a correlation coefficient of 0.6657 was computed. This value is positive, confirming that there is a positive relationship and because it is not small or large, could be described as a moderate relationship. iii. The null and alternative hypotheses can be described as: 1. H 0 : p = 0 2. H A : p ≠ 0 iv. The p-value was determined to be 0.0000. We have enough statistical evidence that the true correlation coefficient between shoulder girth and heigh is significantly different from zero.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Malaysia Marshall PHWM 1690 HW 10 d. Using STATA, the regression for predicting height using shoulder girth was computed as follows. _cons 105.8325 3.27245 32.34 0.000 99.40317 112.2618 shoulder_girth .6036442 .030108 20.05 0.000 .5444918 .6627966 height Coefficient Std. err. t P>|t| [95% conf. interval] Total 44778.7272 506 88.4955084 Root MSE = 7.0265 Adj R-squared = 0.4421 Residual 24932.6382 505 49.3715608 R-squared = 0.4432 Model 19846.089 1 19846.089 Prob > F = 0.0000 F(1, 505) = 401.97 Source SS df MS Number of obs = 507 . regress height shoulder_girth i. The regression equation is ^ y = 105.8325 + 0.6036442x ii. For each additional change in height, we would expect shoulder girth to increase on average by 0.6036442 cm. iii. Height with zero-measurement shoulder girth is expected on average to be 105.8325 cm. This value does not have interpretive meaning in this setting because it doesn’t make any sense. iv. R 2 = 0.4432. About 44.32% of variability in height is explained by the simple linear regression model having shoulder girth as the predictor. e. The residuals-based assumptions for the model are: i. Independence of subjects (Pairs) – there is no specific pattern in the residual plot and using the STATA Command ‘duplicates list id’ there are no duplicated datapoints. ii. Linearity – there is no specific pattern. iii. Normality of Residuals – all residuals are randomly scattered around 0 with no specific outliers. iv. Constant Variability – there is no observable pattern in terms of ‘window-size’ from the residual plot
Malaysia Marshall PHWM 1690 HW 10 The residuals vs. fitted value plot to check these assumptions is in the image below. This linear model is appropriate because it (fairly) satisfies all assumptions. f. Shoulder girth of 100cm and observed height of 160cm. i. Using the STATA Command ‘margins, at(shoulder_girth = 100)’, the predicted height of this student was computed to be 166.1969cm. ii. The residual was calculated to be -6.1969. This value means that the observed height for this student was 6.1969 cm lower than the regression model predicted. g. Based on the summary statistics, it would not be appropriate to use this linear model with shoulder girth to predict the height of this child. Prediction should only be done within the range of the datapoints available, and his shoulder girth does not fall within the range of values listed from part a. h. Regression with sex as a predictor for height i. Using STATA, the regression line has been fit as follows: ii. The equation for the fitted line is ^ y = 164.8723 + 12.87304 ( sex )
Malaysia Marshall PHWM 1690 HW 10 iii. The expected height of a female based on the fitted model is 164.8723cm. iv. The expected height of a male based on the fitted model is 177.74534 cm.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help