Malaika Wauters Math 11 - Lab 8

pdf

School

University of California, San Diego *

*We aren’t endorsed by this school

Course

11

Subject

Mathematics

Date

May 30, 2024

Type

pdf

Pages

5

Uploaded by MegaRose1330

Report
Malaika Wauters Math 11 Denise Rava Lab 8: Predicting Children’s Growth 1. Begin by using linear regression to predict a child's height at age 9 from the child's height at age 2. What is the equation of your regression line? Based on your scatter plot and residual plot, does linear regression seem like an appropriate way to predict heights? a. Equation of regression line: HT9 = 31.93 + 1.796(HT2) b. Based on my scatter and residual plot above, I would suggest that linear regression is appropriate to predict heights because there is an absence of any curvature, heteroskedasticity in the residual plot and the points seem relatively randomly scattered. 2. Next try using linear regression to predict a child's height at age 18 from the child's height at age 9. What is the equation of your regression line? Does linear regression seem appropriate for these data?
a. Equation of regression line: HT18 = 32.2 + 1.035(HT9) b. Analyzing the scatter and residual plots depicted above, linear regression once again seems like an appropriate model to predict the heights of children at age 18. There is no heteroskedasticity or curvature in the residual plot and moreover, the data seems to be randomly scattered. 3. Is there a big difference between how much boys and girls grow in height between age 2 and age 9, or does the regression line you found in question 1 appear to work pretty well for both boys and girls? a. Evidently, the scatter plot on the left demonstrates that the regression line from question 1 can apply and work well for both boys and girls, the slopes are very similar, the female slope is just ever so slightly steeper than the male height slope. 4. Now consider the period between age 9 and age 18. Is there a big difference between the growth patterns of boys and girls in height during this period, or does the regression line you found in question 2 work well for both boys and girls? a. Unfortunately the regression line found as a result of question 1 does not seem to work well for the heights of 18 year old boys and girls because the regression lines for the boys and girls on this scatterplot have different slopes and y intercepts.
5. Find the equation of a regression line that can be used to predict a boy's height at age 18 from the boy's height at age 9. a. HT18 = 35.08 + 1.059(HT9) 6. What percentage of the variation in the boys' heights at age 18 is explained by this regression. a. R^2 = 76.54% 7. Are the assumptions required for statistical inference satisfied? Explain how you arrive at your conclusions and provide supporting plots. a. The plot seems to be linear and the data points show linear assumption, the randomness of the plot points also show that the independence assumption is met, and the equal variance assumption is met clearly with the even random scatter of plots on the residual plot. Since the sample size is greater than 30, the central limit theorem suggests that the normal assumption is met. Therefore, the assumptions for statistical inference are satisfied. 8. Can you conclude that there is an association between boys' heights at age 18 and their heights at age 9? Make sure to state your null and alternative hypotheses and give the T-statistic and p-value for your test. Use significance level .05. a. H 0 : b 1 =0 H A : b 1 ≠0 T-value = 14.45 P-value = 0 b. Since the p-value is less than 0.05, we reject the null hypothesis that states that there is not an association between heights at age 9 and 18 for boys, and thus, we can conclude that there is an association between the heights because there is statistically sufficient evidence. 9. Find a 95 percent confidence interval for the slope of your regression line. Explain carefully in a sentence or two what this confidence interval means. a. D.f = n - 2 = 64 b. ME = (1.9975)(0.07328) = 0.146 c. 95% confidence interval is b 1 +/- ME = (0.913, 1.21) d. The confidence interval tells us that we can be 95% confident that there is a 1 cm increase in height at 9 years old associated with an increase in height at age 18 on average between the above interval.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10. If a boy is 140 centimeters tall at age 9, find an interval that you are 95 percent confident will contain the boy's height at age 18. a. (176.9, 189.8) 11. Find an interval that you are 95 percent confident will contain the average height at age 18 of all boys who are 140 centimeters tall at age 9. a. (182.3, 184.3) 12. Find the equation of a regression line that can be used to predict a boy's weight at age 18 from the boy's weight at age 9. Comment on what you see in the scatterplot and the residual plot. a. Equation of regression line: 37.11 + 1.048 (WT9) b. The linear regression above does not seem to be the most accurate model, the data points are clustered at the beginning of the model and there are 2 obvious outliers with high leverage points which might hinder the normal distribution. Furthermore, the residual plot points are not entirely randomly scattered.
13. You should have noticed that the data set contains some outliers, including one rather extreme outlier that represents a boy who weighed nearly 67 kilograms at age 9. Try removing this outlier. Then do the linear regression again. This time, do the assumptions for inference appear to be satisfied? a. Now that one extreme outlier has been removed, the linear regression model seems to be much more appropriate and the assumptions for statistical inference are satisfied. Now, the central limit theorem concludes that the data is approximately normal. The regression line fits the scatter plot better but the residual plot is also much more randomly dispersed. 14. How much effect was the outlier having on the slope of the regression line? Would you say that this outlier is an influential point? Is it a high leverage point? a. The outlier immensely influenced the regression and slope of the line from 1.0481 to 1.667, thus it had high leverage and was influential. 15. Find an interval that you are 95 percent confident will contain the weight at age 18 of a boy who weighs 30 kilograms at age 9. a. Using the model in question 14, in which the outlier was removed, we can conclude that the 95% confidence interval is ( 54.6, 81.7)