Concept explainers
OUTPUT B.95 Output for Problem 10
R Large residual
10. Diabetes. In the research article “Capillary Basement Membrane Width in Diabetic Children" (American Journal of Medicine, 58. pp. 365–372), P. Raskin et al. obtained data on age and on width of the quadriceps muscle capillary basement membrane of individuals with and without diabetes. The membrane width can be used to diagnose the presence of diabetic microangiopathy. The table below provides the data obtained by the researchers. We want to predict membrane width based on age and whether the person is a diabetic. We introduce the indicator variable diabetic defined by
- a. Output B .91 on page B-170 shows a plot of width versus age, with the plot symbol being a solid black circle for diabetics and an open circle for non-diabetics. Based on this plot does it appear that diabetic is a useful predictor variable? Explain your answer.
- b. We obtained the regression analysis of width on age and diabetic shown in Output B.92 on page B-170. Conduct the t-tests for the individual utility of each of the two predictor variables. Use a 5% level of significance and interpret your results.
- c. Based on Output B.92, obtain the regression equations relating width to age for diabetics and non-diabetics, separately.
- d. Outputs B.93(a), (b), (c), and (d), given on page B-171, provide, respectively, plots of residuals versus fitted values, residuals versus age, residuals versus diabetic, and a normal probability plot of the residuals. Perform a residual analysis to assess the appropriateness of the regression equation, constancy of the conditional standard deviations, and normality of the conditional distributions. Check for outliers and influential observations.
Table for Problem 10
OUTPUT B.91
OUTPUT B.92 Output for Problem 10
Regression Analysis: WIDTH versus AGE, DIABETIC
OUTPUT B.93 Residual plots for Problem 10
- e. Output B.94 provides a plot of width versus age with regression lines for diabetics and non-diabetics. Based on this output and your residual analysis in part (d), do you feel that the model fits the data well? Explain your answer.
- f. To check for interaction between the two predictor variables, we obtained the regression analysis of width on age, diabetic, and diabetic-age. The output is given in Output B.95 on page B-172. Is there an interaction between age and diabetic? Use α = 0.05.
- g. What other analyses should be performed on these data? Explain your answer.
Want to see the full answer?
Check out a sample textbook solutionChapter B Solutions
INTRO.STATISTICS,TECH.UPDT.-W/MYSTATLAB
- An articie in Technometrics by S.C. Narula and J. F. Wallington Prediction, Lincar Regression, and a Minimum Sum of Relative Errors" Vol. 19, 1977) presents data on the sallingprica (y) and annual taas (x) for 24 houses. The taxes include local, school and county taxes. The data are shown in the following table. Sale Price/1000 Taxas/1000 25.9 4.9176 29.5 5.0208 27.9 4.5429 25.9 4.5573 29.9 5.0597 29.9 3.8910 30.9 5.8980 28.9 5.6039 35.9 5.8282 31.5 5.3003 31.0 6.2712 30.9 5.9592 30.0 5.0500 36.9 8.2464 41.9 6.6969 40.5 7.7841 43.9 9.0384 37.5 5.9894 37.9 7.5422 44.5 8.7951 37.9 6.0831 38.9 8.3607 36.9 8.1400 45.8 9.1416 (a) Calculate the least squares estimates of the slops and intercspt. (Round your answer to 3 decimal places.) (Round your answer to 2 decimal places.) (b) Find the mean selling price given that the taxes paid arex-8.9. (Round your answer to 2 decimal places.)arrow_forwarda, b, and carrow_forwardA) A multiple regression model was used in production the speed of a car based on several factors known to affect the speed. A graph of the residuals for the predicted values is presented below. i) Discuss the relevance of the graph shown below in relation to the normality of predicted values. 6.00000- 4.00000- 225 2.00000 00000- -2.00000- 424 164 226 227 O163 -4.00000- Standardized Residual (b) The diameter of iron rods issued in a high rising building pillars are under investigation. The diameter for Eleven rods were measured and the following results are obtained: Days 14.5 16.0 15.4 16.3 15.4 15.9 15.5 14.9 15.7 16.0 15.9 i. Determine The Interquartile range of the data ii. Determine a measure to describe the asymmetry of the data set.arrow_forward
- 2)arrow_forward2. A study is conducted in patients with HIV. The primary outcome is CD4 cell count which is a measure of the stage of the disease. Lower CD4 counts are associated with more advanced disease. The investigators are interested in the association between vitamin and mineral supplements and CD4 count. A multiple regression analysis is performed relating CD4 count to use of supplements(coded as l ves, 0-no) and to duration of HIV, in years(ie., the number of years between the diagnosis of HIV and the study date). For the analysis, Y-CD4 count. Y 501.41 12.67 Supplements - 30.23 Duration of HIV A. What is the expected CD4 count for a patient taking supplements who has had HIV for 2.5 years? B. What is the expected CD4 count for a patient not taking supplements who was diagnosed with HIV at study enrollment? C. What is the expected CD4 count for a patient not taking supplements who has had HIV for 2.5 yearsarrow_forward4. Housing Prices in New YorkWe have looked at predicting the price (in s) of New York homes based on the size (in thousands of square feet), using the data in HomesForSaleNY. Two other variables in the dataset are the number of bedrooms and the number of bathrooms. Use technology to create a multiple regression model to predict price based on all three variables: size, number of bedrooms, and number of bathrooms. Price Size Beds Baths 145 1.3 3 1.5 875 2.9 7 3.75 300 1.5 3 2.5 370 1.1 2 1 268 1.5 2 2 1399 4.8 6 5 1125 3.1 3 2.5 299 1.4 3 2 110 1.2 3 1 2999 6 7 8 170 1 2 1 269 1.5 3 1.5 150 1 2 1.5 288 1.8 3 2.1 350 1.3 3 2 120 0.9 1 1 309 2.4 4 2.5 1500 1.5 2 1.5 635 2.5 4 2.5 350 0.9 2 1 459 1.8 4 2.5 275 2.9 4 1.5 275 1.8 3 2 2500 3.7 3 3 187 1.4 3 1.5 238 1.7 3 1.5 155 0.7 1 1 175 1.6 3 1.5 569 3.2 4 2 105 1.2 2 2.5 a) Which of the variables which are significant at the 5% level? b) Which variable is the most…arrow_forward
- d and earrow_forwardThe relationship between total cholesterol (milligrams per deciliter) and BMI (Ratio of weight in kilograms to height in metres squared) of 20 participants is shown in the scatterplot below along with the least squares regression line. Which of the following statements is correct? a) The relationship between total cholesterol and BMI is linear as can be seen by the random scatter of the data above and below the least squares regression line. Both variables are metric and therefore it is appropriate to use Pearson's correlation to measure the linear association between the two variables. b) The relationship between total cholesterol and BMI is non-linear and since both variables are metric it is appropriate to use Pearson's correlation to measure the linear association between the two variables. c) The relationship between total cholesterol and BMI is non-linear as can be seen by the patterning of points around the least squares regression line and therefore it is not…arrow_forwardAbsenteeism: Absenteeism can be a serious employment problem. It is estimated that absenteeism reduces potential output by more than 10%. Two economists launched a research project to learn more about the problem. They randomly selected 100 organizations to participate in a 1-year study. For each organization, they recorded the average number of days absent per employee and several variables thought to affect absenteeism. Question: Use the t-tests and the slope coefficient for U/M Rel to describe how the relationship between the union and the management affects absenteeism. (both MLR and t-test are provided below) Data: Wage Pct PT Pct U Av Shift U/M Rel Absent 22477 8.5 57.1 1 1 5.4 29939 1.9 41.5 0 1 4.1 22957 12.2 52.6 1 0 11.5 18888 30.8 65.1 0 1 2.1 15078 6.8 68.8 0 1 5.9 15481 5.1 46.4 0 0 12.9 21481 25.3 38.9 0 1 3.5 29687 9.2 17.2 0 0 2.6 13603 8.4 12.9 0 0 8.6 18303 4.9 18.1 0 1 2.7 20832 23.8 64.4 1 1 6.6 22325 24.1 63.7 1 1 2.1 19964 8.6…arrow_forward
- 1)arrow_forwardWhy is it necessary that the variables are significantly correlated before performing regression analysis?arrow_forwardWhat is the differed annual expenditures of two families if their annual net incomes are differed by 2000? The computed regression line has a value of a=4.32 and b=2.12.arrow_forward
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningElementary Linear Algebra (MindTap Course List)AlgebraISBN:9781305658004Author:Ron LarsonPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt