Arndt-Kohlway_assignment2

docx

School

University of Maryland Global Campus (UMGC) *

*We aren’t endorsed by this school

Course

630

Subject

Electrical Engineering

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by Yoloswaggins12

Statistical Data Mining Logistical Regression on Congenital Heart Defects Nicholas Arndt-Kohlway DATA630: Machine Learning (2215) Professor Ami Gates 1

Statistical Data Mining Objective The goal of this analysis is to identify the contributing factors of congenital heart defects by utilizing a logistical regression along with the Naїve Bayes method. The specific question to be answered is: what variables contribute to an increased risk of having a congenital heart defect? The logistical regression will provide estimates of how much a variable contributes to the overall chance of having a congenital heart defect. The Naїve Bayes method will identify conditional probabilities for congenital heart defects. The amount of certain congenital heart defects (CHD) has been increasing, while other types of CHDs have remained stable over the past few years. “CHDs affect nearly 1% of—or about 40,000—births per year in the United States” (Reller, M. D., Strickland, M. J., Riehle- Colarusso, T., Mahle, W. T., & Correa, A.). Treatment of CHDs increases the survival rate of people with CHDs. However, there is no U.S. population-based tracking system to identify adolescents and adults with CHDs. “During 1999-2006, there were 41,494 deaths related to CHDs in the United States…Nearly half (48%) of the deaths due to CHDs occurred during infancy (younger than 1 year of age)” (Gilboa, S. M., Salemi, J. L., Nembhard, W. N., Fixler, D. E., & Correa, A.). By utilizing the data provided by KEEL it is possible to track and identify possible CHD cases to treat the condition before serious illness or death occurs. For this dataset it will be best to use a logistical regression to identify possible indicators for increased risk of a CHD. Since having a CHD is a yes or no answer, a logistical regression will be more useful than a linear regression. Linear regression would be more suited for a numerical dependent variable as opposed to a binary variable. The Naїve Bayes method will help with identifying the variable values that are more likely to contribute to a CHD. The Naїve Bayes method provides a more in-depth Apriori that includes probabilities of a CHD occurring 2

Statistical Data Mining giving certain circumstances. Between the logistical regression and Naїve Bayes there will be an increase in the efficiency and effectiveness of tracking CHDs among the older population. Analysis The congenital heart disease dataset was kindly provided by Knowledge Extraction Evolutionary Learning (KEEL). The given dataset was produced from a sample of men in a heart-disease high-risk region of the Western Cape in South Africa. The dataset includes 462 observations over 10 variables. The 10 variables include: 1. Sbp (systolic blood pressure) with values ranging from 101 to 218. 2. Tobacco (cumulative tobacco in kilograms) with values ranging from 0 to 31.2. 3. Ldl (low density lipoprotein cholesterol) with values ranging from 0.98 to 15.33. 4. Adiposity with values ranging from 6.74 to 42.49. 5. Famhist (family history of heart disease) with values Present or Absent. 6. Typea (type-A behavior) with values ranging from 13 to 78. 7. Obesity with values ranging from 14.7 to 46.58. 8. Alcohol (current alcohol consumption) with values ranging from 0 to 147.19. 9. Age (age at onset) with values ranging from 15 to 64. 10. Chd (congenital heart defect) with values 0 = no and 1 = yes. Looking at a summary in Figure 1, the maximum value of systolic blood pressure is 218 which is overly concerning. Quite frankly, did not even know it was possible to reach a SBP of 218. From this dataset there are a significant number of men that have a SBP over the normal range of 120. When looking at the alcohol, obesity, adiposity, and typea it is difficult to identify the measures of these variables. Assumptions were made that obesity is measured by Body Mass 3

Your preview ends here