M5_R Practise_CharanThota

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6010

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

Uploaded by KidDragonfly3195

Regression Analysis Introduction In this report we going to perform the correlation and regression of a heart data set. The dataset comprises observations on the proportion of individuals riding to work each day, the percentage of people smoking, and the percentage of people having heart disease in a sample of 500 townships Data Analysis 1. Import and summary of a heart data set We have imported heart.data.csv data set in R markdown. The data set contains the 3 columns such as biking, smoking and heart disease. Using the summary function, we can observe the median, minimum, standard deviation, mean, and maximum data. 2. Correlation table and plot of a heart data set 2

Observation: By this correlation table and correlation plot, we can include both factors in our regression model because the association between biking and smoking is weak (0.015 is 1.5 percent correlation). 3. Linearity and Histogram of a heart disease Observation: From the histogram, we can the heart disease is normal distributed. From the Linearity of two scatter plots are one is for biking and heart disease, while the other is for smoking and heart 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

disease. The relation between smoking and heart disease is less obvious, although it still appears to be linear. We can proceed with the regression analysis. 4. Preforming Regression model analysis Observation: In our above analysis of 500 townships, there's a linear relation between biking to work, smoking, and heart disease. Biking to work is done by 1 to 75 percent of people, smoking is done by 0.5 to 30 percent of people, and heart disease by 0.5 to 20.5 percent of people. Biking has an estimated effect of -0.2 on heart disease, while smoking has an estimated effect of 0.178. There is essentially no likelihood that this impact is attributable to chance for both factors. The residuals exhibit no skew in regression analysis; thus, we can declare our model fulfils the homoscedasticity condition. 4

5. Regression plot model Observation: In this can made separate data frame with the construction of three levels of smoking over which to estimate heart disease rates, chosen the minimum, mean, and maximum values for it. Next, we stored the predicted values in the predict y value. Finally added the three regression lines to it such as minimum, mean, and maximum. Summary We observed substantial relation between the biking to work and heart disease, as well as the smoking and the frequency of heart disease in 500 townships. We found that for every 1 percent rise in biking, the rate of heart disease decreased by 0.2 percent is ±0.0014, but for every 1 5

percent increase in smoking, the frequency of heart disease increased by 0.178 percent is ±0.0035. References 1. Tutorialspoint. (2021). R - Linear Regression. Tutorialspoint. https://www.tutorialspoint.com/r/r_linear_regression.htm 2. Peng, R. S. D. K. (2020, December 20). 4.1 Basic Plotting With ggplot2 | Mastering Software Development in R. Bookdown. https://bookdown.org/rdpeng/RProgDA/basic-plotting-with- ggplot2.html 3. I. Kabacoff, R. (2021). Quick-R: Correlations. Statmethods. https://www.statmethods.net/stats/correlations.html Appendix 1. For data analysis I used the data set named as heart.data.csv 2. I attached the R markdown file which is named as M5_RPractice_CharanThota.Rmd 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version