2-1 Lab Assignment and Report Brief R Linear Regression
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
460
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
docx
Pages
4
Uploaded by SuperBook2305
Running head: [SHORTENED TITLE UP TO 50 CHARACTERS]
1
IT 460 Lab and Report Brief
Daniel F. Origoni
Southern New Hampshire University
[SHORTENED TITLE UP TO 50 CHARACTERS]
2
The data set used is information collected from an insurance company, and it contains the
beneficiaries’ age, sex, BMI (body mass index), number of children, smoker status, region (northeast, northwest, southeast or southwest), and the medical expenses charged to the plan for the calendar year. The age group excludes anyone under the age of 18 and over the age of 64, since those people are covered under government care or under someone else’s insurance.
The first R command is summary(insurance$bmi), which outputs statistics regarding the variable
mbi. The results of the command are displayed in the screen shot, and from those results we can see that the median and the mean are very close to each other. From this we can conclude that this set has a normal distribution. The second command is pairs.panels(insurance[c("age", "bmi", "children", "charges")]). This command generates a scatterplot matrix displayed on the second image. “Above the diagonal a correlation matrix [is displayed]. On the diagonal, a histogram depicting the distribution of values for each feature is shown. Finally, the scatterplots below the diagonal are now presented with additional visual information.” Here we can see the visual representation of the normal distribution of the bmi variable. The third command is insurance$bmi2 <-insurance$bmi^2. This command creates a new variable called bmi2, which, although it is not used in the final formula, helps normalize a skewed distribution.
The last two commands, ins_model3 <- lm(charges ~ age + children + bmi +sex + + region, data = insurance) and summary(ins_model3), create a linear regression model using charges as the dependent variable and age, children, bmi, sex and region as the predictor variables (with a typo in the command line), and then executes a statistical report on that model representing how each variable affects the dependent variable.
[SHORTENED TITLE UP TO 50 CHARACTERS]
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[SHORTENED TITLE UP TO 50 CHARACTERS]
4