IDRIPR Word

docx

School

University Canada West *

*We aren’t endorsed by this school

Course

650

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

17

Uploaded by DukeInternetPony38

Report
1 Individual Descriptive Report University Canada West
2 Contents Table of Figures ............................................................................................................................... 3 Introduction ...................................................................................................................................... 4 Background ...................................................................................................................................... 4 Methodology .................................................................................................................................... 4 Result and Discussion ...................................................................................................................... 5 Data Cleaning .............................................................................................................................. 5 Data Exploration .......................................................................................................................... 6 Regression Model ........................................................................................................................ 8 Some intriguing Questions ........................................................................................................ 10 Conclusion ..................................................................................................................................... 10 References ...................................................................................................................................... 11 ZeroGPT ........................................................................................................................................ 12
3 Table of Figures Figure 1 ........................................................................................................................................... 6 Figure 2 ........................................................................................................................................... 7 Figure 3 ........................................................................................................................................... 8 Figure 4 ........................................................................................................................................... 8 Figure 5 ......................................................................................................................................... 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 Introduction In this topic, we will predict the insurance charges for the policyholders based on their age, smoking habits, body mass index and region. This is helpful to many insurance companies as they can charge new customers accurately with the dataset of their previous experiences. In the background section, more details of this study will be given; some of the previous reports in this field and their conclusion will be in the literature review. In methodology, there will be a description of the different methods used in this report, the heading result and discussion will describe different tables and analyse all the data, and finally, we will give our conclusion for this report. Background We have data of charges of the individuals by an insurance company with different characteristics like age, sex, BMI, children, smoker and region. This report will analyse the data and then give a prediction for charges of insurance price. Methodology Use Excel functions like filter and boxplot, use pivot tables to create line and column charts, —use some commands to find quartiles, standard deviation, correl, rand, mse, mae etc. Use of dummy variable to convert characteristics to values for smoker, sex, and region. Use of data analysis for regression.
5 Result and Discussion Data Cleaning Data cleaning makes data more reliable by correcting or removing inaccurate or redundant data and duplicates. (Stedman, 2022). The following steps were used in the dataset- 1. Checking for duplicates – 1 duplicate found and removed. 2. Checking blank spaces – There was no missing data. 3. Changing the charges from value to currency. 4. Checking for outliers in charges, we calculated the min and max whisker and removed outliers greater than the max whisker, i.e. $34679. The following boxplot was created from the price data. Figure 1 Note- Outliers are shown above the max whisker in the boxplot.
6 After removing outliers, the total data left is 1200. Data Exploration Descriptive analysis of data with charges and different variables like age, BMI and smoker. The prices of insurance increase with age; the older you get, the more you can get hospitalized, so you have to pay a premium for insurance (Walker, 2022). Figure 2 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 0 5000 10000 15000 20000 25000 Age Avg Charge Note - The charges of insurance are according to their age. From the line chart, we can see that there is an upward trend in the price with the rise in age and we can see three different age groups with comparative price rage – young people (16- 35), middle-aged (35-50) and older people (>50) as there is steep rise after the end of each group. Companies also use a person’s BMI to charge insurance as anybody with BMI more than 30 is categorised as obese and have to pay more as they are more prone to obesity-related disease like diabetes ( GoodRX r , n.d.).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 Figure 3 15.9620.61522.8824.41526.0727.64529.04 30.4 32.01 33.44 35.1 36.6338.66540.66 45.54 0 10000 20000 30000 40000 50000 60000 70000 Total BMI Avg Charges Note - Column chart of BMI and average insurance charges. The above figure shows that the average insurance charges of people with a BMI greater than 30 are comparatively more than those with less than 30. The insurance price for people with smoking habits is higher, as they are more prone to health problems like lung cancer, heart, etc., so they have to pay a premium ( Does Smoking Affect Life Insurance?, n.d.). Figure 4 no yes 0 5000 10000 15000 20000 25000 30000 35000 Total Smoker Avg Charges
8 Note- average insurance price for people with smoking habits and non-smokers. The figure shows that smokers must pay nearly four times the insurance price compared to non-smokers. The correlation coefficient between insurance charges and age is 0.298, and 0.198 between charges and bmi. The correl between children and charges is 0.06, so we are not using this variable. Creating dummy variables for sex, smoker and region. Regression Model In the dataset above, using the rand function, shuffling the data, and making test and train data(80-20%), we get 960 and 261 data, respectively. The dependent variable taken for the regression model is charged, while the independent variable is selected to be age, smoker and BMI and the following group are formed- 1. Smoker = 1, BMI < 30, 18 <= Age <35 2. Smoker = 1, BMI < 30, 35 <= Age <50 3. Smoker = 1, BMI < 30, Age>=50 4. Smoker = 0, BMI > 30, 18 <= Age <35 5. Smoker = 0, BMI > 30, 35 <= Age <50 6. Smoker = 0, BMI > 30, Age>= 50 7. Smoker = 1, BMI > 30, 18 <= Age <35 8. Smoker = 0, BMI > 30, 18 <= Age <35 9. Smoker = 0, BMI < 30, 35 <= Age <50 10. Smoker = 0, BMI < 30, Age >= 50
9 With these different cases, we created groups from train data and did regression in each group using data analysis toolpak, and we copied all the coefficients of variables in the train sheet. To get the prediction value, we multiplied all the coefficients with their respective variables and used nested if with AND function to select from which group value of coefficient should be taken. From the formula we get from train, we used it in the values of test for the prediction of charges. Figure 5 $0.00 $10,000.00 $20,000.00 $30,000.00 $40,000.00 0 5000 10000 15000 20000 25000 30000 35000 40000 f(x) = 0.8 x + 2502.76 R² = 0.74 Charges Prediction Note - Values of charges and prediction value with a trendline. The y-intercept of the regression model is 2502.8, and the coefficient of determination, also known as R-squared (R²), is 0.7421. This shows that the regression model discussed explained three-fourths of the variability, while the model cannot explain 25% of the variability.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10 The Mean squared value of the regression model is 15660103, and the Root means the squared value is 3957.28, which is a comparatively low value compared with the predicted value, as this value shows the accuracy of the regression model. The Mean absolute error is 2371.9 and is better used for checking the accuracy of the model as it is less prone to outliers. Some intriguing Questions Question-1: There is no relation between Insurance charges and the number of children. Answer: With the value of correlation 0.06 shows that there is nearly no relation between them. Question-2: Relation between sex and charges of insurance. Answer: This shows a slightly negative, nearly zero relation between the sex of customers and insurance charges. Conclusion In conclusion, we used different variables like age, BMI, smoker and region to find the relation between them and the insurance price charged and created a regression model to predict the insurance price. This report shows how insurance charges vary according to all these variables. The people who want to be a policyholder can check their estimated policy price and correct some of these variables before getting insurance, while policy company can give a large number of people their policy price using this regression model.
11
12 References Does smoking affect life insurance? (n.d.). https://www.cooperators.ca/en/resource-centre/protect-what-matters/life-insurance-for- smokers GoodRX . (n.d.). https://www.goodrx.com/insurance/health-insurance/health-insurance-charge- more-obesity Walker, E. (2022). How age impacts your health insurance costs. PeopleKeep, Inc. https://www.peoplekeep.com/blog/how-age-impacts-your-health-insurance-costs
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
13 ZeroGPT
14
15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
16
17