hw03-sol

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

142

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by BaronRhinocerosPerson995

Assignment 3: Predicting insurance charges by age and BMI Your name and student ID February 05, 2024 Run this chunk of code to load the autograder package! Instructions • Solutions will be released by Sunday, February 2nd. • This semester, homework assignments are for practice only and will not be turned in for marks. Helpful hints: • Every function you need to use was taught during lecture! So you may need to revisit the lecture code to help you along by opening the relevant files on Datahub. Alternatively, you may wish to view the code in the condensed PDFs posted on the course website. Good luck! • Knit your file early and often to minimize knitting errors! If you copy and paste code for the slides, you are bound to get an error that is hard to diagnose. Typing out the code is the way to smooth knitting! We recommend knitting your file each time after you write a few sentences/add a new code chunk, so you can detect the source of knitting errors more easily. This will save you and the GSIs from frustration! You must knit correctly before submitting. • It is good practice to not allow your code to run off the page. To avoid this, have a look at your knitted PDF and ensure all the code fits in the file. If it doesn’t look right, go back to your .Rmd file and add spaces (new lines) using the return or enter key so that the code runs onto the next line. 1

library (readr) library (dplyr) library (ggplot2) library (broom) library (forcats) Predicting insurance charges by age and BMI Problem : Medical insurance charges can vary according to the complexity of a procedure or condition that requires medical treatment. You are tasked with determining how these charges are associated with age, for patients who have a body mass index (bmi) in the “normal” range (bmi between 16 and 25) who are smokers. Plan : You have chosen to use tools to examine relationships between two variables to address the problem. In particular, scatter plots and simple linear regression. Data : You have access to the dataset insurance.csv , a claims dataset from an insurance provider. Analysis and Conclusion : In this assignment you will perform the analysis and make a conclusion to help answer the problem statement. 2

1. [1 point] Type one line of code to import these data into R. Assign the data to insure_data . Execute the code by hitting the green arrow and ensure the dataset has been saved by looking at the environment tab and viewing the data set by clicking the table icon to the right of its name. insure_data <- read_csv ( "data/insurance.csv" ) ## Rows: 1338 Columns: 7 ## -- Column specification -------------------------------------------------------- ## Delimiter: "," ## chr (3): sex, smoker, region ## dbl (4): age, bmi, children, charges ## ## i Use ` spec() ` to retrieve the full column specification for this data. ## i Specify the column types or set ` show_col_types = FALSE ` to quiet this message. insure_data ## # A tibble: 1,338 x 7 ## age sex bmi children smoker region charges ## <dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl> ## 1 19 female 27.9 0 yes southwest 16885. ## 2 18 male 33.8 1 no southeast 1726. ## 3 28 male 33 3 no southeast 4449. ## 4 33 male 22.7 0 no northwest 21984. ## 5 32 male 28.9 0 no northwest 3867. ## 6 31 female 25.7 0 no southeast 3757. ## 7 46 female 33.4 1 no southeast 8241. ## 8 37 female 27.7 3 no northwest 7282. ## 9 37 male 29.8 2 no northeast 6406. ## 10 60 female 25.8 0 no northwest 28923. ## # i 1,328 more rows . = ottr :: check ( "tests/p1.R" ) ## ## All tests passed! 3

Your preview ends here