Week6_Assignment

docx

School

New England College *

*We aren’t endorsed by this school

Course

CRN129

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

Uploaded by MatePuppyPerson3950

Week6_Assignment 2023-12-10 Sections: Introduction, Prerequisites, Variation, Visualizing Distributions, Typical Values, Unusual Values Exercises: 1, 2, 3, 4 library (GGally) ## Loading required package: ggplot2 ## Registered S3 method overwritten by 'GGally': ## method from ## +.gg ggplot2 library (tidyverse) #calling the "tidyverse" library ## Warning: package 'tidyverse' was built under R version 4.1.3 ## Warning: package 'tibble' was built under R version 4.1.3 ## Warning: package 'tidyr' was built under R version 4.1.3 ## Warning: package 'readr' was built under R version 4.1.3 ## Warning: package 'purrr' was built under R version 4.1.3 ## Warning: package 'forcats' was built under R version 4.1.3 ## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 -- ## v dplyr 1.1.4 v readr 2.1.4 ## v forcats 1.0.0 v stringr 1.5.1 ## v lubridate 1.9.3 v tibble 3.2.1 ## v purrr 1.0.1 v tidyr 1.3.0 ## -- Conflicts ------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors #1. Explore the distribution of each of the x, y, and z variables in diamonds. What do you learn? Think about a diamond and how you might

decide which dimension is the length, width, and depth. #solution: diamonds %>% gather ( key = dist, vals, x, y, z) %>% ggplot ( aes (vals, colour = dist)) + geom_freqpoly ( bins = 100 ) #It is hard to understand at first is that the distribution of X and Y are almost the same, since the same graph from above with `bins = 30` won't show you the X distribution because it overlaps perfectly. The correlation between the two is `cor(diamonds$x, diamonds$y)`. If we round each mm to the closest number, value-pairing x and y yields `mean(with(diamonds, round(x, 0) == round(y, 0)))` of the values with the same number. So far, the length is directly proportional to the y value. diamonds %>% filter (y < 30 ) %>% select (x, y, z) %>% ggpairs ()

#Yet the relationship between x and y with z is almost flat, as expected. That is, after excluding 2 diamonds which had unreasonable values. #2. Explore the distribution of price. Do you discover anything unusual or surprising? (Hint: Carefully think about the binwidth and make sure you try a wide range of values.) #solution: #graph <- map(seq(50, 1000, 100), # ~ ggplot(diamonds, aes(x = price)) + # geom_histogram(bins = .x) + # labs(x = NULL, y = NULL) + # scale_x_continuous(labels = NULL) + # scale_y_continuous(labels = NULL)) #multiplot(plotlist = graph) #The distribution seems to decrease, as expected, but there's a cut in the distribution showing that most prices are above or below a certain threshold. #3. How many diamonds are 0.99 carat? How many are 1 carat? What do you think is the cause of the difference? #solution: diamonds %>% filter ( between (carat, . 96 , 1.05 )) %>% group_by (carat) %>% summarize ( count = n ())

Your preview ends here