ps44

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

20

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

2

Uploaded by BaronPencilButterfly12

Report
Stat 20 PS: Data Pipelines The following practice problems all deal with a data set that’s near and dear to our hearts, the Palmer penguins dataset. Like always, make sure you start your analysis by loading in the necessary packages and data sets. 1. For each question, write R code and copy the code used in the space provided. Unless otherwise specified, your answer to each question should consist of one dplyr data pipeline!! 2. For the questions below that ask you to extract a subset of the data (marked with an asterisk) , check your answer by using the new data frame you make to modify the plot made using the code provided below. ggplot ( data = penguins, mapping = aes ( x = bill_length_mm, y = bill_depth_mm, color = species)) + geom_point () + lims ( x = c ( 30 , 60 ), y = c ( 12 , 23 )) Questions 1-3 Question 1* Extract a data frame that excludes the Adelie penguins in two different ways: • One without using a data pipeline; • One with using a data pipeline. Question 2* Extract a data frame that excludes the Adelie penguins and retains penguins with bill lengths between 40 and 50 mm. Question 3 Sort the data frame from Question 2 in decreasing order by bill length.
Questions 4-6 Question 4 Calculate the mean bill length and bill depth for each of the three species of penguins . Do those statistics line up with what you see in the penguins plot provided by the (original) code given? Question 5 Consider a new metric called bill_size that’s the sum of the length and depth. What is the average bill size and it’s standard deviation among each species, broken out among each of the island? You may end up with potentially nine pairs of statistics. Sort your resulting data structure in decreasing order by average bill size. Question 6 What are the total number of penguins in the data set belonging to each species-island combination? Why may have you not gotten nine pairs of statistics in the last question? Question 7 What is the proportion of penguins for each species having a body mass greater than 4000 grams? Hint: use a logical variable!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help