worksheet_wrangling

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

DSCI100

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by CountKuduMaster478

Worksheet 3: Cleaning and Wrangling Data Lecture and Tutorial Learning Goals: After completing this week's lecture and tutorial work, you will be able to: distinguish vectors and data frames in R, and how they relate to each other define the term "tidy data" discuss the advantages and disadvantages of storing data in a tidy data format recall and use the following tidyverse functions and operators for their intended data wrangling tasks: select filter |> map mutate summarize group_by pivot_longer separate %in% This worksheet covers parts of the Wrangling chapter of the online textbook. You should read this chapter before attempting the worksheet. ### Run this cell before continuing. library ( tidyverse ) library ( repr ) source ( "tests.R" ) source ( "cleanup.R" ) options ( repr.matrix.max.rows = 6 ) Question 0.0 Multiple Choice: {points: 1} Which statement below is incorrect about vectors and data frames in R? A. the columns of data frames are vectors B. data frames can have columns of different types (e.g., a column of numeric data, and a column of character data) C. vectors can have elements of different types (e.g., element one can be numeric, and element 2 can be a character) D. data frames are a special kind of list In [ ]:

Assign your answer to an object called answer0.0 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ). # Replace the fail() with your answer. ### BEGIN SOLUTION answer0.0 <- "C" ### END SOLUTION test_0.0 () Question 0.1 Multiple Choice: {points: 1} Which of the following does not characterize a tidy dataset? A. each row is a single observation B. each value should not be in a single cell C. each column is a single variable D. each value is a single cell Assign your answer to an object called answer0.1 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ). # Replace the fail() with your answer. ### BEGIN SOLUTION answer0.1 <- "B" ### END SOLUTION test_0.1 () Question 0.2 Multiple Choice: {points: 1} For which scenario would using one of the group_by() + summarize() be appropriate? A. To apply the same function to every row. B. To apply the same function to every column. C. To apply the same function to groups of rows. D. To apply the same function to groups of columns. Assign your answer to an object called answer0.2 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ). In [ ]: In [ ]: In [ ]: In [ ]:

# Replace the fail() with your answer. ### BEGIN SOLUTION answer0.2 <- "C" ### END SOLUTION test_0.2 () Question 0.3 Multiple Choice: {points: 1} For which scenario would using one of the purrr map_* functions be appropriate? A. To apply the same function to groups of rows. B. To apply the same function to every column. C. To apply the same function to groups of columns. D. All of the above. *Assign your answer to an object called answer0.3 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ).** # Replace the fail() with your answer. ### BEGIN SOLUTION answer0.3 <- "B" ### END SOLUTION test_0.3 () 1. Assessing avocado prices to inform restaurant menu planning It is a well known that millennials LOVE avocado toast (joking...well mostly ), and so many restaurants will offer menu items that centre around this delicious food! Like many food items, avocado prices fluctuate. So a restaurant who wants to maximize profits on avocado-containing dishes might ask if there are times when the price of avocados are less expensive to purchase? If such times exist, this is when the restaurant should put avocado-containing dishes on the menu to maximize their profits for those dishes. In [ ]: In [ ]: In [ ]: In [ ]:

Your preview ends here