tutorial_reading

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

DSCI100

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by CountKuduMaster478

Tutorial 2: Introduction to Reading Data Lecture and Tutorial Learning Goals: After completing this week's lecture and tutorial work, you will be able to: define the following: absolute file path relative file path url read data into R using a relative path and a url compare and contrast the following functions: read_csv read_tsv read_csv2 read_delim read_excel match the following tidyverse read_* function arguments to their descriptions: file delim col_names skip choose the appropriate tidyverse read_* function and function arguments to load a given plain text tabular data set into R use readxl library's read_excel function and arguments to load a sheet from an excel file into R connect to a database using the DBI library's dbConnect function list the tables in a database using the DBI library's dbListTables function create a reference to a database table that is queriable using the tbl from the dbplyr library retrieve data from a database query and bring it into R using the collect function from the dbplyr library use write_csv to save a data frame to a csv file optional: scrape data from the web read/scrape data from an internet URL using the rvest html_nodes and html_text functions compare downloading tabular data from a plain text file (e.g. *.csv ) from the web versus scraping data from a .html file Any place you see ... , you must fill in the function, variable, or data to complete the code. Replace fail() with your completed code and run the cell! This worksheet covers parts of the Reading chapter of the online textbook. You should read this chapter before attempting the worksheet.

### Run this cell before continuing. library ( tidyverse ) library ( repr ) library ( rvest ) library ( stringr ) library ( janitor ) options ( repr.matrix.max.rows = 6 ) source ( "tests.R" ) source ( "cleanup.R" ) 1. Happiness Report As you might remember from worksheet_reading , we practised loading data from the Sustainable Development Solutions Network's World Happiness Report . That data was the output of their analysis that calculated each country's happiness score and how much each variable contributed to it. In this tutorial, we are going to look at the data at an earlier stage of the study - the aggregated/averaged values (per country and year) for many different social and health aspects that the researchers anticipated might contribute to happiness (Table2.1 from this Excel spreadsheet ). The goal for today is to produce a plot of 2017's positive affect scores against healthy life expectancy at birth, with healthy life expectancy at birth on the x-axis and positive affect on the y-axis. For this study, positive affect was defined as the average of three positive affect measures: happiness, laughter and enjoyment. We would also like to convert the positive affect score from a scale of 0 - 1 to a scale from 0 - 10. 1. use filter to subset the rows where the year is equal to 2017 2. use mutate to convert the "Positive affect" score from a scale of 0 - 1 to a scale from 0 - 10 3. use select to choose the "Healthy life expectancy at birth" column and the scaled "Positive affect" column 4. use ggplot to create our plot of "Healthy life expectancy at birth" (x - axis) and scaled "Positive affect" (y - axis) Tips for success: Try going through all of the steps on your own, but don't forget to discuss with others (classmates, TAs, or an instructor) if you get stuck. If something is wrong and you can't spot the issue, be sure to read the error message carefully . Since there are a lot of steps involved in working with data and modifying it, feel free to look back at worksheet_reading . Question 1.1 Multiple Choice: {points: 1} What is the maximum value for the "Positive affect" score (in the original data file that you read into R)? In [ ]:

A. 100 B. 10 C. 1 D. 0.1 E. 5 Assign your answer to an object called answer1.1 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ). # Replace the fail() with your answer. ### BEGIN SOLUTION answer1.1 <- "C" ### END SOLUTION test_1.1 () Question 1.2 Multiple Choice: {points: 1} Which column's values will be used to filter the data? A. countries B. generosity C. positive affect D. year Assign your answer to an object called answer1.2 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ). # Replace the fail() with your answer. ### BEGIN SOLUTION answer1.2 <- "D" ### END SOLUTION test_1.2 () Question 1.3.0 {points: 1} Use the appropriate read_* function to read in the WHR2018Chapter2OnlineData (look in the tutorial_02 directory to ensure you use the correct relative path to read it in). Assign the data frame to an object called happy_df_csv . In [ ]: In [ ]: In [ ]: In [ ]:

Your preview ends here