IST 687 HW5.knit

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by MinisterGoldfinch3708

11/29/23, 8:11 PM HW5.knit file:///C:/Users/empet/OneDrive/Desktop/Intro to Data Science/Elyse_Peterson_HW5.knit.html 1/17 Intro to Data Science - HW 5 Copyright Jeffrey Stanton, Jeffrey Saltz, and Jasmina Tacheva # Enter your name here: Elyse Peterson Attribution statement: (choose only one and delete the rest) # 1. I did this homework by myself, with help from the book and the professor. This module: Data visualization is important because many people can make sense of data more easily when it is presented in graphic form. As a data scientist, you will have to present complex data to decision makers in a form that makes the data interpretable for them. From your experience with Excel and other tools, you know that there are a variety of common data visualizations (e.g., pie charts). How many of them can you name? The most powerful tool for data visualization in R is called ggplot . Written by computer/data scientist Hadley Wickham , this “graphics grammar” tool builds visualizations in layers. This method provides immense flexibility, but takes a bit of practice to master. Step 1: Make a copy of the data A. Read the who dataset from this URL: https://intro-datascience.s3.us-east-2.amazonaws.com/who.csv (https://intro-datascience.s3.us-east-2.amazonaws.com/who.csv) into a new dataframe called tb . Your new dataframe, tb, contains a so-called multivariate time series : a sequence of measurements on 23 Tuberculosis-related (TB) variables captured repeatedly over time (1980-2013). Familiarize yourself with the nature of the 23 variables by consulting the dataset’s codebook which can be found here: https://intro-datascience.s3.us- east-2.amazonaws.com/TB_data_dictionary_2021-02-06.csv (https://intro-datascience.s3.us-east- 2.amazonaws.com/TB_data_dictionary_2021-02-06.csv). library (readr) library (tidyverse) ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.3 ✔ purrr 1.0.2 ## ✔ forcats 1.0.0 ✔ stringr 1.5.0 ## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom e errors urlToRead <- "https://intro-datascience.s3.us-east-2.amazonaws.com/who.csv" tb <- read_csv(url(urlToRead)) File failed to load: /extensions/MathZoom.js

11/29/23, 8:11 PM HW5.knit file:///C:/Users/empet/OneDrive/Desktop/Intro to Data Science/Elyse_Peterson_HW5.knit.html 2/17 ## Rows: 5769 Columns: 23 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (1): iso2 ## dbl (22): year, new_sp, new_sp_m04, new_sp_m514, new_sp_m014, new_sp_m1524, ... ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. str(tb) File failed to load: /extensions/MathZoom.js

11/29/23, 8:11 PM HW5.knit file:///C:/Users/empet/OneDrive/Desktop/Intro to Data Science/Elyse_Peterson_HW5.knit.html 3/17 ## spc_tbl_ [5,769 × 23] (S3: spec_tbl_df/tbl_df/tbl/data.frame) ## $ iso2 : chr [1:5769] "AD" "AD" "AD" "AD" ... ## $ year : num [1:5769] 1989 1990 1991 1992 1993 ... ## $ new_sp : num [1:5769] NA NA NA NA 15 24 8 17 1 4 ... ## $ new_sp_m04 : num [1:5769] NA NA NA NA NA NA NA NA NA NA ... ## $ new_sp_m514 : num [1:5769] NA NA NA NA NA NA NA NA NA NA ... ## $ new_sp_m014 : num [1:5769] NA NA NA NA NA NA 0 0 0 0 ... ## $ new_sp_m1524: num [1:5769] NA NA NA NA NA NA 0 0 0 0 ... ## $ new_sp_m2534: num [1:5769] NA NA NA NA NA NA 0 1 0 0 ... ## $ new_sp_m3544: num [1:5769] NA NA NA NA NA NA 4 2 1 1 ... ## $ new_sp_m4554: num [1:5769] NA NA NA NA NA NA 1 2 0 1 ... ## $ new_sp_m5564: num [1:5769] NA NA NA NA NA NA 0 1 0 0 ... ## $ new_sp_m65 : num [1:5769] NA NA NA NA NA NA 0 6 0 0 ... ## $ new_sp_mu : num [1:5769] NA NA NA NA NA NA NA NA NA NA ... ## $ new_sp_f04 : num [1:5769] NA NA NA NA NA NA NA NA NA NA ... ## $ new_sp_f514 : num [1:5769] NA NA NA NA NA NA NA NA NA NA ... ## $ new_sp_f014 : num [1:5769] NA NA NA NA NA NA 0 0 NA 0 ... ## $ new_sp_f1524: num [1:5769] NA NA NA NA NA NA 1 1 NA 0 ... ## $ new_sp_f2534: num [1:5769] NA NA NA NA NA NA 1 2 NA 0 ... ## $ new_sp_f3544: num [1:5769] NA NA NA NA NA NA 0 3 NA 1 ... ## $ new_sp_f4554: num [1:5769] NA NA NA NA NA NA 0 0 NA 0 ... ## $ new_sp_f5564: num [1:5769] NA NA NA NA NA NA 1 0 NA 0 ... ## $ new_sp_f65 : num [1:5769] NA NA NA NA NA NA 0 1 NA 0 ... ## $ new_sp_fu : num [1:5769] NA NA NA NA NA NA NA NA NA NA ... ## - attr(*, "spec")= ## .. cols( ## .. iso2 = col_character(), ## .. year = col_double(), ## .. new_sp = col_double(), ## .. new_sp_m04 = col_double(), ## .. new_sp_m514 = col_double(), ## .. new_sp_m014 = col_double(), ## .. new_sp_m1524 = col_double(), ## .. new_sp_m2534 = col_double(), ## .. new_sp_m3544 = col_double(), ## .. new_sp_m4554 = col_double(), ## .. new_sp_m5564 = col_double(), ## .. new_sp_m65 = col_double(), ## .. new_sp_mu = col_double(), ## .. new_sp_f04 = col_double(), ## .. new_sp_f514 = col_double(), ## .. new_sp_f014 = col_double(), ## .. new_sp_f1524 = col_double(), ## .. new_sp_f2534 = col_double(), ## .. new_sp_f3544 = col_double(), ## .. new_sp_f4554 = col_double(), ## .. new_sp_f5564 = col_double(), ## .. new_sp_f65 = col_double(), ## .. new_sp_fu = col_double() ## .. ) ## - attr(*, "problems")=<externalptr> File failed to load: /extensions/MathZoom.js

Your preview ends here