Lab04

html

School

Temple University *

*We aren’t endorsed by this school

Course

1013

Subject

Computer Science

Date

Dec 6, 2023

Type

html

Pages

Uploaded by samzahroun

Lab 04 Functions and Visualization ¶ Elements of Data Science Welcome to lab 4! This week, we will focus on functions and visualization. Functions are described in Chapter 8 of the Inferential Thinking text. Visualizations is covered in Chapter 7 . First, set up the tests and imports by running the cell below. In [121]: # Enter your name as a string name = "Sam Zahroun" In [122]: import numpy as np from datascience import * %matplotlib inline import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') # This line loads the tests. from gofer.ok import check Let's explore the most recent COVID data from the New York Times ¶ This data is updated and stored at GitHub: https://github.com/nytimes/covid-19-data US rolling average: https://raw.githubusercontent.com/nytimes/covid-19- data/master/rolling-averages/us.csv US States rolling average: https://raw.githubusercontent.com/nytimes/covid-19- data/master/rolling-averages/us-states.csv In [123]: COVID_data = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling- averages/us.csv' COVID=Table.read_table(COVID_data) COVID=COVID.set_format(0, DateFormatter(format='%Y-%m-%d',)) If the above read does not work we can use the data handling packages pandas as first discussed in the introduction to Lab 03. It can be run by removing comments, #, in front of the below lines. In [124]: import pandas as pd data_db = pd.read_csv(COVID_data) # Read data with pandas COVID = Table.from_df(data_db) # Create datascience Table object COVID=COVID.set_format(0, DateFormatter(format='%Y-%m-%d',)) In [125]: COVID.sort("date",descending=False) # Display most recent first Out[125]: date geoid cases cases_av g cases_avg_pe r_100k deaths deaths_a vg deaths_avg_per _100k 2020-01- 21 USA 1 0.14 0 0 0 0 2020-01- 22 USA 0 0.14 0 0 0 0

date geoid cases cases_av g cases_avg_pe r_100k deaths deaths_a vg deaths_avg_per _100k 2020-01- 23 USA 0 0.14 0 0 0 0 2020-01- 24 USA 1 0.29 0 0 0 0 2020-01- 25 USA 1 0.43 0 0 0 0 2020-01- 26 USA 2 0.71 0 0 0 0 2020-01- 27 USA 0 0.71 0 0 0 0 2020-01- 28 USA 0 0.71 0 0 0 0 2020-01- 29 USA 0 0.71 0 0 0 0 2020-01- 30 USA 1 0.86 0 0 0 0 ... (1148 rows omitted) Use where to select data from November - December 2021 ¶ Here are the possible arguments for the where Table method: Predicate Example Result are.equal_to are.equal_to(50) Find rows with values equal to 50 are.not_equal_to are.not_equal_to(50) Find rows with values not equal to 50 are.above are.above(50) Find rows with values above (and not equal to) 50 are.above_or_equal _to are.above_or_equal_to (50) Find rows with values above 50 or equal to 50 are.below are.below(50) Find rows with values below 50 are.between are.between(2, 10) Find rows with values above or equal to 2 and below 10

In [126]: COVID.where("deaths",are.between(0,1)) Out[126]: date geoid cases cases_av g cases_avg_pe r_100k deaths deaths_a vg deaths_avg_per _100k 2020-01- 21 USA 1 0.14 0 0 0 0 2020-01- 22 USA 0 0.14 0 0 0 0 2020-01- 23 USA 0 0.14 0 0 0 0 2020-01- 24 USA 1 0.29 0 0 0 0 2020-01- 25 USA 1 0.43 0 0 0 0 2020-01- 26 USA 2 0.71 0 0 0 0 2020-01- 27 USA 0 0.71 0 0 0 0 2020-01- 28 USA 0 0.71 0 0 0 0 2020-01- 29 USA 0 0.71 0 0 0 0 2020-01- 30 USA 1 0.86 0 0 0 0 ... (41 rows omitted) Dates produce an error as you will see in the next cell, below we will see the steps needed to work with dates In [127]: # Dates produce an error, below we will see the steps needed to work with dates COVID.where("date",are.between("11/01/2021","12/31/2021")) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[127], line 2 1 # Dates produce an error, below we will see the steps needed to work with dates ----> 2 COVID . where( "date" ,are . between( "11/01/2021" , "12/31/2021" )) File /opt/conda/lib/python3.10/site-packages/datascience/tables.py:1415, in

Your preview ends here