lab03_tables
html
keyboard_arrow_up
School
Temple University *
*We aren’t endorsed by this school
Course
1013
Subject
Computer Science
Date
Dec 6, 2023
Type
html
Pages
23
Uploaded by samzahroun
Lab 3: Tables
¶
Welcome to lab 3!
This week, we will focus on manipulating tables. We will import our data sets into tables
and complete the majority of analysis using these tables. Tables are described in
Chapter
6
of the Inferential Thinking text. A related approach in Python programming is to use
what is known as a
pandas dataframe
which we will need to resort to occasionally.
Pandas is a mainstay datascience tool.
First, set up the tests and imports by running the cell below.
In [1]:
import numpy as np
from datascience import * # Brings into Python the datascience Table object
# These lines load the tests.
from gofer.ok import check
In [2]:
# Enter your name as a string
# Example
dogname = "Fido"
# Your name
name = "Sam Zahroun"
1. Introduction
¶
For a collection of things in the world, an array is useful for describing a single attribute
of each thing. For example, among the collection of US States, an array could describe
the land area of each. Tables extend this idea by describing multiple attributes for each
element of a collection.
In most data science applications, we have data about many entities, but we also have
several kinds of data about each entity.
For example, in the cell below we have two arrays. The first one contains the world
population in each year (estimated by the US Census Bureau), and the second contains
the years themselves. These elements are in order, so the year and the world population
for that year have the same index in their corresponding arrays.
In [3]:
population_amounts = Table.read_table("world_population.csv").column("Population")
years = np.arange(1950, 2016,1)
print("Population column:", population_amounts)
print("Years column:", years)
Population column: [2557628654 2594939877 2636772306 2682053389 2730228104 2782098943
2835299673 2891349717 2948137248 3000716593 3043001508 3083966929
3140093217 3209827882 3281201306 3350425793 3420677923 3490333715
3562313822 3637159050 3712697742 3790326948 3866568653 3942096442
4016608813 4089083233 4160185010 4232084578 4304105753 4379013942
4451362735 4534410125 4614566561 4695736743 4774569391 4856462699
4940571232 5027200492 5114557167 5201440110 5288955934 5371585922
5456136278 5538268316 5618682132 5699202985 5779440593 5857972543
5935213248 6012074922 6088571383 6165219247 6242016348 6318590956
6395699509 6473044732 6551263534 6629913759 6709049780 6788214394
6866332358 6944055583 7022349283 7101027895 7178722893 7256490011]
Years column: [1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010 2011 2012 2013 2014 2015]
Suppose we want to answer this question:
When did world population cross 6 billion?
You could technically answer this question just from staring at the arrays, but it's a bit
convoluted, since you would have to count the position where the population first
crossed 6 billion, then find the corresponding element in the years array. In cases like
these, it might be easier to put the data into a
Table
, a 2-dimensional type of dataset.
The expression below:
•
creates an empty table using the expression
Table()
,
•
adds two columns to the table by calling
with_columns
with four arguments
(column and data for each),
•
assigns the result to the name
population
, and finally
•
evaluates
population
so that we can see the table.
The strings
"Year"
and
"Population"
are column labels that we have chosen. Ther
names
population_amounts
and
years
were assigned above to two arrays of the same
length. The function
with_columns
(you can find the documentation
here
) takes in
alternating strings (to represent column labels) and arrays (representing the data in
those columns), which are all separated by commas. Tip: Both
population_amounts
and
years
need the same number of data points or an error will be returned on attempting to
construct the table.
In [4]:
population = Table().with_columns(
"Population", population_amounts,
"Year", years
)
population
Out[4]:
Population
Year
2557628654 1950
2594939877 1951
2636772306 1952
2682053389 1953
2730228104 1954
2782098943 1955
2835299673 1956
2891349717 1957
Population
Year
2948137248 1958
3000716593 1959
... (56 rows omitted)
Now the data are all together in a single table! It's much easier to parse this data--if you
need to know what the population was in 1959, for example, you can tell from a single
glance. We'll revisit this table later.
Question 1
From the example in the cell above, identify the variables or data types for each of the
following: which variable contains the table? which variable contains an array? On the
right of the equals sign provide the correct variable name.
In [5]:
table_var = population
array_var = years
In [6]:
check('tests/q1.py')
Out[6]:
All tests passed!
2. Creating Tables
¶
Question 2
In the cell below, we've created 2 arrays. In these examples, we're going to be looking at
the Enviornmental Protection Index which describes the state of sustainability in each
country. More information can be found:
Yale EPI
. Using the steps above, assign
top_10_epi
to a table that has two columns called "Country" and "Score", which hold
top_10_epi_countries
and
top_10_epi_scores
respectively.
In [7]:
top_10_epi_scores = make_array(82.5, 82.3, 81.5, 81.3, 80., 79.6, 78.9, 78.7, 77.7, 77.2)
top_10_epi_countries = make_array(
'Denmark',
'Luxembourg',
'Switzerland',
'United Kingdom',
'France',
'Austria',
'Finland',
'Sweden',
'Norway',
'Germany'
)
top_10_epi = Table().with_columns(
"Country", top_10_epi_countries,
"Score", top_10_epi_scores
)
# We've put this next line here so your table will get printed out when you
# run this cell.
top_10_epi
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Out[7]:
Country
Score
Denmark
82.5
Luxembourg
82.3
Switzerland
81.5
United Kingdom
81.3
France
80
Austria
79.6
Finland
78.9
Sweden
78.7
Norway
77.7
Germany
77.2
In [8]:
check('tests/q2.py')
Out[8]:
All tests passed!
Loading a table from a file
¶
In most cases, we aren't going to go through the trouble of typing in all the data
manually. Instead, we can use our
Table
functions.
Table.read_table
takes one argument, a path to a data file (a string) and returns a
table. There are many formats for data files, but CSV ("comma-separated values") is the
most common.
Question 3
The file
yale_epi.csv
in the current directory contains a table of information about 180
countries with their corresponding Environmental Performance Index (EPI) based on 32
indicators of sustainability. Load it as a table called
epi
using the
Table.read_table
function.
In [9]:
epi = Table.read_table("yale_epi.csv")
epi
Out[9]:
Country
Score
Decade Change
Rank
Afghanistan
25.5
5
178
Angola
29.7
5.3
158
Country
Score
Decade Change
Rank
Albania
49
10.2
62
United Arab Emirates
55.6
11.3
42
Argentina
52.2
5
54
Armenia
52.3
4.5
53
Antigua and Barbuda 48.5
3.3
63
Australia
74.9
5.5
13
Austria
79.6
5.4
6
Azerbaijan
46.5
4
72
... (170 rows omitted)
In [10]:
check('tests/q3.py')
Out[10]:
All tests passed!
Notice the part about "... (170 rows omitted)." This table is big enough that only a few of
its rows are displayed, but the others are still there. 10 are shown, so there are 180
movies total.
Where did
yale_epi.csv
come from? Take a look at
this lab's folder
. You should see a file
called
yale_epi.csv
.
Open up the
yale_epi.csv
file in that folder and look at the format. What do you notice?
The
.csv
filename ending says that this file is in the
CSV (comma-separated value)
format
.
3. Using lists
¶
A
list
is another Python sequence type, similar to an array. It's different than an array
because the values it contains can all have different types. A single list can contain
int
values,
float
values, and strings. Elements in a list can even be other lists! A list is
created by giving a name to the list of values enclosed in square brackets and separated
by commas. For example,
values_with_different_types = ['data', 8, 8.1]
Lists can be useful when working with tables because they can describe the contents of
one row in a table, which often corresponds to a sequence of values with different types.
A list of lists can be used to describe multiple rows.
Each column in a table is a collection of values with the same type (an array). If you
create a table column from a list, it will automatically be converted to an array. A row, on
the ther hand, mixes types.
Here's a table from Chapter 5. (Run the cell below.)
In [11]:
# Run this cell to recreate the table
flowers = Table().with_columns(
'Number of petals', make_array(8, 34, 5),
'Name', make_array('lotus', 'sunflower', 'rose')
)
flowers
Out[11]:
Number of petals
Name
8
lotus
34
sunflower
5
rose
Question 4
Create a list that describes a new fourth row of this table. The details can be whatever
you want, but the list must contain two values: the number of petals (an
int
value) and
the name of the flower (a string). For example, your flower could be "pondweed"! (A
flower with zero petals)
In [12]:
my_flower = [0, "pondweed"]
my_flower
Out[12]:
[0, 'pondweed']
In [13]:
check('tests/q4.py')
Out[13]:
All tests passed!
Question 5
my_flower
fits right in to the table from chapter 5. Complete the cell below to create a
table of seven flowers that includes your flower as the fourth row followed by
other_flowers
. You can use
with_row
to create a new table with one extra row by
passing a list of values and
with_rows
to create a table with multiple extra rows by
passing a list of lists of values.
In [14]:
# Use the method .with_row(...) to create a new table that includes my_flower
four_flowers = flowers.with_row(my_flower)
# Use the method .with_rows(...) to create a table that
# includes four_flowers followed by other_flowers
other_flowers = [[10, 'lavender'], [3, 'birds of paradise'], [6, 'tulip']]
seven_flowers = four_flowers.with_rows(other_flowers)
seven_flowers
Out[14]:
Number of petals
Name
8
lotus
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Number of petals
Name
34
sunflower
5
rose
0
pondweed
10
lavender
3
birds of paradise
6
tulip
In [15]:
check('tests/q5.py')
Out[15]:
All tests passed!
4. Analyzing datasets
¶
With just a few table methods, we can answer some interesting questions about the EPI
dataset.
If we want just the scores of each country, we can get an array that contains the data in
that column:
In [16]:
epi.column("Score")
Out[16]:
array([ 25.5,
29.7,
49. ,
55.6,
52.2,
52.3,
48.5,
74.9,
79.6,
46.5,
27. ,
73.3,
30. ,
38.3,
29. ,
57. ,
51. ,
43.5,
45.4,
53. ,
41.9,
44.3,
51.2,
45.6,
54.8,
39.3,
40.4,
36.9,
71. ,
81.5,
55.3,
37.3,
25.8,
33.6,
36.4,
30.8,
52.9,
32.1,
32.8,
52.5,
48.4,
64.8,
71. ,
77.2,
28.1,
44.6,
82.5,
46.3,
44.8,
51. ,
43.3,
30.4,
74.3,
65.3,
34.4,
78.9,
34.4,
80. ,
33. ,
45.8,
81.3,
41.3,
27.6,
26.4,
27.9,
29.1,
38.1,
69.1,
43.1,
31.8,
35.9,
37.8,
63.1,
27. ,
63.7,
37.8,
27.6,
72.8,
48. ,
39.5,
72.3,
65.8,
71. ,
48.2,
53.4,
75.1,
44.7,
34.7,
39.8,
33.6,
37.7,
66.5,
53.6,
34.8,
45.4,
22.6,
43.1,
39. ,
28. ,
62.9,
82.3,
61.6,
42.3,
44.4,
26.5,
35.6,
52.6,
30.8,
55.4,
29.4,
70.7,
25.1,
46.3,
32.2,
33.9,
27.7,
45.1,
38.3,
47.9,
40.2,
30.8,
31. ,
39.2,
75.3,
77.7,
32.7,
71.3,
38.5,
33.1,
47.3,
44. ,
38.4,
32.4,
60.9,
67. ,
46.4,
37.1,
64.7,
50.5,
33.8,
44. ,
34.8,
30.7,
58.1,
26.7,
25.7,
43.1,
55.2,
37.6,
45.2,
68.3,
72. ,
78.7,
33.8,
58.2,
26.7,
29.5,
45.4,
38.2,
43.9,
35.3,
45.1,
47.5,
46.7,
42.6,
57.2,
31.1,
35.6,
49.5,
49.1,
69.3,
44.3,
48.4,
50.3,
33.4,
28.9,
37.3,
43.1,
34.7,
37. ])
The value of that expression is an array, exactly the same kind of thing you'd get if you
typed in
make_array(25.5, 29.7, 49.0, [etc])
.
Question 6
Find the EPI score of the highest-ranked country in the dataset.
Hint:
Think back to the functions you've learned about for working with arrays of
numbers. Ask for help if you can't remember one that's useful for this.
In [17]:
highest_rating = max(epi.column("Score"))
highest_rating
Out[17]:
82.5
In [18]:
check('tests/q6.py')
Out[18]:
All tests passed!
That's not very useful, though. You'd probably want to know the
name
of the country
whose score you found! To do that, we can sort the entire table by EPI Score, which
ensures that the scores and country will stay together. Note that calling sort creates a
copy of the table and leaves the original table unsorted.
In [19]:
epi.sort("Score")
Out[19]:
Country
Score
Decade Change
Rank
Liberia
22.6
-3.7
180
Myanmar
25.1
-1.2
179
Afghanistan
25.5
5
178
Sierra Leone
25.7
0.7
177
Cote d'Ivoire
25.8
-8.5
176
Guinea
26.4
-4.2
175
Madagascar
26.5
-6.6
174
Solomon Islands
26.7
-2
172
Chad
26.7
-0.9
172
Burundi
27
-11.1
170
... (170 rows omitted)
Well, that actually doesn't help much, either -- we sorted the countries from lowest ->
highest scores. To look at the highest-ranked countries, sort in reverse order:
In [20]:
epi.sort("Score", descending=True)
Out[20]:
Country
Score
Decade Change
Rank
Denmark
82.5
7.3
1
Luxembourg
82.3
11.6
2
Switzerland
81.5
8.6
3
United Kingdom
81.3
9
4
France
80
5.8
5
Austria
79.6
5.4
6
Finland
78.9
6
7
Sweden
78.7
5.3
8
Norway
77.7
7.6
9
Germany
77.2
1.2
10
... (170 rows omitted)
(The
descending=True
bit is called an
optional argument
. It has a default value of
False
,
so when you explicitly tell the function
descending=True
, then the function will sort in
descending order.)
So the country with the highest Environmental Protection Index is Denmark with 82.5.
Some details about sort:
1. The first argument to
sort
is the name of a column to sort by.
2. If the column has strings in it,
sort
will sort alphabetically; if the column has
numbers, it will sort numerically.
3. The value of
epi.sort("Score")
is a
copy of
epi
; the
epi
table doesn't get
modified. For example, if we called
epi.sort("Score")
, then running
epi
by itself
would still return the unsorted table.
4. Rows always stick together when a table is sorted. It wouldn't make sense to sort
just one column and leave the other columns alone. For example, in this case, if we
sorted just the "Score" column, the countries would all end up with the wrong
scores.
Question 7
We also have information about the changes in sustainability scores from 2010 to 2020.
Create a version of
epi
that's sorted by change, with the largest, positive changes first.
Call it
epi_changes
.
In [21]:
epi_changes = epi.sort("Decade Change", descending=True)
epi_changes
Out[21]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Country
Score
Decade Change
Rank
Bahrain
51
17.3
56
Seychelles
58.2
14.8
38
Croatia
63.1
13.4
34
Morocco
42.3
13.3
100
Kuwait
53.6
12.8
47
Luxembourg
82.3
11.6
2
Malta
70.7
11.6
23
United Arab Emirates
55.6
11.3
42
Jordan
53.4
11.2
48
Oman
38.5
11
110
... (170 rows omitted)
In [22]:
check('tests/q7.py')
Out[22]:
All tests passed!
Question 8
What's the name of the country with the largest, positive change in the dataset? You
could just look this up from the output of the previous cell. Instead, write Python code to
find out.
Hint:
Starting with
epi_changes
, extract the country column to get an array, then use
item
to get its first item.
In [23]:
largest_positive_change = epi_changes.column("Country").item(0)
largest_positive_change
Out[23]:
'Bahrain'
In [24]:
check('tests/q8.py')
Out[24]:
All tests passed!
5. Finding pieces of a dataset
¶
Let's take a look at another dataset. In the cell below, we're reading in a movie dataset
which contains columns for Votes on imdb, imdb Rating, Movie Title, the year released,
and the decade the movie was released.
In [25]:
imdb = Table.read_table('imdb.csv')
imdb
Out[25]:
Votes
Rating
Title
Year Decade
88355
8.4
M
1931 1930
132823 8.3
Singin' in the Rain
1952 1950
74178
8.3
All About Eve
1950 1950
635139 8.6
Léon
1994 1990
145514 8.2
The Elephant Man
1980 1980
425461 8.3
Full Metal Jacket
1987 1980
441174 8.1
Gone Girl
2014 2010
850601 8.3
Batman Begins
2005 2000
37664
8.2
Judgment at Nuremberg
1961 1960
46987
8
Relatos salvajes
2014 2010
... (240 rows omitted)
Suppose you're interested in movies from the 1940s. Sorting the table by year doesn't
help you, because the 1940s are in the middle of the dataset.
Instead, we use the table method
where
.
In [26]:
forties = imdb.where('Decade', are.equal_to(1940))
forties
Out[26]:
Votes
Rating
Title
Year Decade
55793
8.1
The Grapes of Wrath
1940 1940
86715
8.3
Double Indemnity
1944 1940
101754 8.1
The Maltese Falcon
1941 1940
71003
8.3
The Treasure of the Sierra Madre
1948 1940
35983
8.1
The Best Years of Our Lives
1946 1940
81887
8.3
Ladri di biciclette
1948 1940
66622
8
Notorious
1946 1940
Votes
Rating
Title
Year Decade
350551 8.5
Casablanca
1942 1940
59578
8
The Big Sleep
1946 1940
78216
8.2
Rebecca
1940 1940
... (4 rows omitted)
Ignore the syntax for the moment. Instead, try to read that line like this:
Assign the name
forties
to a table whose rows are the rows in the
imdb
table
where
the
'Decade'
s
are
equal
to
1940
.
Question 9
Compute the average rating of movies from the 1980s.
Hint:
The function
np.average
computes the average of an array of numbers.
In [27]:
eighties = imdb.where('Decade', are.equal_to(1980))
average_rating_in_eighties = np.average(eighties.column("Rating"))
average_rating_in_eighties
Out[27]:
8.241935483870968
In [28]:
check('tests/q9.py')
Out[28]:
All tests passed!
Now let's dive into the details a bit more.
where
takes 2 arguments:
1. The name of a column.
where
finds rows where that column's values meet some
criterion.
2. Something that describes the criterion that the column needs to meet, called a
predicate.
To create our predicate, we called the function
are.equal_to
with the value we wanted,
1980. We'll see other predicates soon.
where
returns a table that's a copy of the original table, but with only the rows that meet
the given predicate.
Question 10
Create a table called
ninety_nine
containing the movies that came out in the year
1999. Use
where
.
In [29]:
ninety_nine = imdb.where('Year', are.equal_to(1999))
ninety_nine
Out[29]:
Votes
Rating
Title
Year Decade
1177098 8.8
Fight Club
1999 1990
735056
8.4
American Beauty
1999 1990
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Votes
Rating
Title
Year Decade
630994
8.1
The Sixth Sense
1999 1990
1073043 8.7
The Matrix
1999 1990
672878
8.5
The Green Mile
1999 1990
In [30]:
check('tests/q10.py')
Out[30]:
All tests passed!
So far we've only been finding where a column is
exactly
equal to a certain value.
However, there are many other predicates. Here are a few:
Predicate
Example
Result
are.equal_to
are.equal_to(50)
Find rows with values equal to 50
are.not_equal_to
are.not_equal_to(50)
Find rows with values not equal to 50
are.above
are.above(50)
Find rows with values above (and not
equal to) 50
are.above_or_equal
_to
are.above_or_equal_to
(50)
Find rows with values above 50 or
equal to 50
are.below
are.below(50)
Find rows with values below 50
are.between
are.between(2, 10)
Find rows with values above or equal to
2 and below 10
The textbook section on selecting rows has more examples.
Question 11
Using
where
and one of the predicates from the table above, find all the movies with a
rating higher than 8.8. Put their data in a table called
really_highly_rated
.
In [31]:
really_highly_rated = imdb.where('Rating', are.above(8.8))
really_highly_rated
Out[31]:
Votes
Rating
Title
Year Decade
1027398 9.2
The Godfather
1972 1970
1498733 9.2
The Shawshank Redemption
1994 1990
447875
8.9
Il buono, il brutto, il cattivo (1966)
1966 1960
Votes
Rating
Title
Year Decade
1473049 8.9
The Dark Knight
2008 2000
692753
9
The Godfather: Part II
1974 1970
384187
8.9
12 Angry Men
1957 1950
1074146 8.9
The Lord of the Rings: The Return of the
King
2003 2000
761224
8.9
Schindler's List
1993 1990
1166532 8.9
Pulp Fiction
1994 1990
In [32]:
check('tests/q11.py')
Out[32]:
All tests passed!
Question 12
Find the average rating for movies released in the 20th century and the average rating
for movies released in the 21st century for the movies in
imdb
.
Hint
: Think of the steps you need to do (take the average, find the ratings, find movies
released in 20th/21st centuries), and try to put them in an order that makes sense.
In [33]:
twentieth = imdb.where('Year', are.between(1901,2000))
twentysecond = imdb.where('Year', are.above_or_equal_to(2000))
average_20th_century_rating = np.average(twentieth.column("Rating"))
average_21st_century_rating = np.average(twentysecond.column("Rating"))
print("Average 20th century rating:", average_20th_century_rating)
print("Average 21st century rating:", average_21st_century_rating)
Average 20th century rating: 8.2783625731
Average 21st century rating: 8.23797468354
In [34]:
check('tests/q12.py')
Out[34]:
All tests passed!
The property
num_rows
tells you how many rows are in a table. (A "property" is just a
method that doesn't need to be called by adding parentheses.)
In [35]:
num_movies_in_dataset = imdb.num_rows
num_movies_in_dataset
Out[35]:
250
Question 13
Use
num_rows
(and arithmetic) to find the
proportion
of movies in the dataset that were
released in the 20th century, and the proportion from the 21st century.
Hint:
The
proportion
of movies released in the 20th century is the
number
of movies
released in the 20th century, divided by the
total number
of movies.
In [36]:
proportion_in_20th_century = (twentieth.num_rows)/(imdb.num_rows)
proportion_in_21st_century = (twentysecond.num_rows)/(imdb.num_rows)
print("Proportion in 20th century:", proportion_in_20th_century)
print("Proportion in 21st century:", proportion_in_21st_century)
Proportion in 20th century: 0.684
Proportion in 21st century: 0.316
In [37]:
check('tests/q13.py')
Out[37]:
All tests passed!
Question 14
Here's a challenge: Find the number of movies that came out in
odd
years.
Hint:
The operator
%
computes the remainder when dividing by a number. So
5 % 2
is 1
and
6 % 2
is 0. A number is odd if the remainder is 1 when you divide by 2.
Hint 2:
%
can be used on arrays, operating elementwise like
+
or
*
. So
make_array(5, 6,
7) % 2
is
array([1, 0, 1])
.
Hint 3:
Create a column called "Year Remainder" that's the remainder when each movie's
release year is divided by 2. Make a copy of
imdb
that includes that column. (You may
need to add more variables.) Then use
where
to find rows where that new column is
equal to 1. Then use
num_rows
to count the number of such rows.
In [38]:
year_remainder = imdb.column("Year")%2
imdbcopy = Table().with_columns(
"Odd Years", year_remainder,
"Title", imdb.column("Title")
)
variable = imdbcopy.where("Odd Years", are.equal_to(1))
num_odd_year_movies = variable.num_rows
num_odd_year_movies
Out[38]:
123
In [39]:
check('tests/q14.py')
Out[39]:
All tests passed!
6. Miscellanea
¶
There are a few more table methods you'll need to fill out your toolbox. The first 3 have
to do with manipulating the columns in a table.
The table
farmers_markets.csv
contains data on farmers' markets in the United States
(data collected
by the USDA
. Each row represents one such market.
Question 15
Load the dataset into a table. Call it
farmers_markets
.
In [40]:
farmers_markets = Table.read_table("farmers_markets.csv")
farmers_markets
Out[40]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
FMID
MarketNa
me
Website
Facebook
Twitter
Youtu
be
O
10120
63
Caledonia
Farmers
Market
Associatio
n -
Danville
https://sites.google.com/s
ite/caledoniafarmersmark
et/
https://www.faceboo
k.com/Danville.VT.Fa
rmers.Market/
nan
nan
nan
10118
71
Stearns
Homestea
d Farmers'
Market
http://Stearnshomestead.
com
nan
nan
nan
nan
10118
78
100 Mile
Market
http://www.pfcmarkets.co
m
https://www.faceboo
k.com/100MileMarke
t/?fref=ts
nan
nan
https:/
.com/1
10093
64
106 S.
Main
Street
Farmers
Market
http://thetownofsixmile.w
ordpress.com/
nan
nan
nan
nan
10106
91
10th Steet
Communit
y Farmers
Market
nan
nan
nan
nan
http://
m/mo-
grown
type=
10024
54
112st
Madison
Avenue
nan
nan
nan
nan
nan
10111
00
12 South
Farmers
Market
http://www.12southfarme
rsmarket.com
12_South_Farmers_M
arket
@12southfrm
smkt
nan
@12so
10098
45
125th
Street
Fresh
Connect
Farmers'
Market
http://www.125thStreetFa
rmersMarket.com
https://www.faceboo
k.com/125thStreetFa
rmersMarket
https://twitter.
com/FarmMar
ket125th
nan
Instag
125thS
arket
FMID
MarketNa
me
Website
Facebook
Twitter
Youtu
be
O
10055
86
12th &
Brandywin
e Urban
Farm
Market
nan
https://www.faceboo
k.com/pages/12th-
Brandywine-Urban-
Far ...
nan
nan
https:/
com/d
mcoal
10080
71
14&U
Farmers'
Market
nan
https://www.faceboo
k.com/14UFarmersM
arket
https://twitter.
com/14UFarm
ersMkt
nan
nan
... (8536 rows omitted)
In [41]:
check('tests/q15.py')
Out[41]:
All tests passed!
You'll notice that it has a large number of columns in it!
num_columns
¶
Question 16
The table property
num_columns
(example call:
tbl.num_columns
) produces the number
of columns in a table. Use it to find the number of columns in our farmers' markets
dataset.
In [42]:
num_farmers_markets_columns = farmers_markets.num_columns
print("The table has", num_farmers_markets_columns, "columns in it!")
The table has 59 columns in it!
In [43]:
check('tests/q16.py')
Out[43]:
All tests passed!
Most of the columns are about particular products -- whether the market sells tofu, pet
food, etc. If we're not interested in that stuff, it just makes the table difficult to read. This
comes up more than you might think.
select
¶
In such situations, we can use the table method
select
to pare down the columns of a
table. It takes any number of arguments. Each should be the name or index of a column
in the table. It returns a new table with only those columns in it.
For example, the value of
imdb.select("Year", "Decade")
is a table with only the
years and decades of each movie in
imdb
.
Question 17
Use
select
to create a table with only the name, city, state, latitude ('y'), and longitude
('x') of each market. Call that new table
farmers_markets_locations
.
In [44]:
farmers_markets_locations = farmers_markets.select('MarketName', 'city', 'State', 'y',
'x')
farmers_markets_locations
Out[44]:
MarketName
city
State
y
x
Caledonia Farmers Market Association -
Danville
Danville
Vermont
44.411
-72.1403
Stearns Homestead Farmers' Market
Parma
Ohio
41.3751 -81.7286
100 Mile Market
Kalamazoo Michigan
42.296
-85.5749
106 S. Main Street Farmers Market
Six Mile
South Carolina
34.8042 -82.8187
10th Steet Community Farmers Market Lamar
Missouri
37.4956 -94.2746
112st Madison Avenue
New York
New York
40.7939 -73.9493
12 South Farmers Market
Nashville
Tennessee
36.1184 -86.7907
125th Street Fresh Connect Farmers'
Market
New York
New York
40.809
-73.9482
12th & Brandywine Urban Farm Market Wilmington
Delaware
39.7421 -75.5345
14&U Farmers' Market
Washington
District of
Columbia
38.917
-77.0321
... (8536 rows omitted)
In [45]:
check('tests/q17.py')
Out[45]:
All tests passed!
select
is not
column
!
¶
The method
select
is
definitely not
the same as the method
column
.
farmers_markets.column('y')
is an
array
of the latitudes of all the markets.
farmers_markets.select('y')
is a
table
that happens to contain only 1 column, the
latitudes of all the markets.
Question 18
Below, we tried using the function
np.average
to find the average latitude ('y') and
average longitude ('x') of the farmers' markets in the table, but we screwed something
up. Run the cell to see the (somewhat inscrutable) error message that results from
calling
np.average
on a table. Then, fix our code.
In [46]:
average_latitude = np.average(farmers_markets.column('y'))
average_longitude = np.average(farmers_markets.column('x'))
print("The average of US farmers' markets' coordinates is located at (", average_latitude,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
",", average_longitude, ")")
The average of US farmers' markets' coordinates is located at ( 39.1864645235 ,
-90.9925808129 )
In [47]:
check('tests/q18.py')
Out[47]:
All tests passed!
drop
¶
drop
serves the same purpose as
select
, but it takes away the columns you list instead
of the ones you don't list, leaving all the rest of the columns.
Question 19
Suppose you just didn't want the "Website" or "Location" columns in
farmers_markets
.
Create a table that's a copy of
farmers_markets
but doesn't include those columns. Call
that table
farmers_markets_without_website
.
In [48]:
farmers_markets_without_website = farmers_markets.drop("Website", "Location")
farmers_markets_without_website
Out[48]:
FMID
MarketNa
me
Facebook
Twitter
Youtu
be
OtherMedia
101206
3
Caledonia
Farmers
Market
Association
- Danville
https://www.facebook.
com/Danville.VT.Farme
rs.Market/
nan
nan
nan
na
101187
1
Stearns
Homestead
Farmers'
Market
nan
nan
nan
nan
69
Ro
101187
8
100 Mile
Market
https://www.facebook.
com/100MileMarket/?
fref=ts
nan
nan
https://www.instagram.c
om/100milemarket/
50
Ha
100936
4
106 S. Main
Street
Farmers
Market
nan
nan
nan
nan
10
Ma
101069
1
10th Steet
Community
Farmers
Market
nan
nan
nan
http://agrimissouri.com/
mo-
grown/grodetail.php?
type=mo-g ...
10
an
100245
4
112st
Madison
nan
nan
nan
nan
11
Ma
FMID
MarketNa
me
Facebook
Twitter
Youtu
be
OtherMedia
Avenue
Av
101110
0
12 South
Farmers
Market
12_South_Farmers_Ma
rket
@12southfrms
mkt
nan
@12southfrmsmkt
30
Gr
Wh
100984
5
125th
Street Fresh
Connect
Farmers'
Market
https://www.facebook.
com/125thStreetFarm
ersMarket
https://twitter.c
om/FarmMarke
t125th
nan
Instagram-->
125thStreetFarmersMar
ket
16
12
Str
Ad
Cla
Po
Blv
100558
6
12th &
Brandywine
Urban Farm
Market
https://www.facebook.
com/pages/12th-
Brandywine-Urban-Far
...
nan
nan
https://www.facebook.co
m/delawareurbanfarmco
alition
12
Bra
e S
100807
1
14&U
Farmers'
Market
https://www.facebook.
com/14UFarmersMark
et
https://twitter.c
om/14UFarmer
sMkt
nan
nan
14
Str
... (8536 rows omitted)
In [49]:
check('tests/q19.py')
Out[49]:
All tests passed!
take
¶
Let's find the 5 easternmost farmers' markets in the US. You already know how to sort by
longitude ('x'), but we haven't seen how to get the first 5 rows of a table. That's what
take
is for.
The table method
take
takes as its argument an array of numbers. Each number should
be the index of a row in the table. It returns a new table with only those rows.
Most often you'll want to use
take
in conjunction with
np.arange
to take the first few
rows of a table.
Question 20
Make a table of the 5 easternmost farmers' markets in
farmers_markets_locations
.
Call it
eastern_markets
. (It should include the same columns as
farmers_markets_locations
.
In [55]:
eastern_markets = farmers_markets_locations.sort("x",
descending=True).take(np.arange(0,5,1))
eastern_markets
Out[55]:
MarketName
city
State
y
x
Christian "Shan" Hendricks Vegetable
Market
Saint Croix
Virgin
Islands
17.7449 -64.7043
La Reine Farmers Market
Saint Croix
Virgin
Islands
17.7322 -64.7789
Anne Heyliger Vegetable Market
Saint Croix
Virgin
Islands
17.7099 -64.8799
Rothschild Francis Vegetable Market
St. Thomas
Virgin
Islands
18.3428 -64.9326
Feria Agrícola de Luquillo
Luquillo
Puerto Rico
18.3782 -65.7207
7. Summary
¶
For your reference, here's a table of all the functions and methods we saw in this lab.
Name
Example
Purpose
Table
Table()
Create an empty table, usually
to extend with data
Table.read_ta
ble
Table.read_table("my_data.csv")
Create a table from a data file
with_columns
tbl = Table().with_columns("N",
np.arange(5), "2*N",
np.arange(0, 10, 2))
Create a copy of a table with
more columns
column
tbl.column("N")
Create an array containing the
elements of a column
sort
tbl.sort("N")
Create a copy of a table sorted
by the values in a column
where
tbl.where("N", are.above(2))
Create a copy of a table with
only the rows that match some
predicate
num_rows
tbl.num_rows
Compute the number of rows in
a table
num_columns
tbl.num_columns
Compute the number of columns
in a table
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Name
Example
Purpose
select
tbl.select("N")
Create a copy of a table with
only some of the columns
drop
tbl.drop("2*N")
Create a copy of a table without
some of the columns
take
tbl.take(np.arange(0, 6, 2))
Create a copy of the table with
only the rows whose indices are
in the given array
Congratulations, you're done with lab 3! Be sure to
•
run all the tests and verify that they all pass
(the next cell has a shortcut for
that),
•
Save and Export as, HTML
from the
File
menu,
•
Right click on file name to download noteook
•
Upload and Submit your files to Canvas
under the corresponding assignment .
In [56]:
# For your convenience, you can run this cell to run all the tests at once!
import glob
from gofer.ok import check
correct = 0
for x in range(1, 21):
print('Testing question {}: '.format(str(x)))
g = check('tests/q{}.py'.format(str(x)))
if g.grade == 1.0:
print("Passed")
correct += 1
else:
print('Failed')
display(g)
print('Grade:
{}'.format(str(correct/20)))
Testing question 1:
Passed
Testing question 2:
Passed
Testing question 3:
Passed
Testing question 4:
Passed
Testing question 5:
Passed
Testing question 6:
Passed
Testing question 7:
Passed
Testing question 8:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Passed
Testing question 9:
Passed
Testing question 10:
Passed
Testing question 11:
Passed
Testing question 12:
Passed
Testing question 13:
Passed
Testing question 14:
Passed
Testing question 15:
Passed
Testing question 16:
Passed
Testing question 17:
Passed
Testing question 18:
Passed
Testing question 19:
Passed
Testing question 20:
Passed
Grade:
1.0
In [57]:
print("Nice work ",name)
import time;
localtime = time.asctime( time.localtime(time.time()) )
print("Submitted @ ", localtime)
Nice work
Sam Zahroun
Submitted @
Sat Sep 16 23:50:39 2023
In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Recommended textbooks for you
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageNp Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:CengageCOMPREHENSIVE MICROSOFT OFFICE 365 EXCEComputer ScienceISBN:9780357392676Author:FREUND, StevenPublisher:CENGAGE L
- Systems ArchitectureComputer ScienceISBN:9781305080195Author:Stephen D. BurdPublisher:Cengage LearningC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningC++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology Ptr
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr