Lab02
pdf
keyboard_arrow_up
School
University of North Texas *
*We aren’t endorsed by this school
Course
5502
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
29
Uploaded by venkatasai1999
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
1/29
Lab 2: Table Operations and Arrays
Welcome to Lab 2! This week, we'll learn how to import a module and practice table operations!
we'll also see how to work with arrays
of data, such as all the numbers between 0 and 100 or all
the words in the chapter of a book. Lastly, we'll create tables and practice analyzing them with
our knowledge of table operations.
First, set up the imports by running the cell below.
1. Review: The building blocks of Python
code
The two building blocks of Python code are expressions
and statements
. An expression
is a
piece of code that
is self-contained, meaning it would make sense to write it on a line by itself, and
usually evaluates to a value.
Here are two expressions that both evaluate to 3:
3
5 - 2
One important type of expression is the call expression
. A call expression begins with the name
of a function and is followed by the argument(s) of that function in parentheses. The function
returns some value, based on its arguments. Some important mathematical functions are listed
below.
Function
Description
abs
Returns the absolute value of its argument
max
Returns the maximum of all its arguments
min
Returns the minimum of all its arguments
pow
Raises its first argument to the power of its second argument
round
Rounds its argument to the nearest integer
Here are two call expressions that both evaluate to 3:
In [5]:
# Just run this cell
import
numpy as
np
from
datascience import
*
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
2/29
abs(2 - 5)
max(round(2.8), min(pow(2, 10), -1 * pow(2, 10)))
The expression 2 - 5
and the two call expressions given above are examples of compound
expressions
, meaning that they are actually combinations of several smaller expressions. 2 - 5
combines the expressions 2
and 5
by subtraction. In this case, 2
and 5
are called
subexpressions
because they're expressions that are part of a larger expression.
A statement
is a whole line of code. Some statements are just expressions. The expressions
listed above are examples.
Other statements make something happen
rather than having a value
. For example, an
assignment statement
assigns a value to a name.
A good way to think about this is that we're evaluating the right-hand side
of the equals sign
and assigning it to the left-hand side
. Here are some assignment statements:
height = 1.3
the_number_five = abs(-5)
absolute_height_difference = abs(height - 1.688)
An important idea in programming is that large, interesting things can be built by combining
many simple, uninteresting things. The key to understanding a complicated piece of code is
breaking it down into its simple components.
For example, a lot is going on in the last statement above, but it's really just a combination of a
few things. This picture describes what's going on.
**Question 1.1.** In the next cell, assign the name new_year
to the larger number among the
following two numbers:
1. the absolute value
of , and
2. .
Try to use just one statement (one line of code).
2020
In [7]:
new_year =
(
5
*
13
*
31
)
+
5
new_year
Out[7]:
In [8]:
# TEST
new_year ==
2020
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
3/29
True
We've asked you to use one line of code in the question above because it only involves
mathematical operations. However, more complicated programming questions will more
require more steps. It isn’t always a good idea to jam these steps into a single line because it
can make the code harder to read and harder to debug.
Good programming practice involves splitting up your code into smaller steps and using
appropriate names. You'll have plenty of practice in the rest of this course!
2. Importing code
source
Most programming involves work that is very similar to work that has been done before. Since
writing code is time-consuming, it's good to rely on others' published code when you can.
Out[8]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
4/29
Rather than copy-pasting, Python allows us to import modules
. A module is a file with Python
code that has defined variables and functions. By importing a module, we are able to use its
code in our own notebook.
Python includes many useful modules that are just an import
away. We'll look at the math
module as a first example. The math
module is extremely useful in computing mathematical
expressions in Python.
Suppose we want to very accurately compute the area of a circle with a radius of 5 meters. For
that, we need the constant , which is roughly 3.14. Conveniently, the math
module has pi
defined for us:
78.53981633974483
In the code above, the line import math
imports the math module. This statement creates a
module and then assigns the name math
to that module. We are now able to access any
variables or functions defined within math
by typing the name of the module followed by a
dot, then followed by the name of the variable or function we want.
<module name>.<name>
**Question 2.1.** The module math
also provides the name e
for the base of the natural
logarithm, which is roughly 2.71. Compute , giving it the name near_twenty
.
Remember: You can access pi
from the math
module as well!
19.99909997918947
True
In [6]:
import
math
radius =
5
area_of_circle =
radius
**
2 *
math
.
pi
area_of_circle
Out[6]:
In [9]:
near_twenty =
(
math
.
e
**
math
.
pi
)
-
math
.
pi
near_twenty
Out[9]:
In [10]:
# TEST
round
(
near_twenty
, 8
) ==
19.99909998
Out[10]:
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
5/29
Source Explaination
2.1. Accessing functions
In the question above, you accessed variables within the math
module.
Modules
also define functions
. For example, math
provides the name sin
for the sine
function. Having imported math
already, we can write math.sin(3)
to compute the sine of
3. (Note that this sine function considers its argument to be in radians
, not degrees. 180
degrees are equivalent to radians.)
**Question 2.1.1.** A -radian (45-degree) angle forms a right triangle with equal base and
height, pictured below. If the hypotenuse (the radius of the circle in the picture) is 1, then the
height is . Compute that value using sin
and pi
from the math
module. Give the
result the name sine_of_pi_over_four
.
Source
In [11]:
sine_of_pi_over_four =
math
.
sin
(
math
.
pi
/
4
)
sine_of_pi_over_four
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
6/29
0.7071067811865476
True
There are various ways to import and access code from outside sources. The method we used
above — import <module_name>
— imports the entire module and requires that we use
<module_name>.<name>
to access its code.
We can also import a specific constant or function instead of the entire module. Notice that you
don't have to use the module name beforehand to reference that particular value. However, you
do have to be careful about reassigning the names of the constants or functions to other
values!
-1.0
1.1447298858494002
Or we can import every function and value from the entire module.
1.1447298858494002
Don't worry too much about which type of import to use. It's often a coding style choice left up
to each programmer. In this course, you'll always import the necessary modules when you run
the setup cell (like the first code cell in this lab).
Let's move on to practicing some of the table operations you've learned in lecture!
3. Table operations
The table farmers_markets.csv
contains data on farmers' markets in the United States (data
collected by the USDA
). Each row represents one such market.
Out[11]:
In [12]:
# TEST
# Hint: You can write the sine of 1.5*pi as:
# math.sin(1.5 * math.pi)
import
math
round
(
sine_of_pi_over_four
, 8
) ==
0.70710678
Out[12]:
In [13]:
# Importing just cos and pi from math.
# We don't have to use `math.` in front of cos or pi
from
math import
cos
, pi
print
(
cos
(
pi
))
# We do have to use it in front of other functions from math, though
math
.
log
(
pi
)
Out[13]:
In [14]:
# Lastly, we can import everything from math using the *
# Once again, we don't have to use 'math.' beforehand from
math import
*
log
(
pi
)
Out[14]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
7/29
Run the next cell to load the farmers_markets
table.
Let's examine our table to see what data it contains.
**Question 3.1.** Use the method show
to display the first 5 rows of farmers_markets
.
Note:
The terms "method" and "function" are technically not the same thing, but for the
purposes of this course, we will use them interchangeably.
Hint:
tbl.show(3)
will show the first 3 rows of tbl
. Additionally, make sure not to call
.show()
without an argument, as this will crash your kernel!
FMID
MarketName
street
city
County
State
zip
x
y
1012063
Caledonia
Farmers
Market
Association -
Danville
nan
Danville
Caledonia
Vermont
05828
-72.1403
44.411
https://site
1011871
Stearns
Homestead
Farmers'
Market
6975
Ridge
Road
Parma
Cuyahoga
Ohio
44130
-81.7286
41.3751
1011878
100 Mile
Market
507
Harrison
St
Kalamazoo
Kalamazoo
Michigan
49007
-85.5749
42.296
1009364
106 S. Main
Street
Farmers
Market
106 S.
Main
Street
Six Mile
nan
South
Carolina
29682
-82.8187
34.8042
1010691
10th Steet
Community
Farmers
Market
10th
Street
and
Poplar
Lamar
Barton
Missouri
64759
-94.2746
37.4956
... (8541 rows omitted)
Notice that some of the values in this table are missing, as denoted by "nan." This means either
that the value is not available (e.g. if we don’t know the market’s street address) or not
applicable (e.g. if the market doesn’t have a street address). You'll also notice that the table has
a large number of columns in it!
num_columns
In [15]:
# Just run this cell
farmers_markets =
Table
.
read_table
(
'farmers_markets.csv'
)
In [16]:
farmers_markets
.
show
(
5
)
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
8/29
The table property num_columns
returns the number of columns in a table. (A "property" is
just a method that doesn't need to be called by adding parentheses.)
Example call: <tbl>.num_columns
**Question 3.2.** Use num_columns
to find the number of columns in our farmers' markets
dataset.
Assign the number of columns to num_farmers_markets_columns
.
The table has 59 columns in it!
True
num_rows
Similarly, the property num_rows
tells you how many rows are in a table.
The table has 8546 rows in it!
select
Most of the columns are about particular products -- whether the market sells tofu, pet food,
etc. If we're not interested in that information, it just makes the table difficult to read. This
comes up more than you might think, because people who collect and publish data may not
know ahead of time what people will want to do with it.
In such situations, we can use the table method select
to choose only the columns that we
want in a particular table. It takes any number of arguments. Each should be the name of a
column in the table. It returns a new table with only those columns in it. The columns are in the
order in which they were listed as arguments
.
For example, the value of farmers_markets.select("MarketName", "State")
is a table
with only the name and the state of each farmers' market in farmers_markets
.
**Question 3.3.** Use select
to create a table with only the name, city, state, latitude (
y
),
and longitude (
x
) of each market. Call that new table farmers_markets_locations
.
In [17]:
num_farmers_markets_columns =
farmers_markets
.
num_columns
print
(
"The table has"
, num_farmers_markets_columns
, "columns in it!"
)
In [18]:
# TEST
num_farmers_markets_columns ==
59
Out[18]:
In [19]:
# Just run this cell
num_farmers_markets_rows =
farmers_markets
.
num_rows
print
(
"The table has"
, num_farmers_markets_rows
, "rows in it!"
)
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
9/29
Hint:
Make sure to be exact when using column names with select
; double-check
capitalization!
MarketName
city
State
x
y
Caledonia Farmers Market Association - Danville
Danville
Vermont
-72.1403
44.411
Stearns Homestead Farmers' Market
Parma
Ohio
-81.7286
41.3751
100 Mile Market
Kalamazoo
Michigan
-85.5749
42.296
106 S. Main Street Farmers Market
Six Mile
South Carolina
-82.8187
34.8042
10th Steet Community Farmers Market
Lamar
Missouri
-94.2746
37.4956
112st Madison Avenue
New York
New York
-73.9493
40.7939
12 South Farmers Market
Nashville
Tennessee
-86.7907
36.1184
125th Street Fresh Connect Farmers' Market
New York
New York
-73.9482
40.809
12th & Brandywine Urban Farm Market
Wilmington
Delaware
-75.5345
39.7421
14&U Farmers' Market
Washington
District of Columbia
-77.0321
38.917
... (8536 rows omitted)
True
True
drop
drop
serves the same purpose as select
, but it takes away the columns that you provide
rather than the ones that you don't provide. Like select
, drop
returns a new table.
**Question 3.4.** Suppose you just didn't want the FMID
and updateTime
columns in
farmers_markets
. Create a table that's a copy of farmers_markets
but doesn't include
those columns. Call that table farmers_markets_without_fmid
.
In [20]:
farmers_markets_locations =
farmers_markets
.
select
(
'MarketName'
,
'city'
,
'State'
,
'x'
,
'y'
farmers_markets_locations
Out[20]:
In [21]:
# TEST
sorted
(
farmers_markets_locations
.
labels
) ==
[
'MarketName'
, 'State'
, 'city'
, 'x'
, 'y'
]
Out[21]:
In [22]:
# TEST
farmers_markets_locations
.
num_rows ==
8546
Out[22]:
In [23]:
farmers_markets_without_fmid =
farmers_markets
.
drop
(
'FMID'
,
'updateTime'
)
farmers_markets_without_fmid
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
10/29
MarketName
street
city
County
State
zip
x
y
Caledonia
Farmers
Market
Association -
Danville
nan
Danville
Caledonia
Vermont
05828
-72.1403
44.411
https://sites.goo
Stearns
Homestead
Farmers'
Market
6975 Ridge
Road
Parma
Cuyahoga
Ohio
44130
-81.7286
41.3751
100 Mile
Market
507
Harrison St
Kalamazoo
Kalamazoo
Michigan
49007
-85.5749
42.296
106 S. Main
Street
Farmers
Market
106 S.
Main Street
Six Mile
nan
South
Carolina
29682
-82.8187
34.8042
ht
10th Steet
Community
Farmers
Market
10th Street
and Poplar
Lamar
Barton
Missouri
64759
-94.2746
37.4956
112st
Madison
Avenue
112th
Madison
Avenue
New York
New York
New York
10029
-73.9493
40.7939
12 South
Farmers
Market
3000
Granny
White Pike
Nashville
Davidson
Tennessee
37204
-86.7907
36.1184
h
125th Street
Fresh
Connect
Farmers'
Market
163 West
125th
Street and
Adam
Clayton
Powell, Jr.
Blvd.
New York
New York
New York
10027
-73.9482
40.809
http:/
12th &
Brandywine
Urban Farm
Market
12th &
Brandywine
Streets
Wilmington
New
Castle
Delaware
19801
-75.5345
39.7421
14&U
Farmers'
Market
1400 U
Street NW
Washington
District of
Columbia
District of
Columbia
20009
-77.0321
38.917
... (8536 rows omitted)
Out[23]:
In [24]:
# TEST
farmers_markets_without_fmid
.
num_columns ==
57
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
11/29
True
['Bakedgoods', 'Beans', 'Cheese', 'Coffee', 'County', 'Crafts', 'Credit', 'Eggs', 'Fa
cebook', 'Flowers', 'Fruits', 'Grains', 'Herbs', 'Honey', 'Jams', 'Juices', 'Locatio
n', 'Maple', 'MarketName', 'Meat', 'Mushrooms', 'Nursery', 'Nuts', 'Organic', 'OtherM
edia', 'PetFood', 'Plants', 'Poultry', 'Prepared', 'SFMNP', 'SNAP', 'Seafood', 'Seaso
n1Date', 'Season1Time', 'Season2Date', 'Season2Time', 'Season3Date', 'Season3Time', 'Season4Date', 'Season4Time', 'Soap', 'State', 'Tofu', 'Trees', 'Twitter', 'Vegetable
s', 'WIC', 'WICcash', 'Website', 'WildHarvested', 'Wine', 'Youtube', 'city', 'stree
t', 'x', 'y', 'zip']
Now, suppose we want to answer some questions about farmers' markets in the US. For
example, which market(s) have the largest longitude (given by the x
column)?
To answer this, we'll sort farmers_markets_locations
by longitude.
MarketName
city
State
x
y
Trapper Creek Farmers Market
Trapper Creek
Alaska
-166.54
53.8748
Kekaha Neighborhood Center (Sunshine Markets)
Kekaha
Hawaii
-159.718
21.9704
Hanapepe Park (Sunshine Markets)
Hanapepe
Hawaii
-159.588
21.9101
Kalaheo Neighborhood Center (Sunshine Markets)
Kalaheo
Hawaii
-159.527
21.9251
Hawaiian Farmers of Hanalei
Hanalei
Hawaii
-159.514
22.2033
Hanalei Saturday Farmers Market
Hanalei
Hawaii
-159.492
22.2042
Kauai Culinary Market
Koloa
Hawaii
-159.469
21.9067
Koloa Ball Park (Knudsen) (Sunshine Markets)
Koloa
Hawaii
-159.465
21.9081
West Kauai Agricultural Association
Poipu
Hawaii
-159.435
21.8815
Kilauea Neighborhood Center (Sunshine Markets)
Kilauea
Hawaii
-159.406
22.2112
... (8536 rows omitted)
Oops, that didn't answer our question because we sorted from smallest to largest longitude. To
look at the largest longitudes, we'll have to sort in reverse order.
Out[24]:
In [25]:
# TEST
print
(
sorted
(
farmers_markets_without_fmid
.
labels
))
In [26]:
farmers_markets_locations
.
sort
(
'x'
)
Out[26]:
In [27]:
farmers_markets_locations
.
sort
(
'x'
, descending
=
True
)
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
12/29
MarketName
city
State
x
y
Christian "Shan" Hendricks Vegetable Market
Saint Croix
Virgin Islands
-64.7043
17.7449
La Reine Farmers Market
Saint Croix
Virgin Islands
-64.7789
17.7322
Anne Heyliger Vegetable Market
Saint Croix
Virgin Islands
-64.8799
17.7099
Rothschild Francis Vegetable Market
St. Thomas
Virgin Islands
-64.9326
18.3428
Feria Agrícola de Luquillo
Luquillo
Puerto Rico
-65.7207
18.3782
El Mercado Familiar
San Lorenzo
Puerto Rico
-65.9674
18.1871
El Mercado Familiar
Gurabo
Puerto Rico
-65.9786
18.2526
El Mercado Familiar
Patillas
Puerto Rico
-66.0135
18.0069
El Mercado Familiar
Caguas zona urbana
Puerto Rico
-66.039
18.2324
El Maercado Familiar
Arroyo zona urbana
Puerto Rico
-66.0617
17.9686
... (8536 rows omitted)
(The descending=True
bit is called an optional argument
. It has a default value of False
, so
when you explicitly tell the function descending=True
, then the function will sort in
descending order.)
sort
Some details about sort:
1. The first argument to sort
is the name of a column to sort by.
2. If the column has text in it, sort
will sort alphabetically; if the column has numbers, it will
sort numerically.
3. The value of farmers_markets_locations.sort("x")
is a copy
of
farmers_markets_locations
; the farmers_markets_locations
table doesn't get
modified. For example, if we called farmers_markets_locations.sort("x")
, then
running farmers_markets_locations
by itself would still return the unsorted table.
4. Rows always stick together when a table is sorted. It wouldn't make sense to sort just one
column and leave the other columns alone. For example, in this case, if we sorted just the
x
column, the farmers' markets would all end up with the wrong longitudes.
**Question 3.5.** Create a version of farmers_markets_locations
that's sorted by latitude
(
y
)
, with the largest latitudes first. Call it farmers_markets_locations_by_latitude
.
Out[27]:
In [28]:
farmers_markets_locations_by_latitude =
farmers_markets_locations
.
sort
(
'y'
,
descending
=
farmers_markets_locations_by_latitude
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
13/29
MarketName
city
State
x
y
Tanana Valley Farmers Market
Fairbanks
Alaska
-147.781
64.8628
Ester Community Market
Ester
Alaska
-148.01
64.8459
Fairbanks Downtown Market
Fairbanks
Alaska
-147.72
64.8444
Nenana Open Air Market
Nenana
Alaska
-149.096
64.5566
Highway's End Farmers' Market
Delta Junction
Alaska
-145.733
64.0385
MountainTraders
Talkeetna
Alaska
-150.118
62.3231
Talkeetna Farmers Market
Talkeetna
Alaska
-150.118
62.3228
Denali Farmers Market
Anchorage
Alaska
-150.234
62.3163
Kenny Lake Harvest II
Valdez
Alaska
-145.476
62.1079
Copper Valley Community Market
Copper Valley
Alaska
-145.444
62.0879
... (8536 rows omitted)
True
True
Now let's say we want a table of all farmers' markets in California. Sorting won't help us much
here because California is closer to the middle of the dataset.
Instead, we use the table method where
.
Out[28]:
In [29]:
# TEST
type
(
farmers_markets_locations_by_latitude
) ==
tables
.
Table
Out[29]:
In [30]:
# TEST
list
(
farmers_markets_locations_by_latitude
.
column
(
'y'
)
.
take
(
range
(
3
))) ==
[
64.86275
, 6
Out[30]:
In [31]:
california_farmers_markets =
farmers_markets_locations
.
where
(
'State'
, are
.
equal_to
(
'Ca
california_farmers_markets
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
14/29
MarketName
city
State
x
y
Adelanto Stadium Farmers Market
Victorville
California
-117.405
34.5593
Alameda Farmers' Market
Alameda
California
-122.277
37.7742
Alisal Certified Farmers' Market
Salinas
California
-121.634
36.6733
Altadena Farmers' Market
Altadena
California
-118.158
34.2004
Alum Rock Village Farmers' Market
San Jose
California
-121.833
37.3678
Amador Farmers' Market-- Jackson
Jackson
California
-120.774
38.3488
Amador Farmers' Market-- Pine Grove
Pine Grove
California
-120.774
38.3488
Amador Farmers' Market-- Sutter Creek
Sutter Creek
California
-120.774
38.3488
Anderson Happy Valley Farmers Market
Anderson
California
-122.408
40.4487
Angels Camp Farmers Market-Fresh Fridays
Angels Camp
California
-120.543
38.0722
... (745 rows omitted)
Ignore the syntax for the moment. Instead, try to read that line like this:
Assign the name california_farmers_markets
to a table whose rows are the
rows in the farmers_markets_locations
table where
the 'State'
s are
equal
to
California
.
where
Now let's dive into the details a bit more. where
takes 2 arguments:
1. The name of a column. where
finds rows where that column's values meet some criterion.
2. A predicate that describes the criterion that the column needs to meet.
The predicate in the example above called the function are.equal_to
with the value we
wanted, 'California'. We'll see other predicates soon.
where
returns a table that's a copy of the original table, but with only the rows that meet
the given predicate
.
**Question 3.6.** Use california_farmers_markets
to create a table called
berkeley_markets
containing farmers' markets in Berkeley, California.
Out[31]:
In [32]:
berkeley_markets =
california_farmers_markets
.
where
(
'city'
,
are
.
equal_to
(
'Berkeley'
))
berkeley_markets
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
15/29
MarketName
city
State
x
y
Downtown Berkeley Farmers' Market
Berkeley
California
-122.273
37.8697
North Berkeley Farmers' Market
Berkeley
California
-122.269
37.8802
South Berkeley Farmers' Market
Berkeley
California
-122.272
37.8478
True
True
So far we've only been using where
with the predicate that requires finding the values in a
column to be exactly
equal to a certain value. However, there are many other predicates. Here
are a few:
Predicate
Example
Result
are.equal_to
are.equal_to(50)
Find rows with values equal to 50
are.not_equal_to
are.not_equal_to(50)
Find rows with values not equal to 50
are.above
are.above(50)
Find rows with values above (and not equal
to) 50
are.above_or_equal_to
are.above_or_equal_to(50)
Find rows with values above 50 or equal to
50
are.below
are.below(50)
Find rows with values below 50
are.between
are.between(2, 10)
Find rows with values above or equal to 2
and below 10
4. Arrays
Computers are most useful when you can use a small amount of code to do the same action
to
many different things
.
For example, in the time it takes you to calculate the 18% tip on a restaurant bill, a laptop can
calculate 18% tips for every restaurant bill paid by every human on Earth that day. (That's if
you're pretty fast at doing arithmetic in your head!)
Arrays
are how we put many values in one place so that we can operate on them as a group.
For example, if billions_of_numbers
is an array of numbers, the expression
.18 * billions_of_numbers
Out[32]:
In [33]:
# TEST
berkeley_markets
.
num_rows ==
3
Out[33]:
In [34]:
# TEST
list
(
berkeley_markets
.
column
(
'city'
)) ==
[
'Berkeley'
, 'Berkeley'
, 'Berkeley'
]
Out[34]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
16/29
gives a new array of numbers that contains the result of multiplying each number
in
billions_of_numbers
by .18. Arrays are not limited to numbers; we can also put all the
words in a book into an array of strings.
Concretely, an array is a collection of values of the same type
.
4.1. Making arrays
First, let's learn how to manually input values into an array. This typically isn't how programs
work. Normally, we create arrays by loading them from an external source, like a data file.
To create an array by hand, call the function make_array
. Each argument you pass to
make_array
will be in the array it returns. Run this cell to see an example:
array([ 0.125, 4.75 , -1.3 ])
Each value in an array (in the above case, the numbers 0.125, 4.75, and -1.3) is called an element
of that array.
Arrays themselves are also values, just like numbers and strings. That means you can assign
them to names or use them as arguments to functions. For example, len(<some_array>)
returns the number of elements in some_array
.
**Question 4.1.1.** Make an array containing the numbers 0, 1, -1, , and , in that order. Name
it interesting_numbers
.
Hint:
How did you get the values and in lab 2? You can refer to them in exactly the same
way here.
array([ 0. , 1. , -1. , 3.14159265, 2.71828183])
True
True
In [35]:
make_array
(
0.125
, 4.75
, -
1.3
)
Out[35]:
In [52]:
interesting_numbers =
make_array
(
0
,
1
,
-
1
,
math
.
pi
,
math
.
e
)
interesting_numbers
Out[52]:
In [53]:
# TEST
type
(
interesting_numbers
) ==
np
.
ndarray
Out[53]:
In [54]:
# TEST
len
(
interesting_numbers
) ==
5
Out[54]:
In [55]:
# TEST
all
(
interesting_numbers ==
np
.
array
([
0
, 1
, -
1
, math
.
pi
, math
.
e
]))
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
17/29
True
**Question 4.1.2.** Make an array containing the five strings "Hello"
, ","
, " "
, "world"
,
and "!"
. (The third one is a single space inside quotes.) Name it hello_world_components
.
Note:
If you evaluate hello_world_components
, you'll notice some extra information in
addition to its contents: dtype='<U5'
. That's just NumPy's extremely cryptic way of saying
that the data types in the array are strings.
array(['Hello', ',', ' ', 'world', '!'],
dtype='<U5')
True
True
True
np.arange
Arrays are provided by a package called NumPy (pronounced "NUM-pie"). The package is called
numpy
, but it's standard to rename it np
for brevity. You can do that with:
import numpy as np
Very often in data science, we want to work with many numbers that are evenly spaced within
some range. NumPy provides a special function for this called arange
. The line of code
np.arange(start, stop, step)
evaluates to an array with all the numbers starting at
start
and counting up by step
, stopping before
stop
is reached.
Run the following cells to see some examples!
array([1, 3, 5])
Out[55]:
In [56]:
hello_world_components =
make_array
(
"Hello"
,
","
,
" "
,
"world"
,
"!"
)
hello_world_components
Out[56]:
In [57]:
# TEST
type
(
hello_world_components
) ==
np
.
ndarray
Out[57]:
In [58]:
# TEST
len
(
interesting_numbers
) ==
5
Out[58]:
In [59]:
# TEST
all
(
hello_world_components ==
np
.
array
([
"Hello"
, ","
, " "
, "world"
, "!"
]))
Out[59]:
In [60]:
# This array starts at 1 and counts up by 2
# and then stops before 6
np
.
arange
(
1
, 6
, 2
)
Out[60]:
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
18/29
array([4, 5, 6, 7, 8])
**Question 4.1.3.** Import numpy
as np
and then use np.arange
to create an array with the
multiples of 99 from 0 up to (
and including
) 9999. (So its elements are 0, 99, 198, 297, etc.)
array([ 0, 99, 198, 297, 396, 495, 594, 693, 792, 891, 990,
1089, 1188, 1287, 1386, 1485, 1584, 1683, 1782, 1881, 1980, 2079,
2178, 2277, 2376, 2475, 2574, 2673, 2772, 2871, 2970, 3069, 3168,
3267, 3366, 3465, 3564, 3663, 3762, 3861, 3960, 4059, 4158, 4257,
4356, 4455, 4554, 4653, 4752, 4851, 4950, 5049, 5148, 5247, 5346,
5445, 5544, 5643, 5742, 5841, 5940, 6039, 6138, 6237, 6336, 6435,
6534, 6633, 6732, 6831, 6930, 7029, 7128, 7227, 7326, 7425, 7524,
7623, 7722, 7821, 7920, 8019, 8118, 8217, 8316, 8415, 8514, 8613,
8712, 8811, 8910, 9009, 9108, 9207, 9306, 9405, 9504, 9603, 9702,
9801, 9900, 9999])
True
True
True
4.2. Working with single elements of arrays
("indexing")
Let's work with a more interesting dataset. The next cell creates an array called
population_amounts
that includes estimated world populations in every year from 1950
to
roughly the present. (The estimates come from the US Census Bureau website.)
Rather than type in the data manually, we've loaded them from a file on your computer called
world_population.csv
.
In [61]:
# This array doesn't contain 9
# because np.arange stops *before* the stop value is reached
np
.
arange
(
4
, 9
, 1
)
Out[61]:
In [62]:
...
multiples_of_99 =
np
.
arange
(
0
,
10000
,
99
)
multiples_of_99
Out[62]:
In [63]:
# TEST
type
(
multiples_of_99
) ==
np
.
ndarray
Out[63]:
In [64]:
# TEST
len
(
multiples_of_99
) ==
102
Out[64]:
In [65]:
# TEST
all
(
multiples_of_99 ==
np
.
arange
(
0
, 9999
+
99
, 99
))
Out[65]:
In [66]:
population_amounts =
Table
.
read_table
(
"world_population.csv"
)
.
column
(
"Population"
)
population_amounts
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
19/29
array([2557628654, 2594939877, 2636772306, 2682053389, 2730228104,
2782098943, 2835299673, 2891349717, 2948137248, 3000716593,
3043001508, 3083966929, 3140093217, 3209827882, 3281201306,
3350425793, 3420677923, 3490333715, 3562313822, 3637159050,
3712697742, 3790326948, 3866568653, 3942096442, 4016608813,
4089083233, 4160185010, 4232084578, 4304105753, 4379013942,
4451362735, 4534410125, 4614566561, 4695736743, 4774569391,
4856462699, 4940571232, 5027200492, 5114557167, 5201440110,
5288955934, 5371585922, 5456136278, 5538268316, 5618682132,
5699202985, 5779440593, 5857972543, 5935213248, 6012074922,
6088571383, 6165219247, 6242016348, 6318590956, 6395699509,
6473044732, 6551263534, 6629913759, 6709049780, 6788214394,
6866332358, 6944055583, 7022349283, 7101027895, 7178722893,
7256490011], dtype=int64)
Here's how we get the first element of population_amounts
, which is the world population in
the first year in the dataset, 1950.
2557628654
The value of that expression is the number 2557628654 (around 2.5 billion), because that's the
first thing in the array population_amounts
.
Notice that we wrote .item(0)
, not .item(1)
, to get the first element. This is a weird
convention in computer science. 0 is called the index
of the first item. It's the number of
elements that appear before
that item. So 3 is the index of the 4th item.
Here are some more examples. In the examples, we've given names to the things we get out of
population_amounts
. Read and run each cell.
3140093217
7256490011
Out[66]:
In [67]:
population_amounts
.
item
(
0
)
Out[67]:
In [68]:
# The 13th element in the array is the population
# in 1962 (which is 1950 + 12).
population_1962 =
population_amounts
.
item
(
12
)
population_1962
Out[68]:
In [69]:
# The 66th element is the population in 2015.
population_2015 =
population_amounts
.
item
(
65
)
population_2015
Out[69]:
In [70]:
# The array has only 66 elements, so this doesn't work.
# (There's no element with 66 other elements before it.)
population_2016 =
population_amounts
.
item
(
66
)
population_2016
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
20/29
---------------------------------------------------------------------------
IndexError
Traceback (most recent call last)
Cell In[70], line 3
1 # The array has only 66 elements, so this doesn't work.
2 # (There's no element with 66 other elements before it.)
----> 3
population_2016 = population_amounts
.
item(
66
)
4 population_2016
IndexError
: index 66 is out of bounds for axis 0 with size 66
Since make_array
returns an array, we can call .item(3)
on its output to get its 4th
element, just like we "chained" together calls to the method replace
earlier.
-2
**Question 4.2.1.** Set population_1973
to the world population in 1973, by getting the
appropriate element from population_amounts
using item
.
3942096442
True
4.3. Doing something to every element of an array
Arrays are primarily useful for doing the same operation many times, so we don't often have to
use .item
and work with single elements.
Logarithms
Here is one simple question we might ask about world population:
How big was the population in orders of magnitude
in each year?
Orders of magnitude quantify how big a number is by representing it as the power of another
number (for example, representing 104 as ). One way to do this is by using the
logarithm function. The logarithm (base 10) of a number increases by 1 every time we multiply
the number by 10. It's like a measure of how many decimal digits the number has, or how big it
is in orders of magnitude.
We could try to answer our question like this, using the log10
function from the math
module and the item
method you just saw:
In [71]:
make_array
(
-
1
, -
3
, 4
, -
2
)
.
item
(
3
)
Out[71]:
In [73]:
population_1973 =
population_amounts
.
item
(
23
)
population_1973
Out[73]:
In [74]:
# TEST
population_1973 ==
3942096442
Out[74]:
In [77]:
population_1950_magnitude =
math
.
log10
(
population_amounts
.
item
(
0
))
population_1951_magnitude =
math
.
log10
(
population_amounts
.
item
(
1
))
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
21/29
Ellipsis
But this is tedious and doesn't really take advantage of the fact that we are using a computer.
Instead, NumPy provides its own version of log10
that takes the logarithm of each element of
an array. It takes a single array of numbers as its argument. It returns an array of the same
length, where the first element of the result is the logarithm of the first element of the
argument, and so on.
**Question 4.3.1.** Use np.log10
to compute the logarithms of the world population in every
year. Give the result (an array of 66 numbers) the name population_magnitudes
. Your code
should be very short.
array([ 9.40783749, 9.4141273 , 9.42107263, 9.42846742, 9.43619893,
9.44437257, 9.45259897, 9.46110062, 9.4695477 , 9.47722498,
9.48330217, 9.48910971, 9.49694254, 9.50648175, 9.51603288,
9.5251 , 9.53411218, 9.54286695, 9.55173218, 9.56076229,
9.56968959, 9.57867667, 9.58732573, 9.59572724, 9.60385954,
9.61162595, 9.61911264, 9.62655434, 9.63388293, 9.64137633,
9.64849299, 9.6565208 , 9.66413091, 9.67170374, 9.67893421,
9.68632006, 9.69377717, 9.70132621, 9.70880804, 9.7161236 ,
9.72336995, 9.73010253, 9.73688521, 9.74337399, 9.74963446,
9.75581413, 9.7618858 , 9.76774733, 9.77343633, 9.77902438,
9.7845154 , 9.78994853, 9.7953249 , 9.80062024, 9.80588805,
9.81110861, 9.81632507, 9.82150788, 9.82666101, 9.83175555,
9.83672482, 9.84161319, 9.84648243, 9.85132122, 9.85604719,
9.8607266 ])
True
True
What you just did is called elementwise
application of np.log10
, since np.log10
operates
separately on each element of the array that it's called on. Here's a picture of what's going on:
The textbook's section on arrays has a useful list of NumPy functions that are designed to work
elementwise, like np.log10
.
population_1952_magnitude =
math
.
log10
(
population_amounts
.
item
(
2
))
population_1953_magnitude =
math
.
log10
(
population_amounts
.
item
(
3
))
...
Out[77]:
In [76]:
population_magnitudes =
np
.
log10
(
population_amounts
)
population_magnitudes
Out[76]:
In [78]:
# TEST
# It looks like you're not making an array. You shouldn't need to
# use .item anywhere in your solution.
type
(
population_magnitudes
) ==
np
.
ndarray
Out[78]:
In [79]:
# TEST
# You made an array, but it doesn't have the right numbers in it.
sum
(
abs
(
population_magnitudes -
np
.
log10
(
population_amounts
))) <
1e-6
Out[79]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
22/29
Arithmetic
Arithmetic also works elementwise on arrays, meaning that if you perform an arithmetic
operation (like subtraction, division, etc) on an array, Python will do the operation to every
element of the array individually and return an array of all of the results. For example, you can
divide all the population numbers by 1 billion to get numbers in billions:
array([ 2.55762865, 2.59493988, 2.63677231, 2.68205339, 2.7302281 ,
2.78209894, 2.83529967, 2.89134972, 2.94813725, 3.00071659,
3.04300151, 3.08396693, 3.14009322, 3.20982788, 3.28120131,
3.35042579, 3.42067792, 3.49033371, 3.56231382, 3.63715905,
3.71269774, 3.79032695, 3.86656865, 3.94209644, 4.01660881,
4.08908323, 4.16018501, 4.23208458, 4.30410575, 4.37901394,
4.45136274, 4.53441012, 4.61456656, 4.69573674, 4.77456939,
4.8564627 , 4.94057123, 5.02720049, 5.11455717, 5.20144011,
5.28895593, 5.37158592, 5.45613628, 5.53826832, 5.61868213,
5.69920299, 5.77944059, 5.85797254, 5.93521325, 6.01207492,
6.08857138, 6.16521925, 6.24201635, 6.31859096, 6.39569951,
6.47304473, 6.55126353, 6.62991376, 6.70904978, 6.78821439,
6.86633236, 6.94405558, 7.02234928, 7.10102789, 7.17872289,
7.25649001])
You can do the same with addition, subtraction, multiplication, and exponentiation (
**
). For
example, you can calculate a tip on several restaurant bills at once (in this case just 3):
Restaurant bills:
[ 20.12 39.9 31.01]
Tips:
[ 4.024 7.98 6.202]
**Question 4.3.2.** Suppose the total charge at a restaurant is the original bill plus the tip. If the
tip is 20%, that means we can multiply the original bill by 1.2 to get the total charge. Compute
the total charge for each bill in restaurant_bills
, and assign the resulting array to
total_charges
.
array([ 24.144, 47.88 , 37.212])
In [80]:
population_in_billions =
population_amounts /
1000000000
population_in_billions
Out[80]:
In [81]:
restaurant_bills =
make_array
(
20.12
, 39.90
, 31.01
)
print
(
"Restaurant bills:\t"
, restaurant_bills
)
# Array multiplication
tips =
.2 *
restaurant_bills
print
(
"Tips:\t\t\t"
, tips
)
In [103…
total_charges =
restaurant_bills +
(
restaurant_bills
*
.2
)
total_charges
Out[103]:
In [104…
# TEST
# It looks like you're not making an array. You shouldn't need to
# use .item anywhere in your solution.
type
(
total_charges
) ==
np
.
ndarray
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
23/29
True
**Question 4.3.3.** The array more_restaurant_bills
contains 100,000 bills! Compute the
total charge for each one. How is your code different?
array([ 20.244, 20.892, 12.216, ..., 19.308, 18.336, 35.664])
True
True
The function sum
takes a single array of numbers as its argument. It returns the sum of all the
numbers in that array (so it returns a single number, not an array).
**Question 4.3.4.** What was the sum of all the bills in more_restaurant_bills
, including
tips
?
1795730.0640000193
True
**Question 4.3.5.** The powers of 2 (
, , , etc) arise frequently in computer
science. (For example, you may have noticed that storage on smartphones or USBs come in
powers of 2, like 16 GB, 32 GB, or 64 GB.) Use np.arange
and the exponentiation operator
**
to compute the first 30 powers of 2, starting from 2^0
.
Hint 1:
np.arange(1, 2**30, 1)
creates an array with elements and will crash your
kernel
.
Hint 2:
Part of your solution will involve np.arange
, but your array shouldn't have more than
30 elements.
Out[104]:
In [105…
more_restaurant_bills =
Table
.
read_table
(
"more_restaurant_bills.csv"
)
.
column
(
"Bill"
)
more_total_charges =
more_restaurant_bills +
(
more_restaurant_bills
*
.2
)
more_total_charges
Out[105]:
In [106…
# TEST
# It looks like you're not making an array. You shouldn't need to
# use .item anywhere in your solution.
type
(
more_total_charges
) ==
np
.
ndarray
Out[106]:
In [108…
# TEST
# You made an array, but it doesn't have the right numbers in it.
sum
(
abs
(
more_total_charges -
1.2 *
more_restaurant_bills
)) <
1e-6
Out[108]:
In [110…
sum_of_bills =
sum
(
more_restaurant_bills +
(
more_restaurant_bills
*
.2
))
sum_of_bills
Out[110]:
In [111…
# TEST
round
(
sum_of_bills
, 2
) ==
1795730.06
Out[111]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
24/29
array([ 1, 2, 4, 8, 16, 32,
64, 128, 256, 512, 1024, 2048,
4096, 8192, 16384, 32768, 65536, 131072,
262144, 524288, 1048576, 2097152, 4194304, 8388608,
16777216, 33554432, 67108864, 134217728, 268435456, 536870912], dtype=int3
2)
True
5. Creating Tables
An array is useful for describing a single attribute of each element in a collection. For example,
let's say our collection is all US States. Then an array could describe the land area of each state.
Tables extend this idea by containing multiple arrays, each one describing a different attribute
for every element of a collection. In this way, tables allow us to not only store data about many
entities but to also contain several kinds of data about each entity.
For example, in the cell below we have two arrays. The first one, population_amounts
, was
defined above in section 4.2 and contains the world population in each year (estimated by the
US Census Bureau). The second array, years
, contains the years themselves. These elements
are in order, so the year and the world population for that year have the same index in their
corresponding arrays.
Population column: [2557628654 2594939877 2636772306 2682053389 2730228104 2782098943
2835299673 2891349717 2948137248 3000716593 3043001508 3083966929
3140093217 3209827882 3281201306 3350425793 3420677923 3490333715
3562313822 3637159050 3712697742 3790326948 3866568653 3942096442
4016608813 4089083233 4160185010 4232084578 4304105753 4379013942
4451362735 4534410125 4614566561 4695736743 4774569391 4856462699
4940571232 5027200492 5114557167 5201440110 5288955934 5371585922
5456136278 5538268316 5618682132 5699202985 5779440593 5857972543
5935213248 6012074922 6088571383 6165219247 6242016348 6318590956
6395699509 6473044732 6551263534 6629913759 6709049780 6788214394
6866332358 6944055583 7022349283 7101027895 7178722893 7256490011]
Years column: [1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010 2011 2012 2013 2014 2015]
In [114…
powers_of_2 =
2
**
np
.
arange
(
30
)
powers_of_2
Out[114]:
In [115…
# TEST
all
(
powers_of_2 ==
2 **
np
.
arange
(
30
))
Out[115]:
In [116…
# Just run this cell
years =
np
.
arange
(
1950
, 2015
+
1
)
print
(
"Population column:"
, population_amounts
)
print
(
"Years column:"
, years
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
25/29
Suppose we want to answer this question:
In which year did the world's population cross 6 billion?
You could technically answer this question just from staring at the arrays, but it's a bit
convoluted, since you would have to count the position where the population first crossed 6
billion, then find the corresponding element in the years array. In cases like these, it might be
easier to put the data into a Table
, a 2-dimensional type of dataset.
The expression below:
creates an empty table using the expression Table()
,
adds two columns by calling with_columns
with four arguments,
assigns the result to the name population
, and finally
evaluates population
so that we can see the table.
The strings "Year"
and "Population"
are column labels that we have chosen. The names
population_amounts
and years
were assigned above to two arrays of the same length
.
The function with_columns
(you can find the documentation here
) takes in alternating strings
(to represent column labels) and arrays (representing the data in those columns). The strings
and arrays are separated by commas.
Population
Year
2557628654
1950
2594939877
1951
2636772306
1952
2682053389
1953
2730228104
1954
2782098943
1955
2835299673
1956
2891349717
1957
2948137248
1958
3000716593
1959
... (56 rows omitted)
Now the data is combined into a single table! It's much easier to parse this data. If you need to
know what the population was in 1959, for example, you can tell from a single glance.
In [117…
population =
Table
()
.
with_columns
(
"Population"
, population_amounts
,
"Year"
, years
)
population
Out[117]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
26/29
**Question 5.1.** In the cell below, we've created 2 arrays. Using the steps above, assign
top_10_movies
to a table that has two columns called "Rating" and "Name", which hold
top_10_movie_ratings
and top_10_movie_names
respectively.
Rating
Name
9.2
The Shawshank Redemption (1994)
9.2
The Godfather (1972)
9
The Godfather: Part II (1974)
8.9
Pulp Fiction (1994)
8.9
Schindler's List (1993)
8.9
The Lord of the Rings: The Return of the King (2003)
8.9
12 Angry Men (1957)
8.9
The Dark Knight (2008)
8.9
Il buono, il brutto, il cattivo (1966)
8.8
The Lord of the Rings: The Fellowship of the Ring (2001)
True
In [119…
top_10_movie_ratings =
make_array
(
9.2
, 9.2
, 9.
, 8.9
, 8.9
, 8.9
, 8.9
, 8.9
, 8.9
, 8.8
)
top_10_movie_names =
make_array
(
'The Shawshank Redemption (1994)'
,
'The Godfather (1972)'
,
'The Godfather: Part II (1974)'
,
'Pulp Fiction (1994)'
,
"Schindler's List (1993)"
,
'The Lord of the Rings: The Return of the King (2003)'
,
'12 Angry Men (1957)'
,
'The Dark Knight (2008)'
,
'Il buono, il brutto, il cattivo (1966)'
,
'The Lord of the Rings: The Fellowship of the Ring (2001)'
)
top_10_movies =
Table
()
.
with_columns
(
'Rating'
,
top_10_movie_ratings
,
'Name'
,
top_10_movie
# We've put this next line here # so your table will get printed out # when you run this cell.
top_10_movies
Out[119]:
In [120…
# TEST
type
(
top_10_movies
) ==
tables
.
Table
Out[120]:
In [121…
# TEST
top_10_movies
.
select
(
'Rating'
, 'Name'
)
.
sort
(
'Name'
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
27/29
Rating
Name
8.9
12 Angry Men (1957)
8.9
Il buono, il brutto, il cattivo (1966)
8.9
Pulp Fiction (1994)
8.9
Schindler's List (1993)
8.9
The Dark Knight (2008)
9.2
The Godfather (1972)
9
The Godfather: Part II (1974)
8.8
The Lord of the Rings: The Fellowship of the Ring (2001)
8.9
The Lord of the Rings: The Return of the King (2003)
9.2
The Shawshank Redemption (1994)
Loading a table from a file
In most cases, we aren't going to go through the trouble of typing in all the data manually.
Instead, we load them in from an external source, like a data file. There are many formats for
data files, but CSV ("comma-separated values") is the most common.
Table.read_table(...)
takes one argument (a path to a data file in string
format) and
returns a table.
**Question 5.2.** imdb.csv
contains a table of information about the 250 highest-rated
movies on IMDb. Load it as a table called imdb
.
Votes
Rating
Title
Year
Decade
88355
8.4
M
1931
1930
132823
8.3
Singin' in the Rain
1952
1950
74178
8.3
All About Eve
1950
1950
635139
8.6
Léon
1994
1990
145514
8.2
The Elephant Man
1980
1980
425461
8.3
Full Metal Jacket
1987
1980
441174
8.1
Gone Girl
2014
2010
850601
8.3
Batman Begins
2005
2000
37664
8.2
Judgment at Nuremberg
1961
1960
46987
8
Relatos salvajes
2014
2010
... (240 rows omitted)
Out[121]:
In [122…
imdb =
Table
.
read_table
(
'imdb.csv'
)
imdb
Out[122]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
28/29
True
True
Votes
Rating
Title
Year
Decade
31003
8.1
Le salaire de la peur
1953
1950
32385
8
La battaglia di Algeri
1966
1960
35983
8.1
The Best Years of Our Lives
1946
1940
6. More Table Operations!
Now that you've worked with arrays, let's add a few more methods to the list of table
operations.
column
column
takes the column name of a table (in string format) as its argument and returns the
values in that column as an array
.
array(['The Shawshank Redemption (1994)', 'The Godfather (1972)',
'The Godfather: Part II (1974)', 'Pulp Fiction (1994)',
"Schindler's List (1993)",
'The Lord of the Rings: The Return of the King (2003)',
'12 Angry Men (1957)', 'The Dark Knight (2008)',
'Il buono, il brutto, il cattivo (1966)',
'The Lord of the Rings: The Fellowship of the Ring (2001)'],
dtype='<U56')
take
The table method take
takes as its argument an array of numbers. Each number should be
the index of a row in the table. It returns a new table
with only those rows.
You'll usually want to use take
in conjunction with np.arange
to take the first few rows of a
table.
In [123…
# TEST
type
(
imdb
) ==
tables
.
Table
Out[123]:
In [124…
# TEST
imdb
.
num_rows ==
250
Out[124]:
In [125…
# TEST
imdb
.
select
(
'Votes'
, 'Rating'
, 'Title'
, 'Year'
, 'Decade'
)
.
sort
(
0
)
.
take
(
range
(
2
,
5
))
Out[125]:
In [126…
# Returns an array of movie names
top_10_movies
.
column
(
'Name'
)
Out[126]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/5/23, 11:20 PM
Lab02
localhost:8946/nbconvert/html/Downloads/Lab02.ipynb?download=false
29/29
Rating
Name
9.2
The Shawshank Redemption (1994)
9.2
The Godfather (1972)
9
The Godfather: Part II (1974)
8.9
Pulp Fiction (1994)
8.9
Schindler's List (1993)
You can find more table operations in the documentation for datascience.tables
.
You're done with Lab 2! Don't forget to choose print
to save it as PDF as well. Submit both the
notebook and the PDF to Canvas.
In [127…
# Take first 5 movies of top_10_movies
top_10_movies
.
take
(
np
.
arange
(
0
, 5
, 1
))
Out[127]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help