hw02_revised
pdf
keyboard_arrow_up
School
University of North Georgia, Dahlonega *
*We aren’t endorsed by this school
Course
MATH-240
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
18
Uploaded by SuperHumanKoala4250
hw02_revised
February 4, 2024
1
Homework 2: Arrays, Table Manipulation, and Visualization
[1]:
# Don't change this cell; just run it.
# When you log-in please hit return (not shift + return) after typing in your
␣
,
→
email
import
numpy
as
np
from
datascience
import
*
# These lines do some fancy plotting magic.\n",
import
matplotlib
%
matplotlib
inline
import
matplotlib.pyplot
as
plots
plots
.
style
.
use(
'fivethirtyeight'
)
Recommended Reading
: *
Data Types
*
Sequences
*
Tables
Please complete this notebook by filling in the cells provided. Throughout this homework and all
future ones, please be sure to not re-assign variables throughout the notebook! For example, if you
use
max_temperature
in your answer to one question, do not reassign it later on.
Before continuing the assignment, select “Save and Checkpoint” in the File menu.
1.1
1. Creating Arrays
Question 1.
Make an array called
weird_numbers
containing the following numbers (in the given
order):
1. -2
2. the sine of 1.2
3. 3
4. 5 to the power of the cosine of 1.2
Hint:
sin
and
cos
are functions in the
math
module.
Note:
Python lists are different/behave differently than numpy arrays. In Data 8, we use numpy
arrays, so please make an
array
, not a python list.
[2]:
# Our solution involved one extra line of code before creating
# weird_numbers.
import
math
1
weird_numbers
=
make_array(
-2
,math
.
sin(
1.2
),
3
,
5**
math
.
cos(
1.2
))
weird_numbers
[2]:
array([-2.
,
0.93203909,
3.
,
1.79174913])
Question 2.
Make an array called
numbers_in_order
using the
np.sort
function.
[3]:
numbers_in_order
=
make_array(np
.
sort(weird_numbers))
numbers_in_order
[3]:
array([[-2.
,
0.93203909,
1.79174913,
3.
]])
Question 3.
Find the mean and median of
weird_numbers
using the
np.mean
and
np.median
functions.
[4]:
weird_mean
=
(np
.
mean(weird_numbers))
weird_median
=
(np
.
median(weird_numbers))
# These lines are provided just to print out your answers.
print
(
'weird_mean:'
, weird_mean)
print
(
'weird_median:'
, weird_median)
weird_mean: 0.930947052910613
weird_median: 1.361894105821226
1.2
2. Indexing Arrays
These exercises give you practice accessing individual elements of arrays. In Python (and in many
programming languages), elements are accessed by
index
, so the first element is the element at
index 0.
Note:
Please don’t use bracket notation when indexing (i.e.
arr[0]
), as this can yield different
data type outputs than what we will be expecting.
Question 1.
The cell below creates an array of some numbers. Set
third_element
to the third
element of
some_numbers
.
[5]:
some_numbers
=
make_array(
-1
,
-3
,
-6
,
-10
,
-15
)
third_element
=
some_numbers[
2
]
third_element
[5]:
-6
Question 2.
The next cell creates a table that displays some information about the elements of
some_numbers
and their order. Run the cell to see the partially-completed table, then fill in the
missing information (the cells that say “Ellipsis”) by assigning
blank_a
,
blank_b
,
blank_c
, and
blank_d
to the correct elements in the table.
2
[6]:
blank_a
=
(
'fourth'
)
blank_b
=
(
'third'
)
blank_c
= 0
blank_d
= 3
elements_of_some_numbers
=
Table()
.
with_columns(
"English name for position"
, make_array(
"first"
,
"second"
, blank_a,
␣
,
→
blank_b,
"fifth"
),
"Index"
,
make_array(blank_c,
1
,
2
, blank_d,
4
),
"Element"
,
some_numbers)
elements_of_some_numbers
[6]:
English name for position | Index | Element
first
| 0
| -1
second
| 1
| -3
fourth
| 2
| -6
third
| 3
| -10
fifth
| 4
| -15
Question 3.
You’ll sometimes want to find the
last
element of an array. Suppose an array has
142 elements. What is the index of its last element?
[7]:
index_of_last_element
= 141
More often, you don’t know the number of elements in an array, its
length
. (For example, it might
be a large dataset you found on the Internet.) The function
len
takes a single argument, an array,
and returns the
len
gth of that array (an integer).
Question 4.
The cell below loads an array called
president_birth_years
. Calling
.column(...)
on a table returns an array of the column specified, in this case the
Birth Year
column of the
president_births
table.
The last element in that array is the most recent birth year of any
deceased president. Assign that year to
most_recent_birth_year
.
[8]:
president_birth_years
=
Table
.
read_table(
"president_births.csv"
)
.
column(
'Birth
␣
,
→
Year'
)
most_recent_birth_year
=
president_birth_years[
37
]
most_recent_birth_year
[8]:
1917
Question 5.
Finally, assign
sum_of_birth_years
to the sum of the first, tenth, and last birth
year in
president_birth_years
.
[9]:
sum_of_birth_years
=
(president_birth_years
.
item(
0
)
+
president_birth_years
.
,
→
item(
9
)
+
president_birth_years
.
item(
37
))
sum_of_birth_years
[9]:
5433
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1.3
3. Basic Array Arithmetic
Question 1.
Multiply the numbers 42, 4224, 42422424, and -250 by 157.
Assign each variable
below such that
first_product
is assigned to the result of
42
∗
157
,
second_product
is assigned
to the result of
4224
∗
157
, and so on.
For this question,
don’t
use arrays.
[10]:
first_product
= 42*157
second_product
= 4224*157
third_product
= 42422424*157
fourth_product
= -250*157
print
(first_product, second_product, third_product, fourth_product)
6594 663168 6660320568 -39250
Question 2.
Now, do the same calculation, but using an array called
numbers
and only a single
multiplication (
*
) operator. Store the 4 results in an array named
products
.
[11]:
numbers
=
make_array(
42
,
4224
,
42422424
,
-250
)
products
=
numbers
*157
products
[11]:
array([
6594,
663168, 6660320568,
-39250])
Question 3.
Oops, we made a typo! Instead of 157, we wanted to multiply each number by 1577.
Compute the correct products in the cell below using array arithmetic.
Notice that your job is
really easy if you previously defined an array containing the 4 numbers.
[12]:
correct_products
=
numbers
*1577
correct_products
[12]:
array([
66234,
6661248, 66900162648,
-394250])
Question 4.
We’ve loaded an array of temperatures in the next cell. Each number is the highest
temperature observed on a day at a climate observation station, mostly from the US. Since they’re
from the US government agency
NOAA
, all the temperatures are in Fahrenheit.
Convert them
all to Celsius by first subtracting 32 from them, then multiplying the results by
5
9
. Make sure to
ROUND
the final result after converting to Celsius to the nearest integer using the
np.round
function.
[13]:
max_temperatures
=
Table
.
read_table(
"temperatures.csv"
)
.
column(
"Daily Max
␣
,
→
Temperature"
)
celsius_max_temperatures
=
np
.
round((max_temperatures)
-32
)
*5/9
celsius_max_temperatures
[13]:
array([-3.88888889, 30.55555556, 31.66666667, …, 16.66666667,
22.77777778, 16.11111111])
4
Question 5.
The cell below loads all the
lowest
temperatures from each day (in Fahrenheit).
Compute the size of the daily temperature range for each day.
That is, compute the difference
between each daily maximum temperature and the corresponding daily minimum temperature.
Pay attention to the units, give your answer in Celsius!
Make sure
NOT
to round your
answer for this question!
[14]:
min_temperatures
=
Table
.
read_table(
"temperatures.csv"
)
.
column(
"Daily Min
␣
,
→
Temperature"
)
celsius_temperature_ranges
=
((max_temperatures)
-32
)
*5/
,
→
9-
((min_temperatures)
-32
)
*5/9
celsius_temperature_ranges
[14]:
array([ 6.66666667, 10.
, 12.22222222, …, 17.22222222,
11.66666667, 11.11111111])
1.4
4. World Population
The cell below loads a table of estimates of the world population for different years, starting in
1950. The estimates come from the
US Census Bureau website
.
[15]:
world
=
Table
.
read_table(
"world_population.csv"
)
.
select(
'Year'
,
'Population'
)
world
.
show(
4
)
<IPython.core.display.HTML object>
The name
population
is assigned to an array of population estimates.
[16]:
population
=
world
.
column(
1
)
population
[16]:
array([2557628654, 2594939877, 2636772306, 2682053389, 2730228104,
2782098943, 2835299673, 2891349717, 2948137248, 3000716593,
3043001508, 3083966929, 3140093217, 3209827882, 3281201306,
3350425793, 3420677923, 3490333715, 3562313822, 3637159050,
3712697742, 3790326948, 3866568653, 3942096442, 4016608813,
4089083233, 4160185010, 4232084578, 4304105753, 4379013942,
4451362735, 4534410125, 4614566561, 4695736743, 4774569391,
4856462699, 4940571232, 5027200492, 5114557167, 5201440110,
5288955934, 5371585922, 5456136278, 5538268316, 5618682132,
5699202985, 5779440593, 5857972543, 5935213248, 6012074922,
6088571383, 6165219247, 6242016348, 6318590956, 6395699509,
6473044732, 6551263534, 6629913759, 6709049780, 6788214394,
6866332358, 6944055583, 7022349283, 7101027895, 7178722893,
7256490011])
In this question, you will apply some built-in Numpy functions to this array. Numpy is a module
that is often used in Data Science!
5
The difference function
np.diff
subtracts each element in an array from the element after it within
the array. As a result, the length of the array
np.diff
returns will always be one less than the
length of the input array.
The cumulative sum function
np.cumsum
outputs an array of partial sums. For example, the third
element in the output array corresponds to the sum of the first, second, and third elements.
Question 1.
Very often in data science, we are interested understanding how values change with
time. Use
np.diff
and
np.max
(or just
max
) to calculate the largest annual change in population
between any two consecutive years.
[17]:
largest_population_change
=
np
.
max(population)
largest_population_change
[17]:
7256490011
Question 2.
What do the values in the resulting array represent (choose one)?
[18]:
np
.
cumsum(np
.
diff(population))
[18]:
array([
37311223,
79143652,
124424735,
172599450,
224470289,
277671019,
333721063,
390508594,
443087939,
485372854,
526338275,
582464563,
652199228,
723572652,
792797139,
863049269,
932705061, 1004685168, 1079530396, 1155069088,
1232698294, 1308939999, 1384467788, 1458980159, 1531454579,
1602556356, 1674455924, 1746477099, 1821385288, 1893734081,
1976781471, 2056937907, 2138108089, 2216940737, 2298834045,
2382942578, 2469571838, 2556928513, 2643811456, 2731327280,
2813957268, 2898507624, 2980639662, 3061053478, 3141574331,
3221811939, 3300343889, 3377584594, 3454446268, 3530942729,
3607590593, 3684387694, 3760962302, 3838070855, 3915416078,
3993634880, 4072285105, 4151421126, 4230585740, 4308703704,
4386426929, 4464720629, 4543399241, 4621094239, 4698861357])
1) The total population change between consecutive years, starting at 1951.
2) The total population change between 1950 and each later year, starting at 1951.
3) The total population change between 1950 and each later year, starting inclusively at 1950.
[19]:
# Assign cumulative_sum_answer to 1, 2, or 3
cumulative_sum_answer
= 3
1.5
5. Old Faithful
Old Faithful is a geyser in Yellowstone that erupts every 44 to 125 minutes (according to
Wikipedia
).
People are
often told that the geyser erupts every hour
, but in fact the waiting time between
eruptions is more variable. Let’s take a look.
Question 1.
The first line below assigns
waiting_times
to an array of 272 consecutive waiting
times between eruptions, taken from a classic 1938 dataset. Assign the names
shortest
,
longest
,
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
and
average
so that the
print
statement is correct.
[20]:
waiting_times
=
Table
.
read_table(
'old_faithful.csv'
)
.
column(
'waiting'
)
shortest
=
np
.
min(waiting_times)
longest
=
np
.
max(waiting_times)
average
=
np
.
max(waiting_times)
-
np
.
min(waiting_times)
print
(
"Old Faithful erupts every"
, shortest,
"to"
, longest,
"minutes and
␣
,
→
every"
, average,
"minutes on average."
)
Old Faithful erupts every 43 to 96 minutes and every 53 minutes on average.
Question 2.
Assign
biggest_decrease
to the biggest decrease in waiting time between two
consecutive eruptions. For example, the third eruption occurred after 74 minutes and the fourth
after 62 minutes, so the decrease in waiting time was 74 - 62 = 12 minutes.
Hint 1
: You’ll need an array arithmetic function
mentioned in the textbook
. You have also seen
this function earlier in the homework!
Hint 2
: We want to return the absolute value of the biggest decrease.
[21]:
biggest_decrease
=
abs
(np
.
max(waiting_times))
biggest_decrease
[21]:
96
Question 3.
If you expected Old Faithful to erupt every hour, you would expect to wait a total
of
60 * k
minutes to see
k
eruptions.
Set
difference_from_expected
to an array with 272
elements, where the element at index
i
is the absolute difference between the expected and actual
total amount of waiting time to see the first
i+1
eruptions.
Hint
: You’ll need to compare a cumulative sum to a range.
You’ll go through
np.arange
more
thoroughly in Lab 3, but you can read about it in this
textbook section
.
For example, since the first three waiting times are 79, 54, and 74, the total waiting time for 3
eruptions is 79 + 54 + 74 = 207.
The expected waiting time for 3 eruptions is 60 * 3 = 180.
Therefore,
difference_from_expected.item(2)
should be
|
207
−
180
|
= 27
.
[22]:
difference_from_expected
=
make_array(waiting_times)
difference_from_expected
[22]:
array([[79, 54, 74, 62, 85, 55, 88, 85, 51, 85, 54, 84, 78, 47, 83, 52,
62, 84, 52, 79, 51, 47, 78, 69, 74, 83, 55, 76, 78, 79, 73, 77,
66, 80, 74, 52, 48, 80, 59, 90, 80, 58, 84, 58, 73, 83, 64, 53,
82, 59, 75, 90, 54, 80, 54, 83, 71, 64, 77, 81, 59, 84, 48, 82,
60, 92, 78, 78, 65, 73, 82, 56, 79, 71, 62, 76, 60, 78, 76, 83,
75, 82, 70, 65, 73, 88, 76, 80, 48, 86, 60, 90, 50, 78, 63, 72,
84, 75, 51, 82, 62, 88, 49, 83, 81, 47, 84, 52, 86, 81, 75, 59,
89, 79, 59, 81, 50, 85, 59, 87, 53, 69, 77, 56, 88, 81, 45, 82,
55, 90, 45, 83, 56, 89, 46, 82, 51, 86, 53, 79, 81, 60, 82, 77,
7
76, 59, 80, 49, 96, 53, 77, 77, 65, 81, 71, 70, 81, 93, 53, 89,
45, 86, 58, 78, 66, 76, 63, 88, 52, 93, 49, 57, 77, 68, 81, 81,
73, 50, 85, 74, 55, 77, 83, 83, 51, 78, 84, 46, 83, 55, 81, 57,
76, 84, 77, 81, 87, 77, 51, 78, 60, 82, 91, 53, 78, 46, 77, 84,
49, 83, 71, 80, 49, 75, 64, 76, 53, 94, 55, 76, 50, 82, 54, 75,
78, 79, 78, 78, 70, 79, 70, 54, 86, 50, 90, 54, 54, 77, 79, 64,
75, 47, 86, 63, 85, 82, 57, 82, 67, 74, 54, 83, 73, 73, 88, 80,
71, 83, 56, 79, 78, 84, 58, 83, 43, 60, 75, 81, 46, 90, 46, 74]])
Question 4.
Let’s imagine your guess for the next wait time was always just the length of the
previous waiting time. If you always guessed the previous waiting time, how big would your error
in guessing the waiting times be, on average?
For example, since the first three waiting times are 79, 54, and 74, the average difference between
your guess and the actual time for just the second and third eruption would be
|
79
−
54
|
+
|
54
−
74
|
2
= 22
.
5
.
[23]:
average_error
= ...
# average_error
1.6
6. Tables
Question 1.
Suppose you have 4 apples, 3 oranges, and 3 pineapples.
(Perhaps you’re using
Python to solve a high school Algebra problem.) Create a table that contains this information. It
should have two columns:
fruit name
and
count
. Assign the new table to the variable
fruits
.
Note:
Use lower-case and singular words for the name of each fruit, like
"apple"
.
[24]:
# Our solution uses 1 statement split over 3 lines.
fruits
=
make_array(
'apple'
,
'orange'
,
'pineapple'
)
fruit_count
=
make_array(
4
,
3
,
3
)
fruits
=
Table()
.
with_columns(
'fruits'
,fruits,
'count'
,fruit_count)
fruits
# fruits
[24]:
fruits
| count
apple
| 4
orange
| 3
pineapple | 3
Question 2.
The file
inventory.csv
contains information about the inventory at a fruit stand.
Each row represents the contents of one box of fruit. Load it as a table named
inventory
using
the
Table.read_table()
function.
Table.read_table(...)
takes one argument (data file name
in string format) and returns a table.
[25]:
inventory
=
Table
.
read_table(
'inventory.csv'
)
inventory
# inventory
8
[25]:
box ID | fruit name | count
53686
| kiwi
| 45
57181
| strawberry | 123
25274
| apple
| 20
48800
| orange
| 35
26187
| strawberry | 255
57930
| grape
| 517
52357
| strawberry | 102
43566
| peach
| 40
Question 3.
Does each box at the fruit stand contain a different fruit? Set
all_different
to
True
if each box contains a different fruit or to
False
if multiple boxes contain the same fruit.
Hint:
You don’t have to write code to calculate the True/False value for
all_different
. Just look
at the
inventory
table and assign
all_different
to either
True
or
False
according to what you
can see from the table in answering the question.
[26]:
all_different
=
False
# all_different
Question 4.
The file
sales.csv
contains the number of fruit sold from each box last Saturday.
It has an extra column called “price per fruit ($)” that’s the price
per item of fruit
for fruit in that
box. The rows are in the same order as the
inventory
table. Load these data into a table called
sales
.
[27]:
sales
=
Table
.
read_table(
'sales.csv'
)
sales
# sales
[27]:
box ID | fruit name | count sold | price per fruit ($)
53686
| kiwi
| 3
| 0.5
57181
| strawberry | 101
| 0.2
25274
| apple
| 0
| 0.8
48800
| orange
| 35
| 0.6
26187
| strawberry | 25
| 0.15
57930
| grape
| 355
| 0.06
52357
| strawberry | 102
| 0.25
43566
| peach
| 17
| 0.8
Question 5.
How many fruits did the store sell in total on that day?
[28]:
total_fruits_sold
=
sum
(sales
.
column(
2
))
total_fruits_sold
# total_fruits_sold
[28]:
638
Question 6.
What was the store’s total revenue (the total price of all fruits sold) on that day?
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Hint:
If you’re stuck, think first about how you would compute the total revenue from just the
grape sales.
[29]:
count
=
sales
.
column(
2
)
price
=
sales
.
column(
3
)
total_revenue
=
sum
(count
*
price)
total_revenue
[29]:
106.85
Question 7.
Make a new table called
remaining_inventory
. It should have the same rows and
columns as
inventory
, except that the amount of fruit sold from each box should be subtracted
from that box’s count, so that the “count” is the amount of fruit remaining after Saturday.
[30]:
remaining_inventory
=
Table()
.
with_columns(
'box ID'
, inventory
.
column(
'box
␣
,
→
ID'
),
'fruit name'
, inventory
.
column(
'fruit name'
),
'count'
, inventory
.
,
→
column(
'count'
)
-
sales
.
column(
'count sold'
))
remaining_inventory
[30]:
box ID | fruit name | count
53686
| kiwi
| 42
57181
| strawberry | 22
25274
| apple
| 20
48800
| orange
| 0
26187
| strawberry | 230
57930
| grape
| 162
52357
| strawberry | 0
43566
| peach
| 23
1.7
7. Unemployment
The Federal Reserve Bank of St. Louis publishes data about jobs in the US. Below, we’ve loaded
data on unemployment in the United States. There are many ways of defining unemployment, and
our dataset includes two notions of the unemployment rate:
1. Among people who are able to work and are looking for a full-time job, the percentage who
can’t find a job. This is called the Non-Employment Index, or NEI.
2. Among people who are able to work and are looking for a full-time job, the percentage who
can’t find any job
or
are only working at a part-time job. The latter group is called “Part-
Time for Economic Reasons”, so the acronym for this index is NEI-PTER. (Economists are
great at marketing.)
The source of the data is
here
.
Question 1.
The data are in a CSV file called
unemployment.csv
.
Load that file into a table
called
unemployment
.
10
[31]:
unemployment
=
farmers_markets
=
Table
.
read_table(
'unemployment.csv'
)
unemployment
[31]:
Date
| NEI
| NEI-PTER
1994-01-01 | 10.0974 | 11.172
1994-04-01 | 9.6239
| 10.7883
1994-07-01 | 9.3276
| 10.4831
1994-10-01 | 9.1071
| 10.2361
1995-01-01 | 8.9693
| 10.1832
1995-04-01 | 9.0314
| 10.1071
1995-07-01 | 8.9802
| 10.1084
1995-10-01 | 8.9932
| 10.1046
1996-01-01 | 9.0002
| 10.0531
1996-04-01 | 8.9038
| 9.9782
… (80 rows omitted)
Question 2.
Sort the data in descending order by NEI, naming the sorted table
by_nei
. Create
another table called
by_nei_pter
that’s sorted in descending order by NEI-PTER instead.
[32]:
by_nei
=
unemployment
.
sort(
'NEI'
,descending
=
True
)
by_nei_pter
=
unemployment
.
sort(
'NEI'
,descending
=
True
)
by_nei
[32]:
Date
| NEI
| NEI-PTER
2009-10-01 | 10.9698 | 12.8557
2010-01-01 | 10.9054 | 12.7311
2009-07-01 | 10.8089 | 12.7404
2009-04-01 | 10.7082 | 12.5497
2010-04-01 | 10.6597 | 12.5664
2010-10-01 | 10.5856 | 12.4329
2010-07-01 | 10.5521 | 12.3897
2011-01-01 | 10.5024 | 12.3017
2011-07-01 | 10.4856 | 12.2507
2011-04-01 | 10.4409 | 12.247
… (80 rows omitted)
Question 3.
Use
take
to make a table containing the data for the 10 quarters when NEI was
greatest. Call that table
greatest_nei
.
greatest_nei
should be sorted in descending order of
NEI
. Note that each row of
unemployment
represents a quarter.
[33]:
greatest_nei
=
by_nei
.
take(np
.
arange(
10
))
greatest_nei
[33]:
Date
| NEI
| NEI-PTER
2009-10-01 | 10.9698 | 12.8557
2010-01-01 | 10.9054 | 12.7311
11
2009-07-01 | 10.8089 | 12.7404
2009-04-01 | 10.7082 | 12.5497
2010-04-01 | 10.6597 | 12.5664
2010-10-01 | 10.5856 | 12.4329
2010-07-01 | 10.5521 | 12.3897
2011-01-01 | 10.5024 | 12.3017
2011-07-01 | 10.4856 | 12.2507
2011-04-01 | 10.4409 | 12.247
Question 4.
It’s believed that many people became PTER (recall:
“Part-Time for Economic
Reasons”) in the “Great Recession” of 2008-2009. NEI-PTER is the percentage of people who are
unemployed (and counted in the NEI) plus the percentage of people who are PTER. Compute an
array containing the percentage of people who were PTER in each quarter. (The first element of
the array should correspond to the first row of
unemployment
, and so on.)
Note:
Use the original
unemployment
table for this.
[34]:
pter
=
unemployment
.
column(
'NEI-PTER'
)
-
unemployment
.
column(
'NEI'
)
pter
[34]:
array([1.0746, 1.1644, 1.1555, 1.129 , 1.2139, 1.0757, 1.1282, 1.1114,
1.0529, 1.0744, 1.1004, 1.0747, 1.0705, 1.0455, 1.008 , 0.9734,
0.9753, 0.8931, 0.9451, 0.8367, 0.8208, 0.8105, 0.8248, 0.7578,
0.7251, 0.7445, 0.7543, 0.7423, 0.7399, 0.7687, 0.8418, 0.9923,
0.9181, 0.9629, 0.9703, 0.9575, 1.0333, 1.0781, 1.0675, 1.0354,
1.0601, 1.01
, 1.0042, 1.0368, 0.9704, 0.923 , 0.9759, 0.93
,
0.889 , 0.821 , 0.9409, 0.955 , 0.898 , 0.8948, 0.9523, 0.9579,
1.0149, 1.0762, 1.2873, 1.4335, 1.7446, 1.8415, 1.9315, 1.8859,
1.8257, 1.9067, 1.8376, 1.8473, 1.7993, 1.8061, 1.7651, 1.7927,
1.7286, 1.6387, 1.6808, 1.6805, 1.6629, 1.6253, 1.6477, 1.6298,
1.4796, 1.5131, 1.4866, 1.4345, 1.3675, 1.3097, 1.2319, 1.1735,
1.1844, 1.1746])
Question 5.
Add
pter
as a column to
unemployment
(named “PTER”) and sort the resulting
table by that column in descending order. Call the table
by_pter
.
Try to do this with a single line of code, if you can.
[35]:
by_pter
=
unemployment
.
with_columns(
'PTER'
, pter)
.
sort(
'PTER'
,descending
=
True
)
by_pter
[35]:
Date
| NEI
| NEI-PTER | PTER
2009-07-01 | 10.8089 | 12.7404
| 1.9315
2010-04-01 | 10.6597 | 12.5664
| 1.9067
2009-10-01 | 10.9698 | 12.8557
| 1.8859
2010-10-01 | 10.5856 | 12.4329
| 1.8473
2009-04-01 | 10.7082 | 12.5497
| 1.8415
2010-07-01 | 10.5521 | 12.3897
| 1.8376
2010-01-01 | 10.9054 | 12.7311
| 1.8257
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2011-04-01 | 10.4409 | 12.247
| 1.8061
2011-01-01 | 10.5024 | 12.3017
| 1.7993
2011-10-01 | 10.3287 | 12.1214
| 1.7927
… (80 rows omitted)
Question 6.
Create a line plot of the PTER over time.
To do this, create a new table called
pter_over_time
that adds the
year
array and the
pter
array
to the
unemployment
table. Label these columns
Year
and
PTER
. Then, generate a line plot using
one of the table methods you’ve learned in class.
[36]:
year
= 1994 +
np
.
arange(by_pter
.
num_rows)
/4
pter_over_time
=
unemployment
.
with_columns(
'Year'
,year,
'PTER'
,pter)
pter_over_time
.
plot(
'Year'
,
'PTER'
)
13
Question 7.
Were PTER rates high during the Great Recession (that is to say, were PTER rates
particularly high in the years 2008 through 2011)? Assign highPTER to
True
if you think PTER
rates were high in this period, and
False
if you think they weren’t.
[37]:
highPTER
=
True
1.8
8. Birth Rates
The following table gives census-based population estimates for each state on both July 1, 2015 and
July 1, 2016. The last four columns describe the components of the estimated change in population
during this time interval.
For all questions below, assume that the word “states” refers
to all 52 rows including Puerto Rico & the District of Columbia.
The data was taken from
here
.
If you want to read more about the different column descriptions, click
here
!
The raw data is a bit messy - run the cell below to clean the table and make it easier to work with.
[38]:
# Don't change this cell; just run it.
pop
=
Table
.
read_table(
'nst-est2016-alldata.csv'
)
.
where(
'SUMLEV'
,
40
)
.
,
→
select([
1
,
4
,
12
,
13
,
27
,
34
,
62
,
69
])
pop
=
pop
.
relabeled(
'POPESTIMATE2015'
,
'2015'
)
.
relabeled(
'POPESTIMATE2016'
,
␣
,
→
'2016'
)
pop
=
pop
.
relabeled(
'BIRTHS2016'
,
'BIRTHS'
)
.
relabeled(
'DEATHS2016'
,
'DEATHS'
)
pop
=
pop
.
relabeled(
'NETMIG2016'
,
'MIGRATION'
)
.
relabeled(
'RESIDUAL2016'
,
␣
,
→
'OTHER'
)
pop
=
pop
.
with_columns(
"REGION"
, np
.
array([
int
(region)
if
region
!=
"X"
else
0
␣
,
→
for
region
in
pop
.
column(
"REGION"
)]))
pop
.
set_format([
2
,
3
,
4
,
5
,
6
,
7
], NumberFormatter(decimals
=0
))
.
show(
5
)
<IPython.core.display.HTML object>
Question 1.
Assign
us_birth_rate
to the total US annual birth rate during this time interval.
The annual birth rate for a year-long period is the total number of births in that period as a
proportion of the population size at the start of the time period.
Hint:
Which year corresponds to the start of the time period?
[39]:
us_birth_rate
=
sum
(pop
.
column(
'BIRTHS'
))
/
sum
(pop
.
column(
'2015'
))
us_birth_rate
[39]:
0.012358536498646102
Question 2.
In the next question, you will be creating a visualization to understand the relationship between
birth and death rates. The annual death rate for a year-long period is the total number of deaths
in that period as a proportion of the population size at the start of the time period.
14
What visualization is most appropriate to see if there is an association between birth and death
rates during a given time interval?
1. Line Graph
2. Scatter Plot
3. Bar Chart
Assign
visualization
below to the number corresponding to the correct visualization.
[40]:
visualization
= 3
Question 3.
In the code cell below, create a visualization that will help us determine if there is an association
between birth rate and death rate during this time interval. It may be helpful to create an inter-
mediate table here. The birth rate for each region is the total number of births in that region as a
proportion of the population size at the start of the time period. The death rate for each region is
the total number of deaths in that region as a proportion of the population size at the start of the
time period.
[41]:
# Generate your chart in this cell
birth_rate
=
us_birth_rate
=
sum
(pop
.
column(
'BIRTHS'
))
/
sum
(pop
.
column(
'2015'
))
death_rate
=
us_death_rate
=
sum
(pop
.
column(
'DEATHS'
))
/
sum
(pop
.
column(
'2015'
))
rates
=
Table()
.
with_columns(
'BR'
, birth_rate,
'DR'
, death_rate)
rates
[41]:
BR
| DR
0.0123585 | 0.00855234
Question 7.
True
or
False
: There is an association between birth rate and death rate during this
time interval.
Assign
assoc
to
True
or
False
in the cell below.
[42]:
assoc
=
True
1.9
9. Uber
Below we load tables containing 200,000 weekday Uber rides in the Manila, Philippines, and Boston,
Massachusetts metropolitan areas from the
Uber Movement
project.
The
sourceid
and
dstid
columns contain codes corresponding to start and end locations of each ride.
The
hod
column
contains codes corresponding to the hour of the day the ride took place. The
ride time
column
contains the length of the ride, in minutes.
[43]:
boston
=
Table
.
read_table(
"boston.csv"
)
manila
=
Table
.
read_table(
"manila.csv"
)
print
(
"Boston Table"
)
boston
.
show(
4
)
print
(
"Manila Table"
)
manila
.
show(
4
)
15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Boston Table
<IPython.core.display.HTML object>
Manila Table
<IPython.core.display.HTML object>
Question 1.
Produce histograms of all ride times in Boston using the given bins.
[44]:
equal_bins
=
np
.
arange(
0
,
120
,
5
)
boston
.
hist(
'ride time'
, bins
=
equal_bins)
Question 2.
Now, produce histograms of all ride times in Manila using the given bins.
[45]:
equal_bins
=
np
.
arange(
0
,
120
,
5
)
manila
.
hist(
'ride time'
, bins
=
equal_bins)
# Don't delete the following line!
plots
.
ylim(
0
,
0.05
)
[45]:
(0.0, 0.05)
16
Question 3.
Assign
boston_under_10
and
manila_under_10
to the percentage of rides that are
less than 10 minutes in their respective metropolitan areas. Use the height variables provided below
in order to compute the percentages. Your solution should only use height variables, numbers, and
mathematical operations. You should not access the tables boston and manila in any way.
[46]:
boston_under_5_height
= 1.2
manila_under_5_height
= 0.6
boston_5_to_under_10_height
= 3.2
manila_5_to_under_10_height
= 1.4
boston_under_10
=
(
1.2+3.2
)
/4.4
manila_under_10
=
(
0.6+1.4
)
/2
boston_under_10,manila_under_10
[46]:
(1.0, 1.0)
Question 4.
Let’s take a closer look at the distribution of ride times in Manila.
Assign
manila_median_bin to an integer (1, 2, 3, or 4) that corresponds to the bin that contains the
median time
1: 0-15 minutes 2: 15-40 minutes 3: 40-60 minutes 4: 60-80 minutes
Hint: The median of a sorted list has half of the list elements to its left, and half to its right
17
[47]:
manila
.
hist(
'ride time'
, bins
=
equal_bins)
# Don't delete the following line!
plots
.
ylim(
0
,
0.05
)
manila_median_bin
= 2
Question 5.
What is the main difference between the two histograms. What might be causing
this?
Hint:
Try thinking about external factors that may be causing the difference!
Boston has a lot more rides in the 15-30 minute range, where Manila has many way outside of that
zone.
[ ]:
[ ]:
18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help