hw02_revised

pdf

School

University of North Georgia, Dahlonega *

*We aren’t endorsed by this school

Course

MATH-240

Subject

Mathematics

Date

Apr 3, 2024

Type

pdf

Pages

18

Uploaded by SuperHumanKoala4250

Report
hw02_revised February 4, 2024 1 Homework 2: Arrays, Table Manipulation, and Visualization [1]: # Don't change this cell; just run it. # When you log-in please hit return (not shift + return) after typing in your , email import numpy as np from datascience import * # These lines do some fancy plotting magic.\n", import matplotlib % matplotlib inline import matplotlib.pyplot as plots plots . style . use( 'fivethirtyeight' ) Recommended Reading : * Data Types * Sequences * Tables Please complete this notebook by filling in the cells provided. Throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use max_temperature in your answer to one question, do not reassign it later on. Before continuing the assignment, select “Save and Checkpoint” in the File menu. 1.1 1. Creating Arrays Question 1. Make an array called weird_numbers containing the following numbers (in the given order): 1. -2 2. the sine of 1.2 3. 3 4. 5 to the power of the cosine of 1.2 Hint: sin and cos are functions in the math module. Note: Python lists are different/behave differently than numpy arrays. In Data 8, we use numpy arrays, so please make an array , not a python list. [2]: # Our solution involved one extra line of code before creating # weird_numbers. import math 1
weird_numbers = make_array( -2 ,math . sin( 1.2 ), 3 , 5** math . cos( 1.2 )) weird_numbers [2]: array([-2. , 0.93203909, 3. , 1.79174913]) Question 2. Make an array called numbers_in_order using the np.sort function. [3]: numbers_in_order = make_array(np . sort(weird_numbers)) numbers_in_order [3]: array([[-2. , 0.93203909, 1.79174913, 3. ]]) Question 3. Find the mean and median of weird_numbers using the np.mean and np.median functions. [4]: weird_mean = (np . mean(weird_numbers)) weird_median = (np . median(weird_numbers)) # These lines are provided just to print out your answers. print ( 'weird_mean:' , weird_mean) print ( 'weird_median:' , weird_median) weird_mean: 0.930947052910613 weird_median: 1.361894105821226 1.2 2. Indexing Arrays These exercises give you practice accessing individual elements of arrays. In Python (and in many programming languages), elements are accessed by index , so the first element is the element at index 0. Note: Please don’t use bracket notation when indexing (i.e. arr[0] ), as this can yield different data type outputs than what we will be expecting. Question 1. The cell below creates an array of some numbers. Set third_element to the third element of some_numbers . [5]: some_numbers = make_array( -1 , -3 , -6 , -10 , -15 ) third_element = some_numbers[ 2 ] third_element [5]: -6 Question 2. The next cell creates a table that displays some information about the elements of some_numbers and their order. Run the cell to see the partially-completed table, then fill in the missing information (the cells that say “Ellipsis”) by assigning blank_a , blank_b , blank_c , and blank_d to the correct elements in the table. 2
[6]: blank_a = ( 'fourth' ) blank_b = ( 'third' ) blank_c = 0 blank_d = 3 elements_of_some_numbers = Table() . with_columns( "English name for position" , make_array( "first" , "second" , blank_a, , blank_b, "fifth" ), "Index" , make_array(blank_c, 1 , 2 , blank_d, 4 ), "Element" , some_numbers) elements_of_some_numbers [6]: English name for position | Index | Element first | 0 | -1 second | 1 | -3 fourth | 2 | -6 third | 3 | -10 fifth | 4 | -15 Question 3. You’ll sometimes want to find the last element of an array. Suppose an array has 142 elements. What is the index of its last element? [7]: index_of_last_element = 141 More often, you don’t know the number of elements in an array, its length . (For example, it might be a large dataset you found on the Internet.) The function len takes a single argument, an array, and returns the len gth of that array (an integer). Question 4. The cell below loads an array called president_birth_years . Calling .column(...) on a table returns an array of the column specified, in this case the Birth Year column of the president_births table. The last element in that array is the most recent birth year of any deceased president. Assign that year to most_recent_birth_year . [8]: president_birth_years = Table . read_table( "president_births.csv" ) . column( 'Birth , Year' ) most_recent_birth_year = president_birth_years[ 37 ] most_recent_birth_year [8]: 1917 Question 5. Finally, assign sum_of_birth_years to the sum of the first, tenth, and last birth year in president_birth_years . [9]: sum_of_birth_years = (president_birth_years . item( 0 ) + president_birth_years . , item( 9 ) + president_birth_years . item( 37 )) sum_of_birth_years [9]: 5433 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1.3 3. Basic Array Arithmetic Question 1. Multiply the numbers 42, 4224, 42422424, and -250 by 157. Assign each variable below such that first_product is assigned to the result of 42 157 , second_product is assigned to the result of 4224 157 , and so on. For this question, don’t use arrays. [10]: first_product = 42*157 second_product = 4224*157 third_product = 42422424*157 fourth_product = -250*157 print (first_product, second_product, third_product, fourth_product) 6594 663168 6660320568 -39250 Question 2. Now, do the same calculation, but using an array called numbers and only a single multiplication ( * ) operator. Store the 4 results in an array named products . [11]: numbers = make_array( 42 , 4224 , 42422424 , -250 ) products = numbers *157 products [11]: array([ 6594, 663168, 6660320568, -39250]) Question 3. Oops, we made a typo! Instead of 157, we wanted to multiply each number by 1577. Compute the correct products in the cell below using array arithmetic. Notice that your job is really easy if you previously defined an array containing the 4 numbers. [12]: correct_products = numbers *1577 correct_products [12]: array([ 66234, 6661248, 66900162648, -394250]) Question 4. We’ve loaded an array of temperatures in the next cell. Each number is the highest temperature observed on a day at a climate observation station, mostly from the US. Since they’re from the US government agency NOAA , all the temperatures are in Fahrenheit. Convert them all to Celsius by first subtracting 32 from them, then multiplying the results by 5 9 . Make sure to ROUND the final result after converting to Celsius to the nearest integer using the np.round function. [13]: max_temperatures = Table . read_table( "temperatures.csv" ) . column( "Daily Max , Temperature" ) celsius_max_temperatures = np . round((max_temperatures) -32 ) *5/9 celsius_max_temperatures [13]: array([-3.88888889, 30.55555556, 31.66666667, …, 16.66666667, 22.77777778, 16.11111111]) 4
Question 5. The cell below loads all the lowest temperatures from each day (in Fahrenheit). Compute the size of the daily temperature range for each day. That is, compute the difference between each daily maximum temperature and the corresponding daily minimum temperature. Pay attention to the units, give your answer in Celsius! Make sure NOT to round your answer for this question! [14]: min_temperatures = Table . read_table( "temperatures.csv" ) . column( "Daily Min , Temperature" ) celsius_temperature_ranges = ((max_temperatures) -32 ) *5/ , 9- ((min_temperatures) -32 ) *5/9 celsius_temperature_ranges [14]: array([ 6.66666667, 10. , 12.22222222, …, 17.22222222, 11.66666667, 11.11111111]) 1.4 4. World Population The cell below loads a table of estimates of the world population for different years, starting in 1950. The estimates come from the US Census Bureau website . [15]: world = Table . read_table( "world_population.csv" ) . select( 'Year' , 'Population' ) world . show( 4 ) <IPython.core.display.HTML object> The name population is assigned to an array of population estimates. [16]: population = world . column( 1 ) population [16]: array([2557628654, 2594939877, 2636772306, 2682053389, 2730228104, 2782098943, 2835299673, 2891349717, 2948137248, 3000716593, 3043001508, 3083966929, 3140093217, 3209827882, 3281201306, 3350425793, 3420677923, 3490333715, 3562313822, 3637159050, 3712697742, 3790326948, 3866568653, 3942096442, 4016608813, 4089083233, 4160185010, 4232084578, 4304105753, 4379013942, 4451362735, 4534410125, 4614566561, 4695736743, 4774569391, 4856462699, 4940571232, 5027200492, 5114557167, 5201440110, 5288955934, 5371585922, 5456136278, 5538268316, 5618682132, 5699202985, 5779440593, 5857972543, 5935213248, 6012074922, 6088571383, 6165219247, 6242016348, 6318590956, 6395699509, 6473044732, 6551263534, 6629913759, 6709049780, 6788214394, 6866332358, 6944055583, 7022349283, 7101027895, 7178722893, 7256490011]) In this question, you will apply some built-in Numpy functions to this array. Numpy is a module that is often used in Data Science! 5
The difference function np.diff subtracts each element in an array from the element after it within the array. As a result, the length of the array np.diff returns will always be one less than the length of the input array. The cumulative sum function np.cumsum outputs an array of partial sums. For example, the third element in the output array corresponds to the sum of the first, second, and third elements. Question 1. Very often in data science, we are interested understanding how values change with time. Use np.diff and np.max (or just max ) to calculate the largest annual change in population between any two consecutive years. [17]: largest_population_change = np . max(population) largest_population_change [17]: 7256490011 Question 2. What do the values in the resulting array represent (choose one)? [18]: np . cumsum(np . diff(population)) [18]: array([ 37311223, 79143652, 124424735, 172599450, 224470289, 277671019, 333721063, 390508594, 443087939, 485372854, 526338275, 582464563, 652199228, 723572652, 792797139, 863049269, 932705061, 1004685168, 1079530396, 1155069088, 1232698294, 1308939999, 1384467788, 1458980159, 1531454579, 1602556356, 1674455924, 1746477099, 1821385288, 1893734081, 1976781471, 2056937907, 2138108089, 2216940737, 2298834045, 2382942578, 2469571838, 2556928513, 2643811456, 2731327280, 2813957268, 2898507624, 2980639662, 3061053478, 3141574331, 3221811939, 3300343889, 3377584594, 3454446268, 3530942729, 3607590593, 3684387694, 3760962302, 3838070855, 3915416078, 3993634880, 4072285105, 4151421126, 4230585740, 4308703704, 4386426929, 4464720629, 4543399241, 4621094239, 4698861357]) 1) The total population change between consecutive years, starting at 1951. 2) The total population change between 1950 and each later year, starting at 1951. 3) The total population change between 1950 and each later year, starting inclusively at 1950. [19]: # Assign cumulative_sum_answer to 1, 2, or 3 cumulative_sum_answer = 3 1.5 5. Old Faithful Old Faithful is a geyser in Yellowstone that erupts every 44 to 125 minutes (according to Wikipedia ). People are often told that the geyser erupts every hour , but in fact the waiting time between eruptions is more variable. Let’s take a look. Question 1. The first line below assigns waiting_times to an array of 272 consecutive waiting times between eruptions, taken from a classic 1938 dataset. Assign the names shortest , longest , 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
and average so that the print statement is correct. [20]: waiting_times = Table . read_table( 'old_faithful.csv' ) . column( 'waiting' ) shortest = np . min(waiting_times) longest = np . max(waiting_times) average = np . max(waiting_times) - np . min(waiting_times) print ( "Old Faithful erupts every" , shortest, "to" , longest, "minutes and , every" , average, "minutes on average." ) Old Faithful erupts every 43 to 96 minutes and every 53 minutes on average. Question 2. Assign biggest_decrease to the biggest decrease in waiting time between two consecutive eruptions. For example, the third eruption occurred after 74 minutes and the fourth after 62 minutes, so the decrease in waiting time was 74 - 62 = 12 minutes. Hint 1 : You’ll need an array arithmetic function mentioned in the textbook . You have also seen this function earlier in the homework! Hint 2 : We want to return the absolute value of the biggest decrease. [21]: biggest_decrease = abs (np . max(waiting_times)) biggest_decrease [21]: 96 Question 3. If you expected Old Faithful to erupt every hour, you would expect to wait a total of 60 * k minutes to see k eruptions. Set difference_from_expected to an array with 272 elements, where the element at index i is the absolute difference between the expected and actual total amount of waiting time to see the first i+1 eruptions. Hint : You’ll need to compare a cumulative sum to a range. You’ll go through np.arange more thoroughly in Lab 3, but you can read about it in this textbook section . For example, since the first three waiting times are 79, 54, and 74, the total waiting time for 3 eruptions is 79 + 54 + 74 = 207. The expected waiting time for 3 eruptions is 60 * 3 = 180. Therefore, difference_from_expected.item(2) should be | 207 180 | = 27 . [22]: difference_from_expected = make_array(waiting_times) difference_from_expected [22]: array([[79, 54, 74, 62, 85, 55, 88, 85, 51, 85, 54, 84, 78, 47, 83, 52, 62, 84, 52, 79, 51, 47, 78, 69, 74, 83, 55, 76, 78, 79, 73, 77, 66, 80, 74, 52, 48, 80, 59, 90, 80, 58, 84, 58, 73, 83, 64, 53, 82, 59, 75, 90, 54, 80, 54, 83, 71, 64, 77, 81, 59, 84, 48, 82, 60, 92, 78, 78, 65, 73, 82, 56, 79, 71, 62, 76, 60, 78, 76, 83, 75, 82, 70, 65, 73, 88, 76, 80, 48, 86, 60, 90, 50, 78, 63, 72, 84, 75, 51, 82, 62, 88, 49, 83, 81, 47, 84, 52, 86, 81, 75, 59, 89, 79, 59, 81, 50, 85, 59, 87, 53, 69, 77, 56, 88, 81, 45, 82, 55, 90, 45, 83, 56, 89, 46, 82, 51, 86, 53, 79, 81, 60, 82, 77, 7
76, 59, 80, 49, 96, 53, 77, 77, 65, 81, 71, 70, 81, 93, 53, 89, 45, 86, 58, 78, 66, 76, 63, 88, 52, 93, 49, 57, 77, 68, 81, 81, 73, 50, 85, 74, 55, 77, 83, 83, 51, 78, 84, 46, 83, 55, 81, 57, 76, 84, 77, 81, 87, 77, 51, 78, 60, 82, 91, 53, 78, 46, 77, 84, 49, 83, 71, 80, 49, 75, 64, 76, 53, 94, 55, 76, 50, 82, 54, 75, 78, 79, 78, 78, 70, 79, 70, 54, 86, 50, 90, 54, 54, 77, 79, 64, 75, 47, 86, 63, 85, 82, 57, 82, 67, 74, 54, 83, 73, 73, 88, 80, 71, 83, 56, 79, 78, 84, 58, 83, 43, 60, 75, 81, 46, 90, 46, 74]]) Question 4. Let’s imagine your guess for the next wait time was always just the length of the previous waiting time. If you always guessed the previous waiting time, how big would your error in guessing the waiting times be, on average? For example, since the first three waiting times are 79, 54, and 74, the average difference between your guess and the actual time for just the second and third eruption would be | 79 54 | + | 54 74 | 2 = 22 . 5 . [23]: average_error = ... # average_error 1.6 6. Tables Question 1. Suppose you have 4 apples, 3 oranges, and 3 pineapples. (Perhaps you’re using Python to solve a high school Algebra problem.) Create a table that contains this information. It should have two columns: fruit name and count . Assign the new table to the variable fruits . Note: Use lower-case and singular words for the name of each fruit, like "apple" . [24]: # Our solution uses 1 statement split over 3 lines. fruits = make_array( 'apple' , 'orange' , 'pineapple' ) fruit_count = make_array( 4 , 3 , 3 ) fruits = Table() . with_columns( 'fruits' ,fruits, 'count' ,fruit_count) fruits # fruits [24]: fruits | count apple | 4 orange | 3 pineapple | 3 Question 2. The file inventory.csv contains information about the inventory at a fruit stand. Each row represents the contents of one box of fruit. Load it as a table named inventory using the Table.read_table() function. Table.read_table(...) takes one argument (data file name in string format) and returns a table. [25]: inventory = Table . read_table( 'inventory.csv' ) inventory # inventory 8
[25]: box ID | fruit name | count 53686 | kiwi | 45 57181 | strawberry | 123 25274 | apple | 20 48800 | orange | 35 26187 | strawberry | 255 57930 | grape | 517 52357 | strawberry | 102 43566 | peach | 40 Question 3. Does each box at the fruit stand contain a different fruit? Set all_different to True if each box contains a different fruit or to False if multiple boxes contain the same fruit. Hint: You don’t have to write code to calculate the True/False value for all_different . Just look at the inventory table and assign all_different to either True or False according to what you can see from the table in answering the question. [26]: all_different = False # all_different Question 4. The file sales.csv contains the number of fruit sold from each box last Saturday. It has an extra column called “price per fruit ($)” that’s the price per item of fruit for fruit in that box. The rows are in the same order as the inventory table. Load these data into a table called sales . [27]: sales = Table . read_table( 'sales.csv' ) sales # sales [27]: box ID | fruit name | count sold | price per fruit ($) 53686 | kiwi | 3 | 0.5 57181 | strawberry | 101 | 0.2 25274 | apple | 0 | 0.8 48800 | orange | 35 | 0.6 26187 | strawberry | 25 | 0.15 57930 | grape | 355 | 0.06 52357 | strawberry | 102 | 0.25 43566 | peach | 17 | 0.8 Question 5. How many fruits did the store sell in total on that day? [28]: total_fruits_sold = sum (sales . column( 2 )) total_fruits_sold # total_fruits_sold [28]: 638 Question 6. What was the store’s total revenue (the total price of all fruits sold) on that day? 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Hint: If you’re stuck, think first about how you would compute the total revenue from just the grape sales. [29]: count = sales . column( 2 ) price = sales . column( 3 ) total_revenue = sum (count * price) total_revenue [29]: 106.85 Question 7. Make a new table called remaining_inventory . It should have the same rows and columns as inventory , except that the amount of fruit sold from each box should be subtracted from that box’s count, so that the “count” is the amount of fruit remaining after Saturday. [30]: remaining_inventory = Table() . with_columns( 'box ID' , inventory . column( 'box , ID' ), 'fruit name' , inventory . column( 'fruit name' ), 'count' , inventory . , column( 'count' ) - sales . column( 'count sold' )) remaining_inventory [30]: box ID | fruit name | count 53686 | kiwi | 42 57181 | strawberry | 22 25274 | apple | 20 48800 | orange | 0 26187 | strawberry | 230 57930 | grape | 162 52357 | strawberry | 0 43566 | peach | 23 1.7 7. Unemployment The Federal Reserve Bank of St. Louis publishes data about jobs in the US. Below, we’ve loaded data on unemployment in the United States. There are many ways of defining unemployment, and our dataset includes two notions of the unemployment rate: 1. Among people who are able to work and are looking for a full-time job, the percentage who can’t find a job. This is called the Non-Employment Index, or NEI. 2. Among people who are able to work and are looking for a full-time job, the percentage who can’t find any job or are only working at a part-time job. The latter group is called “Part- Time for Economic Reasons”, so the acronym for this index is NEI-PTER. (Economists are great at marketing.) The source of the data is here . Question 1. The data are in a CSV file called unemployment.csv . Load that file into a table called unemployment . 10
[31]: unemployment = farmers_markets = Table . read_table( 'unemployment.csv' ) unemployment [31]: Date | NEI | NEI-PTER 1994-01-01 | 10.0974 | 11.172 1994-04-01 | 9.6239 | 10.7883 1994-07-01 | 9.3276 | 10.4831 1994-10-01 | 9.1071 | 10.2361 1995-01-01 | 8.9693 | 10.1832 1995-04-01 | 9.0314 | 10.1071 1995-07-01 | 8.9802 | 10.1084 1995-10-01 | 8.9932 | 10.1046 1996-01-01 | 9.0002 | 10.0531 1996-04-01 | 8.9038 | 9.9782 … (80 rows omitted) Question 2. Sort the data in descending order by NEI, naming the sorted table by_nei . Create another table called by_nei_pter that’s sorted in descending order by NEI-PTER instead. [32]: by_nei = unemployment . sort( 'NEI' ,descending = True ) by_nei_pter = unemployment . sort( 'NEI' ,descending = True ) by_nei [32]: Date | NEI | NEI-PTER 2009-10-01 | 10.9698 | 12.8557 2010-01-01 | 10.9054 | 12.7311 2009-07-01 | 10.8089 | 12.7404 2009-04-01 | 10.7082 | 12.5497 2010-04-01 | 10.6597 | 12.5664 2010-10-01 | 10.5856 | 12.4329 2010-07-01 | 10.5521 | 12.3897 2011-01-01 | 10.5024 | 12.3017 2011-07-01 | 10.4856 | 12.2507 2011-04-01 | 10.4409 | 12.247 … (80 rows omitted) Question 3. Use take to make a table containing the data for the 10 quarters when NEI was greatest. Call that table greatest_nei . greatest_nei should be sorted in descending order of NEI . Note that each row of unemployment represents a quarter. [33]: greatest_nei = by_nei . take(np . arange( 10 )) greatest_nei [33]: Date | NEI | NEI-PTER 2009-10-01 | 10.9698 | 12.8557 2010-01-01 | 10.9054 | 12.7311 11
2009-07-01 | 10.8089 | 12.7404 2009-04-01 | 10.7082 | 12.5497 2010-04-01 | 10.6597 | 12.5664 2010-10-01 | 10.5856 | 12.4329 2010-07-01 | 10.5521 | 12.3897 2011-01-01 | 10.5024 | 12.3017 2011-07-01 | 10.4856 | 12.2507 2011-04-01 | 10.4409 | 12.247 Question 4. It’s believed that many people became PTER (recall: “Part-Time for Economic Reasons”) in the “Great Recession” of 2008-2009. NEI-PTER is the percentage of people who are unemployed (and counted in the NEI) plus the percentage of people who are PTER. Compute an array containing the percentage of people who were PTER in each quarter. (The first element of the array should correspond to the first row of unemployment , and so on.) Note: Use the original unemployment table for this. [34]: pter = unemployment . column( 'NEI-PTER' ) - unemployment . column( 'NEI' ) pter [34]: array([1.0746, 1.1644, 1.1555, 1.129 , 1.2139, 1.0757, 1.1282, 1.1114, 1.0529, 1.0744, 1.1004, 1.0747, 1.0705, 1.0455, 1.008 , 0.9734, 0.9753, 0.8931, 0.9451, 0.8367, 0.8208, 0.8105, 0.8248, 0.7578, 0.7251, 0.7445, 0.7543, 0.7423, 0.7399, 0.7687, 0.8418, 0.9923, 0.9181, 0.9629, 0.9703, 0.9575, 1.0333, 1.0781, 1.0675, 1.0354, 1.0601, 1.01 , 1.0042, 1.0368, 0.9704, 0.923 , 0.9759, 0.93 , 0.889 , 0.821 , 0.9409, 0.955 , 0.898 , 0.8948, 0.9523, 0.9579, 1.0149, 1.0762, 1.2873, 1.4335, 1.7446, 1.8415, 1.9315, 1.8859, 1.8257, 1.9067, 1.8376, 1.8473, 1.7993, 1.8061, 1.7651, 1.7927, 1.7286, 1.6387, 1.6808, 1.6805, 1.6629, 1.6253, 1.6477, 1.6298, 1.4796, 1.5131, 1.4866, 1.4345, 1.3675, 1.3097, 1.2319, 1.1735, 1.1844, 1.1746]) Question 5. Add pter as a column to unemployment (named “PTER”) and sort the resulting table by that column in descending order. Call the table by_pter . Try to do this with a single line of code, if you can. [35]: by_pter = unemployment . with_columns( 'PTER' , pter) . sort( 'PTER' ,descending = True ) by_pter [35]: Date | NEI | NEI-PTER | PTER 2009-07-01 | 10.8089 | 12.7404 | 1.9315 2010-04-01 | 10.6597 | 12.5664 | 1.9067 2009-10-01 | 10.9698 | 12.8557 | 1.8859 2010-10-01 | 10.5856 | 12.4329 | 1.8473 2009-04-01 | 10.7082 | 12.5497 | 1.8415 2010-07-01 | 10.5521 | 12.3897 | 1.8376 2010-01-01 | 10.9054 | 12.7311 | 1.8257 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2011-04-01 | 10.4409 | 12.247 | 1.8061 2011-01-01 | 10.5024 | 12.3017 | 1.7993 2011-10-01 | 10.3287 | 12.1214 | 1.7927 … (80 rows omitted) Question 6. Create a line plot of the PTER over time. To do this, create a new table called pter_over_time that adds the year array and the pter array to the unemployment table. Label these columns Year and PTER . Then, generate a line plot using one of the table methods you’ve learned in class. [36]: year = 1994 + np . arange(by_pter . num_rows) /4 pter_over_time = unemployment . with_columns( 'Year' ,year, 'PTER' ,pter) pter_over_time . plot( 'Year' , 'PTER' ) 13
Question 7. Were PTER rates high during the Great Recession (that is to say, were PTER rates particularly high in the years 2008 through 2011)? Assign highPTER to True if you think PTER rates were high in this period, and False if you think they weren’t. [37]: highPTER = True 1.8 8. Birth Rates The following table gives census-based population estimates for each state on both July 1, 2015 and July 1, 2016. The last four columns describe the components of the estimated change in population during this time interval. For all questions below, assume that the word “states” refers to all 52 rows including Puerto Rico & the District of Columbia. The data was taken from here . If you want to read more about the different column descriptions, click here ! The raw data is a bit messy - run the cell below to clean the table and make it easier to work with. [38]: # Don't change this cell; just run it. pop = Table . read_table( 'nst-est2016-alldata.csv' ) . where( 'SUMLEV' , 40 ) . , select([ 1 , 4 , 12 , 13 , 27 , 34 , 62 , 69 ]) pop = pop . relabeled( 'POPESTIMATE2015' , '2015' ) . relabeled( 'POPESTIMATE2016' , , '2016' ) pop = pop . relabeled( 'BIRTHS2016' , 'BIRTHS' ) . relabeled( 'DEATHS2016' , 'DEATHS' ) pop = pop . relabeled( 'NETMIG2016' , 'MIGRATION' ) . relabeled( 'RESIDUAL2016' , , 'OTHER' ) pop = pop . with_columns( "REGION" , np . array([ int (region) if region != "X" else 0 , for region in pop . column( "REGION" )])) pop . set_format([ 2 , 3 , 4 , 5 , 6 , 7 ], NumberFormatter(decimals =0 )) . show( 5 ) <IPython.core.display.HTML object> Question 1. Assign us_birth_rate to the total US annual birth rate during this time interval. The annual birth rate for a year-long period is the total number of births in that period as a proportion of the population size at the start of the time period. Hint: Which year corresponds to the start of the time period? [39]: us_birth_rate = sum (pop . column( 'BIRTHS' )) / sum (pop . column( '2015' )) us_birth_rate [39]: 0.012358536498646102 Question 2. In the next question, you will be creating a visualization to understand the relationship between birth and death rates. The annual death rate for a year-long period is the total number of deaths in that period as a proportion of the population size at the start of the time period. 14
What visualization is most appropriate to see if there is an association between birth and death rates during a given time interval? 1. Line Graph 2. Scatter Plot 3. Bar Chart Assign visualization below to the number corresponding to the correct visualization. [40]: visualization = 3 Question 3. In the code cell below, create a visualization that will help us determine if there is an association between birth rate and death rate during this time interval. It may be helpful to create an inter- mediate table here. The birth rate for each region is the total number of births in that region as a proportion of the population size at the start of the time period. The death rate for each region is the total number of deaths in that region as a proportion of the population size at the start of the time period. [41]: # Generate your chart in this cell birth_rate = us_birth_rate = sum (pop . column( 'BIRTHS' )) / sum (pop . column( '2015' )) death_rate = us_death_rate = sum (pop . column( 'DEATHS' )) / sum (pop . column( '2015' )) rates = Table() . with_columns( 'BR' , birth_rate, 'DR' , death_rate) rates [41]: BR | DR 0.0123585 | 0.00855234 Question 7. True or False : There is an association between birth rate and death rate during this time interval. Assign assoc to True or False in the cell below. [42]: assoc = True 1.9 9. Uber Below we load tables containing 200,000 weekday Uber rides in the Manila, Philippines, and Boston, Massachusetts metropolitan areas from the Uber Movement project. The sourceid and dstid columns contain codes corresponding to start and end locations of each ride. The hod column contains codes corresponding to the hour of the day the ride took place. The ride time column contains the length of the ride, in minutes. [43]: boston = Table . read_table( "boston.csv" ) manila = Table . read_table( "manila.csv" ) print ( "Boston Table" ) boston . show( 4 ) print ( "Manila Table" ) manila . show( 4 ) 15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Boston Table <IPython.core.display.HTML object> Manila Table <IPython.core.display.HTML object> Question 1. Produce histograms of all ride times in Boston using the given bins. [44]: equal_bins = np . arange( 0 , 120 , 5 ) boston . hist( 'ride time' , bins = equal_bins) Question 2. Now, produce histograms of all ride times in Manila using the given bins. [45]: equal_bins = np . arange( 0 , 120 , 5 ) manila . hist( 'ride time' , bins = equal_bins) # Don't delete the following line! plots . ylim( 0 , 0.05 ) [45]: (0.0, 0.05) 16
Question 3. Assign boston_under_10 and manila_under_10 to the percentage of rides that are less than 10 minutes in their respective metropolitan areas. Use the height variables provided below in order to compute the percentages. Your solution should only use height variables, numbers, and mathematical operations. You should not access the tables boston and manila in any way. [46]: boston_under_5_height = 1.2 manila_under_5_height = 0.6 boston_5_to_under_10_height = 3.2 manila_5_to_under_10_height = 1.4 boston_under_10 = ( 1.2+3.2 ) /4.4 manila_under_10 = ( 0.6+1.4 ) /2 boston_under_10,manila_under_10 [46]: (1.0, 1.0) Question 4. Let’s take a closer look at the distribution of ride times in Manila. Assign manila_median_bin to an integer (1, 2, 3, or 4) that corresponds to the bin that contains the median time 1: 0-15 minutes 2: 15-40 minutes 3: 40-60 minutes 4: 60-80 minutes Hint: The median of a sorted list has half of the list elements to its left, and half to its right 17
[47]: manila . hist( 'ride time' , bins = equal_bins) # Don't delete the following line! plots . ylim( 0 , 0.05 ) manila_median_bin = 2 Question 5. What is the main difference between the two histograms. What might be causing this? Hint: Try thinking about external factors that may be causing the difference! Boston has a lot more rides in the 15-30 minute range, where Manila has many way outside of that zone. [ ]: [ ]: 18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help