lab03_tables

html

School

Temple University *

*We aren’t endorsed by this school

Course

1013

Subject

Computer Science

Date

Dec 6, 2023

Type

html

Pages

23

Uploaded by samzahroun

Report
Lab 3: Tables Welcome to lab 3! This week, we will focus on manipulating tables. We will import our data sets into tables and complete the majority of analysis using these tables. Tables are described in Chapter 6 of the Inferential Thinking text. A related approach in Python programming is to use what is known as a pandas dataframe which we will need to resort to occasionally. Pandas is a mainstay datascience tool. First, set up the tests and imports by running the cell below. In [1]: import numpy as np from datascience import * # Brings into Python the datascience Table object # These lines load the tests. from gofer.ok import check In [2]: # Enter your name as a string # Example dogname = "Fido" # Your name name = "Sam Zahroun" 1. Introduction For a collection of things in the world, an array is useful for describing a single attribute of each thing. For example, among the collection of US States, an array could describe the land area of each. Tables extend this idea by describing multiple attributes for each element of a collection. In most data science applications, we have data about many entities, but we also have several kinds of data about each entity. For example, in the cell below we have two arrays. The first one contains the world population in each year (estimated by the US Census Bureau), and the second contains the years themselves. These elements are in order, so the year and the world population for that year have the same index in their corresponding arrays. In [3]: population_amounts = Table.read_table("world_population.csv").column("Population") years = np.arange(1950, 2016,1) print("Population column:", population_amounts) print("Years column:", years) Population column: [2557628654 2594939877 2636772306 2682053389 2730228104 2782098943 2835299673 2891349717 2948137248 3000716593 3043001508 3083966929 3140093217 3209827882 3281201306 3350425793 3420677923 3490333715 3562313822 3637159050 3712697742 3790326948 3866568653 3942096442 4016608813 4089083233 4160185010 4232084578 4304105753 4379013942 4451362735 4534410125 4614566561 4695736743 4774569391 4856462699 4940571232 5027200492 5114557167 5201440110 5288955934 5371585922 5456136278 5538268316 5618682132 5699202985 5779440593 5857972543 5935213248 6012074922 6088571383 6165219247 6242016348 6318590956 6395699509 6473044732 6551263534 6629913759 6709049780 6788214394 6866332358 6944055583 7022349283 7101027895 7178722893 7256490011] Years column: [1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015] Suppose we want to answer this question: When did world population cross 6 billion? You could technically answer this question just from staring at the arrays, but it's a bit convoluted, since you would have to count the position where the population first crossed 6 billion, then find the corresponding element in the years array. In cases like these, it might be easier to put the data into a Table , a 2-dimensional type of dataset. The expression below: creates an empty table using the expression Table() , adds two columns to the table by calling with_columns with four arguments (column and data for each), assigns the result to the name population , and finally evaluates population so that we can see the table. The strings "Year" and "Population" are column labels that we have chosen. Ther names population_amounts and years were assigned above to two arrays of the same length. The function with_columns (you can find the documentation here ) takes in alternating strings (to represent column labels) and arrays (representing the data in those columns), which are all separated by commas. Tip: Both population_amounts and years need the same number of data points or an error will be returned on attempting to construct the table. In [4]: population = Table().with_columns( "Population", population_amounts, "Year", years ) population Out[4]: Population Year 2557628654 1950 2594939877 1951 2636772306 1952 2682053389 1953 2730228104 1954 2782098943 1955 2835299673 1956 2891349717 1957
Population Year 2948137248 1958 3000716593 1959 ... (56 rows omitted) Now the data are all together in a single table! It's much easier to parse this data--if you need to know what the population was in 1959, for example, you can tell from a single glance. We'll revisit this table later. Question 1 From the example in the cell above, identify the variables or data types for each of the following: which variable contains the table? which variable contains an array? On the right of the equals sign provide the correct variable name. In [5]: table_var = population array_var = years In [6]: check('tests/q1.py') Out[6]: All tests passed! 2. Creating Tables Question 2 In the cell below, we've created 2 arrays. In these examples, we're going to be looking at the Enviornmental Protection Index which describes the state of sustainability in each country. More information can be found: Yale EPI . Using the steps above, assign top_10_epi to a table that has two columns called "Country" and "Score", which hold top_10_epi_countries and top_10_epi_scores respectively. In [7]: top_10_epi_scores = make_array(82.5, 82.3, 81.5, 81.3, 80., 79.6, 78.9, 78.7, 77.7, 77.2) top_10_epi_countries = make_array( 'Denmark', 'Luxembourg', 'Switzerland', 'United Kingdom', 'France', 'Austria', 'Finland', 'Sweden', 'Norway', 'Germany' ) top_10_epi = Table().with_columns( "Country", top_10_epi_countries, "Score", top_10_epi_scores ) # We've put this next line here so your table will get printed out when you # run this cell. top_10_epi
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Out[7]: Country Score Denmark 82.5 Luxembourg 82.3 Switzerland 81.5 United Kingdom 81.3 France 80 Austria 79.6 Finland 78.9 Sweden 78.7 Norway 77.7 Germany 77.2 In [8]: check('tests/q2.py') Out[8]: All tests passed! Loading a table from a file In most cases, we aren't going to go through the trouble of typing in all the data manually. Instead, we can use our Table functions. Table.read_table takes one argument, a path to a data file (a string) and returns a table. There are many formats for data files, but CSV ("comma-separated values") is the most common. Question 3 The file yale_epi.csv in the current directory contains a table of information about 180 countries with their corresponding Environmental Performance Index (EPI) based on 32 indicators of sustainability. Load it as a table called epi using the Table.read_table function. In [9]: epi = Table.read_table("yale_epi.csv") epi Out[9]: Country Score Decade Change Rank Afghanistan 25.5 5 178 Angola 29.7 5.3 158
Country Score Decade Change Rank Albania 49 10.2 62 United Arab Emirates 55.6 11.3 42 Argentina 52.2 5 54 Armenia 52.3 4.5 53 Antigua and Barbuda 48.5 3.3 63 Australia 74.9 5.5 13 Austria 79.6 5.4 6 Azerbaijan 46.5 4 72 ... (170 rows omitted) In [10]: check('tests/q3.py') Out[10]: All tests passed! Notice the part about "... (170 rows omitted)." This table is big enough that only a few of its rows are displayed, but the others are still there. 10 are shown, so there are 180 movies total. Where did yale_epi.csv come from? Take a look at this lab's folder . You should see a file called yale_epi.csv . Open up the yale_epi.csv file in that folder and look at the format. What do you notice? The .csv filename ending says that this file is in the CSV (comma-separated value) format . 3. Using lists A list is another Python sequence type, similar to an array. It's different than an array because the values it contains can all have different types. A single list can contain int values, float values, and strings. Elements in a list can even be other lists! A list is created by giving a name to the list of values enclosed in square brackets and separated by commas. For example, values_with_different_types = ['data', 8, 8.1] Lists can be useful when working with tables because they can describe the contents of one row in a table, which often corresponds to a sequence of values with different types. A list of lists can be used to describe multiple rows. Each column in a table is a collection of values with the same type (an array). If you create a table column from a list, it will automatically be converted to an array. A row, on the ther hand, mixes types. Here's a table from Chapter 5. (Run the cell below.) In [11]: # Run this cell to recreate the table flowers = Table().with_columns( 'Number of petals', make_array(8, 34, 5),
'Name', make_array('lotus', 'sunflower', 'rose') ) flowers Out[11]: Number of petals Name 8 lotus 34 sunflower 5 rose Question 4 Create a list that describes a new fourth row of this table. The details can be whatever you want, but the list must contain two values: the number of petals (an int value) and the name of the flower (a string). For example, your flower could be "pondweed"! (A flower with zero petals) In [12]: my_flower = [0, "pondweed"] my_flower Out[12]: [0, 'pondweed'] In [13]: check('tests/q4.py') Out[13]: All tests passed! Question 5 my_flower fits right in to the table from chapter 5. Complete the cell below to create a table of seven flowers that includes your flower as the fourth row followed by other_flowers . You can use with_row to create a new table with one extra row by passing a list of values and with_rows to create a table with multiple extra rows by passing a list of lists of values. In [14]: # Use the method .with_row(...) to create a new table that includes my_flower four_flowers = flowers.with_row(my_flower) # Use the method .with_rows(...) to create a table that # includes four_flowers followed by other_flowers other_flowers = [[10, 'lavender'], [3, 'birds of paradise'], [6, 'tulip']] seven_flowers = four_flowers.with_rows(other_flowers) seven_flowers Out[14]: Number of petals Name 8 lotus
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Number of petals Name 34 sunflower 5 rose 0 pondweed 10 lavender 3 birds of paradise 6 tulip In [15]: check('tests/q5.py') Out[15]: All tests passed! 4. Analyzing datasets With just a few table methods, we can answer some interesting questions about the EPI dataset. If we want just the scores of each country, we can get an array that contains the data in that column: In [16]: epi.column("Score") Out[16]: array([ 25.5, 29.7, 49. , 55.6, 52.2, 52.3, 48.5, 74.9, 79.6, 46.5, 27. , 73.3, 30. , 38.3, 29. , 57. , 51. , 43.5, 45.4, 53. , 41.9, 44.3, 51.2, 45.6, 54.8, 39.3, 40.4, 36.9, 71. , 81.5, 55.3, 37.3, 25.8, 33.6, 36.4, 30.8, 52.9, 32.1, 32.8, 52.5, 48.4, 64.8, 71. , 77.2, 28.1, 44.6, 82.5, 46.3, 44.8, 51. , 43.3, 30.4, 74.3, 65.3, 34.4, 78.9, 34.4, 80. , 33. , 45.8, 81.3, 41.3, 27.6, 26.4, 27.9, 29.1, 38.1, 69.1, 43.1, 31.8, 35.9, 37.8, 63.1, 27. , 63.7, 37.8, 27.6, 72.8, 48. , 39.5, 72.3, 65.8, 71. , 48.2, 53.4, 75.1, 44.7, 34.7, 39.8, 33.6, 37.7, 66.5, 53.6, 34.8, 45.4, 22.6, 43.1, 39. , 28. , 62.9, 82.3, 61.6, 42.3, 44.4, 26.5, 35.6, 52.6, 30.8, 55.4, 29.4, 70.7, 25.1, 46.3, 32.2, 33.9, 27.7, 45.1, 38.3, 47.9, 40.2, 30.8, 31. , 39.2, 75.3, 77.7, 32.7, 71.3, 38.5, 33.1, 47.3, 44. , 38.4, 32.4, 60.9, 67. , 46.4, 37.1, 64.7, 50.5, 33.8, 44. , 34.8, 30.7, 58.1, 26.7, 25.7, 43.1, 55.2, 37.6, 45.2, 68.3, 72. , 78.7, 33.8, 58.2, 26.7, 29.5, 45.4, 38.2, 43.9, 35.3, 45.1, 47.5, 46.7, 42.6, 57.2, 31.1, 35.6, 49.5, 49.1, 69.3, 44.3, 48.4, 50.3, 33.4, 28.9, 37.3, 43.1, 34.7, 37. ]) The value of that expression is an array, exactly the same kind of thing you'd get if you typed in make_array(25.5, 29.7, 49.0, [etc]) . Question 6
Find the EPI score of the highest-ranked country in the dataset. Hint: Think back to the functions you've learned about for working with arrays of numbers. Ask for help if you can't remember one that's useful for this. In [17]: highest_rating = max(epi.column("Score")) highest_rating Out[17]: 82.5 In [18]: check('tests/q6.py') Out[18]: All tests passed! That's not very useful, though. You'd probably want to know the name of the country whose score you found! To do that, we can sort the entire table by EPI Score, which ensures that the scores and country will stay together. Note that calling sort creates a copy of the table and leaves the original table unsorted. In [19]: epi.sort("Score") Out[19]: Country Score Decade Change Rank Liberia 22.6 -3.7 180 Myanmar 25.1 -1.2 179 Afghanistan 25.5 5 178 Sierra Leone 25.7 0.7 177 Cote d'Ivoire 25.8 -8.5 176 Guinea 26.4 -4.2 175 Madagascar 26.5 -6.6 174 Solomon Islands 26.7 -2 172 Chad 26.7 -0.9 172 Burundi 27 -11.1 170 ... (170 rows omitted) Well, that actually doesn't help much, either -- we sorted the countries from lowest -> highest scores. To look at the highest-ranked countries, sort in reverse order: In [20]: epi.sort("Score", descending=True) Out[20]:
Country Score Decade Change Rank Denmark 82.5 7.3 1 Luxembourg 82.3 11.6 2 Switzerland 81.5 8.6 3 United Kingdom 81.3 9 4 France 80 5.8 5 Austria 79.6 5.4 6 Finland 78.9 6 7 Sweden 78.7 5.3 8 Norway 77.7 7.6 9 Germany 77.2 1.2 10 ... (170 rows omitted) (The descending=True bit is called an optional argument . It has a default value of False , so when you explicitly tell the function descending=True , then the function will sort in descending order.) So the country with the highest Environmental Protection Index is Denmark with 82.5. Some details about sort: 1. The first argument to sort is the name of a column to sort by. 2. If the column has strings in it, sort will sort alphabetically; if the column has numbers, it will sort numerically. 3. The value of epi.sort("Score") is a copy of epi ; the epi table doesn't get modified. For example, if we called epi.sort("Score") , then running epi by itself would still return the unsorted table. 4. Rows always stick together when a table is sorted. It wouldn't make sense to sort just one column and leave the other columns alone. For example, in this case, if we sorted just the "Score" column, the countries would all end up with the wrong scores. Question 7 We also have information about the changes in sustainability scores from 2010 to 2020. Create a version of epi that's sorted by change, with the largest, positive changes first. Call it epi_changes . In [21]: epi_changes = epi.sort("Decade Change", descending=True) epi_changes Out[21]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Country Score Decade Change Rank Bahrain 51 17.3 56 Seychelles 58.2 14.8 38 Croatia 63.1 13.4 34 Morocco 42.3 13.3 100 Kuwait 53.6 12.8 47 Luxembourg 82.3 11.6 2 Malta 70.7 11.6 23 United Arab Emirates 55.6 11.3 42 Jordan 53.4 11.2 48 Oman 38.5 11 110 ... (170 rows omitted) In [22]: check('tests/q7.py') Out[22]: All tests passed! Question 8 What's the name of the country with the largest, positive change in the dataset? You could just look this up from the output of the previous cell. Instead, write Python code to find out. Hint: Starting with epi_changes , extract the country column to get an array, then use item to get its first item. In [23]: largest_positive_change = epi_changes.column("Country").item(0) largest_positive_change Out[23]: 'Bahrain' In [24]: check('tests/q8.py') Out[24]: All tests passed! 5. Finding pieces of a dataset Let's take a look at another dataset. In the cell below, we're reading in a movie dataset which contains columns for Votes on imdb, imdb Rating, Movie Title, the year released, and the decade the movie was released. In [25]: imdb = Table.read_table('imdb.csv')
imdb Out[25]: Votes Rating Title Year Decade 88355 8.4 M 1931 1930 132823 8.3 Singin' in the Rain 1952 1950 74178 8.3 All About Eve 1950 1950 635139 8.6 Léon 1994 1990 145514 8.2 The Elephant Man 1980 1980 425461 8.3 Full Metal Jacket 1987 1980 441174 8.1 Gone Girl 2014 2010 850601 8.3 Batman Begins 2005 2000 37664 8.2 Judgment at Nuremberg 1961 1960 46987 8 Relatos salvajes 2014 2010 ... (240 rows omitted) Suppose you're interested in movies from the 1940s. Sorting the table by year doesn't help you, because the 1940s are in the middle of the dataset. Instead, we use the table method where . In [26]: forties = imdb.where('Decade', are.equal_to(1940)) forties Out[26]: Votes Rating Title Year Decade 55793 8.1 The Grapes of Wrath 1940 1940 86715 8.3 Double Indemnity 1944 1940 101754 8.1 The Maltese Falcon 1941 1940 71003 8.3 The Treasure of the Sierra Madre 1948 1940 35983 8.1 The Best Years of Our Lives 1946 1940 81887 8.3 Ladri di biciclette 1948 1940 66622 8 Notorious 1946 1940
Votes Rating Title Year Decade 350551 8.5 Casablanca 1942 1940 59578 8 The Big Sleep 1946 1940 78216 8.2 Rebecca 1940 1940 ... (4 rows omitted) Ignore the syntax for the moment. Instead, try to read that line like this: Assign the name forties to a table whose rows are the rows in the imdb table where the 'Decade' s are equal to 1940 . Question 9 Compute the average rating of movies from the 1980s. Hint: The function np.average computes the average of an array of numbers. In [27]: eighties = imdb.where('Decade', are.equal_to(1980)) average_rating_in_eighties = np.average(eighties.column("Rating")) average_rating_in_eighties Out[27]: 8.241935483870968 In [28]: check('tests/q9.py') Out[28]: All tests passed! Now let's dive into the details a bit more. where takes 2 arguments: 1. The name of a column. where finds rows where that column's values meet some criterion. 2. Something that describes the criterion that the column needs to meet, called a predicate. To create our predicate, we called the function are.equal_to with the value we wanted, 1980. We'll see other predicates soon. where returns a table that's a copy of the original table, but with only the rows that meet the given predicate. Question 10 Create a table called ninety_nine containing the movies that came out in the year 1999. Use where . In [29]: ninety_nine = imdb.where('Year', are.equal_to(1999)) ninety_nine Out[29]: Votes Rating Title Year Decade 1177098 8.8 Fight Club 1999 1990 735056 8.4 American Beauty 1999 1990
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Votes Rating Title Year Decade 630994 8.1 The Sixth Sense 1999 1990 1073043 8.7 The Matrix 1999 1990 672878 8.5 The Green Mile 1999 1990 In [30]: check('tests/q10.py') Out[30]: All tests passed! So far we've only been finding where a column is exactly equal to a certain value. However, there are many other predicates. Here are a few: Predicate Example Result are.equal_to are.equal_to(50) Find rows with values equal to 50 are.not_equal_to are.not_equal_to(50) Find rows with values not equal to 50 are.above are.above(50) Find rows with values above (and not equal to) 50 are.above_or_equal _to are.above_or_equal_to (50) Find rows with values above 50 or equal to 50 are.below are.below(50) Find rows with values below 50 are.between are.between(2, 10) Find rows with values above or equal to 2 and below 10 The textbook section on selecting rows has more examples. Question 11 Using where and one of the predicates from the table above, find all the movies with a rating higher than 8.8. Put their data in a table called really_highly_rated . In [31]: really_highly_rated = imdb.where('Rating', are.above(8.8)) really_highly_rated Out[31]: Votes Rating Title Year Decade 1027398 9.2 The Godfather 1972 1970 1498733 9.2 The Shawshank Redemption 1994 1990 447875 8.9 Il buono, il brutto, il cattivo (1966) 1966 1960
Votes Rating Title Year Decade 1473049 8.9 The Dark Knight 2008 2000 692753 9 The Godfather: Part II 1974 1970 384187 8.9 12 Angry Men 1957 1950 1074146 8.9 The Lord of the Rings: The Return of the King 2003 2000 761224 8.9 Schindler's List 1993 1990 1166532 8.9 Pulp Fiction 1994 1990 In [32]: check('tests/q11.py') Out[32]: All tests passed! Question 12 Find the average rating for movies released in the 20th century and the average rating for movies released in the 21st century for the movies in imdb . Hint : Think of the steps you need to do (take the average, find the ratings, find movies released in 20th/21st centuries), and try to put them in an order that makes sense. In [33]: twentieth = imdb.where('Year', are.between(1901,2000)) twentysecond = imdb.where('Year', are.above_or_equal_to(2000)) average_20th_century_rating = np.average(twentieth.column("Rating")) average_21st_century_rating = np.average(twentysecond.column("Rating")) print("Average 20th century rating:", average_20th_century_rating) print("Average 21st century rating:", average_21st_century_rating) Average 20th century rating: 8.2783625731 Average 21st century rating: 8.23797468354 In [34]: check('tests/q12.py') Out[34]: All tests passed! The property num_rows tells you how many rows are in a table. (A "property" is just a method that doesn't need to be called by adding parentheses.) In [35]: num_movies_in_dataset = imdb.num_rows num_movies_in_dataset Out[35]: 250 Question 13 Use num_rows (and arithmetic) to find the proportion of movies in the dataset that were released in the 20th century, and the proportion from the 21st century. Hint: The proportion of movies released in the 20th century is the number of movies released in the 20th century, divided by the total number of movies.
In [36]: proportion_in_20th_century = (twentieth.num_rows)/(imdb.num_rows) proportion_in_21st_century = (twentysecond.num_rows)/(imdb.num_rows) print("Proportion in 20th century:", proportion_in_20th_century) print("Proportion in 21st century:", proportion_in_21st_century) Proportion in 20th century: 0.684 Proportion in 21st century: 0.316 In [37]: check('tests/q13.py') Out[37]: All tests passed! Question 14 Here's a challenge: Find the number of movies that came out in odd years. Hint: The operator % computes the remainder when dividing by a number. So 5 % 2 is 1 and 6 % 2 is 0. A number is odd if the remainder is 1 when you divide by 2. Hint 2: % can be used on arrays, operating elementwise like + or * . So make_array(5, 6, 7) % 2 is array([1, 0, 1]) . Hint 3: Create a column called "Year Remainder" that's the remainder when each movie's release year is divided by 2. Make a copy of imdb that includes that column. (You may need to add more variables.) Then use where to find rows where that new column is equal to 1. Then use num_rows to count the number of such rows. In [38]: year_remainder = imdb.column("Year")%2 imdbcopy = Table().with_columns( "Odd Years", year_remainder, "Title", imdb.column("Title") ) variable = imdbcopy.where("Odd Years", are.equal_to(1)) num_odd_year_movies = variable.num_rows num_odd_year_movies Out[38]: 123 In [39]: check('tests/q14.py') Out[39]: All tests passed! 6. Miscellanea There are a few more table methods you'll need to fill out your toolbox. The first 3 have to do with manipulating the columns in a table. The table farmers_markets.csv contains data on farmers' markets in the United States (data collected by the USDA . Each row represents one such market. Question 15 Load the dataset into a table. Call it farmers_markets . In [40]: farmers_markets = Table.read_table("farmers_markets.csv") farmers_markets Out[40]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
FMID MarketNa me Website Facebook Twitter Youtu be O 10120 63 Caledonia Farmers Market Associatio n - Danville https://sites.google.com/s ite/caledoniafarmersmark et/ https://www.faceboo k.com/Danville.VT.Fa rmers.Market/ nan nan nan 10118 71 Stearns Homestea d Farmers' Market http://Stearnshomestead. com nan nan nan nan 10118 78 100 Mile Market http://www.pfcmarkets.co m https://www.faceboo k.com/100MileMarke t/?fref=ts nan nan https:/ .com/1 10093 64 106 S. Main Street Farmers Market http://thetownofsixmile.w ordpress.com/ nan nan nan nan 10106 91 10th Steet Communit y Farmers Market nan nan nan nan http:// m/mo- grown type= 10024 54 112st Madison Avenue nan nan nan nan nan 10111 00 12 South Farmers Market http://www.12southfarme rsmarket.com 12_South_Farmers_M arket @12southfrm smkt nan @12so 10098 45 125th Street Fresh Connect Farmers' Market http://www.125thStreetFa rmersMarket.com https://www.faceboo k.com/125thStreetFa rmersMarket https://twitter. com/FarmMar ket125th nan Instag 125thS arket
FMID MarketNa me Website Facebook Twitter Youtu be O 10055 86 12th & Brandywin e Urban Farm Market nan https://www.faceboo k.com/pages/12th- Brandywine-Urban- Far ... nan nan https:/ com/d mcoal 10080 71 14&U Farmers' Market nan https://www.faceboo k.com/14UFarmersM arket https://twitter. com/14UFarm ersMkt nan nan ... (8536 rows omitted) In [41]: check('tests/q15.py') Out[41]: All tests passed! You'll notice that it has a large number of columns in it! num_columns Question 16 The table property num_columns (example call: tbl.num_columns ) produces the number of columns in a table. Use it to find the number of columns in our farmers' markets dataset. In [42]: num_farmers_markets_columns = farmers_markets.num_columns print("The table has", num_farmers_markets_columns, "columns in it!") The table has 59 columns in it! In [43]: check('tests/q16.py') Out[43]: All tests passed! Most of the columns are about particular products -- whether the market sells tofu, pet food, etc. If we're not interested in that stuff, it just makes the table difficult to read. This comes up more than you might think. select In such situations, we can use the table method select to pare down the columns of a table. It takes any number of arguments. Each should be the name or index of a column in the table. It returns a new table with only those columns in it. For example, the value of imdb.select("Year", "Decade") is a table with only the years and decades of each movie in imdb . Question 17 Use select to create a table with only the name, city, state, latitude ('y'), and longitude ('x') of each market. Call that new table farmers_markets_locations . In [44]:
farmers_markets_locations = farmers_markets.select('MarketName', 'city', 'State', 'y', 'x') farmers_markets_locations Out[44]: MarketName city State y x Caledonia Farmers Market Association - Danville Danville Vermont 44.411 -72.1403 Stearns Homestead Farmers' Market Parma Ohio 41.3751 -81.7286 100 Mile Market Kalamazoo Michigan 42.296 -85.5749 106 S. Main Street Farmers Market Six Mile South Carolina 34.8042 -82.8187 10th Steet Community Farmers Market Lamar Missouri 37.4956 -94.2746 112st Madison Avenue New York New York 40.7939 -73.9493 12 South Farmers Market Nashville Tennessee 36.1184 -86.7907 125th Street Fresh Connect Farmers' Market New York New York 40.809 -73.9482 12th & Brandywine Urban Farm Market Wilmington Delaware 39.7421 -75.5345 14&U Farmers' Market Washington District of Columbia 38.917 -77.0321 ... (8536 rows omitted) In [45]: check('tests/q17.py') Out[45]: All tests passed! select is not column ! The method select is definitely not the same as the method column . farmers_markets.column('y') is an array of the latitudes of all the markets. farmers_markets.select('y') is a table that happens to contain only 1 column, the latitudes of all the markets. Question 18 Below, we tried using the function np.average to find the average latitude ('y') and average longitude ('x') of the farmers' markets in the table, but we screwed something up. Run the cell to see the (somewhat inscrutable) error message that results from calling np.average on a table. Then, fix our code. In [46]: average_latitude = np.average(farmers_markets.column('y')) average_longitude = np.average(farmers_markets.column('x')) print("The average of US farmers' markets' coordinates is located at (", average_latitude,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
",", average_longitude, ")") The average of US farmers' markets' coordinates is located at ( 39.1864645235 , -90.9925808129 ) In [47]: check('tests/q18.py') Out[47]: All tests passed! drop drop serves the same purpose as select , but it takes away the columns you list instead of the ones you don't list, leaving all the rest of the columns. Question 19 Suppose you just didn't want the "Website" or "Location" columns in farmers_markets . Create a table that's a copy of farmers_markets but doesn't include those columns. Call that table farmers_markets_without_website . In [48]: farmers_markets_without_website = farmers_markets.drop("Website", "Location") farmers_markets_without_website Out[48]: FMID MarketNa me Facebook Twitter Youtu be OtherMedia 101206 3 Caledonia Farmers Market Association - Danville https://www.facebook. com/Danville.VT.Farme rs.Market/ nan nan nan na 101187 1 Stearns Homestead Farmers' Market nan nan nan nan 69 Ro 101187 8 100 Mile Market https://www.facebook. com/100MileMarket/? fref=ts nan nan https://www.instagram.c om/100milemarket/ 50 Ha 100936 4 106 S. Main Street Farmers Market nan nan nan nan 10 Ma 101069 1 10th Steet Community Farmers Market nan nan nan http://agrimissouri.com/ mo- grown/grodetail.php? type=mo-g ... 10 an 100245 4 112st Madison nan nan nan nan 11 Ma
FMID MarketNa me Facebook Twitter Youtu be OtherMedia Avenue Av 101110 0 12 South Farmers Market 12_South_Farmers_Ma rket @12southfrms mkt nan @12southfrmsmkt 30 Gr Wh 100984 5 125th Street Fresh Connect Farmers' Market https://www.facebook. com/125thStreetFarm ersMarket https://twitter.c om/FarmMarke t125th nan Instagram--> 125thStreetFarmersMar ket 16 12 Str Ad Cla Po Blv 100558 6 12th & Brandywine Urban Farm Market https://www.facebook. com/pages/12th- Brandywine-Urban-Far ... nan nan https://www.facebook.co m/delawareurbanfarmco alition 12 Bra e S 100807 1 14&U Farmers' Market https://www.facebook. com/14UFarmersMark et https://twitter.c om/14UFarmer sMkt nan nan 14 Str ... (8536 rows omitted) In [49]: check('tests/q19.py') Out[49]: All tests passed! take Let's find the 5 easternmost farmers' markets in the US. You already know how to sort by longitude ('x'), but we haven't seen how to get the first 5 rows of a table. That's what take is for. The table method take takes as its argument an array of numbers. Each number should be the index of a row in the table. It returns a new table with only those rows. Most often you'll want to use take in conjunction with np.arange to take the first few rows of a table. Question 20 Make a table of the 5 easternmost farmers' markets in farmers_markets_locations . Call it eastern_markets . (It should include the same columns as farmers_markets_locations . In [55]: eastern_markets = farmers_markets_locations.sort("x", descending=True).take(np.arange(0,5,1)) eastern_markets
Out[55]: MarketName city State y x Christian "Shan" Hendricks Vegetable Market Saint Croix Virgin Islands 17.7449 -64.7043 La Reine Farmers Market Saint Croix Virgin Islands 17.7322 -64.7789 Anne Heyliger Vegetable Market Saint Croix Virgin Islands 17.7099 -64.8799 Rothschild Francis Vegetable Market St. Thomas Virgin Islands 18.3428 -64.9326 Feria Agrícola de Luquillo Luquillo Puerto Rico 18.3782 -65.7207 7. Summary For your reference, here's a table of all the functions and methods we saw in this lab. Name Example Purpose Table Table() Create an empty table, usually to extend with data Table.read_ta ble Table.read_table("my_data.csv") Create a table from a data file with_columns tbl = Table().with_columns("N", np.arange(5), "2*N", np.arange(0, 10, 2)) Create a copy of a table with more columns column tbl.column("N") Create an array containing the elements of a column sort tbl.sort("N") Create a copy of a table sorted by the values in a column where tbl.where("N", are.above(2)) Create a copy of a table with only the rows that match some predicate num_rows tbl.num_rows Compute the number of rows in a table num_columns tbl.num_columns Compute the number of columns in a table
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Name Example Purpose select tbl.select("N") Create a copy of a table with only some of the columns drop tbl.drop("2*N") Create a copy of a table without some of the columns take tbl.take(np.arange(0, 6, 2)) Create a copy of the table with only the rows whose indices are in the given array Congratulations, you're done with lab 3! Be sure to run all the tests and verify that they all pass (the next cell has a shortcut for that), Save and Export as, HTML from the File menu, Right click on file name to download noteook Upload and Submit your files to Canvas under the corresponding assignment . In [56]: # For your convenience, you can run this cell to run all the tests at once! import glob from gofer.ok import check correct = 0 for x in range(1, 21): print('Testing question {}: '.format(str(x))) g = check('tests/q{}.py'.format(str(x))) if g.grade == 1.0: print("Passed") correct += 1 else: print('Failed') display(g) print('Grade: {}'.format(str(correct/20))) Testing question 1: Passed Testing question 2: Passed Testing question 3: Passed Testing question 4: Passed Testing question 5: Passed Testing question 6: Passed Testing question 7: Passed Testing question 8:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Passed Testing question 9: Passed Testing question 10: Passed Testing question 11: Passed Testing question 12: Passed Testing question 13: Passed Testing question 14: Passed Testing question 15: Passed Testing question 16: Passed Testing question 17: Passed Testing question 18: Passed Testing question 19: Passed Testing question 20: Passed Grade: 1.0 In [57]: print("Nice work ",name) import time; localtime = time.asctime( time.localtime(time.time()) ) print("Submitted @ ", localtime) Nice work Sam Zahroun Submitted @ Sat Sep 16 23:50:39 2023 In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help