categorical_HW

pdf

School

University of North Georgia, Dahlonega *

*We aren’t endorsed by this school

Course

MATH-240

Subject

Economics

Date

Apr 3, 2024

Type

pdf

Pages

8

Uploaded by SuperHumanKoala4250

Report
categorical_HW February 25, 2024 1 Categorical Visualizations Homework [1]: from datascience import * import numpy as np import warnings warnings . filterwarnings( "ignore" ) % matplotlib inline import matplotlib.pyplot as plots plots . style . use( 'fivethirtyeight' ) plots . rcParams[ "patch.force_edgecolor" ] = True import pandas as pd 1.0.1 The file ‘farmers_markets.csv’ contains data about farmers’ markets across the US. Q1. Read in the data file and display 5 rows. [13]: myData = Table . read_table( 'farmers_markets.csv' ) myData . show( 5 ) <IPython.core.display.HTML object> Q2. How many rows and columns are in the table? [14]: myCols = 59 myRows = 8546 # do not change the following line of code print ( 'myData contains ' , myRows, 'observations of ' ,myCols, 'variables. ' ) myData contains 8546 observations of 59 variables. Q2. What are the names of the colums? [16]: myData_variables = myData . labels 1
# do not change the following line of code print ( 'myData column names \n ' ,myData_variables) myData column names ('FMID', 'MarketName', 'street', 'city', 'County', 'State', 'zip', 'x', 'y', 'Website', 'Facebook', 'Twitter', 'Youtube', 'OtherMedia', 'Organic', 'Tofu', 'Bakedgoods', 'Cheese', 'Crafts', 'Flowers', 'Eggs', 'Seafood', 'Herbs', 'Vegetables', 'Honey', 'Jams', 'Maple', 'Meat', 'Nursery', 'Nuts', 'Plants', 'Poultry', 'Prepared', 'Soap', 'Trees', 'Wine', 'Coffee', 'Beans', 'Fruits', 'Grains', 'Juices', 'Mushrooms', 'PetFood', 'WildHarvested', 'updateTime', 'Location', 'Credit', 'WIC', 'WICcash', 'SFMNP', 'SNAP', 'Season1Date', 'Season1Time', 'Season2Date', 'Season2Time', 'Season3Date', 'Season3Time', 'Season4Date', 'Season4Time') 1.0.2 Before we start analyzing the data, we need to get a better understanding of the data. It looks like most of the data are the names of products and a simple ‘Y’ if the market sells that product, and ‘N’ if they do not. Q3. Drop the first 3 and the last 15 columns from the table and call it myDataShort. [17]: myDataShort = myData . drop(np . arange, 43 , 59 ) . drop( 0 , 1 , 2 ) # Do NOT change this code print ( 'The new table has ' ,myData . num_columns - myDataShort . num_columns, ' , columns' ) myDataShort . show( 5 ) The new table has 4 columns <IPython.core.display.HTML object> 1.0.3 Which State has the most farmers markets ? Q4. Create a table called num_markets that has 2 columns. The first column ‘State’ has the name of each state and the second column ‘count’ has the number of markets in that state. Sort the table by ‘count’ in descending order. [26]: num_markets = myData . group( 'State' ) . sort( 'count' , True ) . column( 0 ) . item( 0 ) # do not change this line of code print (num_markets, "has the most farmer's markets. \n " ) California has the most farmer's markets. Q5. How many rows are in the num_markets tables? [27]: num_markets = myData . group( 'State' ) market_tally = num_markets . num_rows # do not change this line of code 2
print ( ' There are ' , market_tally, ' "states" in the table. \n ' ) There are 53 "states" in the table. Q6. Why aren’t there 50 rows in the table? Write your answer here. The table also contains territories rank state count 1 California 725 2 New York 672 3 Michigan 340 . Q7. Add a column to the table called ‘rank’ that starts at 1 and counts by 1 for each row in the table. Your table should look something like the one below. [29]: num_markets_ranked = num_markets . sort( 'count' , True ) . with_columns( 'RANK' ,np . , arange( 1 , 54 )) num_markets_ranked [29]: State | count | RANK California | 755 | 1 New York | 672 | 2 Michigan | 340 | 3 Ohio | 326 | 4 Illinois | 323 | 5 Massachusetts | 316 | 6 Wisconsin | 307 | 7 Pennsylvania | 294 | 8 Missouri | 258 | 9 Virginia | 253 | 10 … (43 rows omitted) Q7. Make a horizontal bar chart of the number of markets for the top 10 states. [31]: # Insert your bar chart code here num_markets_ranked . barh( 'State' ) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4
Q8. What rank is Georgia? Extract the rank for Georgia from the table. [32]: GA_rank = num_markets_ranked . where( 'State' , 'Georgia' ) . column( 2 ) . item( 0 ) # do not change this line of code print ( 'There are ' , GA_rank - 1 , " States that have more farmer's markets than , Georgia" ) There are 22 States that have more farmer's markets than Georgia Q9. How many farmer’s markets does Georgia have? Extract the count for Georgia from the table. [33]: GA_count = num_markets_ranked . where( 'State' , "Georgia" ) . column( 1 ) . item( 0 ) # do not change this line of code print ( 'Georgia has ' , GA_count, " farmer's markets." ) Georgia has 144 farmer's markets. Q10. What is the distribution of the number of farmer’s markets per state. Create an appropriate histogram. Use the bins provided. [34]: # do not change this line of code myBins = np . arange( 0 , 800 , 50 ) # insert your histogram code here num_markets . hist( 'count' , bins = myBins) 5
Q11. Is the distribution of markets symmetric or skewed? If it is skewed, what direction. 1. Skewed Left 2. Skewed Right 3. Not Skewed [35]: Q11_answer = 2 Q12. Which of the following is true for the number of markets? 1. The mean is lower than the median. 2. The mean is higher than the median. 3. the mean is the same as the median. [36]: Q12_answer = 2 Q13. What percentage of markets are in the second bar of the histogram (select the closest answer)? 1. About 15% 2. About 20% 3. About 25% 4. About 30% 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[37]: Q13_answer = 3 Q14. How many markets are in the second bar of the histogram (select the closest answer)? 1. About 1200 2. About 2500 3. About 1700 4. About 2000 [38]: Q14_answer = 3 Q15. Create a table called georgia_markets that has only Georgia farmers mar- kets. Drop the column ‘State’ from georgia_markets. Show the first 5 rows of geor- gia_markets. [39]: georgia_markets = myData . where( 'State' , 'Georgia' ) georgia_markets . show( 5 ) <IPython.core.display.HTML object> Q16. Does Georgia sell more honey or baked goods? Count the number of markets that sell baked goods and the number that sell honey. [40]: num_honey = georgia_markets . where( 'Honey' , are . equal_to( 'Y' )) . num_rows num_bakedgoods = georgia_markets . where( "Bakedgoods" , are . equal_to( 'Y' )) . num_rows # do not change the following lines of code print (num_honey, ' Georgia markets sell Honey' ) print (num_bakedgoods, ' Georgia markets sell Baked Goods' ) 100 Georgia markets sell Honey 99 Georgia markets sell Baked Goods Poultry count pct N 6148 72 Y 2398 28 Q17. What percentage of Georgia farmers markets sell poultry? Use group and with_column to create a table that looks like: [41]: poultry_pct = georgia_markets . group( 'Poultry' ) poultry_pct 7
[41]: Poultry | count N | 102 Y | 42 Q18. Make a Pie Chart of the poultry percentages [45]: pct_N = 71 pct_Y = 29 sizes = [pct_N, pct_Y] labels = [ 'N' , 'Y' ] plots . pie(sizes, labels = labels) plots . show() 1.0.4 ALL FINISHED! Just submit :) 8