UGBA88 Lab DX assignment - Jupyter Notebook

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

113

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

13

Uploaded by MegaSeahorseMaster651

Report
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 1/13 Lab DX: Data Visualization 1 Setup In [8]: 2 Fundraising for a Political Campaign continued 2.1 Business Decision The Director of Fundraising in California for a US presidential candidate is considering where best to focus the campaign's effort during the early period January to September of the year prior to the election. To inform her decision, she would like insight into how fundraising has gone for other candidates in the past. # Import some useful functions from numpy import * from numpy.random import * from datascience import * from statsmodels.formula.api import * # Define some useful functions def correlation (array_1, array_2): return corrcoef(array_1, array_2).item( 1 ) # Customize look of graphics import matplotlib.pyplot as plt plt.style.use( 'fivethirtyeight' ) plt.rcParams[ 'figure.dpi' ] = 60 % matplotlib inline # Force display of all values from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" # Handle some obnoxious warning messages import warnings warnings.filterwarnings( "ignore" )
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 2/13 2.2 Data The donor dataset here is from the US Federal Election Commission (FEC) and itemizes individual contributions from California to federal campaign committees for US presidential candidates received January 2019 to September 2019 (refer: US Federal Election Commission https://www.fec.gov/data/browse-data/?tab=bulk-data (https://www.fec.gov/data/browse-data/? tab=bulk-data) ). Retrieve the donor dataset, filter to include only transactions where amount is between $0 and $2,200, show the number of transactions and the first few transactions. In [9]: 2.3 Analysis Construct a table to show $ donated to Democrats for each city that donated to Democrats. Out[9]: 264119 Out[9]: CAND_NAME PARTY NAME CITY OCCUPATION DATE AMOUNT SANFORD, MARSHALL HON REP CARATAN, PATRICK BAKERSFIELD blank 2019- 09-08 400 SANFORD, MARSHALL HON REP PARROTT, IDA FRESNO RETIRED 2019- 09-09 500 SANFORD, MARSHALL HON REP HUEBSCHER, FRED HERMOSA BEACH CONSULTANT 2019- 09-08 1000 SANFORD, MARSHALL HON REP MURPHY, MIKE LOS ANGELES CONSULTANT 2019- 09-08 1000 DELANEY, JOHN K. DEM ELVERUM, JAMIE CARMICHAEL SALES MANAGER 2019- 07-14 25 DELANEY, JOHN K. DEM FLYNN, GREG SAN FRANCISCO blank 2019- 03-31 1000 DELANEY, JOHN K. DEM SAUNDERS, MARSHALL L CORONADO RETIRED 2019- 07-08 1000 DELANEY, JOHN K. DEM HOLMSTROM, RICK WOODSIDE blank 2019- 06-20 200 DELANEY, JOHN K. DEM COWAN, GEOFFREY LOS ANGELES PROFESSOR 2019- 04-10 20 DELANEY, JOHN K. DEM ELVERUM, JAMIE CARMICHAEL SALES MANAGER 2019- 07-28 25 ... (264109 rows omitted) data = Table.read_table( 'donations.csv' , low_memory = False ) data = data.where( 'AMOUNT' , are.between_or_equal_to( 0 , 2200 )) data.num_rows data
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 3/13 In [11]: Construct a table to show $ donated to Republicans for each city that donated to Republicans. In [12]: Construct a table to compare $ donated to Democrats vs. $ donated to Republicans for each city (from among cities that donated to both Democrats and Republicans). Out[11]: CITY DEM sum ACTON 1602 ADELANTO 159 AGOURA 131 AGOURA HILLS 23251 AGUA DULCE 849 AHWAHNEE 54 ALAMEDA 105673 ALAMO 31812 ALBANY 54959 ALBION 2746 ... (1026 rows omitted) Out[12]: CITY REP sum ACAMPO 250 ACTON 80 ADIN 250 AGOURA HILLS 2055 AHWAHNEE 275 ALAMEDA 1688 ALAMO 4505 ALHAMBRA 748 ALISO VIEJO 2295 ALPINE 1035 ... (680 rows omitted) agg_dem = data.where( 'PARTY' , 'DEM' ).select( 'CITY' , 'AMOUNT' ).group( 'CITY' , su agg_dem = agg_dem.relabel( 'AMOUNT sum' , 'DEM sum' ) agg_dem agg_rep = data.where( 'PARTY' , 'REP' ).select( 'CITY' , 'AMOUNT' ).group( 'CITY' , s agg_rep = agg_rep.relabel( 'AMOUNT sum' , 'REP sum' ) agg_rep
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 4/13 In [13]: Find the 10 cities with the highest $ donated to Democrats (from among cities that donated to both Democrats and Republicans). These are the top-10 Democrat donor cities. In [15]: Use a horizontal bar chart to visualize $ donated to Democrats and $ donated to Republicans for each of the top-10 Democrat donor cities, sorted by $ donated to Democrats. Out[13]: CITY DEM sum REP sum ACTON 1602 80 AGOURA HILLS 23251 2055 AHWAHNEE 54 275 ALAMEDA 105673 1688 ALAMO 31812 4505 ALHAMBRA 13508 748 ALISO VIEJO 24879 2295 ALPINE 1310 1035 ALTA LOMA 1505 285 ALTADENA 56061 1900 ... (612 rows omitted) Out[15]: CITY DEM sum REP sum SAN FRANCISCO 3569632 23528 LOS ANGELES 3178304 41417 OAKLAND 767837 5600 BERKELEY 619856 841 SAN DIEGO 557147 32812 SANTA MONICA 540150 4365 PALO ALTO 532022 3620 SAN JOSE 529012 29433 BEVERLY HILLS 408892 9167 SACRAMENTO 403744 11932 agg = agg_dem.join( 'CITY' ,agg_rep) agg agg_top10dem = agg.sort( 'DEM sum' , descending = True ).take(arange( 10 )) agg_top10dem
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 5/13 In [16]: Find the 10 cities with the highest $ donated to Republicans (from among cities that donated to both Democrats and Republicans). These are the top-10 Republican donor cities. In [17]: Use a horizontal bar chart to visualize $ donated to Democrats and $ donated to Republicans for each of the top-10 Republican donor cities, sorted by $ donated to Republicans. Out[17]: CITY DEM sum REP sum LOS ANGELES 3178304 41417 SAN DIEGO 557147 32812 SAN JOSE 529012 29433 SAN FRANCISCO 3569632 23528 SANTA ANA 51036 21938 HUNTINGTON BEACH 80170 15833 NEWPORT BEACH 78004 15640 IRVINE 151146 15405 BAKERSFIELD 32740 15076 FRESNO 72900 14105 agg_top10dem.barh( 'CITY' ) agg_top10rep = agg.sort( 'REP sum' , descending = True ).take(arange( 10 )) agg_top10rep
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 6/13 In [18]: Calculate the ratio of $ donated to Democrats vs. $ donated to Republicans from the top- 4 Republican donor cities combined. In [19]: Construct a table to show the cumulative $ donated in February 2019, by date. Out[19]: CITY DEM sum REP sum LOS ANGELES 3178304 41417 SAN DIEGO 557147 32812 SAN JOSE 529012 29433 SAN FRANCISCO 3569632 23528 Out[19]: 61.593639437062663 agg_top10rep.barh( 'CITY' ) agg_top4rep = agg_top10rep.take(arange( 4 )) agg_top4rep sum (agg_top4rep.column( 'DEM sum' )) / sum (agg_top4rep.column( 'REP sum' ))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 7/13 In [20]: Visualize the cumulative $ donated in February 2019, by date, as a line plot. Out[20]: CAND_NAME PARTY NAME CITY OCCUPATION DATE AMOUNT CUMSUM TRUMP, DONALD J. REP MEZHEBITSKIY, SERGEY CITRUS HEIGHTS SINGLE 2019- 02-01 25 25 TRUMP, DONALD J. REP SCHNEIDER, GARY MOUNTAIN VIEW LYFT DRIVER 2019- 02-01 50 75 TRUMP, DONALD J. REP WAGNER, RUEL E. MR. LA MIRADA RETIRED 2019- 02-01 50 125 TRUMP, DONALD J. REP WALKOV, CINDRA VAN NUYS REAL EATATE 2019- 02-01 35 160 TRUMP, DONALD J. REP DIEHL, LINDA EXETER RETIRED 2019- 02-01 25 185 TRUMP, DONALD J. REP HIGHTOWER, TIMOTHY SAN JOSE RETIRED 2019- 02-01 250 435 TRUMP, DONALD J. REP LUONG, WILLIAM T. MR. SAN JOSE RETIRED 2019- 02-01 50 485 TRUMP, DONALD J. REP WALKOV, CINDRA VAN NUYS REAL EATATE 2019- 02-01 35 520 TRUMP, DONALD J. REP NEIGHBOR, JOSEPH RANCHO CUCAMONGA MANAGER 2019- 02-01 100 620 TRUMP, DONALD J. REP WARD, DELMA PASO ROBLES RETIRED 2019- 02-01 225 845 ... (5304 rows omitted) = data.where( 'DATE' , are.between_or_equal_to( "2019-02-01" , "2019-02-28" )) = data_02.sort( 'DATE' ).with_column( 'CUMSUM' , cumsum(data_02.sort( 'DATE' ).col
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 8/13 In [21]: 2.4 Quiz The dataset, filtered to include only transactions where amount is between $0 and $2,200, comprises ____ observations. City of Acton donated $ ____ to Democrat candidates. City of Alpine donated $ ____ to Republican candidates. City of Alta Loma donated $ ____ to Democrat candidates. City of Alta Loma donated $ ____ to Republican candidates. City of ____ donated more to Democrats than any other city did. BERKELEY BEVERLY HILLS LOS ANGELES SAN DIEGO SAN FRANCISCO SAN JOSE City of Fresno donated more to Republicans than city of ____ did. BAKERSFIELD IRVINE SACRAMENTO SANTA ANA SAN FRANCISCO SAN JOSE The top- 4 Republican donor cities donated ____ times as much to Democrats as they did to Republicans. During February 2019, the largest one-day sum of $ donated was on February ____ , 2019. The line plot of cumulative $ donated in February 2019, by date, is shaped like ____ . discrete stair steps because there are typically several donations made each day discrete stair steps because the donation amounts in the dataset are rounded to the nearest dollar data_02.select( 'DATE' , 'CUMSUM' ).plot( 'DATE' )
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 9/13 discrete stair steps because the python .plot() function displays line segments at low resolution a continuous ramp because there are typically several donations made each day a continuous ramp because the donation amounts in the dataset are rounded to the nearest dollar a continuous ramp because the python .plot() function displays line segments at low resolution 3 Airline Flight On-Time Performance 3.1 Business Decision The practice manager of a management consulting firm has been engaged by a consortium of airport operators to investiugate and recommend process changes to improve airline flight on-time performance. His first task is to identify characteristics surrounding very delayed flights. Based on insights gained there, he will decide where to focus his efforts next. 3.2 Data The dataset describes flight performance of a sample of 1,049 flights during August 2015 (refer: US Bureau of Transportation Statistics). Each observation describes one flight. Retrieve the flight performance dataset, show the number of observations and the first few observations.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 10/13 In [22]: 3.3 Analysis Calculate the mean and standard deviation of arrival delays. In [23]: Visualize the distribution of arrival delays as a 10-bin histogram, normed, range -75 to 75. Out[22]: 1049 Out[22]: DayofMonth DayOfWeek Carrier Origin Origin City Dest Dest City Departure Delay (mins) Arrival Delay (min) Schd Elapsed Time 7 5 American JFK New York, NY LAX Los Angeles, CA -3 -19 364 25 2 American SFO San Francisco, CA JFK New York, NY -9 -31 343 23 7 American PDX Portland, OR ORD Chicago, IL -2 -26 243 21 5 American DEN Denver, CO MIA Miami, FL 4 5 236 18 2 American MIA Miami, FL IAH Houston, TX -6 -19 163 15 6 American JFK New York, NY MIA Miami, FL 73 58 213 8 6 American SEA Seattle, WA DFW Dallas/Fort Worth, TX 0 -5 235 27 4 American LAX Los Angeles, CA BOS Boston, MA -2 -20 340 7 5 American ELP El Paso, TX DFW Dallas/Fort Worth, TX -7 -10 108 15 6 American LAS Las Vegas, NV DFW Dallas/Fort Worth, TX -2 -6 165 ... (1039 rows omitted) Out[23]: -1.7988560533841753 Out[23]: 18.738900333975252 data = Table.read_table( 'flight_performance.csv' ) data.num_rows data mean_delay = mean(data.column( 'Arrival Delay (min)' )) mean_delay std_delay = std(data.column( 'Arrival Delay (min)' )) std_delay
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 11/13 In [ ]: Calculate the 90th percentile of arrival delays. 10% of the flights were delayed longer than this value. In [ ]: Calculate the mean arrival delay for each origin airport. In [ ]: Visualize the distribution of mean arrival delays as a 10-bin histogram, normed, range -75 to 75. (Each mean arrival delay corresponds to a unique origin airport.) In [ ]: Calculate the 90th percentile of mean arrival delays. 10% of the origin airports had flights delayed on average longer than this value. In [ ]: Find the 10% of origin airports with the highest mean arrival delays. These are the origin airports that had flights delayed on average longer than the 90th percentile. These are the problem origin airports. In [ ]: Find the flights originating from problem origin airports. Hint: Join the table of problem origin airports with the table of all flights. In [ ]: Calculate the number of flights originating from problem origin airports for each carrier. In [ ]: Calculate the shortest arrival delay originating from problem origin airports for each carrier. (Treat a negative arrival delay as shorter than a positive arrival delay.) data.hist( 'Arrival Delay (min)' ), normed = True , range = make_array( - 75 , 75 ) p = percentile( 90 , data.column( 'Arrival Delay (min)' )) p data_group = data.select( 'Origin' , 'Arrival Delay (min)' ).group( 'Origin' , m data_group data_group.where( 'Arrival Delay (min) mean' , are.between( - 75 / 5 , 0 )).num_rows data_group.hist( 'Arrival Delay (min) mean' , normed = True , range = make_ar p_group = percentile( 90 , data_group.column( 'Arrival Delay (min) mean' )) p_group data_group_worst = data_group.where( 'Arrival Delay (min) mean' , are.above(p date_group_worst.show() data_worst = data.join( 'Origin' , data_group_worst) data_worst.show() data_worst.select( 'Carrier' ).group( 'Carrier' )
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 12/13 In [ ]: Calculate the mean arrival delay originating from problem origin airports for each carrier. (Treat a negative arrival delay as shorter than a positive arrival delay.) In [ ]: Calculate the longest arrival delay originating from problem origin airports for each carrier. (Treat a negative arrival delay as shorter than a positive arrival delay.) In [ ]: Use a horizontal bar chart to visualize shortest arrival delay, mean arrival delay, and longest arrival delay originating from problem origin airports for each carrier. In [ ]: 3.4 Quiz The dataset comprises ____ observations. The mean arrival delay is ____ minutes. (could be negative) The standard deviation of arrival delay is ____ minutes. The histogram of arrival delays includes ____ bars with heights greater than zero. 10% of flights arrived at their destinations delayed by more than ____ minutes. Flights originating from ASE arrived at their destinations delayed by ____ minutes on average. The area of all bars in the histogram of mean arrival delays is ____ . The area of the highest bar in the histogram of mean arrival delays is ____ . 10% of origin airports had flights that arrived at their destinations delayed by more than ____ minutes on average. ____ origin airports had flights that arrived at their destinations delayed by more than 12 minutes on average. The mean arrival delay of flights originating from MLI is ____ minutes. The mean arrival delay of flights originating from FAI is ____ minutes. ____ flights originated from IAD. ____ SkyWest flights originated from problem origin airports. The shortest arrival delay of Alaska flights originating from problem origin airports is ____ minutes. The mean arrival delay of Atlantic SE flights originating from problem origin airports is ____ minutes. The longest arrival delay of UA flights originating from problem origin airports is ____ minutes. The bar chart of shortest, mean, and longest arrival delays includes ____ bars. The length of the longest bar in the bar chart of shortest, mean, and longest arrival delays is ____ . ____ contributes to arrival delays associated with problem origin airports due to having many flights and high mean arrival delay, even though at least one flight arrived ahead of d1 = data_worst.select( 'Carrier' , 'Arrival Delay (min)' ).group( 'Carrier' , mi d1 d2 = data_worst.select( 'Carrier' , 'Arrival Delay (min)' )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/12/24, 5:50 PM UGBA88 Lab DX assignment - Jupyter Notebook https://ugba88.haas.berkeley.edu/user/darrinfan/notebooks/my-work/Lab D Data Visualization/UGBA88 Lab DX assignment.ipynb 13/13 schedule. American Atlantic SE SkyWest Southwest UA Southwest ____ contribute to arrival delays associated with problem origin airports. does does not Document revised September 16, 2023 Copyright (c) Huntsinger Associates, LLC