BA 222 Introduction to Pandas (Oct 11) Class

py

School

Boston University *

*We aren’t endorsed by this school

Course

BA222

Subject

Computer Science

Date

Jan 9, 2024

Type

py

Pages

3

Uploaded by MinisterWolverine2792

Report
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Mon Oct 9 11:20:24 2023 This document cannot be shared with anyone without permission @author: nobaycik """ import pandas as pd # We can read an Excel using the pd.read_excel. There are other methods to import files. # Make sure to change the directory flight_data = pd.read_excel('/Users/nobaycik/Desktop/Lecture 1 Flights Data Set.xlsx') #Let's check its shape print("The shape of the flight data set is") print(flight_data.shape) print("The first 5 rows of the flight data set are:") print(flight_data.head()) print("\n") #Let's see the column titles print("The column titles in the flight data set are:") print(flight_data.columns) print("\n") AS_flights = flight_data[flight_data["AIRLINE"] == "AS"] # The syntax is: # DataFrame[Condition] # Where the condition can be something like DataFrame["ColumnName"] == Value # If you want to check which values AIRLINE takes on print("Unique airlines in the data set are:", flight_data["AIRLINE"].unique()) print("\n") # I hope you can see that this is a very flexible set of things you can do! # You can give it arbitrary conditions and ask arbitrary questions # Let's make this a bit more interesting: # What is the length of the shortest flight by distance # that United Airlines flew that was delayed but not cancelled? # Let's build this up step by step: # In Python, we can use the and (&) operator to check multiple conditions: # Notice that Python needs you to wrap the conditions in parentheses # In Python, you can use the operator to check OR (|) # it will check if one or the other are true or both print("\n") # Our example: What is the length of the shortest flight by distance # that United Airlines flew that was delayed but not cancelled? #print(flight_data.columns) print("\n")
# We want the min of the flight distance column, selecting on # AIRLINE == UA & ARRIVAL_DELAY > 0 & CANCELLED == 0 # Let's build this up slowly. First, how do we get all united flights? #flight_data[flight_data["AIRLINE"] == "UA"] # Here are all UA flights with positive delays #flights[(flights["AIRLINE"] == "UA") & (flights["ARRIVAL_DELAY"] > 0)] # Here are all UA flights with positive delays not cancelled #print("All UA flights with positive delays not cancelled", UA_flights) #We can then use all the functions as before (min, max, shape, etc.) #Let's find the minimum distance of UA flights that got delayed but not cancelled print("\n") ## Okay but which flights were they? # Just select on the rows where the distance is the min distance! # Syntax: Dataframe[Condition] print("\n") ## Groupby ## Pandas has a syntax called "groupby" that is the equivalent of a pivot table in Excel ## Suppose you want to know the summary statistics, split out by a certain category ## For example, you want mean distance flown per airline ## Or you want count per airline ## You use the following syntax: ## DataFrame[List of Columns you want].groupby(GroupingColumn).agg(List of Statistics) # For example, here are the mean and count of distance by airline #print(flight_data[["AIRLINE", "DISTANCE"]].groupby("AIRLINE").agg(['count', 'mean'])) #flight_data[["AIRLINE", "DISTANCE"]]: This part of the code selects the columns #"AIRLINE" and "DISTANCE" from the flight_data DataFrame. # .groupby("AIRLINE"): This groups the selected data by the "AIRLINE" column, #creating groups where each group contains data for a specific airline. #.agg(['count', 'mean']): This applies two aggregation functions to each group: #count, which counts the number of entries in each group, and mean, #which calculates the mean (average) of the distances in each group. print("\n") #Exercise: # Find the min, max, median and standard deviation # of flight distances by airline of flights that # depart from LAX # Filter flights that depart from LAX
# Group by airline and calculate the required statistics
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help