Skip to main content

Documents Industrial Engineering

Lecture - 26 - 28.pptx

Lecture - 26 - 28

pptx

School

Manchester Community College *

*We aren’t endorsed by this school

Course

202

Subject

Industrial Engineering

Date

Nov 24, 2024

Type

pptx

Pages

54

Uploaded by ChiefWater13497

Lecture Notebooks 26, 27 and 28

• Combining data set - Concat What are we going to learn in Note 26

Consider the following function: def make_df(cols, ind): """Quickly make a DataFrame""" data = {c: [str(c) + str(i) for i in ind] for c in cols} return pd.DataFrame(data, ind) # example DataFrame make_df('ABC', range(3)) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Simple Concatenate with pd.concat pd.concat() can be used for a simple concatenation of Series or DataFrame objects, just like np.concatenate can be used for simple concatenations of array in Numpy. Example : Concatenate Series Output : ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3]) # 3-row Series 1 A ser2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6]) # 3-row Series 2 B pd.concat([ser1, ser2]) # 3+3=6-row Series 3 C 4 D 5 E 6 F dtype: object

Example : Concatenate DataFrame df1 = make_df('AB', [1, 2]) # recall that make_df() is a function that we defined above df2 = make_df('AB', [3, 4]) display('df1', 'df2', 'pd.concat([df1, df2])') Output :

Example : Concatenation can take place within the DataFrame (i.e., axis = 1 or axis = ‘column’) df3 = make_df('AB', [0, 1]) df4 = make_df('CD', [0, 1]) display('df3', 'df4', "pd.concat([df3, df4], axis = 'columns')") Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Duplicate indices Example : x = make_df('AB', [0, 1]) x = make_df('AB', [0, 1]) y = make_df('AB', [2, 3]) y = make_df('AB', [2, 3]) y.index = x.index # make duplicate indices! # y.index = x.index display('x', 'y', 'pd.concat([x, y])’) display('x', 'y', 'pd.concat([x, y])’) Output : Output :

Catching the repeats as error Example : Use verify_integrity to verify if the indices in the result of pd.concat() do not overlap. try: pd.concat([x, y], verify_integrity=True) except ValueError as e: print("ValueError:", e) Output : ValueError: Indexes have overlapping values: Int64Index([0, 1], dtype='int64')

Ignoring the index Example : Sometimes the index itself does not matter, and you would prefer it to simply be ignored. This option can be specified using the ignore_index flag. With this set to true, the concatenation will create a new integer index for the resulting Series. display('x', 'y', 'pd.concat([x, y], ignore_index=True)') Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Adding MultiIndex keys Example : Use the keys option to specify a label for the data sources; the result will be a hierarchically indexed series containing the data. display('x', 'y', "pd.concat([x, y], keys=['x', 'y'])") Output :

Concatenation with joins Example : Consider the following DataFrame . df5 = make_df('ABC', [1, 2]) df6 = make_df('BCD', [3, 4]) display('df5', 'df6', 'pd.concat([df5, df6])') # by default: axis=0, i.e. row concatenation Output :

Note : By default, the entries for which no data is available are filled with NA values. To change this, we can specify one of several options for the join parameters of the concatenate function. By default, the join is a union of the input columns ( join='outer' ), but we can change this to an intersection of the columns using join='inner’ . display('df5', 'df6', "pd.concat([df5, df6], join='inner’)”) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

• Combining data set – Merge and join What are we going to learn in Note 27

One-to-one joins Example : Consider the following two DataFrames: df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'], 'group': ['Accounting', 'Engineering', 'Engineering', 'HR']}) df2 = pd.DataFrame({'employee': ['Lisa', 'Bob', 'Jake', 'Sue'], 'hire_date': [2004, 2008, 2012, 2014]}) display('df1', 'df2’) Output :

Example : To combine this information into a single DataFrame, we can use the pd.merge() function: df3 = pd.merge(df1, df2) df4 = pd.concat([df1, df2]) print(df3) print(df4) Output : Output : Note : The pd.merge() function recognizes that each DataFrame has an "employee" column, and automatically joins using this column as a key. The result of the merge is a new DataFrame that combines the information from the two inputs.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : x1 = make_df("abc",range(3)) x2 = make_df("bcd",range(3)) display('x1', 'x2', 'pd.merge(x1, x2)’) Output :

Example (Cont) : Keep in mind that the merge in general discards the index , except in the special case of merges by index. Indexes are changed! x1 = make_df("abc",range(3)) x2 = make_df("bcd",range(1,4)) display('x1', 'x2', 'pd.merge(x1, x2)’) Output :

Many-to-one joins Example : Many-to-one joins are joins in which one of the two key columns contains duplicate entries. For the many-to-one case, the resulting DataFrame will preserve those duplicate entries as appropriate. df4 = pd.DataFrame({'group': ['Accounting', 'Engineering', 'HR'], 'supervisor': ['Carly', 'Guido', 'Steve']}) display('df3', 'df4', 'pd.merge(df3, df4)’) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Many-to-many joins Example : If the key column in both the left and right array contains duplicates, then the result is a many-to-many merge. df5 = pd.DataFrame({'group': ['Accounting', 'Accounting', 'Engineering', 'Engineering', 'HR', 'HR'], 'skills': ['math', 'spreadsheets', 'coding', 'linux', 'spreadsheets', 'organization']}) display('df1', 'df5', "pd.merge(df1, df5)") Output :

The on keyword Example : One can explicitly specify the name of the key column using the on keyword, which takes a column name or a list of column names . display('df1', 'df2', "pd.merge(df1, df2, on='employee')") Output :

The left_on and right_on keywords Example : At times you may wish to merge two datasets with different column names; for example, we may have a dataset in which the employee name is labeled as "name" rather than "employee". In this case, we can use the left_on and right_on keywords to specify the two column names. df3 = pd.DataFrame({'name': ['Bob', 'Jake', 'Lisa', 'Sue'], 'salary': [70000, 80000, 120000, 90000]}) display('df1', 'df3', 'pd.merge(df1, df3, left_on="employee", right_on="name")’) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : Note that the result has a redundant column that we can drop if desired– for example, by using the drop() method of DataFrames. pd.merge(df1, df3, left_on="employee", right_on="name").drop('name', axis=1) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

The left_index and right_index keywords Example : You can merge on an index. You can use the index as the key for merging by specifying the left_index and/or right_index flags in pd.merge(): df1a = df1.set_index('employee’) # Note: employee is now the index df2a = df2.set_index('employee’) # Note: employee is now the index display('df1a', 'df2a’) display('df1a', 'df2a’, "pd.merge(df1a, df2a, left_index=True, right_index=True)") Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : Alternatively, DataFrames implement the join() method, which performs a merge that defaults to joining on indices: display('df1a', 'df2a', 'df1a.join(df2a)’) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : If you'd like to mix indices and columns, you can combine left_index with right_on or left_on with right_index to get the desired behavior: display('df1a', 'df3', "pd.merge(df1a, df3, left_index=True, right_on='name')") Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Specifying Set Arithmetic for Joins Example : When we merge two datasets that have only a single "name" entry in common, by default, the result contains the intersection of the two sets of inputs; this is what is known as an inner join . We can specify this explicitly using the how keyword, which defaults to " inner ". df6 = pd.DataFrame({'name': ['Peter', 'Paul', 'Mary'], 'food': ['fish', 'beans', 'bread']}, columns=['name', 'food']) df7 = pd.DataFrame({'name': ['Mary', 'Joseph'], 'drink': ['wine', 'beer']}, columns=['name', 'drink’]) pd.merge(df6, df7, how='inner’) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : An outer join returns a join over the union of the input columns, and fills in all missing values with NAs. display('df6', 'df7', "pd.merge(df6, df7, how='outer')") Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : The left join and right join return joins over the left entries and right entries, respectively. display('df6', 'df7', "pd.merge(df6, df7, how='left’)”) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Overlapping Column Names: The suffixes Keyword Example : Consider the following example df8 = pd.DataFrame({'name': ['Bob', 'Jake', 'Lisa', 'Sue'], 'rank': [1, 2, 3, 4]}) df9 = pd.DataFrame({'name': ['Bob', 'Jake', 'Lisa', 'Sue'], 'rank': [3, 1, 4, 2]}) display('df8', 'df9', 'pd.merge(df8, df9, on="name")’) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Note : If the outputs have two conflicting column names, the merge function automatically appends a suffix _x or _y to make the output columns unique. If these defaults are inappropriate, it is possible to specify a custom suffix using the suffixes keyword: display('df8', 'df9', 'pd.merge(df8, df9, on="name", suffixes=["_Sem1", "_Sem_2"])’) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example: US States Data (Please read this part after class)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

• Aggregation and grouping What are we going to learn in Note 28

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Simple Aggregation in Pandas Example : Aggregation for Series rng = np.random.RandomState(42) ser.sum() ser = pd.Series(rng.rand(5)) ser Output: 2.811925491708157 Output: 0 0.374540 ser.mean() 1 0.950714 2 0.731994 Output: 3 0.598658 0.5623850983416314 4 0.156019 dtype: float64

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example : For a DataFrame , by default the aggregates return results within each column. df = pd.DataFrame({'A': rng.rand(5), df.mean() # By specify the axis argument, 'B': rng.rand(5)}) # you can instead aggregate # within each row Output: df.mean(axis='columns’) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Planets Data Example : We are going to use the Planets dataset which is available from the Seaborn package . import seaborn as sns planets = sns.load_dataset('planets') planets.shape Output : (1035, 6) planets.head(10) Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Planets Data Example (Cont) : Method describe() that computes several common aggregates for each column and returns the result. planets.dropna().describe() # dropping rows with missing values: Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Planets Data The following table summarizes some other built-in Pandas aggregations: Aggregation Description count() Total number of items first(), last() First and last item mean(), median() Mean and median min(), max() Minimum and maximum std(), var() Standard deviation and variance mad() Mean absolute deviation prod() Product of all items sum() Sum of all items

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

GroupBy : Split , Apply , Combine

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

The groupby accomplishes: • The split step involves breaking up and grouping a DataFrame depending on the value of the specified key. • The apply step involves computing some function, usually an aggregate, transformation, or filtering, within the individual groups. • The combine step merges the results of these operations into an output array.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example : df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'], 'data': range(6)}, columns=['key', 'data’]) df df.groupby('key').sum() Output : Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Planets Data) : We can select a particular Series group from the original DataFrame group by reference to its column name. planets.groupby('method')['orbital_period'].median() Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Dispatch methods Example : One can use the describe() method of DataFrame s to perform a set of aggregations that describe each group in the data. planets.groupby('method')['year'].describe() Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Aggregate, filter, transform, apply Example : We consider the following DataFrame . rng = np.random.RandomState(0) df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'], 'data1': range(1,7), 'data2': rng.choice(range(1,11), 6, replace=False)}, columns = ['key', 'data1', 'data2']) df Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : aggregate() method allows for more flexibility. It can take a string, a function, or a list thereof, and compute all the aggregates at once. df.groupby('key').aggregate(['min', np.median, max]) Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : Another useful pattern is to pass a dictionary mapping column names to operations to be applied on that column. df.groupby('key').aggregate({'data1': 'min', 'data2': 'max’}) Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : filter() operation allows you to drop data based on the group properties. For example, we might want to keep all groups in which the standard deviation is larger than some critical value. def filter_func(x): return x['data2'].std() > 4 display('df', "df.groupby('key').std()", "df.groupby('key').filter(filter_func)") Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : transformation can return some transformed version of the full data to recombine. For such a transformation, the output is the same shape as the input. display("df", "df.groupby('key').mean()", "df.groupby('key').transform(lambda x: x - x.mean())") Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : The apply() method lets you apply an arbitrary function to the group results. The function should take a DataFrame , and return either a Pandas object (e.g., DataFrame , Series ) or a scalar; the combine operation will be tailored to the type of output returned. pd.DataFrame(df.groupby('key').data2.sum()) Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : rng = np.random.RandomState(0) def norm_by_data2(x): # x is a DataFrame of group values x['data1'] /= x['data2'].sum() return x df_group_sums = pd.DataFrame(df.groupby('key')['data2'].sum()) # here pd.DataFrame() is needed to turn Series object into a DataFrame display("df", "df_group_sums", "df.groupby('key').apply(norm_by_data2)")

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Example (Cont) : Output :

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Specifying the split key Example : The key can be any series or list with a length matching that of the DataFrame . . L = [0, 1, 2, 0, 1, 2] # A=0, B=1, C=1 # Can be accomplished by the df.groupby('key') display('df', 'df.groupby(L).sum()’) or display('df', "df.groupby(df['key']).sum()") Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A dictionary or series mapping index to group Example : Another method is to provide a dictionary that maps index values to the group keys. df2 = df.set_index('key') mapping = {'A': 'vowel', 'B': 'consonant', 'C': 'consonant'} display('df2', 'df2.groupby(mapping).sum()’) Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Any Python function Example : One can pass any Python function that will input the index value and output the group. display('df2', 'df2.groupby(str.lower).mean()’) Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A list of valid keys Example : Any of the preceding key choices can be combined to group on a multi-index. df2.groupby([str.lower, mapping]).mean() Output:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Unit 2 Milestone 12.JPG

RP2 Kyle Duran Explain Cross Docking.docx

MATH1280_DF_Unit1-1 - Copy (4).docx

Screenshot 2023-05-16 222126.png

Saeed Abdalla Ghanim Saif Alzaabi.docx

Thomas, J. Assignment 4-1.docx

Module 4- Real-World Physics Discussion.docx

CollisionStudyguide2021.pdf

Report 1 110.docx