Homework7_Aaryan Pimple

pdf

School

University of Southern California *

*We aren’t endorsed by this school

Course

558

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by CoachMorningYak37

ISE-558 Data Management for Analytics Homework 7  For this homework assignment, enter your answers in the blank cells below, either as Python code (in code cells) or as text in Markdown cells. When you are completed, create a PDF version of your solution with the menu command File > Download As HTML and then open the html � le in a browser and "print" it to a PDF � le. Upload this PDF � le to Gradescope in the normal way. import pandas as pd import numpy as np Problem 1  You are to use the tables found in “Orders - Data Integration.csv” and “Product Costs - Data Integration.csv” to generate a report of the total pro � t from each of your customers. Perform the following steps: 1A) Read in the two � les and determine if either (or both) of the � les are not in Tidy format. Summarize below which � les are not in Tidy format and why: # Load the CSV files orders_data = pd.read_csv('/content/Orders - Data Integration.csv') product_costs_data = pd.read_csv('/content/Product Costs - Data Integration.csv') def check_tidy_format(data, table_name): variables_in_columns = all (data.columns == data.columns.str.strip()) observations_in_rows = data.shape[0] == len(data) observational_unit = len(data.index.unique()) == len(data) tidy_summary = { "Table Name": table_name, "Variables in Columns": variables_in_columns, "Observations in Rows": observations_in_rows, "Observational Unit": observational_unit, "Tidy Format": variables_in_columns and observations_in_rows and observational_unit } return tidy_summary Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 1 of 15 05-12-2023, 22:39

orders_tidy_summary = check_tidy_format (orders_data, "Orders") product_costs_tidy_summary = check_tidy_format (product_costs_data, "Product Costs") print("Summary for Orders table:") print(orders_tidy_summary) print("\nSummary for Product Costs table:") print(product_costs_tidy_summary) Summary for Orders table: {'Table Name': 'Orders', 'Variables in Columns': True, 'Observations in Rows': True, Summary for Product Costs table: {'Table Name': 'Product Costs', 'Variables in Columns': True, 'Observations in Rows': xxx 1B) If either table is not in Tidy format, correct it. Also, convert the costs from character to numeric types. def tidy_and_convert_numeric (data, table_name): if 'cost' in data.columns: data[ 'cost'] = pd.to_numeric(data['cost'], errors='coerce') data = data.dropna() data = data.reset_index(drop=True) print(f"{table_name} table has been corrected and costs have been converted to numeric. return data orders_data_corrected = tidy_and_convert_numeric (orders_data, "Orders") product_costs_data_corrected = tidy_and_convert_numeric (product_costs_data, "Product Cos Orders table has been corrected and costs have been converted to numeric. Product Costs table has been corrected and costs have been converted to numeric. 1C) Combine the two data frames to add the “cost” information column from the Product Costs - Data Integration data frame to the Orders - Data Integration data frame. Display your resulting combined data frame. combined_data = pd.merge(orders_data_corrected, product_costs_data_corrected, on='Product print("Combined Data Frame:") print(combined_data) Combined Data Frame: Order_Num Customer Line Item Year Purchased Product Code Quantity \ 0 1000 Customer A 1 2019 X1189 65 1 1000 Customer A 2 2019 A33 63 2 1000 Customer A 3 2019 BW243 75 3 1000 Customer A 4 2019 X1388 20 Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 2 of 15 05-12-2023, 22:39

3 1000 Customer A 4 2019 X1388 20 4 1000 Customer A 5 2019 Y12 82 5 1001 Customer B 1 2020 X1388 83 6 1001 Customer B 2 2020 BW243 29 7 1001 Customer B 3 2020 GG2554 70 8 1002 Customer C 1 2020 X1388 52 9 1002 Customer C 2 2020 HC155 73 10 1002 Customer C 3 2020 ZZ52 81 11 1002 Customer C 4 2020 YYS1 73 12 1002 Customer C 5 2020 GG2554 98 13 1002 Customer C 6 2020 Y12 93 14 1002 Customer C 7 2020 HCK15 81 15 1003 Customer D 1 2019 X1388 62 Unit Price 2019 Cost 2020 Cost 0 314 244.92 273.92 1 698 467.66 484.66 2 483 367.08 380.08 3 684 540.36 566.36 4 474 322.32 335.32 5 684 540.36 566.36 6 483 367.08 380.08 7 595 428.40 447.40 8 684 540.36 566.36 9 404 282.80 303.80 10 317 196.54 226.54 11 258 157.38 172.38 12 595 428.40 447.40 13 474 322.32 335.32 14 532 335.16 357.16 15 684 540.36 566.36 1D) Create a new column in the joined table that is equal to the total pro � t for that line item (note: pro � t equals the price that you charge for an item minus the cost that you have to pay to get that item from your supplier). Display the resulting dataframe. combined_data['Total Profit'] = (combined_data['Unit Price'] - combined_data['2020 Cost'] print("Combined Data Frame with Total Profit: ") print(combined_data) Combined Data Frame with Total Profit: Order_Num Customer Line Item Year Purchased Product Code Quantity \ 0 1000 Customer A 1 2019 X1189 65 1 1000 Customer A 2 2019 A33 63 2 1000 Customer A 3 2019 BW243 75 3 1000 Customer A 4 2019 X1388 20 4 1000 Customer A 5 2019 Y12 82 5 1001 Customer B 1 2020 X1388 83 6 1001 Customer B 2 2020 BW243 29 7 1001 Customer B 3 2020 GG2554 70 8 1002 Customer C 1 2020 X1388 52 9 1002 Customer C 2 2020 HC155 73 Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 3 of 15 05-12-2023, 22:39

Your preview ends here