1Dbk8_m8vR0d8SMRHmfgo_SbPSPdlq7Bj

pdf

School

University of Southern California *

*We aren’t endorsed by this school

Course

558

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by CoachMorningYak37

Report
ISE-558 Data Management for Analytics Homework 7 For this homework assignment, enter your answers in the blank cells below, either as Python code (in code cells) or as text in Markdown cells. When you are completed, create a PDF version of your solution with the menu command File > Download As HTML and then open the html le in a browser and "print" it to a PDF le. Upload this PDF le to Gradescope in the normal way. import pandas as pd import numpy as np Problem 1 You are to use the tables found in “Orders - Data Integration.csv” and “Product Costs - Data Integration.csv” to generate a report of the total pro t from each of your customers. Perform the following steps: 1A) Read in the two les and determine if either (or both) of the les are not in Tidy format. Summarize below which les are not in Tidy format and why: import pandas as pd # Load the CSV files orders_path = '/Orders - Data Integration.csv' product_costs_path = '' # Reading the CSV files into DataFrames orders_df = pd.read_csv(orders_path) product_costs_df = pd.read_csv(product_costs_path) # Merging the orders dataframe with the product costs dataframe merged_df = orders_df.merge(product_costs_df, on="Product Code") # Function to calculate the cost for each line item def calculate_cost(row): year = row['Year Purchased'] quantity = row['Quantity'] cost_per_unit = row['2019 Cost'] if year == 2019 else row['2020 Cost'] return cost_per_unit * quantity Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 1 of 6 05-12-2023, 17:52
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-1-d2e39827f55f> in <cell line: 8> () 6 7 # Reading the CSV files into DataFrames ----> 8 orders_df = pd . read_csv ( orders_path ) 9 product_costs_df = pd . read_csv ( product_costs_path ) 10 6 frames /usr/local/lib/python3.10/dist-packages/pandas/io/common.py in get_handle (path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 854 if ioargs . encoding and "b" not in ioargs . mode : 855 # Encoding --> 856 handle = open( 857 handle , 858 ioargs . mode , FileNotFoundError : [Errno 2] No such file or directory: '/Orders - Data Integration.csv' # Adding a new column for the total cost merged_df['Total Cost'] = merged_df.apply(calculate_cost, axis=1) # Calculating profit for each line item (Revenue - Cost) merged_df['Profit'] = (merged_df['Unit Price'] * merged_df['Quantity']) - merged_df['Total # Grouping by customer and calculating total profit total_profit_per_customer = merged_df.groupby('Customer')['Profit'].sum().reset_index() # Display the result print(total_profit_per_customer) xxx 1B) If either table is not in Tidy format, correct it. Also, convert the costs from character to numeric types. 1C) Combine the two data frames to add the “cost” information column from the Product Costs - Data Integration data frame to the Orders - Data Integration data frame. Display your resulting combined data frame. Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 2 of 6 05-12-2023, 17:52
1D) Create a new column in the joined table that is equal to the total pro t for that line item (note: pro t equals the price that you charge for an item minus the cost that you have to pay to get that item from your supplier). Display the resulting dataframe. 1E) Generate and display a data frame that contains the total pro t received from each of your customers (your resulting table will contain four rows - one row for each customer). Problem 2 2A) Read into dataframe the following six star schema les: rentals.csv, addresses.csv, cities.csv, countries.csv, payments.csv, and customers.csv. Then, combine the six les into a single dataframe as your analytical base table. Display the rst few rows of this analytical base table (don’t worry if you can’t display all the columns in your printout). 2B) Write a command to return the total amount of all of the payments by customers who live in the United States 2C) Write a command to return the total payment amounts for every country that has a total payment abount greater than 1000. Sort the list from highest to lowest. 2D) Write a command to return the total payment amounts for the 10 customer IDs with the largest total payment amounts. Sort the list from highest to lowest amounts. Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 3 of 6 05-12-2023, 17:52
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Problem 3 The les corporations.csv, departments.csv, and teams.csv contain data that you want to use to analyze the effects of various factors on a department’s error rates on the tasks that they perform and track. Read these les into tibbles and combine in them into a single one row per analysis subject tibble to perform this analysis. Problem 4 For this problem, you will be performing initial data cleansing and feature engineering on the dataset "House Price Dataset.csv" that will be used to predict house prices based on the following attributes: House Sale Identi er House Age SqFt (square footage of the house) Num Bathrooms Num Bedrooms Average Income Sales Price Read the csv le into a Python/Pandas dataframe and prepare the dataset for analytics by performing the steps below. 4A) Are there any missing values? If so, decide how to correct the issue and show your code that does so. xxx Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 4 of 6 05-12-2023, 17:52
4B) Are there any extreme outliers? If so, decide how to correct the issue and show your code that does so. xxx 4C) Are any of the variable signi cantly skewed? If so, correct them. Display any resulting histograms to show that the skew has been reduced. xxx 4D) Does the dataset require scaling? If so, perform that operation. 4E) Display your resulting dataset Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 5 of 6 05-12-2023, 17:52
Module 7 Homework.ipynb - Colaboratory https://colab.research.google.com/drive/1Dbk8_m8vR0d8SMRHmfgo... 6 of 6 05-12-2023, 17:52
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help