Assignment 61 - ETL Project2232 (3)

docx

School

San Francisco State University *

*We aren’t endorsed by this school

Course

Subject

Information Systems

Date

Nov 24, 2024

Type

docx

Pages

Uploaded by PresidentLightningAardvark11

UMGC Data 620 Assignment 6.1 Your company wants to merge its old product order data into a new data mart to facilitate analysis. Your team has been tasked with writing an ETL (extract, transform, and load) code sequence, and executing it on three years’ worth of order data. Your team will produce:  A .csv data file suitable for direct upload to the data mart, to match the data mart for- mat given in the assignment  A Microsoft word memo to the executive team, outlining what you did and what your recommendations are for moving forward. In the Appendix of the memo you will put the SQL code you wrote. Of course, it is possible to perform ETL using a variety of software packages; even Excel. But, for this project, please do *all* of your programming in My SQL Workbench. A correct answer obtained by using something other than MySQL will not receive credit. Rubric: Element: Possible Points Notes .csv file deliverable 20 Graded according to correctness of .csv file data over all years, product lines, and other summary fields. File should have headers describing the columns. Columns should be sorted per instructions. .csv data supports answers to Executive Memo below. SQL code 40 SQL Code is submitted as a separate attachment and labeled as required. Graded according to SQL rubric. Include comments and easy-to-follow queries. If we cannot generate your .csv file from the input files using your SQL code and following your directions, you will receive a 0 on this part. You are welcome to include screenshots, but please 23e91761272c821db3db0e35eda5098d0c1c4629.docx Page 1

also include SQL code such that we can run it. If your only SQL is a screenshot, you will receive very little credit on this part. Executive Memo ERD, ETL Documentation and Metadata 20 ERD is clearly documented and contains sentences denoting cardinality of relationships. Process explanation is clear and in business English, not “technology-speak.” Diagrams are encouraged. There is no use of SQL in this part, but instead references are made to SQL code by caption number in Appendix where needed. Metadata is clear and comprehensive, and would be sufficient for a new programmer to come up to speed quickly. Question 1 – Granularity 30 Complete answer to question, with examples where needed to support points. Demonstrates understanding of granularity in data marts. Question 2 - Ramon 30 Complete answer to question, with examples where needed. Demonstrates understanding of what Ramon would need to answer the query; can run an example with one or two pieces of final data to illustrate. Question 3 – different format for the data 30 Complete answer to question. Demonstrates understanding of advantages and disadvantages of two different types of two data layouts. Identifies any missing ideas and defends answer. Question 4 – tidy data 30 Complete answer to question. Demonstrates understanding of tidy data and applies concepts to this case study. Critically evaluates two data layouts in context of tidy data. Formatting APA Formatting Up to 20 point deduction if incorrect Memo conforms to desired formatting: APA formatting for everything except for important charts/diagrams. Do not put important charts/diagrams in Appendix. Instead, put them in line with your text, with a Figure caption or a Table reference using Microsoft Word’s References -> Insert Caption and Cross-Reference capability. Appendix should contain technical SQL Code, and any 23e91761272c821db3db0e35eda5098d0c1c4629.docx Page 2

secondary charts/diagrams. Few grammatical or spelling errors. Passes a Turnitin plagiarism check. Correct APA formatting for nearly all of the paper is expected. Points may be deducted (up to 20) if formatting is incorrect. TOTAL 200 Getting Started: This assignment starts with the script, “week6_bu_grad.sql” . This script should create a table called “business_unit” and a table called “Product_BU.” Unfortunately, the metadata descriptions have been lost, so you will need to figure out what you can from the SQL script. The only thing you know about the metadata is that the company runs several individual strategic business units, such as “On The Go” and “Snack.” Each of these business units is run under an umbrella designation, such as “Growth” or “Decline.” The company will run marketing for growth products differently than it would run marketing for products on the decline. You also have product order files from 2017, 2018, and 2019. They are attached as .csv files titled  “2017_product_data_students-final.csv”  “2018_product_data_students-final.csv”  “2019_product_data_students-final.csv” Your job is to use SQL to perform an ETL which will accomplish the following: 1. Extract data from the 2017, 2018, and 2019 order files 2. Transform the data according to the given rules 3. Load it into one final table 4. Export your final output table under the name “GX_output_final.csv” . (You may create as many or as few data objects as you like in your work, but the data in the .csv file named “GX_output_final.csv” will be the data evaluated. You may write one large SQL script to accomplish the entire process. You may also break your SQL commands into smaller batches. Just like commercial ETL scripts, your entire script should run without human intervention - you should not require the user to use MySQL GUI commands partway through. If as a last resort you do end up doing this, your notes should reflect what you did (for example, in the Appendix 23e91761272c821db3db0e35eda5098d0c1c4629.docx Page 3

Your preview ends here