D - Detailed Dimensional Modeling

docx

School

Syracuse University *

*We aren’t endorsed by this school

Course

722

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

10

Uploaded by HighnessKnowledge7427

Report
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling Assignment D: Detailed Dimensional Modeling Part 1: Overview This assignment will introduce the Detailed Dimensional Modeling process. It picks up where the previous process left off with the High Level Dimensional Modeling workbook. In this assignment we will complete the technical designs for our DDS by completing the specifics of the fact and dimension table implementations. Goals Specifically the goals of this assignment are to: Understand the goals of the detailed level dimensional modeling process and practice its steps. Master completing the detailed modeling workbook as a means to document the technical designs for a Kimball (DDS) data warehouse architecture. Create a formal table design for our Northwind DDS, including tables, keys, data types, and indexes so we can create tables and indexes required for our star schemas (ROLAP). Identify data sources of our dimensional model so that we can architect and implement the ETL process in a future phase. Technical Requirements To complete this lab you will need the following: Access to the course externalworld.cent-su.org SQL Server, and specifically the Northwind Traders database. You should connect to this server before starting the assignment. The detailed dimensional modeling Excel Workbook, available in the same place where you got this assignment. Your completed high-level dimensional modeling workbook from the previous assignment. The bus matrix solution from the previous assignment Microsoft Excel for editing the worksheets. Page 1 of 10
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling Part 2: Walk Through In this part of the assignment, we will work together to create a detailed level design for one of the high level business processes: order fulfillment . In turn from part 3 you will complete the remaining dimension and fact tables. Getting Started Connect to your SQL Server using Azure Data Studio and open the Northwind database. Open the Dimensional-Modeling Excel Workbook , you completed in the previous assignment, or use the solution provided from the previous assignment. The Workbook The Dimensional Modeling Workbook we will use is a spin-off of the Workbook from the Kimball Consulting Website http://www.kimballgroup.com . The original workbook was well suited to an RDBMS data warehouse. Modern data warehouses don’t have the same concerns outlined in the workbook. As such much of the workbook is no longer relevant. The updated workbook Prof. Fudge provided is best suited for the Snowflake data warehouse and DBT ETL tool. Goals of the Detailed Modeling Process The purpose of the detailed modeling process is to: 1. Define the columns in our target fact and dimension tables, 2. Explain the source of each column in the fact and dimension tables so that we have a source-to- target map and can query source tables to yield the desired targets. 3. Document the non-trivialities of the dimensional models such as fact table grains, meanings behind the facts and dimension attributes. This documentation goes a long way towards building an analytical data warehouse that everyone can use and interpret correctly. Building the Order Fulfillment dimensional model. Here’s the completed bus matrix from the previous assignment. Reading the bus matrix, the order fulfillment process consists of a dimensional model with: Page 2 of 10
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling One fact able fact_order_fulfillment The fact table uses three dimensions dim_customer , dim_employee , and dim_date , as indicated by the “X” in those columns. The date dimension is a conformed dimension. Every date uses the same dimensional attributes, so we will not need to include a detailed design of this table. If you are inrerested in seeing what the date dimension looks like check out the conformed folder on the Externalworld Minio S3 object storage, here: http://externalworld.cent-su.org:19001/ We are left to build out dim_customer , dim_employee and fact_order_fulfillment as part of part 2. Completing the detailed design for dim_customer In this next step, we will complete the detailed dimensional design for the Customer dimension. The process you’ll follow to design a dimension or fact table is outlined in 5 Steps: 1. Create a new dimension (or fact) worksheet in the workbook. 2. Complete the table definition part of the worksheet. 3. Query the sources. 4. Complete column information, based on the sources. Step 1: Create a new dimension worksheet Let’s document the details of this dimension. DO THIS: Make a copy of the DimensionTemplate worksheet. Right- click on it and select Move or Copy from the menu. When the dialog appears, click (move to end) and check the Create a copy checkbox before clicking OK . You will now have a DimensionTemplate (2) worksheet. Right-click on it and select rename from the menu, then type in dim_customer . You should now have your first dimension: Step 2: Complete the table definition Our next step is to complete the table definition for the customers dimension table. The name should match the tab name and the table type should say Dimension. All we should need to do here is enter a description, “One row per customer” When you are complete your table definition should look like this: Step 3: Query the Sources Next we need to get an idea as to what sources we want in our customer dimension. For this we need to go back to the source of the dimension, as we discovered when completing the bus matrix. Upon Page 3 of 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling reviewing the source data, we’d like to include all attributes from the source. Step 4: Complete the column information from the sources. We are now ready to complete this dimension! 1. In the first row, enter the customerkey column - the dimension key. Every dimension should have a PK that is not dependent upon the PK in the source table. That way we do not incur any dependency on the source system. Leave the data type blank and select PK, unique and not_null. The source of the dimension key is a surrogate key. 2. Next let’s add the CustomerID . This is the primary key in the source table. Choose the base the data type from the source type. This column should be unique and not allow null, but its not the PK of the dimension. 3. Next let’s add the CompanyNameEtc. Here’s a completed table of the changes you should make for this dimension: Repeat this process for dim_employee Repeat steps 1-4 above to complete the employee dimension. Here, we’d like to include the names of the direct supervisor of the employee in the dimension to create a natural drill down. Page 4 of 10
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling This query will help you visualize the design of this dimension. with stg_employees as ( select * from [Employees] ), stg_supervisors as ( select * from [Employees] ) select e.employeeid, concat (e.lastname , ', ' , e.firstname) as employeenamelastfirst, concat (e.firstname , ' ' , e.lastname) as employeenamefirstlast, e.title as employeetitle, concat (s.lastname , ', ' , s.firstname) as supervisornamelastfirst, concat (s.firstname , ' ' , s.lastname) as supervisornamefirstlast from stg_employees e left join stg_supervisors s on e.reportsto = s.employeeid Here’s are the dimension columns and their sources: Column Name Description DataType PK uniqu e not_nul l Source employeekey dimension key x x x surrogate key employeid primary key of the source system int x x northwind.employees.employeeid employenamelastfirst employee name last, first varchar northwind.employees.lastname "," firstname employenamefirstlast employee name first + last varchar northwind.employees.firstname " " lastname employetitle employee job title varchar northwind.employees.jobtitle superviornamelastfirs t supervisor name last, first varchar northwind.supervisors.lastname "," firstname superviornamefirstlas t supervisor name first + last varchar northwind.supervisors.firstname " " lastname Finally complete fact_order_fulfillment With the dimension table design complete, its time to create the fact table design. Page 5 of 10
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling Upon reviewing the bus matrix, our fact table should have foreign keys to dim_customer , dim_employee and dim_date (for each date in the fudgemart.orders table). We also have several facts listed in the facts section, and most of them we must calculate from other data within the table. When you model at the detailed level you will need to know exactly how to derive each of those facts. Step 1: Create a new fact worksheet Let’s document the details of this fact table. DO THIS: Make a copy of the FactTemplate worksheet. Right-click on it and select Move or Copy from the menu. When the dialog appears, click (move to end) and check the Create a copy checkbox before clicking OK . You will now have a FactTemplate (2) worksheet. Right-click on it and select rename from the menu, then type in fact_order_fulfillment . You should now have your first fact table: Step 2: Complete the Table Definition Our next step is to complete the table definition for the order fulfillment fact table. This is a very important step for the fact tables as we want clear documentation as to what the table grain is and what the facts themselves mean. Use your description from the bus matrix to help you. When you are complete your table definition should look like this: Step 3: Query the Sources Next we need to get an idea as to where the facts are coming from. For this we need to go back to the source of the fact table fudgemart.orders , as we discovered when completing the bus matrix. The following query will help you to visualize the fact table: with stg_orders as ( select * from [Orders] ), stg_order_details as ( select Page 6 of 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling orderid, sum (Quantity) as quantityonorder, sum (Quantity * UnitPrice * ( 1 - Discount)) as totalorderamount from [Order Details] group by orderid ), stg_shippers as ( select * from [Shippers] ) select o. * , od. * , s.companyname as shippercompanyname, od.quantityonorder, od.totalorderamount, DATEDIFF ( day , o.orderdate, o.shippeddate) as daysfromordertoshipped, DATEDIFF ( day , o.orderdate, o.requireddate) as daysfromordertorequired, DATEDIFF ( day , o.shippeddate, o.requireddate) as shippedtorequireddelta, case when DATEDIFF ( day , o.requireddate, o.ShippedDate) <= 0 then 'Y' else 'N' end as shippedontime from stg_orders o join stg_order_details od on o.orderid = od.orderid join stg_shippers s on s.shipperid = o.shipvia NOTE: The queries you wrote to source your fact table are important. Typically, the fact tables have the most amount of data transformations and non-trivialities. It’s a very good idea to save the SQL you used to help you to figure out what to include in the fact table, as ultimately this will help to write the ELT later on. Degenerate dimensions: What do we do with all these non-facts? If you look over the query output there are several non-facts which are part of this business process. Specifically, we have Shipper Company Name, and the Shipping Address (City, Region Postal Code, etc.) Since non-fact attributes associated with the business process of the fact table are called Degenerate Dimensions . The data is useful – for example we might way to map where the shipments are going, or add up freight costs by shipper. We could place them in their own dimension table, making two more dimensions: dim_shipper , and dim_shipment, but since they are the exact same grain of the fact table and also only used by this business process , we gain no advantages by conforming these dimensions. We leave them in the fact table, as degenerate dimensions. Step 4: Complete the column information from the sources. We are now ready to complete this fact table! 1. First, we define the fact table’s key (if there is one). Unlike the dimension key, we want this to be a natural / business key. Why? Using the key from the source system gives us drill-through Page 7 of 10
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling which is the capability of referencing the actual data from the source system. In this case we could drill-through to a PDF of the actual order by its OrderID for example. 2. Next, we should add all the dimension foreign keys. These should be the PK’s from the dimension tables, and not the actual values from the source system. During ETL/ELT we will use the source system data to derive the dimension key as we complete the surrogate key pipeline . 3. Third we add all the facts. When we add the fact we should also be clear to explain the additivity of the fact, especially if its is semi-additive. 4. Finally, we add any degenerate dimensions (if any exist). In this case, we have several: Page 8 of 10
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling Part 3: On Your Own Let’s complete the worksheet for the Sales Orders , Product Inventory Analysis and Sales Coverage business processes. Here’s the specific list of items that must be completed. To help you, I’ve included the SQL query to assist with sourcing the columns for the dimension or fact table. In addition, I’ve progressed the tasks in order from easiest to most difficult. High Level Bus Matrix SQL Query to help you source the dimension or fact table Dim_territory SELECT TerritoryID, TerritoryDescription, RegionDescription FROM Territories t join Region r on t.RegionID = r.RegionID NOTES: - This is a very small dimension. We start easy! Fact_sales_coverage SELECT EmployeeID, TerritoryID, 1 as salescoveragecount FROM EmployeeTerritories NOTES: - This is an example of a factless fact table since there are no facts in the table we must add a count column. The fact in this case is salescoveragecount Dim_supplier SELECT SupplierID, CompanyName as suppliercompanyname ,ContactName as suppliercontactname ,ContactTitle as suppliercontacttitle , Address as supplieraddress ,City as suppliercity, Region as supplierregion ,PostalCode as supplierpostalcode ,Country as suppliercountry ,Phone as supplierphone, Fax as supplierfax ,HomePage as supplierhomepage FROM Suppliers NOTES: - There are a lot of addresses. This is why we prefix these with “supplier” to disambiguate from “customer” or “shipment” Dim_product SELECT ProductID, ProductName, SupplierID ,CategoryName, [Description] as categorydescription FROM Products p join Categories c on p.CategoryID = c.CategoryID Page 9 of 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
IST722 – Data Warehouse Homework Assignment D Michael A. Fudge, Jr. Detailed Dimensional Modeling NOTES: - This is an example of a snowflake schema dim_supplier is the outrigger dimension of dim_product . And therefore your this table should have a supplierkey Fact_sales select o.employeeid, o.customerid ,o.orderdate, od.productid ,o.orderid, od.quantity ,od.quantity * od.unitprice as extendedpriceamount ,od.quantity * od.unitprice * od.discount as discountamount ,od.quantity * od.unitprice * ( 1 - od.discount) as soldamount from [Orders] o join [Order Details] od on o.orderid = od.orderid Fact_inventory_anal ysis NOTES: - This is not an SQL table, but a file in Minio S3. http://externalworld.cent-su.org:19001/browser/northwind/ I suggest viewing in a spreadsheet and filtering on a specific product to understand the data. - The column “Day” is the number of days from the initial inventory date. Use 2022-01-01 as the initial inventory date. - The rest of the columns here are self-explanatory. Turning it in: Please turn in the submission template your completed reflection and with your name, netid, and date at the top. Also turn in your completed Dimensional Modeling Workbook with your name, NetID, and date somewhere ON EACH SHEET. Page 10 of 10