C - High Level Dimensional Modeling (1)

docx

School

Syracuse University *

*We aren’t endorsed by this school

Course

722

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

docx

Pages

9

Uploaded by HighnessKnowledge7427

Report
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling Assignment C: High Level Dimensional Modeling Part 1: Overview This assignment will introduce you to the High Level Dimensional Modeling process. The goal of this process is to turn functional business requirements into dimensional data warehouse (DDS) specifications based on the Kimball technical architecture. Upon completing this lab activity you learn: Additional for profiling data using the SQL Query language as to identify master data and business processes. The process of high-level dimensional modeling, including: o Create a high-level dimensional model diagram (Kimball: fig. 7-3 pp. 304). o Create an attribute and metrics list (Kimball: fig. 7-2 pp. 294). o Keeping track of issues Goals Specifically the goals of this assignment are to: Understand the goals of the high-level dimensional modeling process and practice its steps. Master the act of profiling data and transforming functional requirements into a technical specifications for a Kimball (DDS) data warehouse architecture. Understand the value of the high level modeling worksheet as a technical documentation tool, which can be later used to determine how to properly build tables in our DDS. Technical Requirements To complete this assignment you will need the following: Access to the course externalworld.cent-su.org SQL Server, and specifically the Northwind Traders database. You should connect to this server before starting the assignment. The Dimensional modeling Excel Workbook, available in the same place where you got this assignment. Microsoft Excel for editing the workbook Page 1 of 9
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling Recall: The Northwind Traders Case Study Northwind traders is a fictitious importer and exporter of specialty foods from around the world. It was created by Microsoft as a sample operational database to use with their database products, such as Microsoft Access and SQL Server. In our class, we’ll use this database as a case study for building a data warehouse. Over time you’ll need to get very intimate with the Northwind table design and source data as we complete our build out. The Northwind Data Model Below is a screen shot of the internal model for the Northwind database. Use this diagram as a reference for understanding the structure of the Northwind data and building your dimensional model designs. Page 2 of 9
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling Data Warehouse Functional Business Requirements As part of the company’s Business Intelligence initiative, the data warehousing team, after much collaboration with stakeholders AND data profiling to assess feasibility, has established the following 4 functional business requirements: 1. Sales reporting. Senior management would like to be able to track sales by customer, employee, product and supplier, with the goal of establishing which products are the top sellers which employees place the most orders, and who are the best suppliers. 2. Order Fulfillment and Delivery . There is a need to analyze the order fulfillment process to see if the time between when the order is placed and when it is shipped can be improved 3. Product Inventory Analysis. Management requires a means to track inventory, On Order, and Re-Order levels of products by supplier or category. Inventory levels should be snapshotted daily and recorded into the warehouse for analysis. 4. Sales Coverage Analysis. An Analysis of the employees and the sales territories they cover. Part 2: Walk Through In this part of the assignment, we will work together to create a high-level design for the first functional business requirement: sales reporting. Along the way we’ll profile our dimensional data and get a feel for our facts using SQL Queries against the Northwind database. Getting Started Connect to your SQL Server using Azure Data Studio and open a query window for the Northwind database. Open the High-Level-Dimensional-Modeling Excel Workbook , to the Bus Matrix page. You can find this workbook in the same place you got this assignment. Kimball’s 4 Step Modeling process Kimball’s 4 step modeling process walks us through setting the fact grain and identifying the useable dimensions. The Bus Matrix page is designed to walk you through this process. Kimball’s Step 1: Business Process Our business process is sales orders , so we place that in our worksheet, along with the name of our fact table, fact_sales : Kimball’s Step 2: Declare the grain Our next step is to determine the level of grain for the fact table. What does it mean to be a single row in the sales reporting fact table? Well, if you read through the requirements (I know they’re vague) you Page 3 of 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling can determine that each row represents the sale of a product , or a line item on an order . This is a transaction type fact, and so we update as follows: At this point you’re thinking: “How do we know this is correct?” To really know the answer to this question you’ll need to query the data and understand the processes by which the data arrives in the table. This is where data profiling comes into play. Let’s take a look. DO THIS: Switch to your SQL Server, and from Azure Data Studio , open a new Query window ( Ctrl + N) and type select * from [Order Details] Then press [F5] to execute. You should see results like this Each row in this query output represents the purchase of a product, which according to the requirements is what we need. Furthermore, many products can be part of one order (for example look at OrderID 10248) and therefore through order we can get back to other dimensions like customer and employee. NOTE: In real life you won’t strike gold so easily. You’ll have to look at several tables before you can get a clear picture of your fact table grain. Kimball’s Step 3: Identify the dimensions. Next, we identify the master data associated with the business process. In the Kimball technical architecture, these will become the dimensions in our dimensional data store. How do find the dimensions? The best way to do this when your source data is in a relational database is to look at the table dependencies for the source of your fact data (foreign keys). For example if you review the database diagram on page 2 of the lab you’ll see that the Order Details table connects directly to the Products table via a foreign key in a many to one relationship. Because it appears on multiple orders, Product fits the candidacy of a dimension. Once again we can verify this dimension works for us and “rolls up” a couple of our known facts by writing some SQL. Page 4 of 9
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling DO THIS: Switch to your SQL Server, and from Azure Data Studio , open a new Query window ( Ctrl + N) and type select p . ProductName , count ( od . ProductId ) OrderCount , sum ( od . Quantity ) as TotalQuantity from [Order Details] od join [Products] p on p . ProductID = od . ProductID group by p . ProductName Then press [F5] to execute. You should see results like this What you’re seeing is a list of Products, along with a count of orders for which that product appears, and a total quantity sold for that product. (There’s a lot of details in this SQL statement, so feel free to ask your instructor for an explanation should you need it.) Important Tip: You should always exercise caution when profiling live systems. Executing SQL queries against production data is usually not a wise decision as you may impact performance negatively. It is important to seek the advice of a Database Administrator prior to embarking your data profiling adventure! Outrigger Dimensions and Hierarchies You’ve probably noticed the Products table connects to the Categories and Suppliers tables in a many to one relationship (and thus these two tables are dependent upon the Products table ). This means there’s many products in a single category and several products for a single supplier. Situations like this help you discover hierarchies you can use in your dimension. Here’s the SQL we use to do this: DO THIS: Switch to your SQL Server, and from Azure Data Studio , open a new Query window ( Ctrl + N) and type select c . CategoryName , p . ProductName from Products p join Categories c on p . CategoryID = c . CategoryID order by c . CategoryName , p . ProductName Then press [F5] to execute. You should see results like this Note our use of the order by clause in the SQL statement. This is important as it helps us visually identify the data hierarchy. In this case, if we determine the hierarchy is useful we can consolidate the attributes we need from it into the product dimension. This makes more sense than including a separate dimension for Category. There are cases where some other business process might need Suppliers or Categories and therefore it would make sense to combine them into a single dimension. This is the fundamental idea behind snowflaking . Page 5 of 9
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling Once you’ve identified a useful dimension, it’s time to add it to our Bus Matrix like so. In this example we’ve added the Product dimension. Rinse, Lather and Repeat Next we should go back and evaluate the other dependencies among our data once again by looking at the tables connected to our Orders and Order Details tables via foreign key. Look for other dimensions which could be useful in our model, and when in doubt you can always check its roll-up capability with some SQL, like we did with Product . Important Tip: There should always be a many to one relationship between the business process table and the master data which makes up your dimension. One row in the dimension should appear many times in the business process. For example, one product appears many times on different orders. Fast-forward through some more data profiling and here’s a screenshot of the dimensions I’ve discovered so far: Important Tip: The x at the intersection of dimension and business process indicates there will be a foreign key in our DDS connecting the business process to the dimension table. Our goal is to re-use dimensions like Product, Customer, etc… in across other business processes. This is call conforming dimensions. The Date and Time dimensions One key dimension essential to the DDS architecture is the Date (and sometimes Time) dimensions. To identify date and time dimensions, look for dates stored in the tables which sources the business process. In our case if you run an SQL query on the Orders table you’ll see Order Date and Shipped Date . Each date regardless of the role is played (ordering or shipping) is a data. This is why we call it a role-play dimension . Page 6 of 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling Important Tip: When you build the fact table each date will be a FK back to the date dimension. Kimball’s Step 4: Identity the facts After you establish your business process, and wrap up your dimensions, it’s time to identify the facts. Facts are the quantifiable values that we measure across attributes in the dimension. I know. It’s a mouthful, so here’s a couple of examples. How many of a specific product category were sold? Category is the attribute of the Product dimension and how many is the measurement, and therefore the fact. Which customers have ordered the most? Customer is the dimension and Sold Amount is the measurement (fact). From merely identifying the fact grain of the model you probably already have a few facts in mind (they can be found in the business process table), but now’s the time to really nail down the facts you need in your model. Like everything else in this step a lot will depend on your requirements. One important this to recognize is not all facts appear among your source data. Some of the facts you’ll need are derived facts we do a little math on some of the source data values. We include the facts we want in the Bus Matrix but explain how they are derived in the Attributes and Metrics worksheet. For now, we’ll add the following facts to our Bus Matrix and complete it. Quantity – (of product sold) Extended Price ( Quantity * Unit Price) Discount amount (Extended Price * discount) Sold Amount (Extended Price – Discount amount) What about the Freight? The Freight value is found in the Orders table, which is not at the same fact table grain as the order details table. Therefore we cannot include it as a fact without making some kind of data governance decision to as to which products account for the freight. Do we: - evenly divide the freight by the number of items in the order? - Factor in weight or volume of the product into which items get more of the freight? Page 7 of 9
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling This is not a technical problem, but a data governance problem. This the process is not trivial, the organization must agree how to handle this as to represent the values, if at all. Attributes & Metrics Now that you’ve completed the Bus Matrix for your business process, it’s time to move down a level of detail in the process. In this next step we will circle back through our dimensions and business process (fact tables) and put together a quick list of Attributes and Metrics that we require for our dimensional model this list allows you to get more specific about the needs of your dimensional model. The idea behind the attributes and metrics is to define your facts and outline the important attributes in your dimensions. Part 3: On Your Own After you have finished up the sales reporting, it’s time to move on to the other three business processes: 1. Sales reporting. Senior management would like to be able to track sales by customer, employee, product and supplier, with the goal of establishing which products are the top sellers which employees place the most orders, and who are the best suppliers. 2. Order Fulfillment and Delivery . There is a need to analyze the order fulfillment process to see if the time between when the order is placed and when it is shipped can be improved 3. Product Inventory Analysis. Management requires a means to track inventory, On Order, and Re-Order levels of products by supplier or category. Inventory levels should be snapshotted daily and recorded into the warehouse for analysis. 4. Sales Coverage Analysis. An Analysis of the employees and the sales territories they cover. In this part, you will repeat the process outlined in part 2 of the assignment for the remaining three business processes. When you are finished you should have the following in your Dimensional Modeling Workbook: 1. A completed Bus Matrix with all 4 business processes in it. No dimensions should repeat. To re- use a dimension for another business process, include an X at its intersection. IMPORTANT TIP: Keep in mind you can only model the data you have. If it’s not in your external world source data (in this case, it’s Northwind Traders) then you cannot include it in your data warehouse! Page 8 of 9
IST722 – Data Warehouse Homework Assignment C Michael A. Fudge, Jr. High Level Dimensional Modeling Turning it in: Please turn in the submission template your completed reflection and with your name, netid, and date at the top. Please turn in your completed High Level Dimensional Modeling worksheet and make sure your name, NetID, and date appear somewhere at the top of the Bus Matrix page. Page 9 of 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help