DWH_Notes
docx
keyboard_arrow_up
School
California State University, East Bay *
*We aren’t endorsed by this school
Course
620
Subject
Information Systems
Date
Oct 30, 2023
Type
docx
Pages
5
Uploaded by AmbassadorBeaver2154
Dimensional Modeling
■
Used to design schemas to support multidimensional analysis
■
Consists of a
Fact
table and
Dimension
tables
Fact table
consists of key measurements of a business process of interest and keys to dimensions:
Ex: Sales process -> Sales figures
Dimensions
– context (who, what, where, when, how) of measurement:
Ex: Time, Location, Product
Dimension attribute
– columns in dimension table:
May have hierarchical relationships
Each dimension table
contains data for one dimension:
May require de-normalization of transaction databases
Building Dimensional Tables
Transaction database :
Location(lid
, code, loc_name, stateid), -,State(stateid,
state_name, countryid) -Country(countryid, country_name)
D
imension table:
Locations(dimid, loc_code, city_name, state_name, country_name)
dim
id
Loc_co
de
City_na
me
State_na
me
1
SJ1
San
Jose
CA
TimeI
D
LocID
ProdID
Sales
1
1
1
454
(Fact)
Contains key measurements of business process:
Sales – monthly sales, Level of detail is granularity or “grain”
Dimensional Modeling
Star schema:
Easy to create, Easy to understand, Extensible
Modeling steps:
Identify process, Identify grain, Identify dimensions, Identify facts
Dimension Table exampleS
:
(i)Retail -- store, products, date, (ii) Telecommunications -- call origin, call destination, (iii) Banking -- customers, accounts, branch, account officer (iv) Insurance --
policy type, insured party
Fact Table examples:
(i) Retail -- number of units sold, sales amount, (ii) Telecommunications -- length of call in minutes, average number of calls, (iii) Banking -- average monthly balance (iv)
Insurance -- claims amount
Extending Start Schema:
(i)DW may also store pre-aggregated data: Separate fact table (monthly, weekly..), Fact constellations – multiple fact tables that share dimensions (ii)DB Server
Issues – Index structures, parallel processing, complex sql queries etc. (Complex queries – nested, aggregates, functions etc.
DW Goals
(i)Accessible – data intuitive and obvious to user, easy to use interface
(ii) Consistency – data should be reliable. Common labels. (iii) Change – inevitable but when new questions are asked,
existing data and apps are not disrupted. Gracefully (iv) Timely – operational decisions, time to clean and validate – users know(v) Secure – IP (vi) Trust – data driven business decisions, trust
data
DW/BI Manager Responsibilities
Understand users – what do they want – job responsibility goals etc, identify champions early adopters, find new users; Quality and rel – choose robust and
actionable data selected from vast. Easy to use ui, trusted data, monitor accuracy cont. adapt to change; Sustain – take credit, update, maintain trust, keep customers happy
Dimensional Modeling Process
Select Business Process; Declare Grain; Identify Dimensions; Identify Facts
Business Process -
What is a business process?
Common characteristics: Action verbs, Supported by IT systems, Input/Output/Metrics/KPI
Examples:
Start with examples – then abstract characteristics (verbs, IT systems, IO, KPI). Not
interested in non IT processes since no data. Retail – sales; Healthcare – billing, treatment?; Education – registration? Teaching?; Insurance – write policy, process claims; Transportation – UPS
– shipment, Uber? – travel; Strategy – Hotel – Good customer exp, Financial strength, Happy Employees and Invest in Emp., Competitive, ; Measure: Checkin Time, How many are booking,
how much training do emp get,
Declare the Grain:
Level of detail associated with fact table measurements; What level of granularity should be used?;
Summarizing
data in DW: (i) Summarizing to improve performance (ii)
Summarize to meet business needs.
Grain (i)
Include lowest grain – lowest level of detail (ii) Lowest level of grain allows flexibility; Lowest level of grain ups – package od vs package segments
Grain: Level of detail associated with fact table measurements.
Examples:
One row per scan of individual product on customer sales transaction; One row per individual boarding pass
scanned at an airport gate; Insurance?; Transportation? Ride Share Service?; Education?; ; Transportation/logistics – container; Retail – line item
Identify Dimensions
Context of measurements; Who, what, why, where, when, how; Data types – fact vs dimensions: Data types – facts – numeric, dimens- categorical
Identify Facts
What is the process measuring?; Key performance metrics; Typical facts are numeric additive figures; KPI determines process; Display product stacking.
Retail Business
Where is data collected?; Many stores?; Profit = high sales price, high quantity, low costs; Decisions – pricing and promotions
(i)Start with most
important process
: Across Org-ROI (ii) “What level
of data detail
should be made available in a dimensional model?” Use most detailed atomic information captured by a
business process: Monday vs. Sunday?; Individual sizes of certain brands; How many shoppers used 50 cents off promotion? (iii)Identify
Dimensions
: Product, Store, Date/Time, Payment
Method, Promotion, Cashier. Add more dimensions if additional dimensions naturally take on only one value under each combination of primary dimensions. (iv) Identify Facts: “Facts must be
true to grain”; Quantity; ; Regular unit price; Discount unit price; Net unit price; Extended discount price; Extended Sales dollar amount; Extended cost dollar amount; Extended gross profit.
Additive
– quantity, ext sales, cost, profit;
Account Balance
semi additive;
Profit margin –
non additive; Regular unit price – not additive, summing unit price is not unit price, average is not
weighted
Sell 1 for $1, 4 for $0.5. , 1 + 0.5 = 1.5 is not unit price. 0.75 is not average.
Modeling Time Dimension
Is date dimension needed?; Time vs Date; Present in every DW; Date and Time of Day dimensions (i) Date -> 20yrs -> 7300 approx. (ii) Time of Day -> 20
x 365 x 24
x 60 = 10,512,000; Date dimension needed – expensive join, direct sql query; Business user not well versed in sql date syntax etc.; Holidays, fiscal etc. roll ups expensive as real calcs on
calendar func; Time – sparsity, large size, needed?
What should date dimension capture?:
Day, Month, Year; Weekday, Weekend ;Day of Week; Day of Month; Calendar vs Fiscal. Sql date stamp; Holiday Indicator – Y/N or Holiday, Non Holiday
– use meaningful values for end user
Product Dimension:
Product dimension contains every single SKU (Stock Keeping Unit); Product<Brand<Subcategory<Category<Department; HG Dark Chocolate < HG < Ice Cream < Frozen
Desserts < Frozen Foods; Normalization of Repeated Values?; List price – fact or dimension? (i) Used for filtering or grouping – dimension (ii) Used for calculations
- fact
(iii) Both - both
Store Dimension:
Describes every store; Primary geographic dimension; Geographic Hierarchy; Internal District Hierarchy
Promotion Dimension
Most interesting dimension, often called causal dimension; Typical promotions: price reduction, end aisle display, news paper ad, coupons; Need to capture each
combination of promotion; Modeled as separate dimensions or combined; ; Promotion cost?; No promotion key; Null values in fact or dimension tables
Multiple payment methods –pay one bill with cash and credit – then? Change grain? Separate fact table per transaction; Null in Fact ok, Null in Dim not ok – say NA or Unknown
Schema Extensibility:
Example: Retailer adds frequent shopper program – how would this be supported?
(i)Add new dimension table, surrogate keys (ii) Modeling at most granular level allows extensibility (iii) Extension schemas by adding (a) New dimensions (b)New dimension attributes (c) New
measured facts Eg:
New measured fact – rfid tags to shopping cart.
Fact and dimension table keys
(i)Use surrogate keys as primary keys: a)Integer keys, sequential
(ii)Advantages vs. natural keys: Buffer from operational rules; Integrate multiple source systems; Improve performance;
Handle null or unknown conditions; Dimension attribute change tracking.
Unknown conditions – no promotion;
Dimension attribute change – leapfrom from electronic toys to educational dept. ; Surrogate keys for degenerate
dimensions – pos no are reused
Snowflake schemas
are a type of data warehousing schema that prioritizes data integrity, minimizes data redundancy,
and offers flexibility for complex hierarchies. They are particularly useful in situations where maintaining the accuracy
and consistency of data is of paramount importance. However, they may result in more complex queries and potentially
slower query performance due to the multiple joins involved.
Centipede fact tables
Large fact table – fact table is larges and too
much space consumed as opposed to dimen tables
Indexing large multipart key is hard
Value Chain:
Done with sales, what about others? Where next?;
Services value chain etc. Insurance value chain?
Retail Value Chain:
Inventory Management:
Inventory Models
Goals: (i)
Sufficient inventory for smooth operations and prevent stockouts (ii) Minimize inventory to reduce carrying costs and increase profit
Inventory Models
(i) Periodic Snapshot (ii) Inventory Transactions (iii) Accumulating Snapshots
Inventory Snapshots:
Dimensional Model: Process: Periodic inventory snapshots; Grain: Snapshot; Dimensions: Product, Date, Store or Warehouse; Fact: Quantity
Inventory Snapshot Schema:
Date dim – same as in retail ; Product DIM and store can have additional details; Product – min reorder quantity; Discuss reorder point – if inventory falls to this
level reorder or will stockout; How much to reorder- Economic reorder point – minimize order cost vs holding cost; Store DIM – storage capacity
Semi Additive FACTS
(i)Inventory Quantity on Hand is Semi-Additive (a) Additive across stores and products (b)Not additive across dates
(ii)Most useful ways of combining is Average (a) SQL AVG Function cannot work (b) OLAP AVG function can avg with specific aggregation rules
SQL AVG Func for Average Daily Inventory:
Sql will avg by dividing across all rows (9) than just date(3); Group by will give 3 averages – need one.
Enhanced Inventory Facts:
Quantity on Hand; Quantity Sold; Inventory Dollar Value at Cost; Inventory Dollar Value at Selling Price; Are the
above additive across date, product and store dimensions?
Inventory Dollar Value at cost or selling price are not additive. (see errata)
Inventory Transactions:
(i)Record every transaction that affects inventory (ii) Inventory transactions at warehouse: Receive product;
Inspect;Return to vendor; Place in product bin; Package/Ship ; Receive from customer/returns; Remove from inventory
Inventory Transaction Models:
Detailed data that mirrors transaction tables, Useful for measuring transaction type frequencies, but
cumbersome and impractical for inventory position.; Fact table fact : dollar amount
Inventory Accumulating Snapshot:
Each fact row tracks receipt through disposition of product; Possible when products received in one
shipment can be distinguished from products received later; Track products by Serial no or lot no
Dimensions
: (i) Sales:
Product;
Date;
Store; Promotion; Payment (ii)
Inventory:
Date;
Product;
Store; Warehouse; Supplier
Enterprise data warehouse bus architecture
“Family of dimensional models that share a comprehensive set of common, conformed dimensions”; “Master suite of
standardized dimensions and facts with uniform interpretation across enterprise”; New dimensional models fit
together like pieces of a puzzle; Bus architecture – overall enterprise wide design but can be built incrementally
Enterprise Data Warehouse Bus Matrix
Development process; Important deliverable of DW/BI; Multiple purposes – architecture planning, db design, data
governance coord, org communication
Replace dim with business function; Requirements determination – which groups?-->
Conformed Dimensions
Aka Master , Common or Reference Dimension; Built once in ETL, replicated logically or physically ; Help combine
measures across business processes; Identical Conformed Dimensions; Shrunken Rollup Conformed Dimensions;
Limited Conformity
Drill across open order, inventory, sales – diff fact tables. Common product dimension; Identical – date dim – same
keys, column names, definitions, values. Small change July JULY, or Month, MonthName
don’t confirm; Shrunken –
rollup, attribute subset – product->brand; Shrunken row subset – specific products (referential integrity issue); Limited – GE – aeroplane engine, appliances
Conformed facts
Revenue, profit, costs, quality measures etc.; Use same label only if definitions are same, same units of measurement.
Different bp’s have revenue – can be added only if
same
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Revenue = sold, or revenue = sold - returns
Data Goveranance:
Conformed Dimensions – need enterprise consensus on attribute names and contents; IT can facilitate but cannot be sole driver – SME lead; Data Governance Objective:
Common data definitions, labels and domain values; Master Data Management : Master file that is a common point of reference
Enterprise Data Warehouse Architecture
(i) Value chain processes (a) Sales, Inventory (b)Support process: Procurement (ii) Procurement process: Critical business process for many industries
including Retail, Manufacturing, Services (iii)Overall procurement goal: Source appropriate materials in most economical manner
Procurement:
(
i) Procurement Activities: Negotiate contracts; Issue Purchase requisitions; Track receipts and authorize payments (ii)Procurement analytics requirements: What is purchased?
From whom? At what cost?; Monitoring organizational purchasing behavior; Monitoring vendor performance
Procurement Processes
: Purchase Requisitions; Purchase Orders;Shipping Notification; Warehouse Receipts; Vendor Invoice; Vendor Payments
Single vs Multiple Fact Tables
: (i)Varying Perspectives on Procurement Process: Purchase orders, receiving, accounts
payable; Different systems, dimensions (ii) Single or Multiple Fact table design considerations: Analytic requirements,
Unique business processes; Source systems; Dimensionality
Procurement Process Bus Matrix
(i)
Bus Matrix (ii) Multiple Fact Tables: Allows richer and more descriptive
dimensionality (iii) Accumulating Fact tables : Procurement Snapshot Table; Pipeline/Process View and Monitoring
Procurement Dimensional Model
Procurement transaction table; Shared common conformed dimensions
Slowly Changing Dimensions: As Dimensions change over time, how should they be handled in DW?
Type 0: Retain Original :
Attribute value never changes; Persistent durable keys, date dimension attributes
Type 1: Overwrite:
Overwrite old attribute value with current value; Loss of history, rebuild aggregates, appropriate when changes are insignificant
Type 2: Add New Row:
Predominant technique to preserve history; Effective and expiration dates, Current row indicator; Pre-existing aggregations do not change; Safest response if not clear
about which rule to apply; Difference between Type 1 and Type2
Type 3: Add New Column for Prior Value:
Appropriate when: Business need is to track both new and prior values; ;Significant change affecting many rows;Example: Product line or sales
district reorganization
Type 4: Add Mini-Dimension:
Appropriate when: Dimensions are large; Many frequent changes; Example: Customer
Demographics
Order Management Processes
:
Ordering, Shipping, Invoicing, Receiving Payments, Returns; Order Management Bus Matrix
Role Playing Dimensions:
Single dimension appears multiple times in the same fact table ; Each role is a separate view of
underlying single physical table; Representation in bus matrix
Junk Dimensions:
(i)Transactional data sources may often include numerous flags and indicators; (ii)Options to consider:
Ignore;Add as fact row; Make each indicator a dimension; Add to header dimension (iii)Alternative: Junk
Dimension:Aggregate low cardinality indicators into one dimension
Ignore, fact rows, each indiv dimension,
Tax-code, prepay, discount,
Hospital – admission type – walk, apt, transfer; level of care – ICU, Med Floor, Ped ICU, Surgery
Ignore flags, fact row, sep dim, junk dim
Junk Dim
Junk Dimension Design
Number of rows in junk dimension depends on cardinality of indicators;Example: If a single junk dimension has 10 two value indicators, number of rows would be 2^ 10 = 1024 ; Rows can be
added incrementally to dimension table or can include all combinations of indicators; Best approach depends on cardinality of indicators and frequency of occurrence of different values of
indicator
Header –Line Pattern Problems:
Design accurately represents header – line relationship but has several shortcomings; Header may be close to 20% of fact table when average orderliness per
header at 5.; Dimension tables should be much smaller than fact tables; Header dimension grows with fact table. Dimension tables should not normally grow at same rate as fact
Multiple Currencies:
Order transactions typically expressed in both local and global currency; Conversion rate could be rate at end of day, at time of capture or another as per company policy;
Above technique allows easy roll up of data
across different currencies; Global currency is fully additive, local currency is only additive for the currency specified; Currency dimension needed
even when geography known; Currency conversion fact table allows more complex analysis.
Supporting Facts of Different Granularity:
Example: Shipping Charges are at Order Level, whereas fact table is at OrderLine level; (i) Two possible approaches: Allocation (preferable if
possible); Separate fact table (ii) Warning: Shouldn’t mix fact granularities at different levels
Header/Line Pattern to Avoid (2):
Avoid joins between fact tables; Fact tables are very large and can take very long to query
Audit dimension Data Quality:
Accuracy, completeness
etc. of data;
Data Provenance:
How was data calculated, what formulas were used?;
Business users may ask
Related Documents
Browse Popular Homework Q&A
Q: Use the bond energies provided to estimate ΔH° rxn for the reaction below.
PCl 3(g) + Cl 2(g) →…
Q: This program requires the main function and a custom value-returning function. In the main function,…
Q: What security measures need a company to do to stop hackers from stealing customer account…
Q: If $20,000 is invested at an interest rate of 2% per year, compounded semiannually, find the value…
Q: The given contingency table lists the cross tabulation of the number of business
partnerships by…
Q: Give the chair flip of each structure and indicate which structure in each pair is more
stable.
a.…
Q: You need to buy some chicken for dinner tonight. You found an ad showing that the store across town…
Q: Evaluate the following double integral over the region R by converting it to an iterated integral.
e…
Q: Leaves contain an array of different light-absorbing molecules to harvest the full spectrum of…
Q: at x:
つ:8ー5x, atx=e
Q: (a)
lim 2²/
I→∞
Q: The size P(t) of a population at time t is modeled by the equation dP
dt
(a) In the long term, how…
Q: What was the difference between the Ku Klux Klan of Reconstruction and the KKK of the 1920s?
A. The…
Q: Let f(x)
2
and g(x)
3
Find lim(fog)(x) and
- I
%3D
x – 1°
|
lim (g • f)(x).
Q: As soon as a traffic light turns green, a car speeds up from rest to 51.0 mi/h with constant…
Q: $HOT!
a. If the disease were X-linked recessive, what would be the probability that a male child…
Q: What is the relationship between the following two molecules?
HO -H
H-OH
HO -H
H- -OH
A) enantiomers…
Q: Mai Lai massacre so devastating to Americans?
Q: Write a program that uses one loop to process the integers from 300 down to 200, inclusive. The…
Q: cos 0
=
6100
9'
tan Ꮎ < 0
Find sin 0.
Q: In your paper, discuss IPv6, the Internet, virtual private networks, and voice over internet…
Q: -x²15-y²15
Consider the function F(x,y) = e
and the point P(-2,2).
a. Find the unit vectors that…
Q: Java create a class
Q: What will be the actual length of a 5” pipeline displayed on a letter size paper if a 1=2,000 scale…
Q: Find the arc length of the graph of the function over the indicated interval. (Round your answer to…