09dwconcepts

pdf

School

University of Wollongong *

*We aren’t endorsed by this school

Course

312

Subject

Computer Science

Date

Nov 24, 2024

Type

pdf

Pages

26

Uploaded by DeathageEX

Report
ISIT312 Big Data Management Data Warehouse Concepts Dr Janusz R. Getta School of Computing and Information Technology - University of Wollongong Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 1 of 26 8/7/23, 8:56 pm
Data Warehouse Concepts Outline OLAP versus OLTP The Multidimensional Model OLAP Operations Data Warehouse Architecture TOP ISIT312/ISIT912 Big Data Management, Spring 2023 2/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 2 of 26 8/7/23, 8:56 pm
OLAP versus OLTP Traditional database systems designed and tuned to support the day-to- day operation: OLTP database characteristics: Data analysis requires a new paradigm: online analytical processing ( OLAP ) Ensure fast, concurrent access to data Transaction processing and concurrency control Focus on online update data consistency Known as operational databases or online transaction processing ( OLTP ) - - - - Detailed data Do not include historical data Highly normalized Poor performance on complex queries including joins an aggregation - - - - Typical OLTP query: pending orders for a customer Typical OLAP query: total sales amount by a product and by a customer - - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 3/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 3 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OLAP versus OLTP OLAP characteristics The need for a di ff erent database model to support OLAP was clear: led to data warehouses Data warehouse : (usually) large repositories that consolidate data from di ff erent sources (internal and external to the organization), are updated o ffl ine, follow the multidimensional data model , designed and optimized to e ffi ciently support OLAP queries OLTP paradigm focused on transactions, OLAP focused on analytical queries Normalization not good for analytical queries, reconstructing data requires a high number of joins OLAP databases support a heavy query load OLTP indexing techniques not e ffi cient in OLAP : oriented to access few records; OLAP queries typically include aggregation - - - - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 4/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 4 of 26 8/7/23, 8:56 pm
Data Warehouse Concepts Outline OLAP versus OLTP The Multidimensional Model OLAP Operations Data Warehouse Architecture TOP ISIT312/ISIT912 Big Data Management, Spring 2023 5/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 5 of 26 8/7/23, 8:56 pm
The Multidimensional Model A view of data in n-dimensional space: a data cube A data cube is composed of dimensions and facts Dimensions : Perspectives used to analyze the data Attributes describe dimensions Example: A three-dimensional cube for sales data with dimensions Product , Time , and Customer , and a measure Quantity - Product dimension may have attributes ProductNumber and UnitPrice (not shown in the figure) - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 6/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 6 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The Multidimensional Model The cells or facts of a data cube have associated numeric values called measures Each cell of the data cube represents Quantity of units sold by category , quarter , and customer's city Data granularity : level of detail at which measures are represented for each dimension of the cube Example: sales figures aggregated to granularities Category , Quarter , and City - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 7/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 7 of 26 8/7/23, 8:56 pm
The Multidimensional Model Instances of a dimension are called members A data cube contains several measures, e.g. Amount , indicating the total sales amount (not shown) A data cube may be sparse (typical case) or dense Hierarchies : allow viewing data at several granularities Example: Seafood and Beverages are members of the Product at the granularity Category - Example: not all customers may have ordered products of all categories during all quarters - Define a sequence of mappings relating lower-level, detailed concepts to higher- level ones The lower level is called the child and the higher level is called the parent The hierarchical structure of a dimension is called the dimension schema A dimension instance comprises all members at all levels in a dimension - - - - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 8/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 8 of 26 8/7/23, 8:56 pm
The Multidimensional Model In the previous figure, granularity of each dimension indicated between parentheses: Category for the Product dimension, Quarter for Time , and City for Customer We may want sales figures at a finer granularity ( Month ), or at a coarser granularity ( Country ) Hierarchies of the Product , Time , and Customer dimensions TOP ISIT312/ISIT912 Big Data Management, Spring 2023 9/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 9 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The Multidimensional Model Members of a hierarchy Product - Category TOP ISIT312/ISIT912 Big Data Management, Spring 2023 10/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 10 of 26 8/7/23, 8:56 pm
The Multidimensional Model: Measures Aggregation of measures changes the abstraction level at which data in a cube are visualized Measures can be: Additive : can be meaningfully summarized along all the dimensions, using addition; The most common type of measures Semiadditive : can be meaningfully summarized using addition along some dimensions; Example: inventory quantities, which cannot be added along the Time dimension Nonadditive measures cannot be meaningfully summarized using addition across any dimension; Example: item price, cost per unit, and exchange rate - - - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 11/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 11 of 26 8/7/23, 8:56 pm
The Multidimensional Model: Measures Another classification of measures: Distributive : defined by an aggregation function that can be computed in a distributed way; Functions count , sum , minimum , and maximum are distributive, distinct count is not; Example: S = {3, 3, 4, 5, 8, 4, 7, 3, 8} partitioned in subsets {3, 3, 4} , {5, 8, 4} , {7, 3, 8} gives a result of 8 , while the answer over the original set is 5 Algebraic measures are defined by an aggregation function that can be expressed as a scalar function of distributive ones; example: average , computed by dividing the sum by the count - - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 12/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 12 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Data Warehouse Concepts Outline OLAP versus OLTP The Multidimensional Model OLAP Operations Data Warehouse Architecture TOP ISIT312/ISIT912 Big Data Management, Spring 2023 13/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 13 of 26 8/7/23, 8:56 pm
OLAP Operations TOP ISIT312/ISIT912 Big Data Management, Spring 2023 14/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 14 of 26 8/7/23, 8:56 pm
OLAP Operations Starting cube: quarterly sales (in thousands) by product category and customer cities for 2012 We first compute the sales quantities by country: a roll-up operation to the Country level along the Customer dimension Sales of category Seafood in France significantly higher in the first quarter To explore alternative visualizations, we sort products by name To see the cube with the Time dimension on the x axis, we rotate the axes of the original cube, without changing granularities pivoting (see next 2 slides) To find out if this occurred during a particular month, we take cube back to City aggregation level, and drill-down along Time to the Month level - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 15/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 15 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OLAP Operations To visualize the data only for Paris slice operation, results in a 2-dimensional sub-cube, basically a collection of time series (see next slide) To obtain a 3-dimensional sub-cube containing only sales for the first two quarters and for the cities Lyon and Paris, we go back to the original cube and apply a dice operation TOP ISIT312/ISIT912 Big Data Management, Spring 2023 16/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 16 of 26 8/7/23, 8:56 pm
OLAP Operations TOP ISIT312/ISIT912 Big Data Management, Spring 2023 17/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 17 of 26 8/7/23, 8:56 pm
OLAP Operations The operations in the previous slides can be defined using the following algebraic operators. Roll-up : aggregates measures along a dimension hierarchy (using an aggregate function) to obtain measures at a coarser granularity Extended roll-up: similar to rollup, but drops all dimensions not involved in the operation Recursive roll-up: aggregates over a recursive hierarchy (a level rolls-up to itself) ROLLUP ( CubeName , ( Dimension Level )*, AggFunction ( Measure )*) ROLLUP ( Sales , Customer Country , SUM ( Quantity )) OLAP ROLLUP *( CubeName , [( Dimension Level )*], AggFunction ( Measure )*) ROLLUP *( Sales , Time Quarter , SUM ( Quantity )) ROLLUP *( Sales , Time Quarter , COUNT ( Product ) AS ProdCount ) OLAP RECROLLUP ( CubeName , Dimension Level , AggFunction ( Measure )*) OLAP TOP ISIT312/ISIT912 Big Data Management, Spring 2023 18/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 18 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
OLAP Operations Drill-down moves from a more general level to a more detailed level in a hierarchy Sort returns a cube where the members of a dimension have been sorted according to the value of Expression DRILLDOWN ( CubeName , ( Dimension Level )*) DRILLDOWN ( Sales , Time Month ) OLAP NAME is a predefined keyword in the algebra representing the name of a member SORT ( CubeName , Dimension , Expression [ ASC | DESC ]) OLAP SORT ( Sales , Product , NAME ) - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 19/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 19 of 26 8/7/23, 8:56 pm
OLAP Operations Pivot Slice: where the axes are specified as {X, Y, Z, X , Y , Z , ... }. PIVOT ( CubeName , ( Dimension Axis )*) OLAP - 1 1 1 PIVOT ( Sales , Time X, Customer Y, Product Z) OLAP Dimension will be dropped by fixing a single Value in the Level, other dimensions unchanged Slice supposes that the granularity of the cube is at the specified level of the dimension SLICE ( CubeName , Dimension , Level = Value ) OLAP - SLICE ( Sales , Customer , City = 'Paris' ) OLAP - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 20/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 20 of 26 8/7/23, 8:56 pm
OLAP Operations Dice: where ? is a Boolean condition over dimension levels, attributes, and measures. DICE ( CubeName , ? ) OLAP - DICE ( Sales , ( Customer . City = 'Paris' OR Customer . City = 'Lyon' ) AND ( Time . Quarter = 'Q1' OR Time . Quarter = 'Q2' ) ) OLAP TOP ISIT312/ISIT912 Big Data Management, Spring 2023 21/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 21 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Data Warehouse Concepts Outline OLAP versus OLTP The Multidimensional Model OLAP Operations Data Warehouse Architecture TOP ISIT312/ISIT912 Big Data Management, Spring 2023 22/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 22 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Typical Data Warehouse Architecture TOP ISIT312/ISIT912 Big Data Management, Spring 2023 23/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 23 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Data Warehouse Architecture General data warehouse architecture: several tiers Back-end tier composed of: Data warehouse tier composed of: OLAP tier composed of: The extraction , transformation , and loading ( ETL ) tools: Feed data into the data warehouse from operational databases and internal and external data sources The data staging area : An intermediate database where all the data integration and transformation processes are run prior to the loading of the data into the data warehouse - - An enterprise data warehouse and/or several data marts A metadata repository storing information about the data warehouse and its contents - - An OLAP server which provides a multidimensional view of the data, regardless the actual way in which data are stored - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 24/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 24 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Data Warehouse Architecture Front-end tier is used for data analysis and visualization Contains client tools such as OLAP tools , reporting tools , statistical tools , and data-mining tools - TOP ISIT312/ISIT912 Big Data Management, Spring 2023 25/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 25 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
References A. VAISMAN, E. ZIMANYI, Data Warehouse Systems: Design and Implementation, Chapter 3 Data Warehouse Concepts, Springer Verlag, 2014 TOP ISIT312/ISIT912 Big Data Management, Spring 2023 26/26 Data Warehouse Concepts file:///Users/jrg/312-2023/LECTURES/09dwconcepts/09dwconcepts.html#1 26 of 26 8/7/23, 8:56 pm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help