학술논문

The percentage cube.
Document Type
Article
Source
Information Systems; Jan2019, Vol. 79, p20-31, 12p
Subject
SQL
Search algorithms
Exponential functions
Generalization
Algebraic geometry
Language
ISSN
03064379
Abstract
Highlights • It is necessary to adapt SQL query processing to evaluate percentage cubes efficiently. • A percentage cube is significantly more difficult to compute than a standard cube due to a higher exponential complexity. • Percentage cubes should be computed at low cube dimensionality with dimensions having low cardinality. • Selecting top-k percentages across all cuboids is the most difficult analysis, harder than selecting minimum percentages. • Incremental materialized view algorithms are feasible for one percentage query, but not for the percentage cube. Abstract OLAP cubes provide exploratory query capabilities combining joins and aggregations at multiple granularity levels. However, cubes cannot intuitively or directly show the relationship between measures aggregated at different grouping levels. One prominent example is the percentage, which is widely used in most analytical applications. Considering this limitation, we introduce percentage cube as a generalized data cube that takes percentages as its basic measure. More precisely, a percentage cube shows the fractional relationship in every cuboid between each aggregated measure on several dimensions and its rolled-up measure aggregated by fewer dimensions. We propose the syntax and introduce query optimizations to materialize the percentage cube. We justify that percentage cubes are significantly harder to evaluate than standard data cubes because in addition to the exponential number of cuboids, there is an additional exponential number of grouping column pairs (grouping columns at the individual level and the total level) on which percentages are computed. We propose alternative methods to prune the cube to identify interesting percentages including a row count threshold, a percentage threshold, and selecting the top k percentages. We study percentage aggregations within the classification of distributive, algebraic, and holistic functions. Finally, we also consider the problem of incremental computation of percentage cube. Experiments compare our query optimizations with existing SQL functions, evaluate the impact and speed of lattice pruning methods and study the effectiveness of the incremental computation. [ABSTRACT FROM AUTHOR]