학술논문

The percentage cube.

Document Type

Article

Author

Zhang, Yiqun, ; Ordonez, Carlos; García-García, Javier; Bellatreche, Ladjel; Carrillo, Humberto

Source

Information Systems; Jan2019, Vol. 79, p20-31, 12p

Subject

SQL
Search algorithms
Exponential functions
Generalization
Algebraic geometry

Language

ISSN

03064379

Abstract

Highlights • It is necessary to adapt SQL query processing to evaluate percentage cubes efficiently. • A percentage cube is significantly more difficult to compute than a standard cube due to a higher exponential complexity. • Percentage cubes should be computed at low cube dimensionality with dimensions having low cardinality. • Selecting top-k percentages across all cuboids is the most difficult analysis, harder than selecting minimum percentages. • Incremental materialized view algorithms are feasible for one percentage query, but not for the percentage cube. Abstract OLAP cubes provide exploratory query capabilities combining joins and aggregations at multiple granularity levels. However, cubes cannot intuitively or directly show the relationship between measures aggregated at different grouping levels. One prominent example is the percentage, which is widely used in most analytical applications. Considering this limitation, we introduce percentage cube as a generalized data cube that takes percentages as its basic measure. More precisely, a percentage cube shows the fractional relationship in every cuboid between each aggregated measure on several dimensions and its rolled-up measure aggregated by fewer dimensions. We propose the syntax and introduce query optimizations to materialize the percentage cube. We justify that percentage cubes are significantly harder to evaluate than standard data cubes because in addition to the exponential number of cuboids, there is an additional exponential number of grouping column pairs (grouping columns at the individual level and the total level) on which percentages are computed. We propose alternative methods to prune the cube to identify interesting percentages including a row count threshold, a percentage threshold, and selecting the top k percentages. We study percentage aggregations within the classification of distributive, algebraic, and holistic functions. Finally, we also consider the problem of incremental computation of percentage cube. Experiments compare our query optimizations with existing SQL functions, evaluate the impact and speed of lattice pruning methods and study the effectiveness of the incremental computation. [ABSTRACT FROM AUTHOR]

Online Access

Full Text (ScienceDirect) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송