학술논문

Cost-Effective Approximate Aggregation Queries on Geospatial Big Data
Document Type
Conference
Source
2023 IEEE Globecom Workshops (GC Wkshps) Globecom Workshops (GC Wkshps), 2023 IEEE. :1313-1318 Dec, 2023
Subject
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Data analysis
Conferences
Aggregates
Urban areas
Big Data
Spatial databases
Geospatial analysis
spatial approximate query processing
geospatial group-by
spatial aggregation
Douglas Peucker
line simplification
Language
Abstract
Aggregation queries are essential in spatial data analytics, including Top-N, and geo-statistics such as ‘mean’ and ‘count’. Those queries require grouping geospatial objects into pre-defined clusters that are typically administrative polygons representing study areas such as cities. Given a big georeferenced dataset on the order of millions, and a group of polygons representing a city, the aggregation query requires grouping objects by polygons and determining to which polygon each object belongs. This is a computationally expensive geospatial operation because polygons are typically represented by huge amounts of vertices. In this paper, we show the design and realization of a system that we term ApproxGeoAgg for the efficient approximation of costly geospatial aggregate queries that require group-by operations. We have performed extensive testing, and our results show that our system outperforms plain baselines by order-of-magnitude in terms of balancing running times with accuracy. Specifically, for Top-N aggregation queries we obtain tiny loss in accuracy that reaches 0.00038% depending on parameter configurations, with a corresponding gain in running time on par with 2.6%, which escalates to circa 12% as we decrease the number of polygon boundary vertices.