
Cost-Effective Approximate Aggregation Queries on Geospatial Big Data
Document Type
2023 IEEE Globecom Workshops (GC Wkshps) Globecom Workshops (GC Wkshps), 2023 IEEE. :1313-1318 Dec, 2023
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Data analysis
Urban areas
Big Data
Spatial databases
Geospatial analysis
spatial approximate query processing
geospatial group-by
spatial aggregation
Douglas Peucker
line simplification
Aggregation queries are essential in spatial data analytics, including Top-N, and geo-statistics such as ‘mean’ and ‘count’. Those queries require grouping geospatial objects into pre-defined clusters that are typically administrative polygons representing study areas such as cities. Given a big georeferenced dataset on the order of millions, and a group of polygons representing a city, the aggregation query requires grouping objects by polygons and determining to which polygon each object belongs. This is a computationally expensive geospatial operation because polygons are typically represented by huge amounts of vertices. In this paper, we show the design and realization of a system that we term ApproxGeoAgg for the efficient approximation of costly geospatial aggregate queries that require group-by operations. We have performed extensive testing, and our results show that our system outperforms plain baselines by order-of-magnitude in terms of balancing running times with accuracy. Specifically, for Top-N aggregation queries we obtain tiny loss in accuracy that reaches 0.00038% depending on parameter configurations, with a corresponding gain in running time on par with 2.6%, which escalates to circa 12% as we decrease the number of polygon boundary vertices.