학술논문
GeoFlink: An Efficient and Scalable Spatial Data Stream Management System
Document Type
Periodical
Author
Source
IEEE Access Access, IEEE. 10:24909-24935 2022
Subject
Language
ISSN
2169-3536
Abstract
This era is witnessing an exponential growth in spatial data due to the increase in GPS-enabled devices. Spatial data can be of extreme use to commercial businesses, governments and NGOs if processed timely. Spatial data is voluminous and is usually generated as a continuous data stream, for instance, vehicles tracking data, mobile location data, etc. To process such a huge data streams, highly scalable systems are needed. Apache Spark Streaming, Apache Flink, and Apache Samza are among the state-of-the-art scalable stream processing platforms; however, they lack spatial objects, indexes, and queries support. Besides them, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop do not support streaming workloads and can only handle static or batch data. To fill this gap, we present GeoFlink which extends Apache Flink to support spatial objects, indexes and continuous queries over spatial data streams. A grid-based index is introduced to support efficient spatial query processing and effective data distribution across distributed cluster nodes. GeoFlink supports spatial range, spatial $k$ NN and spatial join queries on Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon spatial objects. Besides, GeoFlink supports data streams in GeoJSON, WKT, and CSV data formats. A detailed experimental study on real and synthetic spatial data streams proves that GeoFlink achieves significantly higher query throughput than the existing state-of-the-art streaming platforms.