학술논문

GeoFlink: An Efficient and Scalable Spatial Data Stream Management System
Document Type
Periodical
Source
IEEE Access Access, IEEE. 10:24909-24935 2022
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Spatial databases
Indexes
Real-time systems
Query processing
Cluster computing
Costs
Throughput
GeoFlink
spatial data
GeoSpatial
stream processing
spatial data management system
spatial index
spatial objects
Language
ISSN
2169-3536
Abstract
This era is witnessing an exponential growth in spatial data due to the increase in GPS-enabled devices. Spatial data can be of extreme use to commercial businesses, governments and NGOs if processed timely. Spatial data is voluminous and is usually generated as a continuous data stream, for instance, vehicles tracking data, mobile location data, etc. To process such a huge data streams, highly scalable systems are needed. Apache Spark Streaming, Apache Flink, and Apache Samza are among the state-of-the-art scalable stream processing platforms; however, they lack spatial objects, indexes, and queries support. Besides them, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop do not support streaming workloads and can only handle static or batch data. To fill this gap, we present GeoFlink which extends Apache Flink to support spatial objects, indexes and continuous queries over spatial data streams. A grid-based index is introduced to support efficient spatial query processing and effective data distribution across distributed cluster nodes. GeoFlink supports spatial range, spatial $k$ NN and spatial join queries on Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon spatial objects. Besides, GeoFlink supports data streams in GeoJSON, WKT, and CSV data formats. A detailed experimental study on real and synthetic spatial data streams proves that GeoFlink achieves significantly higher query throughput than the existing state-of-the-art streaming platforms.