학술논문

Data Imputation for Multivariate Time-series Data
Document Type
Conference
Source
2023 15th International Conference on Knowledge and Systems Engineering (KSE) Knowledge and Systems Engineering (KSE), 2023 15th International Conference on. :1-6 Oct, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Performance evaluation
Wearable Health Monitoring Systems
Machine learning algorithms
Time series analysis
Transportation
Medical services
Machine learning
Multivariate time-series data
Missing values
Imputation model
Wearables
SAITS
Language
ISSN
2694-4804
Abstract
Multivariate time-series data are abundant in many application areas, such as finance, transportation, environment, and healthcare. However, for many reasons, missing data points is a common problem, mainly associated with data collected from wearable devices. Missing values negatively impact the performance of data analysis and machine learning algorithms. Various statistical and machine-learning methods have been developed to overcome this challenge, primarily by imputation, i.e., filling in the missing values in the data. In this study, we compare some widely used classical imputation methods such as mean, median imputation, Last Observed Carried Forward (LOCF), K-Nearest Neighbors imputation (KNNI), and some recently developed techniques for time series imputation such as Bidirectional Recurrent Imputation for Time Series (BRITS), Transformer, and Self-attention-based imputation for time series (SAITS). We evaluate these methods on the Crowd-sourced Fitbit dataset on collected activity data through wearables. The results suggest that even though being a classical imputation method, KNNI can be more efficient than some state-of-the-art methods when the missing rate is low to moderate (less than 30%). Meanwhile, at a higher missing rate (greater than or equal to 30%), SAITS is the one that can give the lowest mean absolute error (MAE) with a reasonable execution time.