학술논문

A Data Collection Quality Model for Big Data Systems
Document Type
Conference
Source
2023 International Conference on Information Technology (ICIT) Information Technology (ICIT), 2023 International Conference on. :168-172 Aug, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Q-factor
Law
Filtering
Data integrity
Receivers
Medical services
Data models
Big Data
Data Quality
Data Collection
Big Data Quality
Big Data Collection
Language
ISSN
2831-3399
Abstract
Big data applications have gained widespread usage across various fields, including healthcare, business, and education. The effectiveness and accuracy of these applications heavily rely on the availability of a large volume of data. However, the collected and generated data for these applications often suffer from incompleteness, inaccuracy, and lack of structure. Consequently, significant efforts are required to clean and process the vast amount of data collected. In this research, we conduct a comprehensive review of existing data quality models that address data and big data quality in general. Building upon this review, we propose a data collection quality model that incorporates a wide range of quality factors. Our model aims to produce clear and accurate data that can be readily utilized, thereby enhancing the value of data and supporting the performance of big data systems. Additionally, the proposed model contributes to reducing storage space requirements and processing time for data. To validate the effectiveness of the model, a case study is conducted using a predefined dataset. The results indicate that the model significantly streamlines the process of obtaining clean and accurate data. Nonetheless, further investigation is necessary to address additional aspects such as legal and privacy considerations pertaining to the collected data. Overall, this research presents a robust data collection quality model that addresses existing challenges and provides a foundation for improved utilization of big data in various domains.