학술논문

Reduction of Input Features from Machine Learning Datasets for Water Quality Analysis
Document Type
Conference
Source
2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA) Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), 2024 International Conference on. :1-6 Feb, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Support vector machines
Radio frequency
Costs
Water quality
Artificial neural networks
Organizations
Random forests
Dimensionality reduction
machine learning
dataset
data points
water quality prediction
Language
Abstract
Typical water quality testing methods used in water treatment organizations are very complex, time consuming, and expensive. Because these methods require enormous amounts of input features in the datasets. Studies show that machine learning has potential to help analyze water quality. This study employs a method to reduce the number of input features applying machine learning techniques, allowing frequent water tests at a lower cost. First, recursive feature elimination with cross-validation (RFECV), permutation importance (PI), and random forest (RF) techniques are used to identify the most prominent features. Second, artificial neural network (ANN) and support vector machine (SVM) are used to evaluate that the accuracy due to the reduced features is acceptable. A dataset from Kaggle with nine features and 2011 data points is used in this study. Experimental results show that the dataset with five features produces