학술논문

Data Dimension Reduction makes ML Algorithms efficient
Document Type
Conference
Source
2022 International Conference on Emerging Technologies in Electronics, Computing and Communication (ICETECC) Emerging Technologies in Electronics, Computing and Communication (ICETECC), 2022 International Conference on. :1-7 Dec, 2022
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Dimensionality reduction
Support vector machines
Representation learning
Supervised learning
Clustering algorithms
Classification algorithms
Decision trees
Dimension Reduction
Supervised Learning
Unsupervised Learning
Principal Component Analysis
autoencoder
clustering
Language
Abstract
Data dimension reduction (DDR) is all about mapping data from high dimensions to low dimensions, various techniques of DDR are being used for image dimension reduction like Random Projections, Principal Component Analysis (PCA), the Variance approach, LSA-Transform, the Combined and Direct approaches, and the New Random Approach. Auto-encoders (AE) are used to learn end-to-end mapping. In this paper, we demonstrate that pre-processing not only speeds up the algorithms but also improves accuracy in both supervised and unsupervised learning. In pre-processing of DDR, first PCA based DDR is used for supervised learning, then we explore AE based DDR for unsupervised learning. In PCA based DDR, we first compare supervised learning algorithms accuracy and time before and after applying PCA. Similarly, in AE based DDR, we compare unsupervised learning algorithm accuracy and time before and after AE representation learning. Supervised learning algorithms including support-vector machines (SVM), Decision Tree with GINI index, Decision Tree with entropy and Stochastic Gradient Descent classifier (SGDC) and unsupervised learning algorithm including K-means clustering, are used for classification purpose. We used two datasets MNIST and FashionMNIST Our experiment shows that there is massive improvement in accuracy and time reduction after pre-processing in both supervised and unsupervised learning.