학술논문

Parallel out-of-core divide-and-conquer techniques with application to classification trees
Document Type
Conference
Source
Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999 Parallel and distributed processing Parallel Processing, 1999. 13th International and 10th Symposium on Parallel and Distributed Processing, 1999. 1999 IPPS/SPDP. Proceedings. :555-562 1999
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Classification tree analysis
Decision trees
Testing
Clouds
Data mining
Read only memory
Delay
Concatenated codes
Degradation
Language
Abstract
Classification is an important problem in the field of data mining. Construction of good classifiers is computationally intensive and offers plenty of scope for parallelization. Divide-and-conquer paradigm can be used to efficiently construct decision tree classifiers. We discuss in detail various techniques for parallel divide-and-conquer and extend these techniques to handle efficiently disk-resident data. Furthermore, a generic technique for parallel out-of-core divide-and-conquer problems is suggested. We present pCLOUDS, the parallel version of the decision tree classifier algorithm CLOUDS, capable of handling large out-of-core data sets. pCLOUDS exhibits excellent speedup, sizeup and scaleup properties which make it a competitive tool for data mining applications. We evaluate the performance of pCLOUDS for a range of synthetic data sets on the IBM-SP2.