학술논문

Statistically Significant Pattern Mining With Ordinal Utility
Document Type
Periodical
Source
IEEE Transactions on Knowledge and Data Engineering IEEE Trans. Knowl. Data Eng. Knowledge and Data Engineering, IEEE Transactions on. 35(9):8770-8783 Sep, 2023
Subject
Computing and Processing
Testing
Data mining
Task analysis
Iterative methods
Medical services
Education
Reliability
High-utility pattern
multiple testing
significant pattern mining
Language
ISSN
1041-4347
1558-2191
2326-3865
Abstract
Statistically significant pattern mining (SSPM), which evaluates each pattern via a hypothesis test, is an essential and challenging data mining task for knowledge discovery. We introduce a preference relation between patterns and aim to discover the most preferred patterns under the constraint of statistical significance, which has never been considered in existing SSPM problems. We propose an iterative multiple testing procedure that can alternately reject a hypothesis and safely ignore the less useful hypotheses than the rejected one. By filtering out patterns with low utility, we can avoid the significance budget consumption of rejecting useless (uninteresting) patterns and focus the significance budget on more useful patterns, leading to more useful discoveries. We show that the proposed method can control the familywise error rate (FWER) under certain assumptions, which can be satisfied by a realistic problem class in SSPM. We also show that the proposed method always discovers equally or more useful patterns than Tarone-Bonferroni and Subfamily-wise Multiple Testing (SMT). Finally, we conducted several experiments with both synthetic and real-world data to evaluate the performance of our method. The proposed method discovered many more useful patterns in the experiments with real-world datasets than the existing method for all five conducted tasks.