학술논문

Possibilistic Data Cleaning
Document Type
Periodical
Source
IEEE Transactions on Knowledge and Data Engineering IEEE Trans. Knowl. Data Eng. Knowledge and Data Engineering, IEEE Transactions on. 34(12):5939-5950 Dec, 2022
Subject
Computing and Processing
Cleaning
Uncertainty
Maintenance engineering
Possibility theory
Data models
Semantics
Relational databases
Algorithm
constraint
data cleaning
database
fixed-parameter tractable
intractability
possibility theory
vertex cover
Language
ISSN
1041-4347
1558-2191
2326-3865
Abstract
Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples and constraints is ignored. We propose not to view data as dirty but the uncertainty information about data. Since probabilities are often unavailable and their treatment is limited due to correlations in the data, we investigate a qualitative approach to uncertainty. Tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Our approach is non-invasive to the data as we lower the possibility degree of tuples as little as possible. The new resulting qualitative version of vertex cover remains NP-hard. We establish an algorithm that is fixed-parameter tractable in the size of the qualitative vertex cover. Experiments with synthetic and real-world data show that our algorithm outperforms the classical algorithm proportionally to the available number of uncertainty degrees. By mining the certainty degrees with which constraints hold, our framework becomes applicable even when uncertainty information is unavailable.