학술논문
CaFe DBSCAN: A Density-based Clustering Algorithm for Causal Feature Learning
Document Type
Conference
Source
2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA) Data Science and Advanced Analytics (DSAA), 2023 IEEE 10th International Conference on. :1-10 Oct, 2023
Subject
Language
Abstract
Causal Feature Learning (CFL) infers macro-level causes (e.g., an aggregation of pixels in a traffic light image) from micro-level data (e.g., pixels of the image) by clustering the predicted probabilities of effect states (e.g., state of the traffic light). The current method for CFL uses a two-step procedure. First, a classifier for the effect states is trained, and afterwards, the predicted effect state probabilities are clustered. With CaFe DBSCAN, we present a novel density-based clustering method that conducts CFL directly by estimating conditional probabilities during clustering. To this end, we introduce the notion of clustering regions with similar conditional probabilities of the effect states given their micro-level data points. Our single-step approach has the following benefits: (1) CaFe DBSCAN introduces a comprehensive approach to Causal Feature Learning. Unlike existing methods, CaFe DBSCAN uses a probabilistic framework and does not require separate classification and clustering steps implemented by different algorithms relying on various assumptions, parameter settings, and optimization goals. (2) We do not need to train and tune a classifier first, hence the algorithm is more runtime-efficient than the current approach. (3) Due to the properties of density-based clustering algorithms, CaFe DBSCAN is robust against noise and outliers, which leads to purer clusters. (4) Our algorithm automatically infers a reasonable number of clusters, i.e., macro-level causes. We demonstrate the benefits of CaFe DBSCAN on synthetic and real-world data.