학술논문

Shape-Based Measures Improve Scene Categorization
Document Type
Periodical
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 46(4):2041-2053 Apr, 2024
Subject
Computing and Processing
Bioengineering
Visualization
Organizations
Convolutional neural networks
Transforms
Feature extraction
Task analysis
Observers
Contour geometry
Gestalt grouping cues
medial axis transform (MAT)
scene categorization
scene perception
shape based measures
Language
ISSN
0162-8828
2160-9292
1939-3539
Abstract
Converging evidence indicates that deep neural network models that are trained on large datasets are biased toward color and texture information. Humans, on the other hand, can easily recognize objects and scenes from images as well as from bounding contours. Mid-level vision is characterized by the recombination and organization of simple primary features into more complex ones by a set of so-called Gestalt grouping rules. While described qualitatively in the human literature, a computational implementation of these perceptual grouping rules is so far missing. In this article, we contribute a novel set of algorithms for the detection of contour-based cues in complex scenes. We use the medial axis transform (MAT) to locally score contours according to these grouping rules. We demonstrate the benefit of these cues for scene categorization in two ways: (i) Both human observers and CNN models categorize scenes most accurately when perceptual grouping information is emphasized. (ii) Weighting the contours with these measures boosts performance of a CNN model significantly compared to the use of unweighted contours. Our work suggests that, even though these measures are computed directly from contours in the image, current CNN models do not appear to extract or utilize these grouping cues.