학술논문

Probabilistic Topic Model for Context-Driven Visual Attention Understanding
Document Type
Periodical
Source
IEEE Transactions on Circuits and Systems for Video Technology IEEE Trans. Circuits Syst. Video Technol. Circuits and Systems for Video Technology, IEEE Transactions on. 30(6):1653-1667 Jun, 2020
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Visualization
Task analysis
Adaptation models
Feature extraction
Computational modeling
Probabilistic logic
Context modeling
Top-down visual attention
hierarchical probabilistic framework
context-aware model
latent topic models
Language
ISSN
1051-8215
1558-2205
Abstract
Modern computer vision techniques have to deal with vast amounts of visual data, which implies a computational effort that has often to be accomplished in broad and challenging scenarios. The interest in efficiently solving these image and video applications has led researchers to develop methods to expertly drive the corresponding processing to conspicuous regions that either depend on the context or are based on specific requirements. In this paper, we propose a general hierarchical probabilistic framework, independent of the application scenario, and relied on the most outstanding psychological studies about attention and eye movements which support that guidance is not based directly on the information provided by early visual processes but on a contextual representation that arose from them. The approach defines the task of context-driven visual attention as a mixture of latent sub-tasks, which are, in turn, modeled as a combination of specific distributions associated to low-, mid-, and high-level spatio-temporal features. Learning from fixations gathered from human observers, we incorporate an intermediate level between feature extraction and visual attention estimation that enables to obtain comprehensively guiding representations. The experiments show how our proposal successfully learns particularly adapted hierarchical explanations of visual attention in diverse video genres, outperforming several leading models in the literature.