학술논문

Generative Augmentation-Driven Prediction of Diverse Visual Scanpaths in Images
Document Type
Periodical
Author
Source
IEEE Transactions on Artificial Intelligence IEEE Trans. Artif. Intell. Artificial Intelligence, IEEE Transactions on. 5(2):940-955 Feb, 2024
Subject
Computing and Processing
Visualization
Hidden Markov models
Predictive models
Computational modeling
Training
Task analysis
Deep learning
Diverse visual scanpath prediction
generative data augmentation
long short-term memory (LSTM)-based prediction
Language
ISSN
2691-4581
Abstract
Visual scanpaths of multiple humans on an image represent the process by which they capture the information in it. State-of-the-art models to predict visual scanpaths on images learn directly from recorded human visual scanpaths. However, the generation of multiple visual scanpaths on an image having diversity like human visual scanpaths has not been explicitly considered. In this article, we propose a deep network for predicting multiple diverse visual scanpaths on an image. Image-specific hidden Markov model-based generative data augmentation is performed in the beginning to increase the number of image-visual scanpath training pairs. Considering a similarity between our generative data augmentation process and the use of long short-term memory (LSTM) for prediction, we propose an LSTM-based visual scanpath predictor. A network to predict a single visual scanpath on an image is designed first. The network is then modified to predict multiple diverse visual scanpaths representing different viewer varieties by using a parameter indicating the uniqueness of a viewer. A random vector is also employed for subtle variations within scanpaths of the same viewer variety. Our models are evaluated on three standard datasets using multiple performance measures, which demonstrate the superiority of the proposed approach over the state of the art. Empirical studies are also given indicating the significance of our generative data augmentation method and our multiple scanpath prediction strategy producing diverse visual scanpaths.