학술논문

Multimodal Attention Branch Networkに基づく把持命令文の生成 / Sentence Generation for Fetching Instruction based on Multimodal Attention Branch Network
Document Type
Journal Article
Source
Proceedings of the Annual Conference of JSAI. 2020, :1
Subject
Domestic service robot
Multimodal language generation
マルチモーダル言語生成
生活支援ロボット
Language
Japanese
Abstract
Domestic service robots (DSRs) are a promising solution to the shortage of home care workers. Nonetheless, one of the main limitations of DSRs is their inability to naturally interact through language. Recently, data-driven approaches have been shown to be effective for tackling this limitation, however, they often require large-scale datasets, which is costly. Based on this background, we aim to perform automatic sentence generation for fetching instructions, e.g., ``Bring me a green tea bottle on the table.'' This is particularly challenging because appropriate expressions depend on the target object, as well as its surroundings. In this paper, we propose a method that generates sentences from visual inputs. Unlike other approaches, the proposed method has multimodal attention branches that utilize subword-level attention and generate sentences based on subword embeddings. In the experiment, we compared the proposed method with a baseline method using four standard metrics in image captioning. Experimental results show that the proposed method outperformed the baseline in terms of these metrics.

Online Access