학술논문

EgoCom: A Multi-Person Multi-Modal Egocentric Communications Dataset

Document Type

Periodical

Author

Northcutt, C.G.; Zha, S.; Lovegrove, S.; Newcombe, R.

Source

IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 45(6):6783-6793 Jun, 2023

Subject

Computing and Processing
Bioengineering
Task analysis
Artificial intelligence
Visualization
Synchronization
Natural languages
Computer vision
Education
Egocentric
multi-modal data
EgoCom
communication
turn-taking
human-centric
embodied intelligence

Language

ISSN

0162-8828
2160-9292
1939-3539

Abstract

Multi-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but our embodied human intelligence evolved with sensory input from the egocentric, first-person perspective. Towards embodied AI, we introduce the Egocentric Communications (EgoCom) dataset to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning. EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants’ egocentric perspectives. EgoCom includes 38.5 hours of synchronized embodied stereo audio, egocentric video with 240,000 ground-truth, time-stamped word-level transcriptions and speaker labels from 34 diverse speakers. We study baseline performance on two novel applications that benefit from embodied data: (1) predicting turn-taking in conversations and (2) multi-speaker transcription. For (1), we investigate Bayesian baselines to predict turn-taking within 5 percent of human performance. For (2), we use simultaneous egocentric capture to combine Google speech-to-text outputs, improving global transcription by 79 percent relative to a single perspective. Both applications exploit EgoCom's synchronous multi-perspective data to augment performance of embodied AI tasks.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송