학술논문

VLAAD: Vision and Language Assistant for Autonomous Driving

Document Type

Conference

Author

Park, SungYeon; Lee, MinJae; Kang, JiHyuk; Choi, Hahyeon; Park, Yoonah; Cho, Juhwan; Lee, Adam; Kim, DongKyu

Source

2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) WACVW Applications of Computer Vision Workshops (WACVW), 2024 IEEE/CVF Winter Conference on. :980-987 Jan, 2024

Subject

Bioengineering
Computing and Processing
Engineering Profession
Visualization
Refining
Natural languages
Decision making
Oral communication
Data models
Task analysis

Language

ISSN

2690-621X

Abstract

While interpretable decision-making is pivotal in au-tonomous driving, research integrating natural language models remains a relatively untapped. To address this, we introduce a multi-modal instruction tuning dataset that facilitates language models in learning visual instructions across diverse driving scenarios. This dataset encompasses three primary tasks: conversation, detailed description, and complex reasoning. Capitalizing on this dataset, we present a multi-modal LLM driving assistant named VLAAD. After fine-tuned from our instruction-following dataset, VLAAD demonstrates proficient interpretive capabilities across a spectrum of driving situations. We open our work, dataset, and model, to public on github. https://github. com/sungyeonparkk/vision-assistant-for-driving

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송