학술논문

VLAAD: Vision and Language Assistant for Autonomous Driving
Document Type
Conference
Source
2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) WACVW Applications of Computer Vision Workshops (WACVW), 2024 IEEE/CVF Winter Conference on. :980-987 Jan, 2024
Subject
Bioengineering
Computing and Processing
Engineering Profession
Visualization
Refining
Natural languages
Decision making
Oral communication
Data models
Task analysis
Language
ISSN
2690-621X
Abstract
While interpretable decision-making is pivotal in au-tonomous driving, research integrating natural language models remains a relatively untapped. To address this, we introduce a multi-modal instruction tuning dataset that facilitates language models in learning visual instructions across diverse driving scenarios. This dataset encompasses three primary tasks: conversation, detailed description, and complex reasoning. Capitalizing on this dataset, we present a multi-modal LLM driving assistant named VLAAD. After fine-tuned from our instruction-following dataset, VLAAD demonstrates proficient interpretive capabilities across a spectrum of driving situations. We open our work, dataset, and model, to public on github. https://github. com/sungyeonparkk/vision-assistant-for-driving