소장자료
LDR | 04429nam 2200445 4500 | ||
001 | 0100870660▲ | ||
005 | 20250522173957▲ | ||
006 | m o d ▲ | ||
007 | cr#unu||||||||▲ | ||
008 | 250123s2024 us ||||||||||||||c||eng d▲ | ||
020 | ▼a9798382790398▲ | ||
035 | ▼a(MiAaPQ)AAI31328372▲ | ||
040 | ▼aMiAaPQ▼cMiAaPQ▼d221016▲ | ||
082 | 0 | ▼a621.3▲ | |
100 | 1 | ▼aWu, Te-Lin.▲ | |
245 | 1 | 0 | ▼aGrounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications▼h[electronic resource].▲ |
260 | ▼a[S.l.]: ▼bUniversity of California, Los Angeles. ▼c2024▲ | ||
260 | 1 | ▼aAnn Arbor : ▼bProQuest Dissertations & Theses, ▼c2024▲ | |
300 | ▼a1 online resource(150 p.)▲ | ||
500 | ▼aSource: Dissertations Abstracts International, Volume: 85-12, Section: B.▲ | ||
500 | ▼aAdvisor: Peng, Nanyun.▲ | ||
502 | 1 | ▼aThesis (Ph.D.)--University of California, Los Angeles, 2024.▲ | |
520 | ▼aWith the recent advancements in artificial intelligence (AI), researchers are making endeavours towards building an AI that can understand humans, collaborate with humans, and help or guide them to accomplish certain everyday chores. The actualization of such an assistant AI can pose several challenges including planning (on certain events), comprehending human instructions, multimodal understanding, and grounded conversational ability.Imagine a scenario that one wishes to perform a task, such as "making a plate of fried rice", or "purchasing a suitable sofa bed", which can require multiple steps of actions and manipulation of certain objects. How would an assistant AI collaborate with humans to accomplish such desired tasks? One crucial aspect of the system is to understand how and when to take a certain action, which is often learned from interpreting and following a guidance, a piece of resource that encompasses knowledge about accomplishing the task and potentially the events that will occur during task completions. The guidance can come from human verbal interactions (e.g., in the form of a conversation or a question) or static written instructional manuals.In the first part of this thesis, I will decompose the proposed system framework into three foundational components: (1) task-step sequencing/planning, where the AI needs to understand the appropriate sequential procedure of performing each sub-task to accomplish the whole task, especially when the task knowledge is learned from instructional resources online that can be many and do not always come consolidated with proper ordering; (2) action-dependencies understanding, where an agent should be able to infer dependencies of performing an action and the outcomes after executing a particular action, in order to examine the situations and adjust the plan of accomplishing tasks; (3) multimodal grounding and active perception, that we equip the AI with the ability to actively ground the visually perceived surroundings to the textual instructions (or verbal interactions) and perform reasoning over multimodal information along the task completions.In the second part of this thesis, I will introduce two newly curated resources that foresee the next-phase challenges towards building a strong and helpful assistive AI. One such resource focuses on counterfactual reasoning, a type of reasoning capability humans frequently rely on when performing complex decision making processes; while the other presents a comprehensive suite of multimodal capabilities of an assistive AI to function in a virtually created world.Combining the two parts, the foundational components as well as the established novel challenging benchmarks, this thesis aims at providing a comprehensive research road map for the research direction of next-generation (multimodal) AI assistants.▲ | ||
590 | ▼aSchool code: 0031.▲ | ||
650 | 4 | ▼aComputer engineering.▲ | |
650 | 4 | ▼aComputer science.▲ | |
653 | ▼aComputer vision▲ | ||
653 | ▼aFoundation models▲ | ||
653 | ▼aMultimodal grounding▲ | ||
653 | ▼aNatural language processing▲ | ||
690 | ▼a0800▲ | ||
690 | ▼a0984▲ | ||
690 | ▼a0464▲ | ||
710 | 2 | 0 | ▼aUniversity of California, Los Angeles.▼bComputer Science 0201.▲ |
773 | 0 | ▼tDissertations Abstracts International▼g85-12B.▲ | |
790 | ▼a0031▲ | ||
791 | ▼aPh.D.▲ | ||
792 | ▼a2024▲ | ||
793 | ▼aEnglish▲ | ||
856 | 4 | 0 | ▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162254▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.▲ |

Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications[electronic resource]
자료유형
국외eBook
서명/책임사항
Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications [electronic resource].
개인저자
발행사항
[S.l.] : University of California, Los Angeles. 2024 Ann Arbor : ProQuest Dissertations & Theses , 2024
형태사항
1 online resource(150 p.)
일반주기
Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
Advisor: Peng, Nanyun.
Advisor: Peng, Nanyun.
학위논문주기
Thesis (Ph.D.)--University of California, Los Angeles, 2024.
요약주기
With the recent advancements in artificial intelligence (AI), researchers are making endeavours towards building an AI that can understand humans, collaborate with humans, and help or guide them to accomplish certain everyday chores. The actualization of such an assistant AI can pose several challenges including planning (on certain events), comprehending human instructions, multimodal understanding, and grounded conversational ability.Imagine a scenario that one wishes to perform a task, such as "making a plate of fried rice", or "purchasing a suitable sofa bed", which can require multiple steps of actions and manipulation of certain objects. How would an assistant AI collaborate with humans to accomplish such desired tasks? One crucial aspect of the system is to understand how and when to take a certain action, which is often learned from interpreting and following a guidance, a piece of resource that encompasses knowledge about accomplishing the task and potentially the events that will occur during task completions. The guidance can come from human verbal interactions (e.g., in the form of a conversation or a question) or static written instructional manuals.In the first part of this thesis, I will decompose the proposed system framework into three foundational components: (1) task-step sequencing/planning, where the AI needs to understand the appropriate sequential procedure of performing each sub-task to accomplish the whole task, especially when the task knowledge is learned from instructional resources online that can be many and do not always come consolidated with proper ordering; (2) action-dependencies understanding, where an agent should be able to infer dependencies of performing an action and the outcomes after executing a particular action, in order to examine the situations and adjust the plan of accomplishing tasks; (3) multimodal grounding and active perception, that we equip the AI with the ability to actively ground the visually perceived surroundings to the textual instructions (or verbal interactions) and perform reasoning over multimodal information along the task completions.In the second part of this thesis, I will introduce two newly curated resources that foresee the next-phase challenges towards building a strong and helpful assistive AI. One such resource focuses on counterfactual reasoning, a type of reasoning capability humans frequently rely on when performing complex decision making processes; while the other presents a comprehensive suite of multimodal capabilities of an assistive AI to function in a virtually created world.Combining the two parts, the foundational components as well as the established novel challenging benchmarks, this thesis aims at providing a comprehensive research road map for the research direction of next-generation (multimodal) AI assistants.
주제
ISBN
9798382790398
관련 인기대출 도서