소장자료

내 서재 담기 내 서재 보기

LDR			04429nam 2200445 4500
001			0100870660▲
005			20250522173957▲
006			m o d ▲
007			cr#unu\|\|\|\|\|\|\|\|▲
008			250123s2024 us \|\|\|\|\|\|\|\|\|\|\|\|\|\|c\|\|eng d▲
020			▼a9798382790398▲
035			▼a(MiAaPQ)AAI31328372▲
040			▼aMiAaPQ▼cMiAaPQ▼d221016▲
082	0		▼a621.3▲
100	1		▼aWu, Te-Lin.▲
245	1	0	▼aGrounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications▼h[electronic resource].▲
260			▼a[S.l.]: ▼bUniversity of California, Los Angeles. ▼c2024▲
260		1	▼aAnn Arbor : ▼bProQuest Dissertations & Theses, ▼c2024▲
300			▼a1 online resource(150 p.)▲
500			▼aSource: Dissertations Abstracts International, Volume: 85-12, Section: B.▲
500			▼aAdvisor: Peng, Nanyun.▲
502	1		▼aThesis (Ph.D.)--University of California, Los Angeles, 2024.▲
520			▼aWith the recent advancements in artificial intelligence (AI), researchers are making endeavours towards building an AI that can understand humans, collaborate with humans, and help or guide them to accomplish certain everyday chores. The actualization of such an assistant AI can pose several challenges including planning (on certain events), comprehending human instructions, multimodal understanding, and grounded conversational ability.Imagine a scenario that one wishes to perform a task, such as "making a plate of fried rice", or "purchasing a suitable sofa bed", which can require multiple steps of actions and manipulation of certain objects. How would an assistant AI collaborate with humans to accomplish such desired tasks? One crucial aspect of the system is to understand how and when to take a certain action, which is often learned from interpreting and following a guidance, a piece of resource that encompasses knowledge about accomplishing the task and potentially the events that will occur during task completions. The guidance can come from human verbal interactions (e.g., in the form of a conversation or a question) or static written instructional manuals.In the first part of this thesis, I will decompose the proposed system framework into three foundational components: (1) task-step sequencing/planning, where the AI needs to understand the appropriate sequential procedure of performing each sub-task to accomplish the whole task, especially when the task knowledge is learned from instructional resources online that can be many and do not always come consolidated with proper ordering; (2) action-dependencies understanding, where an agent should be able to infer dependencies of performing an action and the outcomes after executing a particular action, in order to examine the situations and adjust the plan of accomplishing tasks; (3) multimodal grounding and active perception, that we equip the AI with the ability to actively ground the visually perceived surroundings to the textual instructions (or verbal interactions) and perform reasoning over multimodal information along the task completions.In the second part of this thesis, I will introduce two newly curated resources that foresee the next-phase challenges towards building a strong and helpful assistive AI. One such resource focuses on counterfactual reasoning, a type of reasoning capability humans frequently rely on when performing complex decision making processes; while the other presents a comprehensive suite of multimodal capabilities of an assistive AI to function in a virtually created world.Combining the two parts, the foundational components as well as the established novel challenging benchmarks, this thesis aims at providing a comprehensive research road map for the research direction of next-generation (multimodal) AI assistants.▲
590			▼aSchool code: 0031.▲
650		4	▼aComputer engineering.▲
650		4	▼aComputer science.▲
653			▼aComputer vision▲
653			▼aFoundation models▲
653			▼aMultimodal grounding▲
653			▼aNatural language processing▲
690			▼a0800▲
690			▼a0984▲
690			▼a0464▲
710	2	0	▼aUniversity of California, Los Angeles.▼bComputer Science 0201.▲
773	0		▼tDissertations Abstracts International▼g85-12B.▲
790			▼a0031▲
791			▼aPh.D.▲
792			▼a2024▲
793			▼aEnglish▲
856	4	0	▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162254▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.▲

미리보기 상세보기

Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications[electronic resource]

자료유형

국외eBook

서명/책임사항

Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications [electronic resource].

개인저자

Wu, Te-Lin.

단체저자

University of California, Los Angeles. Computer Science 0201.

발행사항

[S.l.] : University of California, Los Angeles. 2024 Ann Arbor : ProQuest Dissertations & Theses , 2024

형태사항

1 online resource(150 p.)

일반주기

Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
Advisor: Peng, Nanyun.

학위논문주기

Thesis (Ph.D.)--University of California, Los Angeles, 2024.

요약주기

With the recent advancements in artificial intelligence (AI), researchers are making endeavours towards building an AI that can understand humans, collaborate with humans, and help or guide them to accomplish certain everyday chores. The actualization of such an assistant AI can pose several challenges including planning (on certain events), comprehending human instructions, multimodal understanding, and grounded conversational ability.Imagine a scenario that one wishes to perform a task, such as "making a plate of fried rice", or "purchasing a suitable sofa bed", which can require multiple steps of actions and manipulation of certain objects. How would an assistant AI collaborate with humans to accomplish such desired tasks? One crucial aspect of the system is to understand how and when to take a certain action, which is often learned from interpreting and following a guidance, a piece of resource that encompasses knowledge about accomplishing the task and potentially the events that will occur during task completions. The guidance can come from human verbal interactions (e.g., in the form of a conversation or a question) or static written instructional manuals.In the first part of this thesis, I will decompose the proposed system framework into three foundational components: (1) task-step sequencing/planning, where the AI needs to understand the appropriate sequential procedure of performing each sub-task to accomplish the whole task, especially when the task knowledge is learned from instructional resources online that can be many and do not always come consolidated with proper ordering; (2) action-dependencies understanding, where an agent should be able to infer dependencies of performing an action and the outcomes after executing a particular action, in order to examine the situations and adjust the plan of accomplishing tasks; (3) multimodal grounding and active perception, that we equip the AI with the ability to actively ground the visually perceived surroundings to the textual instructions (or verbal interactions) and perform reasoning over multimodal information along the task completions.In the second part of this thesis, I will introduce two newly curated resources that foresee the next-phase challenges towards building a strong and helpful assistive AI. One such resource focuses on counterfactual reasoning, a type of reasoning capability humans frequently rely on when performing complex decision making processes; while the other presents a comprehensive suite of multimodal capabilities of an assistive AI to function in a virtually created world.Combining the two parts, the foundational components as well as the established novel challenging benchmarks, this thesis aims at providing a comprehensive research road map for the research direction of next-generation (multimodal) AI assistants.

주제

Computer engineering.
Computer science.
Computer vision
Foundation models
Multimodal grounding
Natural language processing

ISBN

9798382790398

원문 등 관련정보

링크 정보

http://www.riss.kr/pdu/ddodLink.do?id=T17162254

북토크

자유롭게 책을 읽고
느낀점을 적어주세요

글쓰기

부산대학교 도서관

원문 등 관련정보

링크 정보

관련 서지자원

북토크

청구기호 브라우징

부산대학교 도서관

소장자료

원문 등 관련정보

링크 정보

관련 서지자원

북토크

청구기호 브라우징

내 서재 + 폴더 추가

메일 발송

로그인이 필요합니다.

내 서재