소장자료
| LDR | 04001nam 2200481 4500 | ||
| 001 | 0100870780▲ | ||
| 005 | 20250523093927▲ | ||
| 006 | m o d ▲ | ||
| 007 | cr#unu||||||||▲ | ||
| 008 | 250123s2024 us ||||||||||||||c||eng d▲ | ||
| 020 | ▼a9798384447214▲ | ||
| 035 | ▼a(MiAaPQ)AAI31295008▲ | ||
| 040 | ▼aMiAaPQ▼cMiAaPQ▼d221016▲ | ||
| 082 | 0 | ▼a004▲ | |
| 100 | 1 | ▼aEpstein, Dave.▲ | |
| 245 | 1 | 0 | ▼aDisentangled Visual Generative Models▼h[electronic resource].▲ |
| 260 | ▼a[S.l.]: ▼bUniversity of California, Berkeley. ▼c2024▲ | ||
| 260 | 1 | ▼aAnn Arbor : ▼bProQuest Dissertations & Theses, ▼c2024▲ | |
| 300 | ▼a1 online resource(116 p.)▲ | ||
| 500 | ▼aSource: Dissertations Abstracts International, Volume: 86-04, Section: B.▲ | ||
| 500 | ▼aAdvisor: Efros, Alexei A.▲ | ||
| 502 | 1 | ▼aThesis (Ph.D.)--University of California, Berkeley, 2024.▲ | |
| 520 | ▼aGenerative modeling promises an elegant solution to learning about high-dimensional data distributions such as images and videos - but how can we expose and utilize the rich structure these models discover? Rather than just drawing new samples, how can an agent actually harness p(x) as a source of knowledge about how our world works? This thesis explores scalable inductive biases that unlock a generative model's disentangled understanding of visual data, enabling much richer interaction and control as a result.First, I propose a representation of scenes as collections of feature "blobs", where a generative adversarial network (GAN) learns - without any labels - to bind each blob to a different object in the images it creates. This allows GANs to more gracefully model compositional scenes, in contrast to typical unconditional models which are constrained to highly-aligned single-object data. The trained model's representation can easily be modified to counterfactually manipulate objects in both generated and real images. Next, I consider methods that do not impose bottlenecks on architectures during training, facilitating their application to more diverse, uncurated data. I show that the internals of diffusion models can be used to meaningfully guide generation of new samples, without any further fine-tuning or supervision. Energy functions derived from a small set of primitive properties of denoiser activations can be combined to impose arbitrarily complex conditions on the iterative diffusion sampling procedure. This allows for control over attributes such as the position, shape, size, and appearance of any concept that can be described in text.I also demonstrate that the distribution learned by a text-to-image model can be distilled to generate compositional 3D scenes. Predominant approaches focus on creating 3D objects in isolation rather than scenes with several entities interacting. I propose an architecture that, when optimized so its outputs are on-manifold for the image generator, creates 3D scenes decomposed into the objects they contain. This provides evidence that scale alone suffices for a model to infer the actual 3D structure latent to a world it observes only through 2D images.Finally, I conclude with a perspective on the interplay between emergence, control, interpretability, and scale, and humbly attempt to relate these themes to the pursuit of intelligence.▲ | ||
| 590 | ▼aSchool code: 0028.▲ | ||
| 650 | 4 | ▼aComputer science.▲ | |
| 650 | 4 | ▼aEngineering.▲ | |
| 650 | 4 | ▼aInformation technology.▲ | |
| 653 | ▼aGenerative adversarial network▲ | ||
| 653 | ▼aBottlenecks▲ | ||
| 653 | ▼aVisual data▲ | ||
| 653 | ▼aText-to-image model▲ | ||
| 653 | ▼a3D scenes▲ | ||
| 690 | ▼a0984▲ | ||
| 690 | ▼a0489▲ | ||
| 690 | ▼a0800▲ | ||
| 690 | ▼a0537▲ | ||
| 710 | 2 | 0 | ▼aUniversity of California, Berkeley.▼bComputer Science.▲ |
| 773 | 0 | ▼tDissertations Abstracts International▼g86-04B.▲ | |
| 790 | ▼a0028▲ | ||
| 791 | ▼aPh.D.▲ | ||
| 792 | ▼a2024▲ | ||
| 793 | ▼aEnglish▲ | ||
| 856 | 4 | 0 | ▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17161672▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.▲ |
Disentangled Visual Generative Models[electronic resource]
자료유형
국외단행본
서명/책임사항
Disentangled Visual Generative Models [electronic resource].
개인저자
발행사항
[S.l.] : University of California, Berkeley. 2024 Ann Arbor : ProQuest Dissertations & Theses , 2024
형태사항
1 online resource(116 p.)
일반주기
Source: Dissertations Abstracts International, Volume: 86-04, Section: B.
Advisor: Efros, Alexei A.
Advisor: Efros, Alexei A.
학위논문주기
Thesis (Ph.D.)--University of California, Berkeley, 2024.
요약주기
Generative modeling promises an elegant solution to learning about high-dimensional data distributions such as images and videos - but how can we expose and utilize the rich structure these models discover? Rather than just drawing new samples, how can an agent actually harness p(x) as a source of knowledge about how our world works? This thesis explores scalable inductive biases that unlock a generative model's disentangled understanding of visual data, enabling much richer interaction and control as a result.First, I propose a representation of scenes as collections of feature "blobs", where a generative adversarial network (GAN) learns - without any labels - to bind each blob to a different object in the images it creates. This allows GANs to more gracefully model compositional scenes, in contrast to typical unconditional models which are constrained to highly-aligned single-object data. The trained model's representation can easily be modified to counterfactually manipulate objects in both generated and real images. Next, I consider methods that do not impose bottlenecks on architectures during training, facilitating their application to more diverse, uncurated data. I show that the internals of diffusion models can be used to meaningfully guide generation of new samples, without any further fine-tuning or supervision. Energy functions derived from a small set of primitive properties of denoiser activations can be combined to impose arbitrarily complex conditions on the iterative diffusion sampling procedure. This allows for control over attributes such as the position, shape, size, and appearance of any concept that can be described in text.I also demonstrate that the distribution learned by a text-to-image model can be distilled to generate compositional 3D scenes. Predominant approaches focus on creating 3D objects in isolation rather than scenes with several entities interacting. I propose an architecture that, when optimized so its outputs are on-manifold for the image generator, creates 3D scenes decomposed into the objects they contain. This provides evidence that scale alone suffices for a model to infer the actual 3D structure latent to a world it observes only through 2D images.Finally, I conclude with a perspective on the interplay between emergence, control, interpretability, and scale, and humbly attempt to relate these themes to the pursuit of intelligence.
주제
ISBN
9798384447214
원문 등 관련정보
소장정보
예도서예약
서서가에없는책 신고
방방문대출신청
보보존서고신청
캠캠퍼스대출
우우선정리신청
배자료배달신청
문문자발송
출청구기호출력
학소장학술지 원문서비스
| 등록번호 | 청구기호 | 소장처 | 도서상태 | 반납예정일 | 서비스 |
|---|
북토크
자유롭게 책을 읽고
느낀점을 적어주세요
글쓰기
느낀점을 적어주세요
청구기호 브라우징
관련 인기대출 도서