Withdraw
Loading…
Compositional visual generation with energy-based modeling
Liu, Nan
Loading…
Permalink
https://hdl.handle.net/2142/121419
Description
- Title
- Compositional visual generation with energy-based modeling
- Author(s)
- Liu, Nan
- Issue Date
- 2023-06-21
- Director of Research (if dissertation) or Advisor (if thesis)
- Lazebnik, Svetlana
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- energy-based models
- diffusion models
- Abstract
- Our understanding of the visual world around us is highly compositional in nature, since humans can rapidly understand individual concepts in a scene, and even compose them to describe the world states we encounter. However, machines struggle to understand complex composition of challenging concepts, such as confusing attributes of different objects or relations between objects. While a larger body of work has explored inferring and understanding objects in a scene, less work has been done on building a composable system that can enable “infinite use of finite means”, i.e., repeatedly reuse and recombine acquired concepts. This thesis endeavors to construct machine learning systems to have such com- positional capabilities, particularly in the context of generative modeling. First, existing works primarily compose relations by utilizing a holistic encoder that encodes inputs into fixed-size vectors, in the form of text or graphs. We instead propose to represent each relation as an unnormalized density (an energy-based model), enabling us to compose separate relations in a factorized manner. We show that such a factorized decomposition allows the model to both generate and edit scenes that have multiple sets of relations more faithfully. Second, we further extend our previous work to understand the composition of various concepts, including objects, relations and text descriptions. Our alternative structured approach for compositional generation involves interpreting diffusion models as energy-based mod- els, which allow us to explicitly combine data distributions defined by energy functions. This proposed method can generate scenes at test time that are substantially more complex than those seen in training, composing sentence descriptions, object relations, human facial attributes, and even generalizing to new combinations that are rarely seen. Third, we consider the inverse problem – given a collection of different images, can we discover the underlying generative concepts that represent each image? We present an approach to decompose and represent images into a set of different concepts, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images. We illustrate how such discovered concepts accurately represent the underlying content of images and illustrate how they may further be composed with other concepts to construct new artistic and hybrid images. In summary, the proposed methods in this thesis showcase the potential for compositional modeling to enhance machine learning systems’ ability to generate complex and realistic scenes by intelligently combining learned generative concepts.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Nan Liu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…