Withdraw
Loading…
Glass onion: Compositional text-to-image generation using diffusion models and LLMs
Sarswat, Shrey
Loading…
Permalink
https://hdl.handle.net/2142/125625
Description
- Title
- Glass onion: Compositional text-to-image generation using diffusion models and LLMs
- Author(s)
- Sarswat, Shrey
- Issue Date
- 2024-07-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Lazebnik, Svetlana
- Department of Study
- Siebel Computing &DataScience
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Text-to-Image Generation
- Diffusion Models
- Large Language Models (LLMs)
- Computer Vision
- Language and Vision
- Abstract
- Text-to-image generation has seen substantial advancements in recent years, particularly with the advent of diffusion models, which have transformed how images are generated from text prompts. Despite these advancements, current methods often struggle with complex challenges such as accurately interpreting spatial and numerical relationships, unusual attributes, and logically intricate prompts. This work introduces a pioneering experimental approach specifically designed to tackle these complexities. Inspired by the layered editing techniques found in Adobe Photoshop, we propose an iterative framework that leverages Large Language Models (LLMs) along with state-of-the-art image generation models. This approach aims to create highly accurate and controllable visual representations from detailed textual descriptions. Our methodology involves decomposing text prompts into structured sub-prompts via LLMs, which are then sequentially rendered into images. To refine this process further, we integrate a dynamic feedback mechanism using a suite of foundational models. This system meticulously evaluates each generated image to ensure it aligns closely with the original prompt and maintains high visual fidelity. Our findings demonstrate the effectiveness of this approach, showing notable advancements for specific types of prompts while also revealing areas for improvement in others. This research underscores the considerable potential of our method and sets a foundation for future explorations in enhancing automated visual content generation.
- Graduation Semester
- 2024-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/125625
- Copyright and License Information
- Copyright 2024 Shrey Sarswat
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…