Withdraw
Loading…
Towards holistic scene understanding from monocular video
Hu, Yuan-Ting
Loading…
Permalink
https://hdl.handle.net/2142/116107
Description
- Title
- Towards holistic scene understanding from monocular video
- Author(s)
- Hu, Yuan-Ting
- Issue Date
- 2022-07-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Schwing, Alexander Gerhard
- Doctoral Committee Chair(s)
- Schwing, Alexander Gerhard
- Committee Member(s)
- Forsyth, David
- Hoiem, Derek
- Patel, Sanjay
- Huang, Jia-Bin
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Scene understanding
- Video understanding
- Video segmentation
- Object segmentation
- Amodal segmentation
- 3D reconstruction
- Abstract
- Humans have the remarkable ability to vividly envision future scenarios as they are capable of understanding scenes in a holistic manner. They can extrapolate scene information such as object shapes and interactions from the observed scene content and its dynamics. Importantly, they can reason about unseen information, e.g., when objects are partially observed. In contrast, while computer vision and machine learning systems can successfully explain observations, it remains challenging to develop autonomous agents that can infer the unseen and have a holistic understanding of the environment. In this dissertation, we discuss techniques that tackle research problems related to holistic scene understanding from monocular video data. To study holistic scene understanding from monocular video, we first present models for human pose understanding from video. Second, we study the research problem of track moving objects under challenging conditions such as occlusion and appearance change. Third, we then consider a challenging task, amodal understanding of objects in a scene from video, aiming to infer the entirety of objects even if they are only partially observed. To enable data-driven approaches towards video amodal perception, we present a large-scale video dataset where more than 1.8 million objects are annotated with amodal labels. With the proposed dataset, we study and present video algorithms that infer the unseen and understand the scene dynamics as well as 3D shapes with partially occluded data. Last, we present a method to show how geometric cues predicted from 2D can improve 3D understanding of objects in the scene. We then conclude and discuss future directions towards holistic video scene understanding.
- Graduation Semester
- 2022-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Yuan-Ting Hu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…