Some modules of hierarchical video parsing with transformers for activity localization and recognition
Yu, Mengxuan
Loading…
Permalink
https://hdl.handle.net/2142/121984
Description
Title
Some modules of hierarchical video parsing with transformers for activity localization and recognition
Author(s)
Yu, Mengxuan
Issue Date
2023-12-01
Director of Research (if dissertation) or Advisor (if thesis)
Ahuja, Narendra
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Computer Vision
AI
Video Parsing
Abstract
This thesis presents a set of modules of a method for human activity video parsing, with temporal action recognition and localization. The previous works have already achieved very high performances. However, many of them are focusing on short video clips with a single label. The new method described includes a way to parse human activity videos with a sequence of action labels, complex environment, and arbitrary long background clips (the part of the video in which nothing happens). The method applies an encoder combined with LSTM and a self-attentive Transformer to the video frame feature sequence extracted by the I3D model. It uses multiple parsing methods such as CYK parsing and probabilistic inference to decode the result and build the parsing tree efficiently and accurately. The method gives a performance that is a significant improvement in accuracy compared to SoTA methods.
The modules presented in this thesis are:
(1)Video Tree structure and Vocabulary
(2)Video CYK Parsing algorithm
(3)Video Grammar Probability Tree, and
(4)Mean Average Precision testing
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.