Some modules of hierarchical video parsing with transformers for activity localization and recognition

Yu, Mengxuan

Some modules of hierarchical video parsing with transformers for activity localization and recognition

Yu, Mengxuan

Permalink

https://hdl.handle.net/2142/121984

Description

Title

Some modules of hierarchical video parsing with transformers for activity localization and recognition

Author(s)

Yu, Mengxuan

Issue Date

2023-12-01

Director of Research (if dissertation) or Advisor (if thesis)

Ahuja, Narendra

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Computer Vision
AI
Video Parsing

Abstract

This thesis presents a set of modules of a method for human activity video parsing, with temporal action recognition and localization. The previous works have already achieved very high performances. However, many of them are focusing on short video clips with a single label. The new method described includes a way to parse human activity videos with a sequence of action labels, complex environment, and arbitrary long background clips (the part of the video in which nothing happens). The method applies an encoder combined with LSTM and a self-attentive Transformer to the video frame feature sequence extracted by the I3D model. It uses multiple parsing methods such as CYK parsing and probabilistic inference to decode the result and build the parsing tree efficiently and accurately. The method gives a performance that is a significant improvement in accuracy compared to SoTA methods. The modules presented in this thesis are: (1)Video Tree structure and Vocabulary (2)Video CYK Parsing algorithm (3)Video Grammar Probability Tree, and (4)Mean Average Precision testing

Graduation Semester

2023-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/121984

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Some modules of hierarchical video parsing with transformers for activity localization and recognition

Yu, Mengxuan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In