Withdraw
Loading…
Semantic and spatio-temporal understanding for computer vision driven worker safety inspection and risk analysis
Tang, Shuai
Loading…
Permalink
https://hdl.handle.net/2142/110471
Description
- Title
- Semantic and spatio-temporal understanding for computer vision driven worker safety inspection and risk analysis
- Author(s)
- Tang, Shuai
- Issue Date
- 2021-04-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Golparvar-Fard, Mani
- Doctoral Committee Chair(s)
- Golparvar-Fard, Mani
- Committee Member(s)
- El-Rayes, Khaled
- Liu, Liang
- El-Gohary, Nora
- Hoiem, Derek
- Department of Study
- Civil & Environmental Eng
- Discipline
- Civil Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Construction Management
- Computer Vision
- Construction Safety
- Machine Learning
- Semantic Understanding
- Spatio-Temporal Modeling
- Abstract
- Despite decades of efforts, we are still far from eliminating construction safety risks. Recently, computer vision techniques have been applied for construction safety management on real-world residential and commercial projects; they have shown the potential to fundamentally change safety management practices and safety performance measurement. The most significant breakthroughs of this field have been achieved in the areas of safety practice observations, incident and safety performance forecasting, and vision-based construction risk assessment. However, fundamental theoretical and technical challenges have yet to be addressed in order to achieve the full potential of construction site images and videos for construction safety. This dissertation explores methods for automated semantic and spatio-temporal visual understanding of workers and equipment and how to use them to improve automatic safety inspections and risk analysis: (1) a new method is developed to improve the breadth and depth of vision-based safety compliance checking by explicitly classifying worker-tool interactions. A detection model is trained on a newly constructed image dataset for construction sites, achieving 52.9% mean average precision for 10 object categories and 89.4% average precision for detecting workers. Using this detector and new dataset, the proposed human-object interaction recognition model achieved 79.78% precision and 77.64% recall for hard hat checking; 79.11% precision and 75.29% recall for safety vest checking. The new model also verifies hand protection for workers when tools are being used with 66.2% precision and 64.86% recall. The proposed model is superior to methods relying on hand-made rules to recognize interactions or that reason directly on the outputs of object detectors. (2) to support systems that proactively prevent these accidents, this thesis presents a path prediction model for workers and equipment. The model leverages the extracted video frames to predict upcoming worker and equipment motion trajectories on construction sites. Specifically, the model takes 2D tracks of workers and equipment from visual data -based on computer vision methods for detection and tracking- and uses a Long Short-Term Memory (LSTM) encoder-decoder followed by a Mixture Density Network (MDN) to predict their locations. A multi-head prediction module is introduced to predict locations at different future times. The method is validated on an existing dataset TrajNet and a new dataset of 105 high-definition videos recorded over 30 days from a real-world construction site. On the TrajNet dataset, the proposed model significantly outperforms Social LSTM. On the new dataset, the presented model outperforms conventional time-series models and achieves average localization errors of 7.30, 12.71, and 24.22 pixels for 10, 20, and 40 future steps, respectively. (3) A new construction worker safety analysis method is introduced that evaluates worker-level risk from site photos and videos. This method evaluates worker state, which is based on workers' body pose, their protective equipment use, their interactions with tools and materials, the construction activity being performed, and hazards in the workplace. To estimate worker state, a visual-based Object-Activity-Keypoint (OAK) recognition model is proposed that takes 36.6% less time and 40.1% less memory while keeping comparably performances compared to a system running individual models for each sub-task. Worker activity recognition is further improved with a spatio-temporal graph model using recognized per-frame worker activity, detected bounding boxes of tools and materials, and estimated worker poses. Finally, severity levels are predicted by a trained classifier on a dataset of images of construction workers accompanied with ground truth severity level annotations. In the test dataset, the severity level prediction model achieves 85.7% cross-validation accuracy in a bricklaying task and 86.6% cross-validation accuracy for a plastering task.
- Graduation Semester
- 2021-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/110471
- Copyright and License Information
- Copyright 2021 Shuai Tang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…