Towards a foundation model for multi-modal and hyperspectral geospatial data
Si, Haozhe
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/125729
Description
Title
Towards a foundation model for multi-modal and hyperspectral geospatial data
Author(s)
Si, Haozhe
Issue Date
2024-07-19
Director of Research (if dissertation) or Advisor (if thesis)
Zhao, Han
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Computer Vision, Machine Learning
Abstract
Geospatial imagery data, such as that collected by different satellite-based sensing systems at different times, holds immense potential for enabling a wide range of high-impact applications. Such potential comes from the rich and contextualized information provided by geospatial imagery across multiple dimensions, channels, and sensing modalities. To unlock the insights from geospatial data, recent work has adapted existing self-supervised learning (SSL) approaches; however, they fall short of tailored training objects and model architectures, leading to inflexibility and computational inefficiencies especially when facing an increasing number of channels and modalities. In light of existing limitations, we introduce a novel framework consisting of three key components: i) a Multi-Modal Masked Autoencoder (MM-MAE) that fuses features from different modalities; ii) a Masked-Channel Reconstruction objective that exploits interchannel relationships in hyperspectral data; and iii) a Spatial-Spectral Vision Transformer (S2ViT), incorporating novel Low-Rank Spatial-Spectral Attention Blocks, which flexibly assigns attention to different dimensions. Experimental results demonstrate that our proposed method surpasses current state-of-the-art multi-modal geospatial foundation models, achieving superior performance with less computation and fewer parameters. The flexibility and extensibility of our framework make it a promising solution for future geospatial data analysis tasks that involve a wide range of modalities and dimensions. Consequently, our pretrained model can be effectively applied to various downstream tasks, such as land-cover classification, land functionality management, and marine debris detection, eventually supporting informed decision-making for sustainable development and environmental conservation.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.