This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/120438
Description
Title
Operator learning in the overparameterized regime
Author(s)
Shrimali, Bhavesh
Issue Date
2023-05-01
Director of Research (if dissertation) or Advisor (if thesis)
Banerjee, Arindam
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Overparameterization, Optimization, Deep Operator Networks
Abstract
Neural Operators that directly learn mappings between function spaces have received considerable recent attention. Deep Operator Networks (DeepONets), a popular recent class
of operator networks have shown promising preliminary results in approximating solution
operators of parametric partial differential equations. Despite the universal approximation
guarantees there is yet no optimization convergence guarantee for DeepONets based
on gradient descent (GD). In this thesis, we establish such guarantees and show that overparameterization based on wide layers provably helps. In particular, we present two types
of optimization convergence analysis: first, for smooth activations, we bound the spectral
norm of the Hessian of DeepONets and use the bound to show geometric convergence of GD
based on restricted strong convexity (RSC); and second, for ReLU activations, we show the
neural tangent kernel (NTK) of DeepONets at initialization is positive definite, which can be
used with the standard NTK analysis to imply geometric convergence. Further, we present
empirical results on three canonical operator learning problems: Antiderivative, DiffusionReaction equation, and Burger’s equation, and show that wider DeepONets lead to lower
training loss on all the problems, thereby supporting the theoretical results
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.