Sequential prediction and decision making, joint community detection and phase synchronization, and acceleration methods for convex optimization

Wang, Lingda

Sequential prediction and decision making, joint community detection and phase synchronization, and acceleration methods for convex optimization

Wang, Lingda

Content Files

WANG-DISSERTATION-2024.pdf

Permalink

https://hdl.handle.net/2142/124134

Description

Title

Sequential prediction and decision making, joint community detection and phase synchronization, and acceleration methods for convex optimization

Author(s)

Wang, Lingda

Issue Date

2024-02-01

Director of Research (if dissertation) or Advisor (if thesis)

Zhao, Zhizhen J

Doctoral Committee Chair(s)

Zhao, Zhizhen J

Committee Member(s)

Sriver, Ryan L
Varshney, Lav R
Shomorony, Ilan

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Sequential Prediction
Sequential Decision Making
Community Detection
Phase Synchronization
Convex Optimization

Abstract

In this dissertation, several topics in machine learning are presented, including sequential prediction and decision making (e.g., time series/spatio-temporal sequences prediction and bandit learning), joint community detection and phase synchronization, and acceleration methods for convex optimization. Many applications such as demand/inventory planning, precipitation/climate prediction, online recommendation/advertising, web search, cryo-electron microscopy (cryo-EM) reconstruction, optimal transport, and video colocation have been shown to benefit from topics studied in this dissertation. In Part~\ref{part:I} (Chapters~\ref{chap:convgru}, ~\ref{chap:RobustDF},~\ref{chap:CB}, and~\ref{chap:GCB}), we focus on sequential prediction and decision making problems. Chapters~\ref{chap:convgru} and~\ref{chap:RobustDF} focus on sequential prediction problems, including both time series and spatio-temporal sequences. Chapter~\ref{chap:convgru} focuses on the problem of predicting sea surface temperature (SST) within the El Ni\~no-Southern Oscillation (ENSO) region, which has been extensively studied recently due to its significant influence on global temperature and precipitation patterns. Statistical models such as linear inverse model (LIM), analog forecasting (AF), convolutional neural network (CNN), and recurrent neural network (RNN) have been widely used for ENSO prediction, offering flexibility and relatively low computational expense compared to large dynamic models. However, most of these models have limitations in capturing spatial patterns in SST variability or relying solely on linear dynamics. Chapter~\ref{chap:convgru} presents a modified convolutional gated recurrent unit (ConvGRU) network for the ENSO region spatio-temporal sequence prediction problem, along with the Ni\~no 3.4 index prediction as a down stream task. The proposed ConvGRU network, with an encoder-decoder sequence-to-sequence (Seq2Seq) structure, takes historical SST maps of the Pacific region as inputs and generates future SST maps for the subsequent months within the ENSO region. To evaluate the performance of the ConvGRU network, we train and test it using simulation and reanalysis datasets from multiple climate model ensembles, including a pre-industrial simulation spanning approximately 1300 years from the community climate system model version 4 (CCSM4) and a 30-member historical ensemble during 1921-2100 using the NOAA seamless system for prediction and earth system research (SPEAR) model. We also compare and contrast the prediction skill of the ConvGRU network against SOTA models. The results demonstrate that the ConvGRU network significantly improves the predictability of the Ni\~no 3.4 index compared to existing statistical and deep learning prediction models, including LIM, AF, CNN, and RNN. This improvement is evidenced by extended useful prediction range, higher Pearson correlation (PC), lower root-mean-square error (RMSE), and lower weighted mean absolute percentage error (wMAPE). In Chapter~\ref{chap:RobustDF}, a practical and robust distribution forecast framework that relies on backtest-based bootstrap and adaptive residual selection is proposed. Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities. Accurate distribution forecast is crucial for demand planning when making production capacity or inventory allocation decisions. The proposed approach is robust to the choice of the underlying forecasting model, which accounts for uncertainty around the input covariates and relaxes the independence between residual and covariate assumptions. It reduces the absolute coverage error (ACE) by more than $63\%$ compared to the classic bootstrap approaches and by $2\%-32\%$ compared to a variety of state-of-the-art (SOTA) deep learning approaches on in-house product sales data and M4-hourly competition data. Chapters~\ref{chap:CB} and~\ref{chap:GCB} present two bandit learning problems. In Chapter~\ref{chap:CB}, we consider a popular bandit model, cascading bandit (CB), for web search and online advertisement, where an agent aims to learn the $K$ most attractive items out of a ground set of size $L$ during the interaction with a user. Meanwhile, we take it a step further by considering CB in the piecewise-stationary environment where the user's preference may change over time. Two efficient algorithms, \texttt{GLRT-CascadeUCB} and \texttt{GLRT-CascadeKL-UCB}, are developed and shown to ensure regret upper bounds of $\mO(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, and $T$ is the length of time horizon. In addition, we show that the proposed algorithms are optimal (up to a logarithmic factor) by deriving a minimax lower bound of $\Omega(\sqrt{NLT})$ for the piecewise-stationary CB. The efficiency of the proposed algorithms relative to existing approaches is validated through numerical experiments on both synthetic and real-world datasets. Chapter~\ref{chap:GCB} studies the adversarial graphical contextual bandit problem, a variant of the adversarial multi-armed bandit problem, which leverages two categories of the most common side information: \textit{contexts} and \textit{side observations}. In this setting, an agent repeatedly chooses a set of $L$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes the loss of the chosen action but also observes the losses of its neighboring actions in observation structures which are encoded as a series of feedback graphs. Two algorithms are developed based on \texttt{EXP3}. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, \texttt{EXP3-LGC-U}, achieves the regret of $\mO(\sqrt{(L+\alpha(G)d)T\log{L}})$ where $\alpha(G)$ is the average \emph{independence number} of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $\mO(\sqrt{\alpha(G)dT\log{L}\log(LT)})$ for both directed and undirected feedback graphs. Part~\ref{part:II} (Chapter~\ref{chap:MFJCDPS}) studies the joint community detection and phase synchronization problem on the \textit{stochastic block model with relative phase} where each node is associated with an unknown phase angle. This problem, with a variety of real-world applications, aims to recover the cluster structure and associated phase angles simultaneously. We show this problem exhibits a \textit{multi-frequency} structure by closely examining its maximum likelihood estimation (MLE) formulation, whereas existing methods are not originated from this perspective. To this end, two simple yet efficient algorithms that leverage the MLE formulation and benefit from the information across multiple frequencies are proposed. The former is a spectral method based on the novel multi-frequency column-pivoted QR factorization. The factorization applied to the top eigenvectors of the observation matrix provides key information about the cluster structure and the associated phase angles. The second approach is an iterative multi-frequency generalized power method where each iteration updates the estimation in a matrix-multiplication-then-projection manner. Numerical experiments show that the proposed algorithms significantly improve the ability of exactly recovering the cluster structure and the accuracy of the estimated phase angles, compared to SOTA algorithms. Part~\ref{part:III} of this dissertation (Chapters~\ref{chap:VR} and~\ref{chap:ExtraFW}) focuses on acceleration methods for convex optimization problems. Chapter~\ref{chap:VR} introduces \textit{almost tune-free} stochastic variance reduced gradient (SVRG) algorithm and stochastic recursive gradient (SARAH) algorithm equipped with i) Barzilai-Borwein (BB) step sizes; ii) averaging; and, iii) the inner loop length adjusted to the BB step sizes. In particular, SVRG, SARAH, and their BB variants are first re-examined through an \textit{estimate sequence} lens to enable new averaging methods that tighten their convergence rates theoretically and improve their performance empirically when the step size or the inner loop length is chosen large. Then a simple yet effective means to adjust the number of iterations per inner loop is developed to enhance the merits of the proposed averaging schemes and BB step sizes. Chapter~\ref{chap:ExtraFW} introduces and analyzes a variant of the Frank Wolfe (FW) algorithm termed ExtraFW that has faster rate $\mO(\sfrac{1}{k^2})$ on a class of machine learning problems where $k$ is the iteration index. Compared with other parameter-free FW variants that have faster rates on the same problems, ExtraFW has improved rates and fine-grained analysis thanks to its prediction-correction update. Numerical experiments on binary classification with different sparsity-promoting constraints demonstrate that the empirical performance of ExtraFW is significantly better than FW and even faster than Nesterov’s accelerated gradient on certain datasets. For matrix completion, ExtraFW enjoys smaller optimality gap and lower rank than FW.

Graduation Semester

2024-05

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/124134

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Sequential prediction and decision making, joint community detection and phase synchronization, and acceleration methods for convex optimization

Wang, Lingda

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In