Special Session 125: Analysis, Algorithms, and Applications of Neural Networks

Neural network, dynamical system and formal language

Yongqiang Cai
Beijing Normal University
Peoples Rep of China
Co-Author(s):    
Abstract:
Deep learning has made significant progress in data science and natural science. Some studies have linked deep neural networks to dynamical systems, but the network structure is restricted to a residual network. It is known that residual networks can be regarded as a numerical discretization of dynamical systems. In this talk, we consider the traditional network structure and prove that vanilla feedforward networks can also be used for the numerical discretization of dynamical systems, where the width of the network is equal to the input and output dimensions. The proof is based on the properties of the leaky-ReLU function and the numerical technique of the splitting method for solving differential equations. The results could provide a new perspective for understanding the approximation properties of feedforward neural networks. In particular, the minimum width of neural networks for universal approximation can be derived and the relationship between mapping conpositions and regular languages can be constructed.

Structure-conforming Operator Learning for Geometric Inverse Problems

Ruchi Guo
Sichuan University
Peoples Rep of China
Co-Author(s):    Long Chen, Shuhao Can, Huayi Wei
Abstract:
The principle of developing structure-conforming numerical algorithms widely exists in scientific computing. In this work, following this principle, we propose an operator learning method for solving a class of geometric inverse problems. The architecture here is inspired by Direct Sampling Methods and is also closely related to convolutional network and Transformer. The latter one is state-of-art architecture for many scientific computing tasks. To obtain the optimal hyperparameters in this method, we propose a FEM and OpL joint-training framework and a Leaning-Automated FEM package. Numerical examples demonstrate that the proposed architecture outperforms many existing operator learning methods in the literature.

Neural Networks and Operators Based on Convolution and Multigrid Structure

Juncai He
The King Abdullah University of Science and Technology
Saudi Arabia
Co-Author(s):    Jinchao Xu and Xinliang Liu
Abstract:
In this talk, we will present recent results on applying multigrid structures to both neural networks and operators for problems in images and numerical PDEs. First, we will illustrate MgNet as a unified framework for convolutional neural networks and multigrid methods with some preliminary theories and applications. Then, we will discuss some basic background on operator learning, including the problem setup, a uniform framework, and a general universal approximation result. Motivated by the general definition of neural operators and the MgNet structure, we propose MgNO, which utilizes multigrid structures to parameterize these linear operators between neurons, offering a new and concise architecture in operator learning. This approach provides both mathematical rigor and practical expressivity, with many interesting numerical properties and observations.

DualFL-CS: an accelerated, inexact, and parallel coordinate descent method for federated learning

Boou Jiang
The King Abdullah University of Science and Technology
Peoples Rep of China
Co-Author(s):    Boou Jiang, Jongho Park, Jinchao Xu
Abstract:
This work presents a novel approach to Federated Learning (FL), a collaborative learning model that leverages data distributed across numerous clients. We establish a duality connection between the widely studied FL problem and the parallel subspace correction problem, leading to the development of our accelerated FL algorithm, DualFL-CS. By employing a novel randomized coordinate descent method, our algorithm effectively incorporates client sampling and allows for the use of inexact local solvers, thereby reducing computational costs in both smooth and non-smooth cases. For smooth FL problems, DualFL-CS achieves optimal linear convergence rates, while for non-smooth problems, it attains accelerated sub-linear convergence rates. Numerical experiments demonstrate the superior performance of our algorithm compared to existing state-of-the-art FL algorithms.

A deformation-based framework for learning solution mappings of PDEs defined on varying domains

Pengzhan Jin
Peking University
Peoples Rep of China
Co-Author(s):    
Abstract:
In this work, we establish a deformation-based framework for learning solution mappings of PDEs defined on varying domains. The union of functions defined on varying domains can be identified as a metric space according to the deformation, then the solution mapping is regarded as a continuous metric-to-metric mapping, and subsequently can be represented by another continuous metric-to-Banach mapping using two different strategies, referred to as the D2D framework and the D2E framework, respectively. We point out that such a metric-to-Banach mapping can be learned by neural operators, hence the solution mapping is accordingly learned. With this framework, a rigorous convergence analysis is built for the problem of learning solution mappings of PDEs on varying domains. As the theoretical framework holds based on several pivotal assumptions which need to be verified for a given specific problem, we study the star domains as a typical example, and other situations could be similarly verified. There are three important features of this framework: (1) The domains under consideration are not required to be diffeomorphic, therefore a wide range of regions can be covered by one model provided they are homeomorphic. (2) The deformation mapping is unnecessary to be continuous, thus it can be flexibly established via combining a main body identity mapping and a local deformation mapping. This feature makes it possible to achieve inverse design for the large system where only local parts of the shape are tuned. (3) If a linearity-preserving neural operator such as MIONet is adopted, this framework still preserves the linearity of the surrogate solution mapping on its source term for linear PDEs, thus it can be applied to the hybrid iterative method. We finally present several numerical experiments to validate our theoretical results.

Entropy-based convergence rates of greedy algorithms

Yuwen Li
Zhejiang University
Peoples Rep of China
Co-Author(s):    Yuwen Li
Abstract:
In this talk, I will present novel convergence estimates of greedy algorithms including the reduced basis method for parametrized PDEs, the empirical interpolation method for approximating parametric functions, and the orthogonal/Chebyshev greedy algorithms for nonlinear dictionary approximation. The proposed convergence rates are all based on the metric entropy of underlying compact sets. This talk is partially based on joint work Jonathan Siegel.

Expressivity and Approximation Properties of Deep Neural Networks with ReLUk Activation

Tong Mao
King Abdullah University of Science and Technology
Saudi Arabia
Co-Author(s):    J. He, J. Xu
Abstract:
Deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. We provide a comprehensive constructive proof for polynomial representation using deep ReLU$^k$ networks. This allows us to establish an upper bound on both the size and count of network parameters. Consequently, we are able to demonstrate a suboptimal approximation rate for functions from Sobolev spaces as well as for analytic functions. Additionally, we reveal that deep ReLU$^k$ networks can approximate functions from a range of variation spaces, extending beyond those generated solely by the ReLU$^k$ activation function. This finding demonstrates the adaptability of deep ReLU$^k$ networks in approximating functions within various variation spaces.

Adaptive Growing Randomized Neural Networks for Solving Partial Differential Equations

Fei Wang
Xi`an Jiaotong University
Peoples Rep of China
Co-Author(s):    Haoning Dang and Song Jiang
Abstract:
Traditional numerical methods face numerous challenges in handling high-dimensional problems, complex regional segmentation, and error accumulation caused by time iteration. Concurrently, neural network methods based on optimization training suffer from insufficient accuracy, slow training speeds, and uncontrollable errors due to the lack of efficient optimization algorithms. To combine the advantages of these two approaches and overcome their shortcomings, randomized neural network methods have been proposed. This method not only leverages the strong approximation capabilities of neural networks to circumvent the limitations of classical numerical methods but also aims to resolve issues related to accuracy and training efficiency in neural networks. By incorporating a posterior error estimation as feedback, in this talk, we propose Adaptive Growing Randomized Neural Networks for solving PDEs. This approach can adaptively generate network structures, significantly improving the approximation capabilities.

Optimistic Sample Size Estimate for Deep Neural Networks

Yaoyu Zhang
Shanghai Jiao Tong University
Peoples Rep of China
Co-Author(s):    
Abstract:
Estimating the sample size required for a deep neural network (DNN) to accurately fit a target function is a crucial issue in deep learning. In this talk, we introduce a novel sample size estimation method based on the phenomenon of condensation, which we term the optimistic estimate. This method quantitatively characterizes the best possible performance achievable by neural networks through condensation. Our findings suggest that increasing the width and depth of a DNN preserves its sample efficiency. However, increasing the number of unnecessary connections significantly deteriorates sample efficiency. This analysis provides theoretical support for the commonly adopted strategy in practice of expanding network width and depth rather than increasing the number of connections.