Special Session 151: Encounter and Merging of Mesh-based Methods and Meshless Methods in the Era of Machine Learning

Transformer: structure conforming operator learning
Ruchi Guo
Sichuan University
Peoples Rep of China
Co-Author(s):    Shuhao Cao, Long Chen, and Ruchi Guo
Abstract:
The Transformer has emerged as one of the most advanced neural network architectures, with wide applications in large language models (LLMs), AI for Science, and image/video process. Despite its success, its mathematical foundations remain largely open. This research presents our recent progress toward addressing this gap, structured in two parts. First, we introduce a new perspective based on Petrov-Galerkin projection and Fourier analysis to better interpret the attention mechanism. Building on this framework, we propose a modified Transformer architecture that admits a clearer mathematical interpretation and exhibits a frequency-bootstrapping property. Second, drawing inspiration from direct sampling methods (DSMs) for inverse problems, we develop a novel feature generation approach: data features are constructed by solving PDEs and then incorporated into the attention mechanism. By embedding a learnable nonlocal kernel, the DSM is naturally reformulated as such the modified attention mechanism. We demonstrate the proposed method on electrical impedance tomography (EIT), a prototypical severely ill-posed nonlinear inverse problem, which achieves superior accuracy over its predecessors and contemporary operator learners.