Special Session 177: Innovations in Data Assimilation: Theory, Algorithms, and Application

DASSH -- Diffusion-Accelerated Smoothing using Score-based Heuristics

Marios Andreou
University of Wisconsin-Madison
USA
Co-Author(s):    Nan Chen, Daniele Venturi
Abstract:
Bayesian data assimilation (BDA) combines partial observations with physics-informed dynamical models to estimate latent system states. Filter-based BDA uses past and present data, enabling online forecasting, while smoothing additionally incorporates future observations for hindcasting and reanalysis. However, smoothing typically requires storing the full filter history and inverting filter statistics (e.g., covariance matrix), which is prohibitive in high-dimensional settings. In this work, we introduce a framework that leverages the score-based structure of a backward diffusion flow, whose marginal law at any time matches the smoother posterior distribution. This backward-in-time stochastic differential equation generates samples of the unobserved states consistent with the smoother distribution. For a broad class of nonlinear systems this equation admits a closed-form analytical expression, while for more general systems a Gaussian statistical approximation can be obtained via an ensemble Kalman-Bucy smoother. These formulations enable training a neural network via score matching, alleviating the storage and computational costs of traditional smoother-based BDA. The trained score network can also generalize to unseen states, potentially enabling forward-in-time extrapolation. The developed methodology is demonstrated on complex high-dimensional systems with dense covariance structures arising from multiplicative and cross-correlated noise and strongly nonlinear dynamics, achieving accurate smoothing while reducing memory and computational costs.

Learning probabilistic filters for data assimilation

Eviatar Bach
University of Reading
England
Co-Author(s):    Ricardo Baptista, Jochen Broecker, Edoardo Calvello, Bohan Chen, Andrew Stuart
Abstract:
Filtering, the problem of estimating the probability distribution of a system's states given partial and noisy observations, is generally intractable for high-dimensional, nonlinear systems. The ensemble Kalman filter (EnKF) approximates the filtering distribution with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. The EnKF is robust, but the Gaussian ansatz limits accuracy. We address this shortcoming by using machine learning to map the forecast distribution and observation to the filtering distribution. We propose cost functions that are minimized uniquely at the true filtering distribution. By time-averaging over long trajectories in ergodic dynamical systems, the map can be learned and subsequently used for future filtering; this is a form of amortized Bayesian inference. We focus on learning ensemble-based filters within a mean field framework. We demonstrate the approach using a set transformer neural architecture, which is invariant to ensemble permutations. The learned filtering algorithms outperform state-of-the-art methods for filtering chaotic systems. They also perform well in challenging highly non-Gaussian and multimodal problems where the EnKF fails. Once learned at a given ensemble size, the learned map can be applied to other ensemble sizes with minimal fine-tuning.

Error analysis of proper orthogonal decomposition data assimilation schemes for the Navier-Stokes equations with grad-div stabilization

Julia Novo
Universidad Autonoma de Madrid
Spain
Co-Author(s):    Bosco Garc\`{i}a-Archilla, Samuele Rubino
Abstract:
The error analysis of a proper orthogonal decomposition (POD) data assimilation (DA) scheme for the Navier-Stokes equations is carried out. A grad-div stabilization term is added to the formulation of the POD method. Error bounds with constants independent on inverse powers of the viscosity parameter are derived for the POD algorithm. No upper bounds in the nudging parameter of the data assimilation method are required. Numerical experiments show that, for large values of the nudging parameter, the proposed method rapidly converges to the real solution, and greatly improves the overall accuracy of standard POD schemes up to low viscosities over predictive time intervals.

Continuous Data Assimilation with Learned Surrogate Dynamics

Daniel Sanz-Alonso
University of Chicago
USA
Co-Author(s):    
Abstract:
Continuous data assimilation concerns estimating the state of a dynamical system from partial measurements. In many applications, the state dynamics are unknown or too expensive to simulate at the desired resolution, leading to model error. Motivated by this challenge and the increasing use of machine learning surrogates in data assimilation, this talk presents a unified analysis of nudging algorithms that employ learned surrogate models of the dynamics. We first establish general conditions on the dynamics and measurements under which nudging with the true dynamics model achieves accurate tracking, both in the noise-free setting and under noisy measurements. We then show that nudging algorithms that employ surrogate models retain exponential convergence up to an explicit error floor that quantifies the effects of surrogate approximation error and measurement noise. Finally, we analyze surrogate models constructed by learning either the vector field governing the dynamics or the system's solution map over a short time step. Our results quantify the amount of training data required for accurate nudging with these learned surrogate models. Numerical experiments illustrate and support the theory.

Spectral Viscosity with Continuous Data Assimilation: Model Mismatch

Nicholas White
University of Nebraska-Lincoln
USA
Co-Author(s):    Adam Larios
Abstract:
Continuous Data Assimilation aims to reconstruct the solution of a dissipative system given partial observations of large-scale data. Spectral viscosity, introduced by Tadmor (1989) applies dissipation only on small scales in order to resolve numerical simulations. In this work, we discuss the use of the Continuous Data Assimilation algorithm on the 2D Navier-Stokes Equations with only spectral viscosity. We also discuss the consequences of mismatch in dissipation strength between the observed and simulated solutions.

System Identification via Optimization in Data Assimilation

Jared P Whitehead
Brigham Young University
USA
Co-Author(s):    
Abstract:
Classical system identification techniques suffer from the need to have full observations of the system in question. If instead, we couple the observations with data assimilation, and then minimize the observed error over the unknown coefficients/parameters of the proposed model, then we are able to reconstruct unknown systems from partial observations. Standard optimization routines require gradients of the system which can be addressed via adjoint methods or, as we demonstrate, when the unknown parameters lie in the observed space of the proposed model, we can utilize an asymptotic representation of the sensitivities, yielding an efficient method for parameter and hence system identification in both the linear and nonlinear setting.