| Abstract: |
| Bayesian data assimilation (BDA) combines partial observations with physics-informed dynamical models to estimate latent system states. Filter-based BDA uses past and present data, enabling online forecasting, while smoothing additionally incorporates future observations for hindcasting and reanalysis. However, smoothing typically requires storing the full filter history and inverting filter statistics (e.g., covariance matrix), which is prohibitive in high-dimensional settings. In this work, we introduce a framework that leverages the score-based structure of a backward diffusion flow, whose marginal law at any time matches the smoother posterior distribution. This backward-in-time stochastic differential equation generates samples of the unobserved states consistent with the smoother distribution. For a broad class of nonlinear systems this equation admits a closed-form analytical expression, while for more general systems a Gaussian statistical approximation can be obtained via an ensemble Kalman-Bucy smoother. These formulations enable training a neural network via score matching, alleviating the storage and computational costs of traditional smoother-based BDA. The trained score network can also generalize to unseen states, potentially enabling forward-in-time extrapolation. The developed methodology is demonstrated on complex high-dimensional systems with dense covariance structures arising from multiplicative and cross-correlated noise and strongly nonlinear dynamics, achieving accurate smoothing while reducing memory and computational costs. |
|