Special Session 147: From optimal control to large population games: Learning and Applications

A Mean Field Games Perspective on Evolutionary Clustering

Fabio Camilli
Univ. di Chieti Pescara
Italy
Co-Author(s):    Alessio Basti, Adriano Festa
Abstract:
We propose a control-theoretic framework for evolutionary clustering based on Mean Field Games (MFG). Unlike traditional static or heuristic approaches, we recast the problem as a population dynamics game governed by a coupled Hamilton-Jacobi-Bellman and Fokker-Planck system. Driven by variational cost functional rather than predefined statistical shapes, this continuous-time formulation naturally accommodates non-parametric cluster evolution. To analytically validate our general framework, we analyze the specific setting of time-dependent Gaussian mixtures. We prove that the MFG dynamics explicitly recover the trajectories of the classic Expectation-Maximization algorithm, providing a rigorous generalization that guarantees mass conservation. Furthermore, we introduce time-averaged log-likelihood functionals to smooth short-term fluctuations. Numerical experiments confirm the validity of our approach in parametric contexts and pave the way for fully non-parametric clustering applications where classical EM methods are not applicable.

Vector-valued robust stochastic control with applications to finance

Igor Cialenco
Illinois Institute of Technology
USA
Co-Author(s):    Gabriela Kovacova
Abstract:
We study a dynamic stochastic control problem subject to Knightian uncertainty with multi-objective (vector-valued) criteria. Assuming the preferences across expected multi-loss vectors are represented by a given preorder, we address the model uncertainty by adopting a robust or minimax perspective, minimizing expected loss across the worst-case model. Using a set-valued framework, we derive both a weak and a strong version of the dynamic programming principle (DPP) or Bellman equations for two appropriately chosen value functions: the collection of all worst expected losses across all feasible actions, and for its upper image. The weak version of Bellman`s principle is proved under minimal assumptions. To establish a stronger version of DPP, we introduce the rectangularity property with respect to a general preorder. We also show that the weak minimizers obey the time consistency property. Finally, we study the important particular case of component-wise partial order of vectors, and conclude with some illustrative examples motivated by financial problems.

Ranking Quantilized Mean-Field Games with an Application to Early-Stage Venture Investments

Dena Firoozi
University of Toronto
Canada
Co-Author(s):    Rinel Foguen Tchuendom, Michele Breton
Abstract:
Quantilized mean-field game models involve quantiles of the population`s distribution. We study a class of such games with a capacity for ranking games, where the performance of each agent is evaluated based on its terminal state relative to the population`s \(\alpha\)-quantile value, where \(\alpha \in [0,1]\). This evaluation criterion is designed to select the top \((1 - \alpha)\%\) performing agents. We then propose an application to early-stage venture investments, where a venture capital firm supports a group of startups competing over a finite horizon, aiming to identify and fund the top-performing fraction at the end of the period.

A Preconditioned Monotone Method for Price Formation Mean Field Games

Diogo Gomes
KAUST
Saudi Arabia
Co-Author(s):    Yeva Gevorgyan de Mendonca
Abstract:
We develop a monotone-operator numerical method for a time-dependent mean field game model of price formation, coupling Hamilton--Jacobi--Bellman and transport equations with a market-clearing condition. The system is formulated as a monotone inclusion in a weak sense that encodes the forward-backward structure and boundary conditions intrinsically. A cross-assignment structure, inherited from the HJB/Fokker--Planck duality, pairs the transport residual with a spatial Sobolev preconditioner, yielding a grid-independent Lipschitz constant. We prove convergence of a projected extragradient iteration, and validate the method against an explicit benchmark for the quadratic Hamiltonian.

A data-driven approach to time-dependent Hamilton--Jacobi--Bellman PDEs with high-order information

Matias Gomez-Aedo
Imperial College London
England
Co-Author(s):    D. Kalise, R.B. Vinter, P. Bettiol
Abstract:
We propose a data-driven framework for approximating value functions of Hamilton--Jacobi--Bellman (HJB) equations to the time-dependent setting, targeting finite-horizon optimal control problems governed by nonlinear control-affine dynamics. The approach exploits the structural link between the Pontryagin Maximum Principle (PMP) and dynamic programming to generate augmented datasets containing values, gradients, Hessians, and temporal derivatives of the value function. A backward temporal decomposition reduces the problem to a sequence of short-horizon subproblems, on each of which the enriched data is used within a sparse polynomial regression framework based on hyperbolic cross index sets and $\ell^2$ regularization. The inclusion of high-order spatial and temporal information enables accurate approximation of the value function and the associated feedback law across the full time horizon. We assess the methodology on nonlinear control problems of moderate dimension. The numerical results demonstrate that derivative-enriched regression combined with temporal decomposition yields stable and accurate approximations in Sobolev norms.

Learning Mean Field Games via Mean Field Actor Critic Flow

Ruimeng Hu
University of California, Santa Barbara
USA
Co-Author(s):    Mo Zhou, Haosheng Zhou
Abstract:
We introduce the Mean-Field Actor-Critic (MFAC) flow, a continuous-time learning dynamics for solving mean-field games (MFGs), drawing on ideas from reinforcement learning, generative modeling, and optimal transport. The MFAC framework jointly evolves the actor, critic, and distribution through gradient-based updates, with the distribution governed by a novel Optimal Transport Geodesic Picard (OTGP) flow. The OTGP flow drives the distribution toward equilibrium along Wasserstein-2 geodesics. We rigorously analyze the MFAC flow using Lyapunov functionals and establish global exponential convergence under suitable time scales. The analysis highlights the coupled structure of the algorithm and offers practical guidelines for choosing learning rates. Numerical results further support the theory and demonstrate the effectiveness of the proposed approach. This is joint work with Mo Zhou (UCLA) and Haosheng Zhou (UCSB).

Relative Arbitrage in an Extended Mean Field System

Tomoyuki Ichiba
University of California Santa Barbara
USA
Co-Author(s):    Nicole Tianjiao Yang
Abstract:
We consider relative arbitrage opportunities in a market with competitive investors through stochastic differential games in the limit as the number of players tends to infinity. We explore a conditional McKean-Vlasov system to study the market dynamics coupled to the expected trading volume of investors, and characterize optimal arbitrage in terms of the volatilities. Here, the mean-field interaction is considered through a joint distribution of wealth and strategies. In this setting, the optimal relative arbitrage constitutes the strong equilibrium of an extended mean-field game. We provide conditions for the existence and uniqueness of the mean-field equilibrium. We further prove the propagation of chaos result for the finite-player game counterpart, and demonstrate that the Nash equilibrium converges to the mean field equilibrium when the population grows to infinity.

Stability for BSDEs and backward propagation of chaos

Antonis Papapantoleon
TU Delft
Netherlands
Co-Author(s):    Alexandros Saplaouras & Stefanos Theodorakopoulos
Abstract:
Backwards SDEs (BSDEs) are object naturally arising in the pricing and hedging of financial derivatives, and have excited the interest of the mathematical community because of their deep connections with stochastic optimal control, PDEs, and their many applications, e.g. in game theory or economics. In this talk, we will first motivate BSDEs and then consider BSDE driven by general semimartingales in a unifying framework that allows to treat discrete- and continuous-time processes simultaneously. We will discuss general existence and uniqueness results, as well as stability results, i.e. convergence from discrete- to continuous-time BSDEs. Then, we will also consider mean-field and McKean-Vlasov BSDEs, discuss their existence and uniqueness theory, and also provide a novel proof for the backward propagation of chaos, i.e. the convergence of the mean-field BSDEs to the McKean-Vlasov limit.

Inverse Reinforcement Learning for Mean-Field Games

Naci Saldi
Bilkent University
Turkey
Co-Author(s):    Naci Saldi, Berkay Anahtarci, Can Deha Kariksiz
Abstract:
This work studies the Inverse Reinforcement Learning (IRL) problem for infinite-horizon stationary Mean Field Games (MFGs) under the maximum causal entropy principle. The unknown reward function is embedded in a Reproducing Kernel Hilbert Space (RKHS), enabling the inference of rich and nonlinear reward structures directly from expert demonstrations. This approach addresses fundamental limitations of existing IRL methods that rely on linear reward models and finite-horizon settings. A Lagrangian relaxation is introduced to reformulate the IRL objective as an unconstrained log-likelihood maximization, solved via gradient ascent. Theoretical consistency is established by proving the smoothness of the log-likelihood objective through Frechet differentiability of the associated soft Bellman operators. Numerical experiments on a mean-field traffic routing game validate the effectiveness of the method, demonstrating that the learned policies successfully replicate expert behavior.

Initialization-driven neural generation and training for high-dimensional optimal control and first-order mean field games

Francisco Silva
XLIM, Universite de Limoges
France
Co-Author(s):    Mouhcine Assouli, Justina Gianatti, Badr Missaoui
Abstract:
This paper first introduces a method to approximate the value function of high-dimensional optimal control by neural networks. Based on the established relationship between Pontryagin`s maximum principle (PMP) and the value function of the optimal control problem, which is characterized as being the unique solution to an associated Hamilton-Jacobi-Bellman (HJB) equation, we propose an approach that begins by using neural networks to provide a first rough estimate of the value function, which serves as initialization for solving the two point boundary value problem in the PMP and, as a result, generates reliable data. To train the neural network we define a loss function that takes into account this dataset and also penalizes deviations from the HJB equation. In the second part, we address the computation of equilibria in first-order Mean Field Game (MFG) problems by integrating our method with the fictitious play algorithm. These equilibria are characterized by a coupled system of a first-order HJB equation and a continuity equation. To approximate the solution to the continuity equation, we introduce a second neural network that learns the flow map transporting the initial distribution of agents. This network is trained on data generated by solving the underlying ODEs for a batch of initial conditions sampled from the initial distribution of agents. By combining this flow approximation, the previously described method for approximating the value function, and the fictitious play algorithm, we obtain an effective method to tackle high-dimensional deterministic MFGs.

Optimal Matching Strategies in Two-sided Markets: A Mean Field Approach

Ho Man Tai
University of Sydney
Australia
Co-Author(s):    Erhan Bayraktar, Dantong Chu, Bohan Li
Abstract:
This paper develops a mean field game framework for dynamic two-sided matching markets, extending existing matching theory by integrating micro-macro dynamics in two-sided environments. Unlike traditional matching models focusing on static equilibrium or unilateral optimization, our framework simultaneously captures dynamic interactions and strategic behaviors of both market sides, as well as the equilibrium. We model two types of agents who meet each other via Poisson processes and make simultaneous matching decisions to maximize their respective objective functionals, and find the corresponding equilibrium. Our approach formulates the equilibrium as a fully coupled Hamilton-Jacobi-Bellman and Fokker-Planck system with nonlocal structure coupling two distinct populations. The mathematical analysis addresses significant challenges from the dual-layered coupling structure and nonlocal structure. We also provide insights into individual behaviors shaping aggregate patterns in labor markets through numerical experiments.

Vanishing viscosity limit for Hamilton-Jacobi Equations defined in the Wasserstein space

Daniela Tonon
University of Padua
Italy
Co-Author(s):    Giacomo Ceccherini Silberstein
Abstract:
We consider Hamilton-Jacobi equations on the Wasserstein space of probability measures, arising in deterministic and stochastic dynamics. We develop a unified viscosity solution framework for both first-order equations and semilinear equations with idiosyncratic noise, based on an appropriate choice of subdifferential compatible with comparison and stability properties. Within this framework, we establish a vanishing viscosity limit for semilinear Hamilton-Jacobi equations, showing convergence to the corresponding first-order equation as the noise intensity tends to zero, together with an optimal convergence rate. Our results provide a PDE-level description of the zero-noise transition.

Bridging Schrodinger and Bass: A Semimartingale Optimal Transport Problem with Diffusion Control

Nizar Touzi
New York University
USA
Co-Author(s):    P. Henry-Labordere, G. Loeper, O. Mazhar, H. Pham
Abstract:
We study a semimartingale optimal transport problem interpolating between the Schrodinger bridge and the stretched Brownian motion associated with the Bass solution of the Skorokhod embedding problem. The cost combines an entropy term on the drift with a quadratic penalization of the diffusion coefficient, leading to a stochastic control problem over drift and volatility. We establish a complete duality theory for this problem, despite the lack of coercivity in the diffusion component. In particular, we prove strong duality and dual attainment, and derive an equivalent reduced dual formulation in terms of a variational problem over terminal potentials. Optimal solutions are characterized by a coupled Schrodinger-Bass bridge system, involving a backward heat potential and a transport map given by the gradient of a beta-convex function. This system interpolates between the classical Schrodinger system and the Bass martingale transport. Our results furnish a unified framework encompassing entropic and martingale optimal transport, and yield a variational foundation for data-driven diffusion models.

Behavioral patterns, partial observability, and uncertainty in mean-field game epidemiological models

Alexander Vladimirsky
Cornell University
USA
Co-Author(s):    Finnegan Buckley and Carlos Doebeli
Abstract:
Details of human behavior are an important factor in the spread of infectious diseases. While in traditional epidemiological models most types of behavior (e.g., individuals` degree of compliance with social distancing recommendations) are pre-programmed, in Mean-Field Game (MFG) models individuals are assumed to make decisions (e.g., on their individual contact rates) rationally based on their current health status and the evolving epidemiological situation. But in most MFG epidemiological models, all players are assumed (a) to always have full information about their own health status, (b) to know in advance the planning horizon, (c) to be fully rational in planning their behavior, and (d) to be fully consistent in carrying out their plans. In this talk, we will show how each of these unrealistic assumptions can be relaxed, and how the resulting generalized MFG models can be treated numerically by solving a two-point boundary value problem for a system of approximating ODEs. Parts of this presentation will be based on joint papers with F. Buckley and C. Doebeli.

Dual Approaches to Stochastic Control via SPDEs and the Pathwise Hopf Formula

Jiefei Yang
New York University Shanghai
Peoples Rep of China
Co-Author(s):    Mathieu Lauri\\`{e}re and Jiefei Yang
Abstract:
We develop dual approaches for continuous-time stochastic control problems, enabling the computation of robust dual bounds in high-dimensional state and control spaces. Building on the dual formulation proposed in [L. C. G. Rogers, SIAM Journal on Control and Optimization, 46 (2007), pp. 1116--1132], we first formulate the inner optimization problem as a stochastic partial differential equation (SPDE); the expectation of its solution yields the dual bound. Curse-of-dimensionality-free methods are proposed based on the Pontryagin maximum principle and the generalized Hopf formula. In the process, we prove the generalized Hopf formula, first introduced as a conjecture in [Y. T. Chow, J. Darbon, S. Osher, and W. Yin, Journal of Computational Physics 387 (2019), pp. 376--409], under mild conditions. Numerical experiments demonstrate that our dual approaches effectively complement primal methods, including the deep BSDE method for solving high-dimensional PDEs and the deep actor-critic method in reinforcement learning.