Special Session 37: Recent development of stochastic optimal control, applications and deep learning methods

Exponential Convergence of Relative Value Iteration in Ergodic Control Problems in Diffusions

Sumith Reddy Anugu
TU Ilmenau
Germany
Co-Author(s):    Sumith Reddy Anugu (TU Ilmenau), Guodong Pang (Rice University)
Abstract:
The relative value iteration (RVI) algorithm is well-known to numerically approximate the value function (and also the optimal value) associated with ergodic cost control problems and thereby also help us obtain the approximations of optimal controls. However, the rate of convergence of these algorithms is less explored in the case of infinite state space like in diffusions, even under strong conditions of exponential ergodicity. In this talk, we present our recent work where we establish that a `slightly modified version' of the RVI algorithm in the case of diffusions converges exponentially, under the conditions of uniform exponential ergodicity of the diffusion. The proof involves considering a weighted semi-norm which is identical for all functions modulo an additive constant. It turns out that under this semi-norm, the diffusion semi-group becomes a contraction, uniformly in all Markov controls. Another consequence of this consideration is that one effectively `decouples' the problem of convergence to the optimal value and the problem of convergence to the value function. We then analyze both these convergences separately.

Deep Learning for Energy Market Contracts: Dynkin Game with Doubly RBSDEs

Ihsan Arharas
Linnaeus University, Sweden
Sweden
Co-Author(s):    Nacira Agram, Giulia Pucci and Jan Rems
Abstract:
We formulate a Contract for Difference (CfD) with early exit options as a two-player zero-sum Dynkin game, capturing the strategic interaction between an electricity producer and a regulatory authority. The payoff structure includes running revenues, early termination penalties, and a terminal settlement, while the underlying electricity prices follow mean-reverting dynamics. The value of the game and the associated feedback optimal stopping rules are characterized through a doubly reflected backward stochastic differential equation (DRBSDE). To approximate the solution of the DRBSDE, we propose a learning-based numerical method that combines time discretization with neural network approximations of the backward components along simulated price trajectories. The approach avoids explicit state space discretization, accommodates time dependent barriers, and is applicable in moderately high-dimensional settings. A convergence result is established to justify the link between the continuous-time formulation and its numerical approximation. The proposed Deep DRBSDE solver is illustrated on a CfD model driven by 24-dimensional mean-reverting electricity prices representing multiple European market zones. In addition, a symmetric benchmark Dynkin game in dimension~20 and a mean-field extension are considered to assess the validity of the solver in controlled settings. The numerical results demonstrate stable training behavior and a consistent approximation of the contract value and optimal stopping regions across the considered examples.

Pontryagin Maximum Principle for Reflected BSDEs under State Constraints

Hanane Ben-Gherbal
University of Mohamed Khider, Biskra
Algeria
Co-Author(s):    Omar Kebiri
Abstract:
In this talk, we investigate a stochastic optimal control problem in which the performance criterion is described by a reflected backward stochastic differential equation (RBSDE) subject to a lower obstacle. The reflection mechanism introduces intrinsic nonsmoothness, which prevents the direct application of standard variational techniques. To address this difficulty, we employ a penalization method, approximating the RBSDE by a sequence of classical BSDEs augmented with penalization terms. Combining spike variation arguments with duality techniques, we establish a Pontryagin-type maximum principle for the penalized systems and subsequently pass to the limit. In the limiting framework, we derive a novel adjoint equation that involves a singular measure concentrated on the contact set where the reflection constraint is active. This measure can be interpreted as a stochastic counterpart of a Lagrange multiplier, yielding a rigorous Hamiltonian formulation of the maximum principle under state inequality constraints. In this talk, we also illustrate the theoretical results with a representative example.

Modeling Long-Memory Stochastic Dynamics in a Fractional SIRV$^{3}$S Epidemic System

Khelifa Berkane
University of BTU Cottbus-Senftenberg
Germany
Co-Author(s):    Omar Kebiri ; Abdeldjebbar Kandouci ; Carsten Hartmann ; Mhamed Eddahbi
Abstract:
We investigate a novel epidemic model known as the SIRV$^{3}$S model, which incorporates three types of vaccinations and introduces stochastic perturbations in one of our key parameters, considering the presence of a long-memory effect. This formulation enables us to express our model as a stochastic differential equation driven by fractional Brownian motion with a Hurst parameter ($H > \frac{1}{2}$), which we denote as FSDE. By employing the Wick-It^{o}-Skorohod (WIS) integral framework, we establish the existence and uniqueness of a global positive solution using the random Lyapunov function method in conjunction with It^{o}`s formula. In our numerical modelling, we examine an example based on the COVID-19 epidemic. Our objective is to ascertain the most appropriate Hurst parameter for our specific context. To achieve this objective, we first generate fractional Brownian motion utilising the fast Fourier transform method. Following this step, we apply an Euler-type discretisation tailored to the increments of fractional Brownian motion. Throughout this simulation, we compare three distinct Hurst parameters. Through these comparisons, we identify the parameter that best corresponds to our particular scenario.

Quadratic BSDEs Subject to Irregular Obstacle Constraints

El Hassan ES-SAKY
Cadi Ayyad University, Polydisciplinary Faculty of Safi
Morocco
Co-Author(s):    E.H. Essaky, M. Hassani, C.E. Rhazlane
Abstract:
This talk establishes the existence of maximal (and minimal) solution for one-dimensional generalized reflected backward stochastic differential equation (RBSDE for short). The equation features irregular barriers and a driver with stochastic quadratic growth. The solution $Y$ is constrained to remain above rcll barriers $L$ and $U$ on $[0, T[$, while its left limit $Y_-$ must stay above predictable barriers $l$ and $u$ on $]0, T]$. This result is achieved without assuming any $\mathbb{P}$-integrability conditions and under weaker assumptions on the input data. In particular, we construct a maximal solution for such a RBSDE when the terminal condition $\xi$ is only $\mathcal{F}_T$-measurable and the driver $f$ is continuous with general growth with respect to the variable $y$ and stochastic quadratic growth with respect to the variable $z$. Our proof is based on a generalized penalization method. Furthermore, we present a standard and equivalent formulation of the original RBSDE and characterize the solution $Y$ as the generalized Snell envelope of a specific predictable process $l$.

Financial Modeling with Stochastic Volatility: Connections to 2BSDEs and Deep Learning Methods

Omar KEBIRI
BTU Cottbus-Senftenberg
Germany
Co-Author(s):    Zaineb Mezdoud, Carsten Hartmann, Mohamed Riad Remita
Abstract:
In this talk, we present modeling of financial dynamics under uncertainty through stochastic volatility frameworks. We introduce several stochastic models, with a particular focus on an $\alpha$-hypergeometric model with uncertain volatility (UV), and investigate the corresponding worst-case scenario for option pricing. Our approach relies on the connection between a class of nonlinear Hamilton-Jacobi-Bellman (HJB) type partial differential equations, namely G-HJB equations, which characterize the nonlinear expectation in the UV setting and second-order backward stochastic differential equations (2BSDEs). This framework provides an alternative to the challenging calibration issues inherent in uncertain volatility models. Using asymptotic analysis of the G-HJB equation and its equivalent 2BSDE representation, we derive a limiting model that accurately captures the worst-case pricing scenario when the volatility bounds vary slowly. Finally, the theoretical results are supported by numerical experiments based on deep learning methods for approximating the associated 2BSDE, demonstrating the effectiveness of the proposed approach.

Stochastic optimal control of battery storage with SEI driven degradation in volatile electricity markets.

Wilfried Kenmoe Nzali
Weierstrass Institute for Applied Analysis and Stochastics(WIAS-Berlin)
Germany
Co-Author(s):    Christian Bayer ,Doerte Kreher , Manuel Landstorfer
Abstract:
attery energy storage systems are increasingly used in volatile electricity markets where aggressive cycling can increase economic returns but also speed up degradation. One important source of capacity loss is the growth of the solid electrolyte interphase layer which affects long term performance. This work presents an optimal control framework that includes both market price uncertainty and a physics based model of this degradation process. The model is placed inside a stochastic optimization problem that selects charge and discharge actions while protecting battery health. Numerical tests show how market volatility, the growth of the interphase layer, and operational limits influence the resulting strategies.

Filippov`s Theorem for stochastic differential inclusions driven by semimartingales

Mariusz Michta
Institute of Mathematics, University of Zielona Gora
Poland
Co-Author(s):    Mariusz Michta
Abstract:
In the talk, we present a stochastic version of Filippov's Theorem and its application to the qualitative analysis of stochastic differntial inclusions with respect to semimartingale integrators. Based on this result, in particular, we establish the Lipschitz dependence of the solution set of the considered inclusion on initial sets, and continuous dependence on the multivalued operators and the integrators involved. We also provide analogous continuity properties for the attainable sets generated by the solutions of the given inclusion.

Deep gradient flow methods for PDEs and applications in finance

Antonis Papapantoleon
TU Delft
Netherlands
Co-Author(s):    Chenguang Liu; Emmanuil Georgoulis; Jasper Rou; Costas Smaragdakis
Abstract:
We develop a novel deep learning approach for pricing European options in diffusion and jump-diffusion models, that can efficiently handle high-dimensional problems resulting from Markovian approximations of rough volatility models or from multi-asset options. The option pricing partial differential equation is reformulated as an energy minimization problem, which is approximated in a time-stepping fashion by deep artificial neural networks. The proposed scheme respects the asymptotic behavior of option prices for large levels of moneyness and adheres to a priori known bounds for option prices. The accuracy and efficiency of the proposed method is assessed in a series of numerical examples, with particular focus in the lifted Heston model and the multi-variate Merton model. Time permitting, theoretical results about the generalization error of this method will be discussed.

Stability analysis of a branching diffusion solver for semilinear heat equations

Nicolas Privault
Nanyang Technological University
Singapore
Co-Author(s):    Qiao Huang and Nicolas Privault
Abstract:
Stochastic branching algorithms provide a useful alternative to grid-based schemes for the numerical solution of partial differential equations, particularly in high-dimensional settings. However, they require a strict control of the integrability of random functionals of branching processes in order to ensure the non-explosion of solutions. In this paper, we study the stability of a functional branching representation of PDE solutions by deriving sufficient criteria for the integrability of the multiplicative weighted progeny of stochastic branching processes. We also prove the uniqueness of mild solutions under uniform integrability assumptions on random functionals.

Asymptotic behaviors of small perturbation for path-dependent multivalued McKean-Vlasov stochastic differential equations

Huijie Qiao
Southeast University
Peoples Rep of China
Co-Author(s):    Ying Ma
Abstract:
In this talk I will introduce the asymptotic behavior of path-dependent multivalued McKean-Vlasov stochastic differential equations perturbed by small noise. Specifically, we first establish a large deviation principle for such equations under non-Lipschitz coefficients by the weak convergence approach. Subsequently, we introduce an auxiliary equation and apply it to derive the moderate deviation principle.

Optimal Energy Management via Extended McKean-Vlasov Stochastic Control; a Lagrange Relaxation formulation

Riccardo Saporiti
EPFL
Switzerland
Co-Author(s):    Fabio Nobile, Raul Tempone
Abstract:
In this talk, we present a framework for tackling Stochastic Optimal Control Problems of Extended McKean-Vlasov type, which are characterized by scalar interactions with respect to the laws of the state and control variables. We employ a continuous-time Lagrangian relaxation that transforms the original extended McKean-Vlasov control problem into a saddle-point problem, where the inner minimization represents a standard stochastic optimal control problem in the physical state space. We derive a Hamilton-Jacobi-Bellman (HJB) equation on the state space by applying the traditional Bellman principle of optimality, which provides optimality conditions for the control variable. To tackle the resulting relaxed problem and the associated HJB equation, we resort to gradient-based methods along with semi-implicit partial differential equation solvers. As a specific application, we focus on the energy sector, particularly on the optimal scheduling of the day-ahead dispatch plan for an aggregated consumer that manages its smart grid. An innovative, Neural Network-based, Markovian projection is proposed to reduce the dimensionality of the state space by condensing the uncontrolled residual consumption into a single stochastic process. Numerical results validate the framework we presented.

Generative Market Equilibrium Models with Stable Adversarial Learning via Reinforcement Link

Xiaofei Shi
University of Toronto
Canada
Co-Author(s):    Anastasis Kratsios, Qiang Sun, Zhanhao Zhang
Abstract:
We present a general computational framework for solving continuous-time financial market equilibria under minimal modeling assumptions while incorporating realistic financial frictions, such as trading costs, and supporting multiple interacting agents. Inspired by generative adversarial networks (GANs), our approach employs a novel generative deep reinforcement learning framework with a decoupling feedback system embedded in the adversarial training loop, which we term as the reinforcement link. This architecture stabilizes the training dynamics by incorporating feedback from the discriminator. Our theoretically guided feedback mechanism enables the decoupling of the equilibrium system, overcoming challenges that hinder conventional numerical algorithms. Experimentally, our algorithm not only learns but also provides testable predictions on how asset returns and volatilities emerge from the endogenous trading behavior of market participants, where traditional analytical methods fall short. The design of our model is further supported by an approximation guarantee.

A two-step stochastic model of anaerobic digestion and opportunities for biogas production

Calvin Tadmon
University of Dschang
Cameroon
Co-Author(s):    
Abstract:
In this talk, we consider the so-called deterministic anaerobic digestion model number 2, derive its stochastic version by perturbing the maximal growth rate of acidogenic microorganisms with a white noise, and investigate the global dynamics of the stochastic model obtained. By mainly relying on $It\hat{o}$ formula, combined with other tools from stochastic analysis, we first prove the existence and uniqueness of a global positive strong solution of the stochastic model. Then, we explore and derive the conditions of persistence and extinction of the microorganisms. Finally, by performing some numerical simulations, we illustrate the theoretical results obtained. In some cases, the interpretation of the results enables clarification of conditions under which biogas can be produced.

Optimal Market-Making with Hawkes Process: A Markovian Approximation Approach via Mercer`s Expansion

Alex Tse
University College London
England
Co-Author(s):    Nicholas Martin
Abstract:
We study an optimal market-making problem under which the order flow and liquidity level are driven by self-exciting Hawkes processes. To overcome the challenge brought by the non-Markovian structure of the problem, we propose a Markovian lifting approach where the Hawkes kernels are approximated by their truncated Mercer`s expansions. This enables dynamic programming and in turn computationally feasible procedures to solve the market-making problem. The theoretical convergence of the approximated solution to the true value function is proven. Our numerical findings show that ignoring persistence with the the order flow and liquidity level underestimates adverse selection risk, whereas explicitly modelling them improves the robustness and profitability of the market-making strategies.

Optimality and Robustness in Path-Planning Under Initial Uncertainty

Alexander Vladimirsky
Cornell University
USA
Co-Author(s):    Dongping Qi and Adam Dhillon
Abstract:
Classical deterministic optimal control problems assume full information about the controlled process. The theory of control for general partially-observable processes is powerful, but the methods are computationally expensive and typically address the problems with stochastic dynamics and continuous (directly unobserved) stochastic perturbations. In this presentation we focus on path planning problems which are in between -- deterministic, but with an initial uncertainty on either the target or the running cost on parts of the domain. That uncertainty is later removed at some time T, and the goal is to choose the optimal trajectory until then. We address this challenge for three different models of information acquisition: with fixed T, discretely distributed and exponentially distributed random T. We develop models and numerical methods suitable for multiple notions of optimality: based on the average-case performance, the worst-case performance, the average constrained by the worst, the average performance with probabilistic constraints on the bad outcomes, risk-sensitivity, and distributional-robustness. We illustrate our approach using examples of pursuing random targets identified at a (possibly random) later time T.

Time-Inconsistent Stochastic Optimal Control Problems in Infinite Time Horizon

Qingmeng Wei
Northeast Normal University
Peoples Rep of China
Co-Author(s):    Jiongmin Yong
Abstract:
This talk is concerned with a time-inconsistent stochastic optimal control problem in an infinite time horizon with a non-degenerate diffusion in the state equation. A major assumption is that people become rational after a large time. Under such a condition, the problem is decomposed into two parts: a non-autonomous time-consistent problem in infinite time horizon and a time-inconsistent problem in infinite time horizon. Then an equilibrium strategy will be constructed. Both Bolza type problem and recursive cost problem are considered.

Continuous-time reinforcement learning for optimal switching over multiple regimes

Xiang Yu
The Hong Kong Polytechnic University
Hong Kong
Co-Author(s):    Yijie Huang, Mengge Li, Zhou Zhou
Abstract:
This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regularization where the agent randomizes both the timing of switches and the selection of regimes through the generator matrix of an associated continuous-time finite-state Markov chain. We establish the well-posedness of the associated system of Hamilton-Jacobi-Bellman (HJB) equations and provide a characterization of the optimal policy. The policy improvement and the convergence of the policy iterations are rigorously established by analyzing the system of equations. We also show the convergence of the value function in the exploratory formulation towards the value function in the classical formulation as the temperature parameter vanishes. Finally, a reinforcement learning algorithm is devised and implemented by invoking the policy evaluation based on the martingale characterization. Our numerical examples with the aid of neural networks illustrate the effectiveness of the proposed RL algorithm.

Stochastic Differential Games with Random Coefficients and Stochastic Hamilton-Jacobi-Bellman-Isaacs Equations

Jing Zhang
Fudan University
Peoples Rep of China
Co-Author(s):    
Abstract:
We study a class of zero-sum two-player stochastic differential games with the controlled stochastic differential equations and the payoff/cost functionals of recursive type. As opposed to the pioneering work by Fleming and Souganidis [Indiana Univ. Math. J., 38 (1989), pp. 293--314] and the seminal work by Buckdahn and Li [SIAM J. Control Optim., 47 (2008), pp. 444--475], the involved coefficients may be random, going beyond the Markovian framework and leading to the random upper and lower value functions. We first prove the dynamic programming principle for the game, and then under the standard Lipschitz continuity assumptions on the coefficients, the upper and lower value functions are shown to be the viscosity solutions of the upper and the lower fully nonlinear stochastic Hamilton-Jacobi-Bellman-Isaacs (HJBI) equations, respectively. A stability property of viscosity solutions is also proved. Under certain additional regularity assumptions on the diffusion coefficient, the uniqueness of the viscosity solution is addressed as well

Second-order PDEs on Wasserstein Space

Xin Zhang
NYU
USA
Co-Author(s):    Erhan Bayraktar, Ibrahim Ekren, and Xihao He.
Abstract:
Mean-field control with common noise and filtering problems naturally lead to second-order PDEs on Wasserstein space. In this talk, we analyze a class of such equations in which the second-order operator is finite-dimensional in nature. We establish comparison principles and apply them to obtain particle convergence rates in mean-field control. The talk is based on joint work with Erhan Bayraktar, Ibrahim Ekren, and Xihao He.

Long-Term Average Impulse Control with Mean Field Interactions

Chao Zhu
University of Wisconsin-Milwaukee
USA
Co-Author(s):    
Abstract:
This paper analyzes and explicitly solves a class of long-term average impulse control problems with a specific mean-field interaction. The underlying process is a general one-dimensional diffusion with appropriate boundary behavior. The model is motivated by applications such the optimal long-term management of renewable natural resources and financial portfolio management. Each individual agent seeks to maximize her long-term average reward, which consists of a running reward and income from discrete impulses, where the unit intervention price depends on the market through a stationary supply rate, the specific mean field variable to be considered. In a competitive market setting, we establish the existence of and explicitly characterize an equilibrium strategy within a large class of policies under mild conditions. Additionally, we formulate and solve the mean field control problem, in which agents cooperate with each other, aiming to realize a common maximal long-term average profit. To illustrate the theoretical results, we examine a stochastic logistic growth model and a population growth model in a stochastic environment with impulse control.