Special Session 16: Recent Development of Stochastic Optimal Control and Differential Games

Deep learning methods to solve some of stochastic optimal control problems

Omar Kebiri
BTU Cottbus-Senftenberg
Germany
Co-Author(s):    
Abstract:
In this talk, I will presents deep learning methods to solve some stochastic optimal control (SOC in short) problems. The first SOC is an application for solving initial path optimization of mean-field systems with memory where we consider the problem of finding the optimal initial investment strategy for a system modeled by a linear McKean-Vlasov (mean-field) stochastic differential equation with positive delay, driven by a Brownian motion and a pure jump Poisson random measure. The problem is to find the optimal initial values for the system in this period, before the system starts at t equal zero. Because of the delay in the dynamics, the system will after startup be influenced by these initial investment values. It is known that linear stochastic delay differential equations are equivalent to stochastic Volterra integral equations. By using this equivalence, we can find implicit expression for the optimal investment. We deep machine learning algorithms to solve explicitly some examples The second type of dynamic is a second BSDE that represent a fully nonlinear second order PDE. As an application here, we study alpha-hypergeometric model with uncertain volatility (UV) where we derive a worst-case scenario for option pricing. The approach is based on the connection between a certain class of nonlinear partial differential equations of HJB-type (G-HJB equations), that govern the nonlinear expectation of the UV model and that provide an alternative to the difficult model calibration problem of UV models, and second-order backward stochastic differential equations (2BSDEs). Using a deep learning based approximation of the underlying 2BSDE we can find the solution of our problem.

Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning

Xun Li
HK PolyU
Hong Kong
Co-Author(s):    Xiangyu Cui, Yun Shi, Si Zhao
Abstract:
This talk studies a discrete-time mean-variance model based on reinforcement learning. Compared with its continuous-time counterpart in Wang and Zhou (2020), the discrete-time model makes more general assumptions about the asset`s return distribution. Using entropy to measure the cost of exploration, we derive the optimal investment strategy, whose density function is also Gaussian type. Additionally, we design the corresponding reinforcement learning algorithm. Both simulation experiments and empirical analysis indicate that our discrete-time model exhibits better applicability when analyzing real-world data than the continuous-time model.

Deep Backward Schemes for High-Dimensional Portfolio Optimization Problem

Mohamed Mnif
ENIT
Tunisia
Co-Author(s):    Xavier Warin
Abstract:
In this paper, we consider the problem of portfolio choice in high dimension for an investor who wants to maximize the utility of his terminal wealth. For solving the Hamilton Jacobi Bellman Equation, we relate our problem to a backward stochastic differential equation which is solved by two algorithms based on deep neural networks. At each time step from, the optimal portfolio is first estimated with a first neural network. Then we minimize a loss function defined recursively by backward induction estimating the solution and its gradient separately in the first algorithm and simultaneously in the second. We provide error estimates in terms of the universal approximation of neural networks and we compare the numerical results with a direct algorithm still using neural networks but estimating the control in a single optimization maximizing the expectation of the terminal wealth.

Advances in Linear-Quadratic Stochastic Differential Games

Jun Moon
Hanyang University
Korea
Co-Author(s):    Jun Moon
Abstract:
Since the seminal paper of Fleming and Souganidis, stochastic differential games have been playing a central role in mathematical control theory, as they can be applied to model the general decision-making process between interacting players under stochastic uncertainties. Two different types of stochastic differential games can be formulated depending on the role of the interacting players. Specifically, when the interaction of the players can be described in a symmetric way, it is called the Nash differential game. On the other hand, the Stackelberg differential game can be used to formulate the nonsymmetric leader-follower hierarchical decision-making process between the players. This talk consists of two parts, studying various recent results on LQ stochastic Nash and Stackelberg differential games. In the first part, the rigorous mathematical formulation on LQ stochastic Nash and Stackelberg differential games will be covered within various different frameworks, including systems with random-coefficients, games of mean-field type, Markov-jump systems, and systems with delay, where we will also provide several different notions of Nash and Stackelberg equilibria depending on the underlying information structures. In the second part, we will address the detailed mathematical approaches to and analyses of the LQ stochastic differential games formulated in the first part, and present their explicit Nash/Stackelberg equilibrium solutions expressed by Riccati differential equations. Some examples including numerical solvability of the corresponding Riccati differential equations will also be discussed to illustrate the theoretical results of this talk.

Strict Dissipativity in Stochastic Optimal and Predictive Control

Jonas Schiessl
University of Bayreuth
Germany
Co-Author(s):    Ruchuan Ou, Michael H. Baumann, Timm Faulwasser, Lars Gruene
Abstract:
Since its introduction by Jan C. Willems, the concept of dissipativity has become a valuable tool for analyzing optimal and model predictive control problems. While dissipativity and its relationship to the so-called turnpike property are well-established for deterministic problems, further theoretical development is required to extend these notions to stochastic settings. In this talk, we introduce different forms of dissipativity based on stationarity concepts in distribution and random variables. We show that these notions are suitable for analyzing the distributional and pathwise behavior of stochastic problems and highlight their connection to different stochastic turnpike properties. Furthermore, we demonstrate how the proposed dissipativity and turnpike concepts can be utilized to analyze the performance of stochastic economic model predictive control schemes.

Mean field LQG games and teams

Bingchang Wang
Shandong University
Peoples Rep of China
Co-Author(s):    Huanshui Zhang and Ji-Feng Zhang
Abstract:
This work studies linear-quadratic-Gaussian (LQG) mean field games and teams, where agents are coupled via dynamics and individual costs. We propose an approach of decoupling mean field FBSDEs (forward- backward stochastic differential equations), and obtained the necessary and sufficient conditions for uniform stabilization of mean field control systems. In this work, a new approach is developed for mean field games and control, and the essential difference and connection is also revealed between the direct method and the fixed point method. We further apply the approach to investiage feedback solutions to mean field LQG Stackelberg games.

Stochastic maximum principle for weighted mean-field system with application to ambiguity filtering

Jie Xiong
Southern University of Science and Technology
Peoples Rep of China
Co-Author(s):    YanyanTang and Jiaqi Zhang
Abstract:
We study the optimal control problem for a weighted mean-field system. A new feature of the control problem is that the coefficients depend on the state process as well as its weighted measure and the control variable. By applying variational technique, we establish a stochastic maximum principle. Also, we establish a sufficient condition of optimality. As an application, we investigate the optimal ambiguity filtering problem.

Computational Nonlinear Filtering Using A Deep Learning Approach

George Yin
University of Connecticut
USA
Co-Author(s):    Hongjian Qian, George Yin, Qing Zhang
Abstract:
Nonlinear filtering is a fundamental problem in signal processing, information theory, communication, control and optimization, and systems theory. In the 1960s, celebrated results on nonlinear filtering were obtained. Nevertheless, the computational issues for nonlinear filtering remained to be a long-standing (60-year-old) and challenging problem. In this talk, in lieu of treating the stochastic partial differential equations for obtaining the conditional distribution or conditional measure, we construct finite-dimensional approximations using deep neural networks for the optimal weights. Two recursions are used in the algorithm. One of them is the approximation of the optimal weight and the other is for approximating the optimal learning rate. If time permits, we will also discuss our recent work on system identification.

Linear-Quadratic Optimal Control Problem for Mean-Field SDEs With Certain Random Coefficients

Jiongmin Yong
University of Central Florida
USA
Co-Author(s):    Jiongmin Yong
Abstract:
Motivated by linear-quadratic optimal control problems for mean-field SDEs with regime switching, we formulate an LQ problem governed by a standard Brownian motion, with the coefficients of the state equation and the weighting matrices/vectors being adapted to the filtration generated by the Markov chain independent of the Brownian motion governing the state equation. Through such a problem, we are going to approach our LQ problem in the following aspects: (1) Classical completing the squares gives a sufficient condition for the open-loop solvability of the LQ problem. However, this method is relevant to the optimality system, and therefore the optimal control could be anticipating which is not practical feasible. This leads to the following question: (2) Does the optimal control admit a non-anticipating representation? Under certain conditions, we found a closed-loop representation of open-loop optimal control, which is non-anticipating. Then it is natural to ask whether such a representation is itself optimal within the class of closed-loop controls. This leads to the problem (3) the closed-loop solvability of our LQ problem, a characterization is given. Finally, both open-loop and closed-loop solvability will be implied by the unform convexity of the cost functional in the control.

Optimization Methods Based on Optimal Control

Huanshui Zhang
Shandong University of Science and Technology
Peoples Rep of China
Co-Author(s):    Hongxia Wang, Yeming Xu, Ziyuan Guo
Abstract:
Optimization is critical in Applied Mathematics. It is also a scientific basis for engineering and information fields. The development of optimization theory spans hundreds of years, featuring classic algorithms such as gradient descent, improved gradient descent, Newton`s iteration, and enhanced quasi-Newton methods. While these algorithms have acknowledged strengths, they also have limitations: gradient descent is stable but typically suffers from slow convergence, whereas Newton`s iteration converges quickly but can easily diverge, with similar issues also present in their improved versions. This report introduces a novel optimization algorithm that is both fast-converging and stable, with its core idea rooted in optimal control theory. The update size of the iterative algorithm is treated as a control input, designed to minimize the sum of the optimized function and the control energy at future moments. Minimizing the optimized function ensures the fastest convergence while minimizing the control energy guarantees the algorithm`s stability. By applying Taylor expansion for linearization, the algorithm is further refined into an iterative form, thus avoiding the complexities of solving nonlinear forward-backward difference equations. The new algorithm is rigorously shown to achieve super-linear convergence similar to Newton`s iteration, along with the stability characteristic of gradient descent. Moreover, this algorithm can recover gradient descent, Newton`s iteration, as well as improved accelerated gradient descent and regularized Newton methods, providing the first theoretical foundation for the scientific validity of both gradient descent and Newton`s iteration.

Pairs Trading: An Optimal Selling Rule with Constraints

Qing Zhang
University of Georgia
USA
Co-Author(s):    Ruyi Liu, Jingzhi Tie, Zhen Wu, Qing Zhang
Abstract:
The talk is to focus on pairs trading selling rules. In pairs trading, a long position is held in one stock and a short position is held in another. The goal is to determine the optimal time to sell the long position and repurchase the short position in order to close the pair position. This talk presents an optimal pairs-trading selling rule with trading constraints. In particular, the underlying stock prices evolve according to a two-dimensional geometric Brownian motion, and the trading permission process is given in terms of a two-state {trading allowed, trading not allowed} Markov chain. It is shown that the optimal policy can be determined by a threshold curve which is obtained by solving the associated Hamilton- Jacobi-Bellman (HJB) equations (quasi-variational inequalities). A closed-form solution is obtained. A verification theorem is provided. Numerical experiments are also reported to demonstrate the optimal policies and value functions.