Special Session 182: Recent developments on mathematical finance, stochastic control and related topics

Distribution constrained optimal stopping: beyond the Root-type solution

Shuoqing Deng
The Hong Kong University of Science and Science and Technology
Hong Kong
Co-Author(s):    Shuoqing Deng
Abstract:
We give an explicit construction of the boundary which solves the distribution constrained optimal stopping when the cost function is not of Root-type. The boundary can be characterised analytically and probabilistically. From the analytical perspective, it is characterised by the viscosity solution of a variational inequality with Wentzell type boundary condition. From the probabilistic perspective, it can be characterised by the backward optimal stopping of a Sticky Brwonian motion without distribution constraint.

Pricing and Hedging of SOFR Derivatives

Yining Ding
The University of Sydney
Australia
Co-Author(s):    Matthew Bickersteth, Yining Ding, Marek Rutkowski
Abstract:
The London Interbank Offered Rate (LIBOR) has served since the 1970s as a fundamental measure for floating term rates across multiple currencies and maturities. However, in 2017, the Financial Conduct Authority announced the discontinuation of LIBOR from the end of 2021, and the New York Fed declared the Treasury repo financing rate, called the Secured Overnight Financing Rate (SOFR), as a candidate for a new reference rate for IRSs denominated in U.S. dollars. We examine arbitrage-free pricing and hedging of swaps referencing SOFR without and with collateral backing. As hedging instruments, we take SOFR futures and idiosyncratic funding rates for the hedge and margin account. For simplicity, a one-factor model based on Vasicek's equation is used to specify the joint dynamics of several overnight interest rates, including the SOFR and unsecured funding rate.

Dynamic Portfolio Selection under Monotone Additive Statistics in the Heston Model

Xuedong He
Chinese University of Hong Kong
Hong Kong
Co-Author(s):    Zhaoli Jiang, Jianming Xia
Abstract:
The monotone additive statistic is a preference representation satisfying the monotonicity and additivity properties. This statistic has been proven to be represented by a weighted average of certainty equivalents under exponential utility functions with different risk aversion degrees and employed in various financial and economic contexts. We study a dynamic portfolio selection problem in which an agent trades a risk-free asset and a risky stock with stochastic volatility to optimize her investment performance measured by the monotone additive statistic of her terminal wealth or log investment return. We find that the monotone additive statistic, when applied to dynamic decision problems, can lead to time inconsistency. We thus consider equilibrium strategies for our portfolio selection problem and derive these strategies by proving the solvability of two associated systems of ordinary differential equations.

Merton`s Problem with Recursive Perturbed Utility

Yanwei Jia
The Chinese University of Hong Kong
Hong Kong
Co-Author(s):    Min Dai, Yuchao Dong, Yanwei Jia, Xunyu Zhou
Abstract:
The classical Merton investment problem predicts deterministic, state-dependent portfolio rules; however, laboratory and field evidence suggest that individuals often prefer randomized decisions leading to stochastic and noisy choices. Fudenberg et al. (2015) develop the additive perturbed utility theory to explain the preference for randomization in the static setting, which, however, becomes ill-posed or intractable in the dynamic setting. We introduce the recursive perturbed utility (RPU), a special stochastic differential utility that incorporates an entropy-based preference for randomization into a recursive aggregator. RPU endogenizes the intertemporal trade-off between utilities from randomization and bequest via a discounting term dependent on past accumulated randomization, thereby avoiding excessive randomization and yielding a well-posed problem. In a general Markovian incomplete market with CRRA preferences, we prove that the RPU-optimal portfolio policy (in terms of the risk exposure ratio) is Gaussian and can be expressed in closed form, independent of wealth. Its variance is inversely proportional to risk aversion and stock volatility, while its mean is based on the solution to a partial differential equation. Moreover, the mean is the sum of a myopic term and an intertemporal hedging term (against market incompleteness) that intertwines with policy randomization.


Ruyi Liu
University of New South Wales
Australia
Co-Author(s):    
Abstract:

Risk-sensitive Reinforcement Learning based on Convex Scoring Functions

Yang Liu
The Chinese University of Hong Kong, Shenzhen
Peoples Rep of China
Co-Author(s):    Shanyu Han, Yang Liu, Xiang Yu
Abstract:
We propose a reinforcement learning (RL) framework under a broad class of risk objectives, characterized by convex scoring functions. This class covers many common risk measures, such as variance, Expected Shortfall, entropic Value-at-Risk, and mean-risk utility. To resolve the time-inconsistency issue, we consider an augmented state space and an auxiliary variable and recast the problem as a two-state optimization problem. We propose a customized Actor-Critic algorithm and establish some theoretical approximation guarantees. A key theoretical contribution is that our results do not require the Markov decision process to be continuous. Additionally, we propose an auxiliary variable sampling method inspired by the alternating minimization algorithm, which is convergent under certain conditions. We validate our approach in simulation experiments with a financial application in statistical arbitrage trading, demonstrating the effectiveness of the algorithm.

Inverse Learning the Altruism and Labor Cost Level in Mixed-Individual Mean Field Games

Xiaofei Shi
University of Toronto
Canada
Co-Author(s):    Haoyang Cao, Gokce Dayanikli
Abstract:
Understanding how humans respond to incentives, both individually and collectively, is central to effective policy design. In the context of stochastic differential game, mean field games (MFGs) are usually used to capture interactions among fully non-cooperative (egocentric) players, whereas mean field control (MFC) models are to study fully cooperative (altruistic) players. To capture the whole spectrum of behaviors, mixed-individual MFGs introduce a parameterized blend of egocentric and altruistic objectives. However, in practical settings policymakers cannot directly observe intrinsic altruism levels and/or other private cost parameters such as cost of efforts. We address this challenge by developing an inverse learning framework for mixed-individual MFGs. We demonstrate the feasibility and accuracy of the method through numerical experiments, showcasing recovery of latent altruism levels under noisy observations. Our results highlight the potential of inverse MFG techniques to infer behavioral structure in large populations, with implications for incentive design and data-driven policy analysis.

Incentives of Defined-Contribution Pension Managers

Ho Man Tai
University of Sydney
Australia
Co-Author(s):    Paolo Guasoni, Bohan Li, Tak Kwong Wong, Sheung Chi Phillip Yam
Abstract:
This talk will discuss the implications of asset management fee structures for defined-contribution pension funds. We develop a model where a manager with inter-temporal preferences invests pension contributions over members` working lives. Contrary to typical risk-shifting behaviors documented in the literature, we find that managers with the same preferences as plan members take less risk than plan members would choose for themselves. This result is due to the consumption-smoothing motive and the misalignment arising from calculating fees as a proportion of current assets rather than total wealth. Our findings reveal an overlooked aspect of delegated portfolio management and underscore the significance of inter-temporal utility in pension fund management. We establish the well-posedness of the value function and the optimal trading strategy through a fully nonlinear Hamilton-Jacobi-Bellman equation. We also develop an efficient numerical scheme to approximate the solutions, overcoming the challenges associated with its non-linearity and unbounded domain.

Portfolio Optimization under Transaction Costs with Recursive Preferences

Alex Tse
University College London
England
Co-Author(s):    Martin Herdegen; David Hobson
Abstract:
In this talk, I will report some recent progress on the portfolio optimization problem featuring proportional transaction costs under the Epstein-Zin stochastic differential utility preference. A key and novel idea is to parametrise consumption and the value function in terms of the ``shadow fraction of wealth``, which leads to a simpler first order free boundary problem. This facilitates the analysis of aspects of the problem that have previously been challenging such as well-posedness, comparative statics, and cases beyond the small transaction cost regime. An extension to dividend-paying risky asset will be discussed.

Well-posedness of the equilibrium HJB system for time-inconsistent controls

Zhenhua Wang
Shandong University
Peoples Rep of China
Co-Author(s):    Xiang Yu, Jiengjie Zhang, Zhou Zhou
Abstract:
We provide a general framework to solve equilibrium HJB equation system for time-inconsistent control problems by entropy-regularization. The controlled process is given by a SDE with no control on the diffusion term. We show the existence of solution/equilibrium with positive entropy weight, then prove the existence of relaxed equilibrium to the original time-inconsistent control by vanishing the entropy weight. This talk is based on joint works with Xiang Yu, Jingjie Zhang and Zhou Zhou.

Continuous-time q-learning for mean-field control problems with common noise

Xiaoli Wei
Harbin Institute of Technology
Peoples Rep of China
Co-Author(s):    Zhenjie Ren, Xiang Yu, Xun Yu Zhou
Abstract:
This paper investigates the continuous-time counterpart of the Q-function for entropy-regularized mean-field control (MFC) with controlled common noise, coined as q-function by Jia and Zhou (2023) in the single agent`s model. We first show that, under discretely sampled actions, the value function in the exploratory formulation converges to the one in the relaxed control formulation as the time grid refines. Leveraging the relaxed control formulation, we derive the exploratory Hamilton-Jacobi-Bellman (HJB) equation, in which the controlled common noise gives rise to an additional nonlinear functional of policy, rendering the policy iteration intricate. Under certain concavity condition, we establish the existence and uniqueness of the optimal one-step policy iteration via a first-order condition using the partial linear functional derivative with respect to policy. The policy improvement at each iteration is verified by relating to an entropy-regularized optimization problem over the space of policies. In the mean-field setting, we introduce the integrated q-function (Iq-function) defined on the state distribution and the policy, and it is shown that an optimal policy is identified as a two-layer fixed point to the argmax operator of the Iq-function. Finally, we provide the explicit characterization of an optimal policy as a Gaussian distribution in the general linear-quadratic (LQ) setting.

Policy Iteration Achieves Regularized Equilibrium under Time Inconsistency

Xiang Yu
The Hong Kong Polytechnic University
Hong Kong
Co-Author(s):    Yu-Jui Huang, Keyu Zhang
Abstract:
For a general entropy-regularized time-inconsistent stochastic control problem, we propose a policy iteration algorithm (PIA) and establish its convergence to an equilibrium policy with an exponential convergence rate. The design of the PIA is based on a coupled system of non-local partial differential equations, called the exploratory equilibrium Hamilton--Jacobi--Bellman (EEHJB) equation. As opposed to the standard time-consistent case, policy improvement fails in general and the target value function (now an equilibrium value function) is not even known to exist a priori. To overcome these, we prove that the value functions generated by the PIA form a Cauchy sequence in a specialized Banach space, hence admit a limit, and the rate of convergence is exponential, on the strength of the Bismut--Elworthy--Li formula of stochastic representation. The limiting value function is shown to fulfill the EEHJB equation, which induces an equilibrium policy in a Gibbs form. Such convergence in value additionally implies uniform convergence of the generated policies to the equilibrium policy, again with an exponential rate. As a byproduct, the PIA gives a constructive proof of the global existence and uniqueness of a classical solution to our general EEHJB equation, whose well-posedness has not been explored in the literature.

Mean-field games with rough common noise:\\ the compactification approach

Fengyi Yuan
The Chinese University of Hong Kong (Shenzhen)
Peoples Rep of China
Co-Author(s):    Erhan Bayraktar, Xihao He, Xiang Yu, Fengyi Yuan
Abstract:
We study mean-field game (MFG) problems with rough common noise where the representative state dynamics is governed by a controlled rough stochastic differential equation driven by an idiosyncratic Brownian motion and a deterministic rough path noise affecting the whole population. Within this new framework, we introduce a canonical weak formulation based on relaxed controls and rough martingale problems. We prove the existence of a pathwise mean-field equilibrium in this context by developing new technical tools for compactification to accommodate rough integration, which deviate substantially from classical compactification arguments in the literature. Finally, we discuss the relationship between the pathwise problem and the classical MFG problem with randomized Brownian common noise: conditioning yields the pathwise problem almost surely; and conversely, under a suitable causality/measurable-selection requirement, pathwise mean-field equilibria can be aggregated to produce randomized mean-field equilibria in the classical problem.

Major-Minor Mean Field Game of Stopping: An Entropy Regularization Approach

Jiacheng Zhang
the Chinese University of Hong Kong
Hong Kong
Co-Author(s):    Xiang Yu, Keyu Zhang, Zhou Zhou
Abstract:
This paper studies a discrete-time major-minor mean field game of stopping where the major player can choose either an optimal control or stopping time. We look for the relaxed equilibrium as a randomized stopping policy, which is formulated as a fixed point of a set-valued mapping, whose existence is challenging by direct arguments. To overcome the difficulties caused by the presence of a major player, we propose to study an auxiliary problem by considering entropy regularization in the major player`s problem while formulating the minor players` optimal stopping problems as linear programming over occupation measures. We first show the existence of regularized equilibria as fixed points of some simplified set-valued operator using the Kakutani-Fan-Glicksberg fixed-point theorem. Next, we prove that the regularized equilibrium converges as the regularization parameter \lambda tends to 0, and the limit corresponds to a fixed point of the original operator, thereby confirming the existence of a relaxed equilibrium in the original mean field game problem. We also extend this entropy regularization method to the mean-field game problem where the minor players choose optimal controls.

Goal-based Portfolio Selection with Fixed Transaction Costs

Jingjie Zhang
University of International Business and Economics
Peoples Rep of China
Co-Author(s):    Erhan Bayraktar, Bingyan Han
Abstract:
We study a goal-based portfolio selection problem in which an investor aims to meet multiple financial goals, each with a specific deadline and target amount. Trading the stock incurs a strictly positive transaction cost. Using the stochastic Perron`s method, we show that the value function is the unique viscosity solution to a system of quasi-variational inequalities. The existence of an optimal trading strategy and goal funding scheme is established. Numerical results reveal complex optimal trading regions and show that the optimal investment strategy differs substantially from the V-shaped strategy observed in the frictionless case.

Optimization of win martingales

Xin Zhang
NYU
USA
Co-Author(s):    Julio Backhoff
Abstract:
Prediction market is a market where people can trade based on outcomes of future events. It is widely used in sports games, elections, and pricing of digital options. In math finance, prediction markets can be modeled by the so-called win martingales, which are continuous time martingales that end up with Bernoulli distributions. In this talk, choosing different divergences as objective functionals, we will solve a class of optimal win martingales. In some cases, we will get explicit formulas of optimizers, and make connections to Schr\{o}dinger, filtering problems, Wright-Fisher diffusion, and the problem of identifying most exciting games.

EXISTENCE OF EQUILIBRIA FOR TIME-INCONSISTENT GAMES IN DISCRETE TIME

Zhou Zhou
University of Sydney
Australia
Co-Author(s):    Zhou Zhou
Abstract:
We investigate Markov relaxed equilibria for time-inconsistent stochastic games in discrete time. A key feature of such equilibria is that they capture the interaction of the current self with both future selves and other players. Our objective is to establish the existence of equilibria when the state space of the underlying controlled process is uncountable. The main difficulty arises from the absence of topologies under which the strategy sets are compact and the associated value functions are continuous. We provide general conditions on the transition kernels of the underlying process under which existence can be established.

DeepPAAC: A New Deep Galerkin Method for Principal-Agent Problems

Zimu Zhu
Hong Kong University of Science and Technology (Guangzhou)
Peoples Rep of China
Co-Author(s):    Michael Ludkovski, Changgen Xie
Abstract:
We consider numerical resolution of principal-agent (PA) problems in continuous time. We formulate a generic PA model with continuous and lump payments and a multi-dimensional strategy of the agent. To tackle the resulting Hamilton-Jacobi-Bellman equation with an implicit Hamiltonian we develop a novel deep learning method: the Deep Principal-Agent Actor Critic (DeepPAAC) Actor-Critic algorithm. DeepPAAC is able to handle multi-dimensional states and controls, as well as constraints. We investigate the role of the neural network architecture, training designs, loss functions, etc. on the convergence of the solver, presenting five different case studies. This is a joint work with Michael Ludkovski and Changgen Xie.