Special Session 133: New developments on nonlinear expectations

When Reinforcement Learning Aligns with Estimate-Then-Plug-In? Insights from Continuous-Time Portfolio Selection
Min Dai
The Hong Kong Polytechnic University
Hong Kong
Co-Author(s):    Min Dai, Yanwei Jia, Zhichao Lu
Abstract:
Traditional dynamic decision-making under uncertainty typically follows an estimate-then-plug-in approach: specifying a model, estimating its parameters from historical data, and solving the resulting stochastic control problem analytically or numerically. In contrast, modern reinforcement learning (RL) directly learns optimal policies from data without estimating model parameters. Continuous-time RL methods have recently gained traction in financial decision-making, but it remains unclear when RL outperforms the traditional approach and why. We investigate these questions through continuous-time portfolio optimization using a relaxed control framework. Without transaction costs, we derive an analytical solution and develop a model-free q-learning algorithm to learn optimal trading strategies. Our theoretical results show that given continuous stock price data, the RL-learned policy matches the estimate-then-plug-in solution. Incorporating transaction costs, we find this equivalence still holds. Numerical experiments with synthetic data confirm these findings across various models. However, empirical studies demonstrate that our RL method significantly outperforms the traditional approach in realistic settings, likely because real-world stock price dynamics exhibit complex features not fully captured by parametric models.