Abstract: |
This talk addresses the problem of continuous-time reinforcement learning (RL). When the underlying dynamics remain unknown and only discrete-time observations are available, how can we effectively conduct policy evaluation and policy iteration? We first highlight that while model-free RL algorithms are straightforward to implement, they are often not a reliable approximation of the true value function. On the other hand, model-based PDE approaches are more accurate, but the inverse problem is not easy to solve. To bridge this gap, we introduce a new Bellman equation, PhiBE, which integrates discrete-time information into a PDE formulation. PhiBE allows us to skip the identification of the dynamics and directly evaluate the value function using discrete-time data. Additionally, it offers a more accurate approximation of the true value function, especially in scenarios where the underlying dynamics change slowly. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations. |
|