Inverse Reinforcement Learning for portfolio allocation

| October 1, 2022

Project overview

In the context of work on automated trading systems, we rely on reinforcement learning algorithms to create portfolio allocation strategies. These algorithms require large computational resources to obtain a strategy that is not guaranteed to converge. The current approach aims at learning an action policy from interactions between an agent and the trading environment. Thus, the transition and reward functions are learned either implicitly or by a critic in parallel to the action policy.

Reverse reinforcement learning (RRL), nicknamed “imitation learning”, consists in learning the reward function and the action policy from an expert system. This approach can be doubly interesting in finance where these estimates are complex from interactions, and where an adapted reward function can be difficult to create, in the case of risk mitigation.

The project aims at comparing the different existing LRI solutions, as well as quantifying their advantages over classical RL solutions (DDPG, SAC, PPO). Works [1,2,3,4] gather a number of approaches, as well as the founding theories of the field. We wish to apply this method to portfolio management (OLPS), we can already find first publications on this application, such as [5].

During this project, you will:

  • get to know the existing IRL techniques,
  • implement algorithms, with libraries (SB3, torch),
  • benchmark your methods.

[1] Schmidhuber “Reinforcement Learning Upside Down: Don’t Predict Rewards – Just Map Them to Actions.” ArXiv:1912.02875 [Cs], (2020)

[2] Xu et al “Receding Horizon Inverse Reinforcement Learning.” arXiv, (2022)

[3] Gerogiannis “Inverse Reinforcement Learning,” (2022)

[4] Chen et al “Decision Transformer: Reinforcement Learning via Sequence Modeling.” arXiv, (2021)

[5] Halperin et al “Combining Reinforcement Learning and Inverse Reinforcement Learning for Asset Allocation Recommendations.” arXiv, (2022)