Open-sourcing TRFL – a library of RL building blocks
DeepMind are currently open sourcing a new library of useful building blocks for documenting reinforcement learning (RL) agents within TensorFlow. Referred to as TRFL, (pronounced ‘Truffle’) it indicates a collection of critical algorithmic components that have been leveraged internally for a major number of the most successful agents like DQN, DDPG, and the importance Weighted Actor Learner Architecture.
A regular deep reinforcement learning agent is made up of a large number of components that share interaction: at the very least, these consist of the environment and some deep network indicating values or policies, but they typically also consist of components like a learned model of the environment, pseudo-reward functions or a replay system.
These parts have a tendency to experience interactions in subtle ways, therefore making it tough to identify bugs in such large computational graphs. A latest blog post by OpenAI illustrated this problem through analysis of few of the most popular open-source implementations of reinforcement learning agents and identifying that six out of ten had subtle bugs identified by a community member and confirmed by the author.”
One strategy to tackling this issue, and assisting those within the research community making efforts to reproduce outcomes from papers, is through open sourcing complete agent implementations. Large Agent codebases can be very relevant for reproducing research, but also difficult to modify and extend. A differing and complementary strategy is to furnish dependable, well-evaluated implementations of mutual building blocks, that can be leveraged in a plethora of differing RL agents. Further, having these basic components abstracted away within a singular library, with a consistent API, makes it easier to bring together ideas coming from several differing publications.
The TRFL Library which includes functions to implement both classical RL algorithms in addition to more bleeding-edge strategies. The loss functions and other operations furnished here have implementation in pure TensorFlow. They are not finished algorithms, but implementations of RL-particular mathematical operations required when developing fully-functional reinforcement learning agents.
For value-based reinforcement learning they furnish TensorFlow ops to go about learning in discrete action spaces, like TD-learning, Sarsa, Q-learning, and their variations, in addition to Ops for implementation of ongoing control algorithms, like DPG. Ops are also included for learning distributional value functions. These ops assist batches, and return a loss that can be reduced by inputting it to a TensorFlow optimizer. Some losses operate across batches of transitions (for example, Sarsa, Q-learning) and others across batches of trajectories (for e.g., Q lambda, Retrace) For policy-driven methods, they possess utilities to implement both online methods like A2C and support off-policy correction strategies such as v-trace, with relative ease.
The computing of policy gradients in ongoing action spaces is also supported. Lastly, TRFL also furnishes an implementation of the auxiliary pseudo-reward functions leveraged by UNREAL, which was discovered to enhance data efficiency in a plethora of domains.
This is not a one-time release. As this library is leveraged extensively, it will be continued to be maintained and will see new functionalities added to it with the passage of time.