RL with unsupervised auxiliary activities
DeepMind’s main mission is to push the frontiers of artificial intelligence, generating programs that can go about learning to identify solutions to any complicated issue with no need to be instructed or with no human intervention. Reinforcement learning agents have accomplished breakthroughs in Atari 2600 titles and the game of Go. These systems, although, can need a ton of data and protracted time to learn so researchers are always on the lookout for methods to enhance their generic learning algorithms.
A paper put on Reinforcement Learning with Unsupervised Auxiliary Tasks puts forth a strategy for greatly enhancing the speed and final performance of agents. This is done through augmentation of the standard deep reinforcement learning strategies with two primary extra tasks for the agents to execute during training.
The first activity consists of the agent learning how to manage the pixels on a screen, which places emphasis on learning how your actions impact what you will observe over than mere prediction. This is just like how a baby might learn to gain autonomy over their hands by moving them around and looking at their movements. By learning to change differing portions of the screen, our agent goes about learning features of the visual input that are good for playing the title and obtaining higher scores.
In the second activity, the agent receives training to forecast the onset of immediate rewards from a short historical context. In order to deal with the situation in an improved way, where rewards are uncommon the agent is presented with past rewarding and non-rewarding histories in equal proportion. By learning on rewarding histories much more consistently, the agent can find out visual features predictive of reward much quicker.
The combo of these auxiliary activities, combined with our prior A3C paper is the new UNREAL agent (UNsupervised REinforcement and Auxiliary Learning). This agent was evaluated on a grouping of 57 Atari games in addition to a 3D environment referred to as Labyrinth with 13 levels. In all of the titles, the same UNREAL agent receives training in the same fashion, on the raw image output from the game, to generate actions to maximize the scoring or reward of the agent in the game. The behaviour needed to obtain game rewards is incredibly varied, from picking up fruits in 3D mazes to playing Breakout – the same UNREAL algorithm goes about learning to play these titles often to human level and beyond.
In Labyrinth, the outcome of leveraging the auxiliary activities – controlling the pixels on the screen and forecasting when reward is bound to occur – implies that UNREAL is capable of learning over 10x quicker than the prior best A3C agent, and attains far better performance. They can now accomplish 87% of specialist human performance averaged throughout the Labyrinth levels they considered, with super-human performance levels on an array of them. On Atari the agent now accomplishes on average 9-fold human performance. The hope is that this research will enable them to scale up their agents to ever more complicated environments, scenarios, and situations.