Prefrontal cortex as a meta-reinforcement learning system
Lately, Artificial Intelligence frameworks have mastered a broad array of video-games like the Atari classics Breakout and Pong. However, as impressive as these levels of performance are, Artificial intelligence is still reliant to the equivalent of thousands of hours of gameplay to attain and surpass the performance level of human video game players. By comparison, we can typically grasp the fundamentals of a video game we have never played prior in a matter of minutes.
The question of why the brain is capable to do a lot more with so much less has provided rise to the theory of meta-learning, or ‘learning to learn’. It is viewed that we undertake learning on two timescales – in the short term, we concentrate on learning about particular examples while over larger timescales we learn the abstract skills or rules needed to finish a task. It is this combination that is viewed to assist us in learning efficiently and apply that know-how swiftly and flexibly on new activities. Recreating this meta-learning structure within AI systems – referred to as meta-reinforcement learning – has proven really beneficial in facilitating quick, one-shot learning within the agents. However, the particular mechanisms that enable this process to occur in the brain are still most unexplained by neuroscience.
A new research paper was put out in Nature Neuroscience, we leverage the meta-reinforcement learning framework generated in artificial intelligence research to look into the role of dopamine in the brain in assisting us to learn. Dopamine – typically known as the brain’s pleasure indicator – has typically been viewed of as analogous to the reward forecasting error signal leveraged in Artificial Intelligence reinforcement learning algorithms. These frameworks go about learning to act by the trial and error with the reward serving as guidance. We propose that Dopamine’s function surpasses just utilizing reward to learn the value of historical actions and that it plays an integral role, particularly within the prefrontal cortex region, in enabling us to learn effectively, swiftly, and flexibly on new activities.
The theory was evaluated by virtually recreating six meta-learning experiments from the domain of neuroscience, every one needing an agent to carry out tasks that leverage the same basic principles (or grouping of skills) but that have variance in some dimension. A recurrent neural network received training, (indicating the prefrontal cortex) leveraging standard deep reinforcement learning techniques (indicating the role of dopamine) and then contrasted to the activity dynamics of the recurrent network with actual data taken from prior discoveries in neuroscience experiments. Recurrent networks are an adequate proxy for meta-learning as they are capable of internalising past actions and observations, and then draw on those experiences while undertaking training on a plethora of tasks.
One experiment that was recreated is referred to as the Harlow Experiment, a psychology assessment from the 1940s leverage to look into the concept of meta-learning. In the initial test, a grouping of monkeys where given two unfamiliar objects to choose from, just one of which provided them a food reward. They were shown these two objects on six occasions, every time the left-right placement was arbitrary so the monkey had to go about learning which object provided a food reward. They were shown two new objects, again, just one would have the outcome of a food reward. Over the duration of this training, the monkey generated a strategy to choose the reward connected – object. It went about learning to choose arbitrarily the first time, and then on the basis of the reward feedback to opt for the specific object, instead of the left or right position, from then on. The experiment demonstrates that monkeys could internalize the basic principles of the activity and go about learning an abstract rule structure, basically, learning to learn.
When a very similar evaluation was simulated, leveraging a virtual computer screen and arbitrarily chosen images, we discovered that our ‘meta-RL agent seemed to learn in a fashion analogous to the animals included in the Harlow Experiment, even when put forth with completely fresh images never viewed prior.
In the virtual recreation of the Harlow Experiment, the agent must move its gaze towards the object it thinks is connected with a reward.
As a matter of fact, it was discovered that the meta-RL agent could learn to swiftly adapt in a broad domain of activities with differing rules and structures. And as the network learned how to adapt to a plethora of activities, it also learned general principles about how to learn effectively.
Critically, we observed that the majority of learning occurred in the recurrent network, which assists the proposal that dopamine has a more critical part to play in the meta-learning process that prior believed. Dopamine is conventionally understood to fortify synaptic links in the prefrontal system, reinforcing specific behaviours. Within artificial intelligence, this implies that the dopamine – like a reward indicator adjusts the artificial synaptic weights in a neural network as it goes about learning the correct way to find a solution to an activity. However, in the experiments the weights of the neural network were frozen, implying that couldn’t be modified during the learning process, however, the meta-RL agent was still capable of solving and adapting to new activities. This demonstrates to us that dopamine, like reward isn’t just leveraged to adjust weights, but it additionally conveys and encodes critical data with regards to abstract task and rule structure, enabling quicker adaptation to new tasks.
Neuroscientists have for a long time, observed identical patterns of neural activations in the prefrontal cortex, which is swift to adapt and also flexible, but have faced a struggle to identify an adequate explanation for why that’s happening. The concept that the prefrontal cortex isn’t reliant on slow synaptic weight changes to go about learning rule structures, but is leveraging abstract, model-based data directly encoded in dopamine, provides a more satisfactory reason for its versatility.
In depicting that the critical ingredients thought to give rise to meta-reinforcement learning in artificial intelligence also exists within the brain, they’ve put forth a theory that not just fits with common knowledge about both dopamine and prefrontal cortex but that also describes a range of mysterious discoveries from neuroscience and psychology. Specifically, the concept puts new light on how structured, model-based learning comes from the brain, why dopamine itself consists of model-based data, and how neurons in the prefrontal cortex get tuned to learning connected signals. Utilizing insights from artificial intelligence which can have application to detail discoveries within neuroscience and psychology illustrates the value each domain can provide the other. Moving forward, we expect that a lot of advantages can be obtained in the reverse direction, by obtaining guidance from particular organization of brain circuits in developing new models for learning amongst reinforcement learning agents.