Leveraging the hippocampus as a predictive map
Contemplate about how you opt for a route to get to work, where to shift homes, or even which move to perform next in a tricky game such as Go. Each one of these situations needs you to estimate the probable future reward of your decision. This is tricky as the number of potential scenarios explodes as one looks further and further into what’s to come.
Comprehending how we perform this is a dominant research question within the field of neuroscience, while developing systems that can effectively forecast rewards is a dominant concentration within AI research.
In a research paper, put out in Nature Neuroscience, a neuroscience perspective was applied to a long-standing mathematical theory obtained from machine learning to furnish new insights into the nature of learning and memory itself. Particularly, it is put forth that the sphere of the brain referred to as the hippocampus provides a novel solution to this issue by compactly summarizing upcoming events leveraging what we refer to as a “predictive map”.
The hippocampus has been conventionally viewed to only indicate an organism’s present state, especially in spatial tasks, like navigation of a maze. This perspective gained considerable traction with the discovery of “place cells” in the rodent hippocampus, which activate selectively when the animal is in particular locations. While this theory accounts for several neuropsychological discoveries, it does not completely explain why the hippocampus is also involved in other functions, like memory, relational reasoning, and decision making.
The new theory views navigation as part of the more general issue of computing plans that maximise upcoming reward. The insights were obtained from reinforcement learning, the subdomain of artificial intelligence research that concentrates on systems that go about learning through trial and error. The critical computational idea that is drawn upon is that to estimate upcoming reward, an agent must initially estimate how much immediate reward it is expecting in every state, and then weight this expected reward by how frequently it expects to visit that state in the future. By totalling up this weighted reward throughout all potential states, the agent gathers an estimate of future reward.
Likewise, the argument is that the hippocampus indicates every scenario – or state – in terms of the future states which it forecasts. For instance, if you are leaving work, which happens to be your present state, your hippocampus might indicate this by forecasting that you will probably soon be on your commute, getting your kids from school, or more distantly, at home. By indicating every present state in terms of its anticipated successor states, the hippocampus conveys a concise summarization of upcoming events, known formally as the “successor representation”. It is indicated that this particular form of predictive map enables the brain to adapt swiftly in environments with altering rewards, but without needing to execute costly simulations of the future.
This strategy brings together the strengths of two algorithms that are already well known within reinforcement learning and are also viewed to be present in human beings and rodents. “Model-based” algorithms go about learning models of the environment that can subsequently be simulated to generate estimates of upcoming reward, while “model-free” algorithms learn upcoming reward estimates directly from experience in the setting. Model-based algorithms are flexible but expensive from a computational standpoint, whereas model-free algorithms are cheap in a computational sense, but not very flexible.
The algorithm this theory draws inspiration upon brings together some of the flexibility of model-based algorithms with the efficiency of model-free algorithms. As the calculation is a simple weighted sum, it is computationally efficient, just like a model-free algorithm. Simultaneously, by separating reward expectations and state expectations (the predictive map) it can swiftly adapt to alterations in reward by merely updating the reward expectations while retaining the state expectations.
In upcoming research, the intention is to evaluate the theory even more. As the predictive map theory can be converted into a neural network architecture, the desire is to explore the extent to which this learning strategy can promote flexible, rapid planning in silico.
In a more general sense, a dominant future task will be to look at how the brain goes about integrating varying types of learning. While this model is posed as an alternative to model-based and model-free learning in the brain, a more realistic perspective is that several variants of learning are, at the same time, coordinated by the brain over the course of learning and planning. Comprehending how these learning algorithms are brought together is a critical step towards comprehending human and animal brains, and could furnish critical insights for developing equally complicated, multifaceted artificial intelligences.