Meta Learning — A path to Artificial General Intelligence Series Part II — Model-Based Meta Learning

CellStrat
5 min readFeb 19, 2022

Welcome to the second part of our Meta Learning — A path to Artificial General Intelligence Series. Hope you’re all set with the Metric-Based Meta Learning. Now let’s dive into the new type, Model-based.

The Model-Based method is entirely different from Metric-based techniques. They make no assumption about maximising probability of the true class. Instead, the focus is on making the parameter-update/adaptability on new data faster using specific model architectures which generalise. In this section, we will look at models which are better at generalisation and to some extent, reasoning too.

Memory Augmented Neural Networks (MANN)

A computer uses three fundamental mechanisms to operate — arithmetic operations, logical flow control and external memory. They have long-term storage as well as short-term storage in the form of working memory (RAM). We humans use memory to retrieve facts from past experiences and put together different facts for inference and reasoning. The Machine learning community has largely neglected the use of external memory for solving problems to a large extent. Although RNNs have internal memory and it’s a whole another kind. Making use of external memory gives extra powers to models in terms of adaptability to new tasks by just accumulating new information in its working memory. MANNs are a class of models which learn to do just that by using attention-based mechanisms to operate on its memory and infer. We know traditional NNs such as RNNs and LSTMs have a memory concept. Unlike these, MANNs don’t need to remember large amounts of data, they just need to learn to operate on an external memory and infer from it. Usually, this is achieved by using attention-based techniques.

Memory Networks (MemNN) and Neural Turing Machines (NTM) are two different types of MANNs. Let’s take a look at NTMs for a while now.

A Neural Turing Machine is a neural network coupled with an external memory with which it interacts using attention mechanisms. It was introduced by Graves et. al. in DeepMind in 2014. It is analogous to a Turing Machine or Von Neuman architecture except that it is end-to-end differentiable, allowing it to be efficiently trained by gradient descent. Due to the usage of an external memory, they are good at retaining long sequences and can generalise quite well.

Vanilla NTMs have been tested to learn algorithms like copy, copy-repeat, associative recall and sorting using supervised methods. Successors of NTM, like the Differentiable Neural Computer (DNC) can perform more sophisticated general tasks like route planning, answering logical questions etc.

The NTM Architecture contains two basic components — Controller and Memory. The controller is a neural network which interacts with the external world with standard input and output vectors. Additionally, it interacts with the memory bank using selective read and write operations. The network output neurons responsible for memory interaction are called heads. This is where the analogy of the Turing machine exists. The head interaction with memory is very sparse and it uses focused attention mechanisms.

So, what happens in Reading and Writing?

In Reading, memory is defined by a matrix of size N×M where, N is the number of memory locations and M is the length of the vector at each location.

Here Mt is the contents of the memory at time t and Wt be the vector of weights over the N locations. All weights are normalised and lie between 0 and 1 and sum up to 1. The Read Head returns a red vector rt of size M defined by a weighted combination of row vectors Mt (i).

While in Writing, the write operation is decomposed into two parts — erase and add. In Erase, given a weight wt emitted by the write head at time t along with an erase vector et (of length M), the memory contents M(t-1) from the previous time step are modified as follows,

If both weights wt and erase vector et are 1 then the memory is reset to zero. If any one of them is 0 then the previous contents remain unchanged. In Add, each write head also produces an add

vector at which is used to perform the changes to the Memory after the erase step.

Memory Augmented Neural Networks (MANN) use external memory coupled with attention to generalise and adapt to different tasks. Memory Networks use attention over the supporting information (e.g. stories) to infer on the input question. Multiple hops in them can help in considering multiple sequences for inference that resembles recurrent networks. The Neural Turing Machine has a controller neural network which learns to read and write to a memory matrix using selective attention to solve a task. NTMs tend to learn internal algorithms to solve the problem and thus generalise well. Memory Networks are mainly used for NLP based tasks. But it can be tuned for any sequence-based tasks. They are shallow and compute friendly.

Next up we have the final and the most sophisticated type, Optimization-based Meta learning. Stay tuned for the climax blog of this series. Or go ahead check this out.

USEFUL LINKS :-

--

--

CellStrat

A Simple and Unified AI Platform for Developers and Researchers.