Meta Learning — A path to Artificial General Intelligence Series Part I — Meta Learning and Metric-Based Meta Learning
Imagine a world where a game bot trained on pubg being able to start playing call of duty when introduced to it. Or a robot who knows how to walk in flat soil knows how to walk in rough terrain just like that. Got the idea? The thing is, it’s not so much of an imaginative thing right now, because we have META Learning right now, with the motto of Learning to Learn. This is all about designing models which can learn and adapt to new tasks fast and efficiently without much fine-tuning. We want to design models that can learn to generalize well to new tasks, data or environments which are different from the ones in training.
Meta Learning can be approached in different ways : Metric-Based — Learn an efficient distance function for similarity, Model-Based — Learn to utilize internal/external memory for adapting (MANN) and Optimization-Based — Optimize the model parameters explicitly for learning quickly.
So let’s look into this more deeply. First, we have Metric-Based meta learning. This kind of meta learning provides us a way to develop generalized systems with relatively less data. There a couple of notable use cases for this like,
● Hyperparameter optimization for meta learning the optimal hyperparameters. Techniques like Genetic Algorithms can also be used to optimize the neural network hyperparameters (meta).
● Neural Architecture Search
● Few-shot learning and generalization across multiple tasks with little/no fine-tuning.
Now let’s see how this is different from a normal training procedure. In normal ML training, two factors affecting the training process are: Optimizer and Initial Parameters. Here, we initialize the model parameters aka model weights(θ). As shown below in the figure, in the Forward Propagation loop, the input X is fed to the system which moves forward; the model calculates the activations in each successive layer by the help of model weights. Finally, at the end of the model, the loss or cost function is calculated. This loss is back propagated and weights are adjusted with a given optimization algorithm like Stochastic Gradient Descent(SGD). The next iteration runs with updated weights and the above cycle repeats.
Coming to Meta Learning, we are learning to learn, which means learning the ideal weight initialization as well as the correct optimization logic.
Let’s try to visualize this. Here there is a gϕ meta-learner which in turn uses SGD meta-optimizer to learn the primary learner fθ.
When it comes to metric based meta learning, the core idea is similar to KNNs and K-Means which are nearest neighbour algorithms and kernel density estimation. This essentially means that the predicted probabilities of a given input is equal to the weighted sum of its labels (one-hot encoded) where the weight is generated by a kernel function which measures the similarity between two samples.
Lets try to make sense of this approach through the few-shot classification problem.
Few-shot Classification
This is an instance of meta-learning in a supervised context. Take a simple image classification dataset D with images and its corresponding labels. Each task in D is a single data-point having one image and one label and the goal is to predict the label given a single image trained on the entire dataset. To frame it as a meta-learning problem for generalization, we can transform this dataset into a collection of mini-datasets having only a handful of images per class. We train our model on all these mini-datasets. The goal is to learn to classify with less training data.
Each forward pass on one mini-dataset (task) is called an episode. In notational terms, this translates to the Dataset D being transformed into a mini dataset B, each containing a Support Set S and Query Set Q. Support Set S contains the small number training examples of each class and Query Set Q contains the testing examples to run classification on, given the training examples in support set S. It is modelled as a k-shot n-way problem where k is the number of training examples for one class and n is the total number of classes in the task/mini-dataset B. Support Set S contains k number of examples for each class in n. The data transformed this way allows us to train explicitly for learning with less data. During testing, we will be giving the model a mini-dataset B only, so training it in the same way as it would be tested on with mini-datasets, is also called “training in the same way as testing”. In practice, we perform similarity measurement on the support set images with the query set images making it a metric-based meta learning problem.
The goal is to find parameters such that finding the optimal parameters θ∗ for our model where the expectation is maximised for all subtasks B in the dataset D; for each sub-task B the probability of classifying the correct label is maximised.
So to sum it up,
- Meta-Learning deals with creating models which either learn and optimise fast or models which generalise and adapt to different tasks easily.
- Few-Shot classification frames image classification as a multiple-task learning problem.
- Metric Learning is about learning to accurately measure similarity in a given support set and generalize to other datasets So now we have a fair idea about Metric-based Meta learning. The next in line is Model-Based Meta-Learning, stay tuned for the next part of this series. Or if you can’t wait that long, check our CellStrat Research Archives here, to learn all about Meta-Learning and its types.
USEFUL LINKS :-
- CellStrat Hub
- CellStrat Hub AI Project Packs
- CellStrat Hub AI Apps and API Marketplace
- CellStrat YouTube Channel
- CellStrat product demo : Vision Apps and NLP Apps
- CellStrat AI Lab community (Disrupt4.0)
- CellStrat research : https://bit.ly/CS-research