How to harness learning curves to diagnose the performance of machine learning models
A learning curve can be defined as a plot of a model learning performance with experience or with the passage of time.
Learning curves are a broadly leveraged diagnostic tool in machine learning for algorithms that go about learning from a training dataset incrementally. The model can be assessed on the training dataset and on a hold out validation dataset after every update in training and plots of the quantified performance can be developed to display learning curves.
Analysis of learning curves of models in the course of training can be leveraged in diagnosing issues with learning, like an underfit or overfit model, in addition to whether the training and validation datasets are suitably representative.
In this blog article by AICoreSpot, you will find out all about learning curves and how they can be leveraged to diagnose the learning and generalization behaviour of machine learning models, with instance plots displaying common learning problems.
After going through this guide, you will be aware of:
- Learning curves are plots that display changes in learning pages across time in terms of experience.
- Learning curves of model performance on the train and validation datasets can be leveraged to undertake diagnosis of an underfit, overfit, or well-fit model.
- Learning curves of model performance can be leveraged to diagnose if the train or validation datasets are not comparatively representative of the problem domain.
Tutorial Summarization
This guide is subdivided into three portions, which are:
1] Learning Curves
2] Diagnosing Model Behaviour
3] Diagnosing Unrepresentative Datasets
Learning Curves within Machine Learning
Typically, a learning curve is a plot that displays time or experience on the x-axis and learning or enhancement on the y-axis.
Learning curves (LC) are viewed effective tools for monitoring the performance of workers with exposure to a new activity. LCs furnish a mathematical representation of the learning process that takes place as task repetition takes place.
For instance, if were learning a new skill, your learned skill could be evaluated and allotted a numerical scoring each week over the course of one year. A plot of the scores across the 52 weeks is a learning curve and would display how your learning of the instrument has modified over time.
Learning Curve: Line plot of learning (y-axis) over experience (x-axis)
Learning curves are broadly leveraged within machine learning for algorithms that learn (optimize their internal parameters) incrementally with the passage of time, like deep learning neural networks.
The metric leveraged to assess learning could be maximizing, implying that improved scores (bigger numbers) signify more learning. An instance would be classification precision.
It is more typical to leverage a score that is reducing, like loss or error whereby improved scores (smaller numbers) signify more learning and a value of 0.0 signifies that the training dataset was learned perfectly and mistakes were committed.
In the training of a machine learning model, the present state of the model at every step of the training algorithm can be assessed. It can be evaluated on the training dataset to provide an idea on how well the model is learning. It can also be assessed on a hold-out validation dataset that is not part of the training dataset. Evaluation on the validation dataset provides an idea of how well the model is “generalizing”.
- Train learning curve: Learning curve quantified from the training dataset that provides an idea of how well the model is learning.
- Validation Learning Curve: Learning curve calculated from a hold-out validation dataset that provides an idea of how well the model is generalizing.
It is typical to develop dual learning curves for a machine learning model in training on both the training and validation datasets.
In some scenarios, it is also typical to develop learning curves for several metrics, such as in the scenario of classification predictive modelling problems, where the model might be optimized going by cross-entropy loss and model performance is assessed leveraging classification precision. In this scenario, two plots are created, one for the learning curves of every metric, and every plot can display two learning curves, one for each of the train and validation datasets.
- Optimization learning curves: Learning curves quantified on the metric by which the parameters of the model are being optimized, for example, loss.
- Performance learning curves: Learning curves quantified on the metric through which the model will be assessed and chosen, for example, precision.
Now that we are acquainted with the use of learning curves in machine learning, let’s look at some typical shapes observed in learning curve plots.
Undertaking diagnosis of model behaviour
The dynamics and shape of a learning curve can be leveraged to diagnose the behaviour of a machine learning model and in turn probably indicate the variant of configuration changes that may be made to enhance learning and/or performance.
There are three typical dynamics that you are probable to observe in learning curves, which are:
- Underfit
- Overfit
- Good fit
We will take a deeper look at each one with instances. The instances will assume that we are looking at a minimizing metric, implying that smaller relative scores on the y-axis signify more or improved learning.
Underfit learning curves
Underfitting is a reference to a model that cannot learn the training dataset.
Underfitting takes place when the model is not capable to gather an adequately low error value on the training set.
An underfit model can be detected from the learning curve of the training loss only.
It may display a flat line or noisy values of comparatively high loss, signifying that the model was unable to learn the training dataset to begin with.
An instance of this is furnished below and is typical when the model does not possess a suitable capacity for the intricacy of the dataset.
An underfit model might also be identified through a training loss that is reducing and continues to reduce at the end of the plot.
This signifies that the model has the potential of further learning and potential further enhancements and that the training procedure was stopped before time.
A plot of learning curves displays underfitting if:
- The training loss stays flat regardless of training.
- The training loss continues to reduce till the conclusion of the training.
Overfit Learning Curves
Overfitting is a reference to a model that has learned the training dataset too well, which includes the statistical noise or random fluctuations in the training dataset.
Fitting of a more flexible model needs estimating a bigger number of parameters. These more complicated models can lead to a phenomenon referred to as overfitting the data, which basically implies they follow the errors, or noise, too closely.
The issue with overfitting, is that the more specialized the model becomes to training data, the less well it is able to generalize to fresh data, having the outcome of an increase in generalization error. This escalation in generalization error can be quantified through the performance of the model on the validation dataset.
This is an instance of overfitting the data, […] it is an undesirable scenario as the fit gathered will not provide precise estimates on the response on new observations that were not part of the original training data set.
This often happens if the model has additional capacity than is needed for the problem, and, in turn, too much flexibility. It can also happen if the model is trained for too long a duration.
A plot of learning curves displays overfitting if:
- The plot of training loss continues to reduce with experience.
- The plot of validation loss reduces to a point and starts increasing again.
The inflection point in validation loss might be the point at which training could be stopped as experience following that point displays the dynamics of overfitting.
The instance plot below displays a case of overfitting.
Good Fit Learning Curves
A good fit is the objective of the learning algorithm and exists amongst an overfit and underfit model.
An ideal fit is detected through a training and validation loss that reduces to a point of stability with a minimum gap between the two final loss values.
The loss of the model will nearly unanimously be lesser on the training dataset than the validation dataset. This implies that we ought to expect some gaps between the train and validation loss learning curves. This gap is called as the “generalization gap”
A plot of learning curves displays a good fitting if:
- The plot of a training loss reduces to a point of stability.
- The plot of validation loss reduces to a point of stability and has a minimal gap with the training loss.
Ongoing training of a good fit will probably lead to an overfit.
The instance plot below illustrates a case of a good fit.
DiD
Diagnosis of Unrepresentative Datasets
Learning curves can also be leveraged to diagnose attributes of a dataset and whether it is comparatively representative.
An unrepresentative dataset implies a dataset that might not capture the statistical traits relative to another dataset drawn from the same domain, like between a train and a validation dataset. This can typically happen if the number of samples in a dataset is too minimal, comparative to another dataset.
There are two typical scenarios that could be witnessed, they are:
- Training dataset is comparatively unrepresentative
- Validation dataset is comparatively unrepresentative.
Unrepresentative Train Dataset
An unrepresentative training dataset implies that the training dataset does not furnish adequate data to learn the problem, comparative to the validation dataset leveraged to assess it.
This may happen if the training dataset has too minimal instances contrasted to the validation dataset.
This scenario can be detected by a learning curve for training loss that displays improvement and likewise a learning curve for validation loss that displays improvement, but a massive gap between both curves.
Unrepresentative Validation Dataset
An unrepresentative validation dataset implies that the validation dataset does not furnish adequate data to assess the capability of the model to generalize.
This might happen if the validation dataset has too few instances as contrasted to the training dataset.
This scenario can be identified through a learning curve for training curve for training loss the appears like a good fit (or other fits) and a learning curve for validation loss that displays noisy movements around the training loss.
It might also be detected through a validation loss that is lesser than the training loss. In this scenario, it signifies that the validation dataset might be simpler for the model to forecast than the training dataset.
Further Reading
This section furnishes additional resources on the subject if you are seeking to delve deeper.
Books
Deep Learning, 2016
An introduction to statistical learning: with Applications in R, 2013
Papers
Learning curve models and applications: Literature review and research directions, 2011
Posts
How to diagnose overfitting and underfitting of LSTM models
Overfitting and underfitting with machine learning algorithms
Articles
Learning curve, Wikipedia
Overfitting, Wikipedia
Conclusion
In this article, you found out about learning curves and how they can be leveraged to diagnose the learning and generalization behaviour of ML models.
Particularly, you learned:
- Learning curves are plots that display modifications in learning performance over time in terms of experience
- Learning curves of model performance on the train and validation datasets can be leveraged to diagnose an underfit, overfit, or well-fit model
- Learning curves of model performance can be leveraged to undertake diagnosis of whether the train or validation datasets are not relatively representative of the problem domain.