An intro to Premature Convergence
Convergence is a reference to the limit of a process and can be a good analytical utility when assessing the forecasted performance of an optimization algorithm.
It can also serve as a good empirical utility when looking into the learning dynamics of an optimization algorithm, and machine learning algorithms trained leveraging an optimization algorithm, like deep learning neural networks. This motivates the investigating of learning curves and strategies, like early stopping.
If optimization is a procedure that produces candidate solutions, then convergence is representative of a stable point at the end of the process when no further modifications or enhancements are expected. Premature convergence is a reference to a failure mode for an optimization algorithm where the procedure ceases at a stable point which is not representative of a globally optimal solution.
In this guide, you will find out about premature convergence within machine learning.
After going through this guide, you will be aware of:
- Convergence being a reference to the stable point identified at the end of a sequence of solutions through an iterative optimization algorithm.
- Premature convergence is a reference to a stable point identified too soon, probably close to the beginning point of the search, and with a worse evaluation than expected.
- Greediness of an optimization algorithm furnishes control over the rate of convergence of an algorithm.
Tutorial Summarization
This guide is subdivided into three portions, which are:
1] Convergence within machine learning
2] Premature Convergence
3] Addressing Premature Convergence
Convergence within Machine Learning
Convergence typically refers to the values of a procedure that have a tendency in behaviours with the passage of time.
It is a good idea when working with optimization algorithms.
Optimization is a reference to a variant of problem that needs identifying a group of inputs that have the outcome of the maximum or minimum value from an objective function. Optimisation is an iterative procedure that generates a sequence of a candidate solutions till eventually arriving at a final solution at the conclusion of the process.
This behaviour or dynamics of the optimisation algorithm arriving on a stable-point final solution is called convergence, for example, the convergence of the optimisation algorithms. In this fashion, convergence defines the termination of the optimisation algorithm.
Local descent consists of iteratively choosing a descent direction and then taking a step in that direction and doing that process over and over again till convergence or some termination condition is attained.
Convergence: Stop condition for an optimization algorithm where a stable point is situated and subsequent iterations of the algorithm are not likely to have the outcome of further enhancements.
We may measure and look into the convergence of an optimisation algorithm empirically, like leveraging learning curves. Also, we may additionally explore the convergence of an optimisation algorithm analytically, like a convergence proof or average case computational complexity.
Strong selection pressure has the outcome of swift, but potentially premature convergence. Weakening the selection pressure slows down the search procedure.
Optimisation, and the convergence of optimisation algorithms, is a critical idea within machine learning for those algorithms that fit on a training dataset through an iterative optimization algorithm, like logistic regression and artificial neural networks.
As such, we may opt for optimisation algorithms that have the outcome of improved convergence behaviour in comparison to other algorithms, or a spend a lot of time tuning the convergence dynamics (learning dynamics) of an optimisation algorithm through the hyperparameters of the optimisation. (For example, learning rate)
Convergence behaviour can be contrasted, typically in terms of the number of iterations of an algorithm needed until convergence, to the objective function assessment of the stable point identified at convergence, and combos of these concerns.
Premature Convergence
Premature convergence is a reference to the convergence of a process that has taken place too soon.
In optimisation, it is a reference to the algorithm converging upon a stable point that features worse performance than expected.
Premature convergence usually impacts complex optimisation activities where the objective function is non-convex, implying that the response surface consists of several differing good solutions (stable points), probably with one, or some, ideal solutions.
If we take up the response surface of an objective function under optimisation as a geometrical landscape and we are looking for a minimum of the function, then premature optimisation is a reference to identifying a valley close to the beginning point of the search that has lesser depth than the deepest valley in the problem domain.
For issues that demonstrate highly multi-modal (rugged) fitness landscapes or landscapes that alter over time, too much exploitation typically has the outcome of premature convergence to suboptimal peaks in the space.
In this fashion, premature convergence is detailed as identifying a locally optimal solution rather than the globally optimal solution for an optimisation algorithm. It is a specific failure case for an optimisation algorithm.
Premature Convergence: Convergence of an optimisation algorithm to a worse than optimal stable point that is probably close to the beginning point.
To put it in a different way, convergence denotes the conclusion of the search process, for instance, a stable point was situated and subsequent iterations of the algorithm are not probable to enhance the solution. Premature convergence is a reference to attaining this stop condition of an optimisation algorithm at a less than desirable stationary juncture.
Addressing Premature Convergence
Premature convergence might be an appropriate concern on any adequately challenging optimisation task.
For instance, a dominant proportion of research into the domain of evolutionary computation and genetic algorithms consists of identifying and overcoming the premature convergence of the algorithm on an optimisation activity.
If selection concentrates on the most-fit specimens, the selection pressure might create premature convergence owing to reduced diversity of the new populations.
Population-based optimisation algorithms, such as evolutionary algorithms and swarm intelligence, often detail their dynamics in terms of the interplay amongst selective pressures and convergence. For instance, robust selective pressures have the outcome of quicker convergence and probably premature convergence. Weaker selective pressures may have the outcome of a slower convergence (larger computing cost) although probably situate a better or even global optima.
An operator with a high selective pressure reduces diversity amongst the population more swiftly than operators with a low selective pressure, which may cause premature convergence to suboptimal solutions. A high selective pressure restricts the exploration capabilities of the population.
This concept of selective pressure is beneficial in a more general sense in comprehending the learning dynamics of optimisation algorithms. For instance, an optimisation that is setup to be excessively greedy, for instance, through hyperparameters like the step size or learning rate; may fail owing to premature convergence, while the same algorithm that is setup to be less greedy may surpass premature convergence and find out an improved or globally optimal solution.
Premature convergence might be faced when leveraging stochastic gradient descent in training a neural network model, indicated through a learning curve that drops exponentially quickly then ceases improving.
The number of updates needed to attain convergence typically increases with training set size. Although as m approaches infinity, the model will gradually converge to its best potential test error prior to SGD sampling each instance within the training set.
The fact that fitment of neural is networks is subject to premature convergence compels the leveraging of strategies like learning curves to survey and diagnose problems with the convergence of a model on a training dataset, and the leveraging of regularization, like early stopping, that halts the optimisation algorithm before identifying a stable point comes at the expense of bad performance on a holdout dataset.
As such, a lot of research within deep learning neural networks is ultimately targeted at surpassing premature convergence.
Empirically, it is usually identified that ‘tanh’ activation functions give rise to quicker convergence of training algorithms than logistic functions.
This consists of strategies like work on weight initialization, which is crucial as the initial weights of a neural network define the beginning point of the optimisation process, and bad initialization can cause premature convergence.
The initial point can decide if the algorithm has convergence at all, with a few initial points being so unstable that the algorithm faces numerical difficulties and fails totally.
This also consists of the broad number of a variations and extensions of the stochastic gradient descent optimisation algorithm, like the addition of momentum so that the algorithm does not overshoot the optima (stable point), and Adam that includes an automatically adapted step size hyperparameter (learning rate) for every parameter that is being optimised, dramatically quickening up convergence.
Conclusion
In this guide, you had a brief intro to premature convergence within machine learning.
Particularly, we learned:
- Convergence is in reference to the stable point identified at the end of a sequence of solutions through an iterative optimisation algorithm.
- Premature convergence is in reference to a stable point identified too quickly, probably close to the beginning point of the search, and with a worse evaluation than expected.
- Greediness of an optimisation algorithm furnishes a control over the rate of convergence of an algorithm.