An intro to multiple-model machine learning
An ensemble learning strategy consisting of bringing together the predictions from several contributing models.
Nonetheless, not all strategies that leverage several machine learning models are ensemble learning algorithms.
It is typical to divide a forecasting problem into subproblems. For instance, some issues naturally subdivide into independent but connected subproblems and a machine learning model can be prepared for each. It is less obvious if these signify instances of ensemble learning, even though we may distinguish these strategies from ensembles provided the inability for a contributing ensemble member to generate a solution (however weakly) to the cumulative prediction problem.
In this guide, you will find out multiple-model techniques for machine learning and their connection to ensemble learning.
After going through this guide, you will be aware of:
- Multiple-model machine learning references to strategies that leverage multiple models in some fashion that closely resembles ensemble learning.
- Leveraging of multiple models for multi-class classification and multi-output regression differ from ensembles in that no contributing member can find a solution to the problem
- Mixture of specialists might be considered a true ensemble method, even though hybrid ML models are likely not ensemble learning methods
Tutorial Summarization
This guide is subdivided into five portions, which are:
- Multiple-model techniques
- Multiple Models for Multi-Class Classification
- Multiple Models for Multi-Output Regression
- Multiple Expert Models
- Hybrids built from multiple models
Multiple Model Techniques
Ensemble learning is in relation to approaches that bring together predictions from two or more models.
We can characterize model as an ensemble learning strategy if it possesses two attributes, like:
- Comprising two or more models
- Predictions are combined
We might also suggest that the objective of an ensemble model is to enhance predictions over any contributing member. Although a lesser objective might be to enhance the stability of the model, for example, minimize the variance in the predictions or prediction errors.
Nonetheless, there are models and model architectures that consist of elements of ensemble learning strategies, but it is not obvious as to whether they might be considered ensemble learning or not.
For instance, we might go about defining an ensemble learning strategy as being consisted of two or more models. The issue is that there may be strategies that possess more than two models, yet do not bring together their predictions. Alternatively, they may bring together their predictions in unexpected ways.
There are a few methods which attempt to leverage several learners, yet in a strict sense they can not be recognized as ensemble combination methods.
For the lack of a better name, we will call these as “multiple model techniques” to assist in differentiating them from ensemble learning strategies. However, as we will observe, the line that distinguishes these two variants of ML methods is that that obvious.
Multiple-model Techniques: Machine learning algorithms that are consisted of several models and bring together the techniques but might not be considered ensemble learning.
As such, it is critical to review and explore multiple-model strategies that sit on the border of ensemble learning to both understand better ensemble learning and to draw upon connected ideas that might improve the ensemble learning models we develop.
There are predictive modelling issues where the structure of the problem itself may suggest the leveraging of several models.
Usually, these are issues that can be divided naturally into sub-issues. This does not imply that dividing the problems into subproblems is the ideal solution for a provided instance, it only implies that the issue naturally lends itself to decomposition.
Two instances are multi-class classification and multiple-output regressions.
Multiple Models for Multi-Class Classification
Classification problems consist of allocating a class label to input instances.
Binary classification activities are those that possess two classes. One decision is made for every instance, either allocating it to one class or another. If modelled leveraging probability, a singular probability of the instance being to one class is forecasted, where the inverse is the probability for the second class, referred to as a binomial probability distribution.
More than two classes can put forth a challenge. The strategies developed for two classes can be extended to several classes, and at times, this is straightforward.
Multi-class classification: Allocate one among more than class labels to a provided input instance.
Alternatively, the issue can be naturally partitioned into several binary classification activities.
There are several ways this can be accomplished.
For instance, the classes can be grouped into several one-vs-rest forecasting problems. A model can then be fitted for every subproblem and usually the same algorithm variant is leveraged for every model. When a forecast is needed for a new instance, then the model that responds more strongly than the other models can allocate a prediction. This is called a one-vs-rest (OvR) or one-vs-all (OvA) approach.
OvR: A strategy that splits a multi-class classification into a singular binary classification problem per class.
The multi-class classification problem can be divided into several pairings of classes, and a model fit on each. Again, a forecast for a new instance can be selected from the model that responds more strongly. This is referenced to as one-vs-one (OvO)
OvR: A technique that splits a multi-class classification into a single binary classification per each pairing of classes.
For more on one-vs-rest and one-vs-one classification; look into the guide.
The strategy of partitioning a multi class classification problem into several binary classification problems can be generalized. Every class can be mapped to a unique binary string for the class with arbitrarily length. One classifier can then be fitted to forecast every bit in the bit string, enabling an arbitrary number of classifiers to be leveraged.
The bit string can then be mapped to the class label with the nearest match. The extra bits function like error-correcting codes, enhancing the performance of the approach in some scenarios over simpler OvR and OvO methods. This strategy is referenced to as error-correcting output codes, ECOC.
ECOC: A strategy that splits a multi-class classification into a random number of binary classification problems.
In each of these scenarios, several models are leveraged, just like an ensemble. Predictions are additionally combined, like an ensemble method, even though in a winner-take-all method instead of a vote or weighted sum. Technically, this is a combination method, but unlike most conventional ensemble learning methods.
Unlike ensemble learning, these strategies are developed to explore the natural decomposition of the forecasting problem over contributing models. With strategies like OvR, OvR, and ECOC, the contributing models can’t be leveraged to tackle the prediction issue in isolation, by definition.
Multiple Models for Multi-Output Regression
Regression issues consist of forecasting a numerical value provided an input instance.
Usually, a single output value is forecasted. Nonetheless, there are regression issues where several numeric values must be forecasted for every input example. These issues are referenced to as multiple-output regression problems.
Models can be produced to forecast all target values simultaneously, even though a multi-output regression issue is another instance of an issue that can be naturally divided into subproblems.
Like binary classification in the prior section, most strategies for regression predictive modelling were developed to forecast a singular value. Forecasting several values can pose a problem and needs the modification of the strategy. Some techniques cannot be reasonably modified for several values.
One strategy is to develop a separate regression model to forecast every target value in a multi-output regression issue. Conventionally, the same algorithm variant is leveraged for every model. For instance, a multi-output regression with three targeted values would consist of fitting 3 models, one for every target.
When a forecast is needed, the same input pattern is furnished to every model and the particular target for every model is forecasted and together are representative of the vector output of the method.
Multi-output regression: A strategy where a singular regression model is leveraged for every target within a multi-output regression problem.
Another connected approach is to develop a sequential chain of regression models. The difference is that the output of the first model forecasts the first output target value, however, this value is leveraged as part of the input to the second model in the chain in order to forecast the second output target value, and so on.
As such, the chain puts forth a linear dependence amongst the regression models, enabling the outputs of models later in the chain to be conditional on the outputs of prior models in the chain.
Regression chain: A strategy where a sequential chain of regression models is leveraged to forecast every target in a multi-output regression issue, one model later in the chain leverages values forecasted by models earlier on in the chain.
In each scenario, several regression models are leveraged, just like an ensemble.
A potential difference from ensembles is that the forecasts made by every model are not combined directly. We could stretch the definition of “combining predictions” to cover this strategy, however. For instance, the forecasts are concatenated in the scenario of multi-output regression models and indirectly through the conditional approach in chained regression.
The key distinguishing point from ensemble learning strategies is that no contributing ensemble member can solve the prediction problem alone. A solution can only be accomplished by bringing together the forecasts from all members.
Multiple Expert Models
Thus far, we have looked into dividing problems into subtasks on the basis of the structure of what is being forecasted. There are also issues that can be naturally divided into subproblems on the basis of the input data. This could be as simple as partitions of the input feature space, or something more elaborate, such as dividing an image into the foreground and background and generating a model for each.
A more generalized approach for this from the domain of neural networks is referenced to as a mixture of experts (MoE)
MoE: A strategy that develops an expert model for every subtask and learns how much to trust every expert when making a forecast for particular instances.
Two elements of MoE make the strategy unique. The first is the overt partitioning of the input feature space, and the second is the leveraging of a gating network or gating model that learns which expert to trust in every scenario. E.g., every input case.
Unlike the prior instances for multi-class classification and multi-output regression that divided the target into subproblems, the contributing members in a mixture of expert model can tackle the entire problem, at the very least, partially, or to some extent. Even though a specialist might not be tailored to a particular input, it can still be leveraged to make a forecast on something outside of its area of expertise.
Also, not like those prior reviewed methods, a mixture of experts also combines the forecasts of all contributing members leveraging a weighted sum, albeit measured by the gating network.
As such, it more closely resembles more familiar ensemble learning strategies, like stacked generalization, referred to as stacking.
Hybrids built from multiple models
Another variant of machine learning that consists of the leveraging of multiple models and is loosely connected to ensemble learning is hybrid models.
Hybrid models are those models that bring together two or more models explicitly. As such, the definition of what does and does not constitute a hybrid model can be fuzzy.
Hybrid Model: A strategy that brings together two or more differing machine learning models in some fashion.
For instance, an autoencoder neural network that goes about learning how to compress input patters to a bottleneck layer, the output of which is then inputted to another model, like a support vector machine, would be viewed a hybrid machine learning model.
This instance has dual machine learning models, a neural network and a support vector machine. It just so happens that the models are linearly stacked one on top of another into a pipeline and the final model in the pipeline makes a forecast.
Take up an ensemble learning method that has several contributing ensemble members of differing types (for example, a logistic regression and a support vector machine) and leverages voting to average their forecast. This ensemble too might be viewed as a hybrid machine learning model, under the wider definition.
Probably the critical difference amongst ensemble learning and hybrid machine learning is the requirement in hybrid models to leverage models of differing types. While in ensemble learning, contributing members to the ensemble might be of any type.
Further, it is more probable that a hybrid machine learning will graft one or more models onto another base model, which is very differing from fitting separate models and bringing together their forecasts as we do in ensemble learning.
Further Reading
This part of the blog post furnishes additional resources on the subject if you are seeking to delve deeper.
Books
Pattern Classification using Ensemble Methods, 2010
Ensemble Methods, 2012
Ensemble Machine Learning, 2012
Conclusion
In this guide, you found out about multiple model strategies for machine learning and their relationship to ensemble learning.
Particularly, you learned:
- Multiple-model machine learning references to strategies that leverage multiple models in some fashion that closely resembles ensemble learning.
- Leveraging of multiple models for multi-class classification and multi-output regression differ from ensembles in that no contributing member can solve the issue
- Mixture of experts might be considered a true ensemble method, even though hybrid machine learning models are likely not ensemble learning methods