Dynamic Classifier Selection Ensembles in Python
Dynamic classifier selection is a variant of ensemble learning algorithm for classification predictive modelling.
The strategy consists of fitting several machine learning models on the training dataset, then choosing the model that is expected to perform best when making a forecast, on the basis of particular details of instance to be forecasted.
This can be accomplished leveraging a k-nearest neighbour model to locate instances in the training dataset that are nearest to the new instance to be forecasted, assessing all models in the pool on this neighbourhood and leveraging the model that has the best performance on the neighbourhood to make a forecast for the new instance.
As such, the dynamic classifier selection can typically perform in a better way than any singular model in the pool and furnishes an alternative to averaging the forecasts from several models, as is the scenario in other ensemble algorithms.
In this guide, you will find out how to develop dynamic classifier selection ensembles in Python.
After going through this guide, you will be aware of:
- Dynamic classifier selection algorithms choose one from among several models to make a forecast for every new instance.
- How to develop and assess dynamic classifier selection models for classification activities leveraging the scikit-learn API
- How to explore the impact of dynamic classifier selection model hyperparameters on classification precision.
Tutorial Summarization
This guide is subdivided into three portions, which are:
1] Dynamic Classifier Selection
2] Dynamic Classifier Selection with Scikit-learn
- DCS with overall local accuracy (OLA)
- DCS with local class accuracy (LCA)
3] Hyperparameter Tuning for DCS
- Explore k in k-nearest neighbour
- Explore algorithms for classifier pool
Dynamic Classifier Selection
Multiple classifier systems is a reference to a domain of machine learning algorithms that leverage several models to tackle classification predictive modelling problems.
This consists of familiar strategies such as one v. rest, one v. all, and output error-correcting codes strategies. It also includes more general strategies that choose a model to leverage dynamically for each new instance that needs a forecast.
Several strategies are presently leveraged to construct an MCS […] One of the most promising MCS strategies is Dynamic Selection (DS), in which the base classifiers are chosen on the fly, according to every new sample to be classified.
These strategies are generally known by the name: Dynamic classifier selection, or DCS for short.
Dynamic classifier selection: Algorithms that select one from among several trained models to make a forecast on the basis of the particular details of the input. Provided the multiple models are leveraged in DCS, it is viewed as a variant of ensemble learning strategy.
Dynamic classifier selection algorithms typically consist of partitioning the input feature space in some fashion and allocating particular models to be accountable for making forecasts for every partition. There are an array of differing DCS algorithms and research attempts are primarily concentrated on how to assess and allocate classifiers to particular regions of the input space.
Upon training several individual learners, DCS dynamically chooses one learner for every test instance […] DCS makes forecasts by leveraging one individual learner.
An early and widespread strategy consists of first fitting a small, diverse grouping of classification models on the training dataset. When a forecast is needed, first a k-nearest neighbour (kNN) algorithm is leveraged to identify the k most similar instances from the training dataset that match the instance. Each prior fit classifier in the model is then assessed on the neighbour of k training instances and the classifier that performs the best is chosen to make a forecast for the new instance.
This strategy is referenced to as “Dynamic Classifier Selection Local Accuracy” or DCS-LA for short and was detailed by Kevin Woods et al. in their 1997 paper entitled “Combination of Multiple Classifiers using Local Accuracy Estimates”
The fundamental idea is to estimate every classifier’s precision in local region of feature space surrounding an unknown test sample, and then leverage the decision of the most locally precise classifier.
The authors detail two strategies for choosing a singular classifier model to make a forecast for a provided input instance, which are:
- Local accuracy: often referred to as LA or Overall Local Accuracy (OLA)
- Class accuracy: often referred to as CA or Local Class Accuracy (LCA)
Local accuracy (OLA) consists of evaluating the classification precision of each model on the neighbourhood of k training instances. The model that has the best performance in this neighbourhood is then chosen to make a forecast for the new instance.
The OLA of every classifier is computed as the percentage of the correct recognition of the samples in the local region.
Class Accuracy (CLA) consists of leveraging every model to make a forecast for the new instance and noting the class that was forecasted. Then, the precision of every model on the neighbour of k training instances is assessed and the model that has the best skill for the class that it forecasted on the new instance is chosen and its forecast returned.
The LCA is estimated for every base classifier as the percentage of correct classifications within the local region, but considering just those instances where the classifier has provided the same class as the one it provides for the unknown pattern.
In both scenarios, it all fit models make the same forecast for a fresh input instance, then the forecast is returned directly.
Now that we are acquainted with DCS and the DCS-LA algorithm, let’s look at how we can leverage it on our own classification predictive modelling projects.
Dynamic Classifier Selection with Scikit-Learn
The Dynamic Ensemble Selection Library or DESlib for short is an open source Python library that furnishes an implementation of several differing dynamic classifier selection algorithms.
DESlib is a simple-to-leverage ensemble learning library concentrated on the implementation of the state-of-the-art strategies for dynamic classifier and ensemble selection.
To start with, we can set up the DESlib library leveraging the pip package manager.
sudo pip install deslib
Once setup, we can provide confirmation that the library was setup correctly and is ready to be leveraged by loading the library and printing the setup version.
1 2 3 | # check deslib version import deslib print(deslib.__version__) |
Running the script will print your version of the DESlib library you have setup.
Your version should be the same or higher. If not, you must upgrade your version of the DESlib library.
0.3
The DESlib furnishes an implementation of the DCS-LA algorithm with every classifier selection strategy through the OLA and LCA classes respectively.
Every class can be leveraged as a scikit-learn model directly, enabling the complete suite of scikit-learn data prep, modelling pipelines, and model evaluation strategies to be leveraged directly.
Both classes leverage a k-nearest neighbour algorithm to choose the neighbour with a default value of k=7.
A bootstrap aggregation (bagging) ensemble of decision trees is leveraged as the pool of classifier models taken up for each classification that is made by default, even though this can be modified by setting “pool_classifiers” to a listing of models.
We can leverage the make_classification() function to develop a synthetic binary classification problem with 10,000 instances and 20 input features.
[Control]
1 2 3 4 5 6 | # synthetic binary classification dataset from sklearn.datasets import make_classification # define dataset X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7) # summarize the dataset print(X.shape, y.shape) |
Running the instance creates the dataset and summarizes the shape of the input and output components.
(10000, 20) (10000,)
Now that we are acquainted with the DESlib API, let’s look at how to leverage each DCS-LA algorithm.
DCS with overall local accuracy (OLA)
We can assess a DCS-LA model leveraging cumulative local precision on the synthetic dataset.
In this scenario, we will leverage default model hyperparameters, which includes bagged decision trees as the pool of classifier models and k=7 for the selection of the local neighbourhood when making a forecast.
We will assess the model leveraging repeated stratified k-fold cross-validation with three repeats and 10 folds. We will report the mean and standard deviation of the precision of the model across all repeats and folds.
The complete instance is detailed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # evaluate dynamic classifier selection DCS-LA with overall local accuracy from numpy import mean from numpy import std from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from deslib.dcs.ola import OLA # define dataset X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7) # define the model model = OLA() # define the evaluation procedure cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate the model n_scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1) # report performance print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(n_scores), std(n_scores))) |
Running the instance reports the mean and standard deviation precision of the model.
Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we can observe the DCS-LA with OLA and default hyperparameters accomplishes a classification precision of approximately 88.3%.
Mean Accuracy: 0.883 (0.012)
We can additionally leverage the DCS-LA model with OLA as a final model and make forecasts for classification.
To start with, the model is fitted on all available data, then the predict() function can be referred to make forecasts on new data.
The instance below depicts this on our binary classification dataset.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 | # make a prediction with DCS-LA using overall local accuracy from sklearn.datasets import make_classification from deslib.dcs.ola import OLA # define dataset X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7) # define the model model = OLA() # fit the model on the whole dataset model.fit(X, y) # make a single prediction row = [0.2929949,-4.21223056,-1.288332,-2.17849815,-0.64527665,2.58097719,0.28422388,-7.1827928,-1.91211104,2.73729512,0.81395695,3.96973717,-2.66939799,3.34692332,4.19791821,0.99990998,-0.30201875,-4.43170633,-2.82646737,0.44916808] yhat = model.predict([row]) print(‘Predicted Class: %d’ % yhat[0]) |
Running the instance fits the DCS-LA with OLA model on the entire dataset and is then leveraged to make a forecast on a new row of data, as we might when leveraging the model in an application.
Predicted Class: 0
Now that we are acquainted with leveraging DCS-LA with OLA, let’s look at the LCA method.
DCS with Local Class Accuracy (LCA)
We can assess a DCS-LA model leveraging local class precision on the synthetic dataset.
In this scenario, we will leverage default model hyperparameters, which includes bagged decision trees as the pool of classifier models and a k=7 for the selection of the local neighbourhood when making a forecast.
We will assess the model leveraging repeated stratified k-fold cross-validation with three repeats and ten folds. We will go about reporting the mean and standard deviation of the precision of the model across all repeats and folds.
The full instance is detailed below.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # evaluate dynamic classifier selection DCS-LA using local class accuracy from numpy import mean from numpy import std from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from deslib.dcs.lca import LCA # define dataset X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7) # define the model model = LCA() # define the evaluation procedure cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate the model n_scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1) # report performance print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(n_scores), std(n_scores))) |
Running the instance reports the mean and standard deviation precision of the model.
Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or evaluation process or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we can observe the DCS-LA with LCA and default hyperparameters accomplishes a classification precision of approximately 92.2%
Mean Accuracy: 0.922 (0.007)
We can additionally leverage the DCS-LA model with LCA as a final model and make forecasts for classification.
First, the model is fitted on all available data, then the predict() function can be called to make forecasts on fresh data.
The instance below illustrates this on our binary classification dataset.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 | # make a prediction with DCS-LA using local class accuracy from sklearn.datasets import make_classification from deslib.dcs.lca import LCA # define dataset X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7) # define the model model = LCA() # fit the model on the whole dataset model.fit(X, y) # make a single prediction row = [0.2929949,-4.21223056,-1.288332,-2.17849815,-0.64527665,2.58097719,0.28422388,-7.1827928,-1.91211104,2.73729512,0.81395695,3.96973717,-2.66939799,3.34692332,4.19791821,0.99990998,-0.30201875,-4.43170633,-2.82646737,0.44916808] yhat = model.predict([row]) print(‘Predicted Class: %d’ % yhat[0]) |
Running the instance fits the DCS-LA with LCA model on the complete dataset and is then leveraged to make a forecast on a fresh row of data, as we might when leveraging the model in an application.
Predicted Class: 0
Now that we are acquainted with leveraging the scikit-learn API to evaluate and leverage DCS-LA models, let’s look at model configuration.
Hyperparameter Tuning for DCS
In this portion of the blog, we will take a deeper look at a few of the hyperparameters you should consider tuning for the DCS-LA model and their impact on model performance.
There are several hyperparameters we can observe for DCS-LA, even though in this scenario, we will look at the value of k in the k-nearest neighbor model leveraged in the local evaluation of the models, and how to leverage a custom pool of classifiers.
We will leverage the DCS-LA with OLA as the foundation for these experiments, even though the choice of the particular method is random.
Explore k in k-nearest Neighbor
The configuration of the k-nearest neighbor algorithm is vital to the DCS-LA model as it defines the scope of the neighborhood in which every classifier is taken up for selection. The k value controls the size of the neighborhood and it is critical to set it to a value that is relevant for your dataset, particularly the density of samples within the feature space. A value too minimal will imply that appropriate instances in the training set might be excluded from the neighborhood, whereas values too big might mean that the signal is being washed out by too many instances.
The instance below explores the classification precision of the DCS-LA with OLA with k values from 2 to 21.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | # explore k in knn for DCS-LA with overall local accuracy from numpy import mean from numpy import std from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from deslib.dcs.ola import OLA from matplotlib import pyplot
# get the dataset def get_dataset(): X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7) return X, y
# get a list of models to evaluate def get_models(): models = dict() for n in range(2,22): models[str(n)] = OLA(k=n) return models
# evaluate a give model using cross-validation def evaluate_model(model): cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1) return scores
# define dataset X, y = get_dataset() # get the models to evaluate models = get_models() # evaluate the models and store results results, names = list(), list() for name, model in models.items(): scores = evaluate_model(model) results.append(scores) names.append(name) print(‘>%s %.3f (%.3f)’ % (name, mean(scores), std(scores))) # plot model performance for comparison pyplot.boxplot(results, labels=names, showmeans=True) pyplot.show() |
Running the instance first reports the mean precision for every configured neighborhood size.
Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation process, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we can observe that precision increases with the neighborhood size, probably to k=13 or k=14, where it seems to level off.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | >2 0.873 (0.009) >3 0.874 (0.013) >4 0.880 (0.009) >5 0.881 (0.009) >6 0.883 (0.010) >7 0.883 (0.011) >8 0.884 (0.012) >9 0.883 (0.010) >10 0.886 (0.012) >11 0.886 (0.011) >12 0.885 (0.010) >13 0.888 (0.010) >14 0.886 (0.009) >15 0.889 (0.010) >16 0.885 (0.012) >17 0.888 (0.009) >18 0.886 (0.010) >19 0.889 (0.012) >20 0.889 (0.011) >21 0.886 (0.011) |
A box and whisker plot is developed for the distribution of precision scores for each configured neighborhood size. We can observe that the general trend of improving model performance and k value prior to reaching a plateau.
Explore Algorithms for Classifier Pool
The selection of algorithms leveraged in the pool for the DCS-LA is another critical hyperparameter.
By default, bagged decision trees are leveraged, as it has proven to be an efficient strategy on an array of classification activities. Nonetheless, a customize pool of classifiers can be considered.
This needs initially defining a listing of classifier models to leverage and fitting each on the training dataset. Unluckily, this implies that the automatic k-fold cross-validation model evaluation strategies in scikit-learn cannot be leveraged in this scenario. Rather, we will leverage a train-test split so that we can fit the classifier pool manually on the training dataset.
The list of fit classifiers can then be mentioned to the OLA (or LCA) class through the “pool_classifiers” argument. In this scenario, w will leverage a pool that includes logistic regression, a decision tree, and a naïve Bayes classifier.
The full instance of assessing DCS-LA with OLA and a custom set of classifiers on the synthetic dataset is detailed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # evaluate DCS-LA using OLA with a custom pool of algorithms from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from deslib.dcs.ola import OLA from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.naive_bayes import GaussianNB X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7) # split the dataset into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1) # define classifiers to use in the pool classifiers = [ LogisticRegression(), DecisionTreeClassifier(), GaussianNB()] # fit each classifier on the training set for c in classifiers: c.fit(X_train, y_train) # define the DCS-LA model model = OLA(pool_classifiers=classifiers) # fit the model model.fit(X_train, y_train) # make predictions on the test set yhat = model.predict(X_test) # evaluate predictions score = accuracy_score(y_test, yhat) print(‘Accuracy: %.3f’ % (score)) |
Running the instance first reports the mean precision for the model with the custom pool of classifiers.
Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation process, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we can observe that the model accomplished a precision of about 91.2%.
Accuracy: 0.913
In order to adopt the DCS model, it must perform better than any contributing model. Otherwise, we would merely leverage the contributing model that performs better instead.
We can check this by assessing the performance of every contributing classifier on the test set.
1 2 3 4 5 6 | … # evaluate contributing models for c in classifiers: yhat = c.predict(X_test) score = accuracy_score(y_test, yhat) print(‘>%s: %.3f’ % (c.__class__.__name__, score)) |
The updated instance of DCS-LA with a custom pool of classifiers that are also assessed independently is detailed below:
# evaluate DCS-LA using OLA with a custom pool of algorithms
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deslib.dcs.ola import OLA
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
# define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)
# split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# define classifiers to use in the pool
classifiers = [
LogisticRegression(),
DecisionTreeClassifier(),
GaussianNB()]
# fit each classifier on the training set
for c in classifiers:
c.fit(X_train, y_train)
# define the DCS-LA model
model = OLA(pool_classifiers=classifiers)
# fit the model
model.fit(X_train, y_train)
# make predictions on the test set
yhat = model.predict(X_test)
# evaluate predictions
score = accuracy_score(y_test, yhat)
print(‘Accuracy: %.3f’ % (score))
# evaluate contributing models
for c in classifiers:
yhat = c.predict(X_test)
score = accuracy_score(y_test, yhat)
print(‘>%s: %.3f’ % (c.__class__.__name__, score))
Running the instance first reports the mean precision for the model with the custom pool of classifiers and the precision of every contributing model.
Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Take up running the instance a few times and contrast the average outcome.
In this scenario, we can observe that again, the DCS-LA accomplishes a precision of approximately 91.3%, which is better than any contributing model.
1 2 3 4 | Accuracy: 0.913 >LogisticRegression: 0.878 >DecisionTreeClassifier: 0.884 >GaussianNB: 0.873 |
Further Reading
This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper.
Papers
Combination of Multiple Classifiers Using Local Accuracy Estimates, 1997.
Dynamic Selection of Classifiers – A Comprehensive Review, 2014.
Dynamic Classifier Selection: Recent Advances and Perspectives, 2018.
Books
Ensemble methods: Foundations and Algorithms, 2012.
APIs
Dynamic Selection Library Project, Github
DESlib API Documentation
Conclusion
In this guide, you found out how to develop dynamic classifier selection ensembles in Python.
Particularly, you learned:
- Dynamic classifier selection algorithms select one from amongst several models to make a forecast for every new instance.
- How to develop and assess dynamic classifier selection models for classification activities leveraging the scikit-learn API.
- How to explore the impact of dynamic classifier selection model hyperparameters on classification precision.