>Business >​How to calculate Feature Importance leveraging Python

​How to calculate Feature Importance leveraging Python

Feature importance is in reference to a grouping of techniques that allocate a score to input features on the basis on how good they are at forecasting a target variable. 

There are several types and sources of feature importance scores, even though famous examples consist of statistical correlational scores, coefficients calculated as part of linear models, decision trees, and permutation importance scoring. 

Feature importance score have an important part to play in a predictive modelling project, which includes furnishing insights with regards to the data, insight into the model, and the basis for dimensionality reduction and feature selection that can enhance the efficiency and effectiveness of a predictive model on the issue. 

In this blog post by AICoreSpot, which serves as a tutorial, you will find out about feature importance scores for machine learning in python. 

After finishing this tutorial, you will be aware of: 

  • The part of feature importance in a predictive modelling problem 
  • How to calculate and review feature importance from linear models and decision trees 
  • How to calculate and review permutation feature importance scores 

Overview 

This is tutorial is demarcated into six portions, they are as follows: 

  • Feature Importance 
  • Preparation 
    • Check Scikit-learn Version 
    • Evaluate datasets 
  • Coefficients as Feature Importance 
    • Linear Regression Feature Importance 
    • Logistic Regression Feature Importance 
  • Decision tree feature importance 
    • CART Feature Importance 
    • Random Forest Feature Importance 
    • XGBoost Feature Importance 
  • Permutation Feature Importance 
    • Permutation Feature Importance for Classification 
    • Permutation Feature Importance for Regression 
  • Feature Selection with Importance 

Feature Importance 

Feature importance is in reference to a grouping of strategies for allocating scores to input features to a predictive model that indicates the comparative importance of every feature when making a forecast. 

Feature importance scores can be quantified for issues that consist of forecasting a numerical value, referred to as regression, and those issues that consist of forecasting a class label, referred to as classification. 

The scores are useful and can be leveraged in an array of scenarios in a predictive modelling issue, like: 

  • Improved comprehension of the data 
  • Improved understanding of a model 
  • Minimizing the number of input features 

Feature importance scores can furnish insight into the dataset: The comparative scores can highlight which features may be most apt to the target, and the converse, which features don’t hold any relevance. This can be interpreted by a domain specialist and could be leveraged as the foundation for collecting more or differing data. 

Feature importance scores can furnish insight into the model. A majority of importance scores are estimated through a predictive model that has been fit on the dataset. Inspecting the importance score furnishes insight into that particular model and which features are the most critical and least critical to the model when rendering a prediction. This is a variant of model interpretation that can be executed for those models that are compatible with it. 

Feature importance can be leveraged to enhance a predictive model. This can be accomplished by leveraging the importance scores to choose those features to delete (lowest scores) or those features to retain (highest scores). This is a variant of feature selection and simplify the issue that is being modelled, quicken up the modelling procedure (removing features is referred to as dimensionality reduction), and in some scenarios, enhance the performance of the model. 

Often, we desire to quantify the strength of the relationship between the predictors and the result. Ranking predictors in this fashion can be very apt when sifting through larger amounts of information. 

Feature importance scores can be input to a wrapper model, like the SelectFromModel class, to execute feature selection. 

There are several ways to calculate feature importance scores and several models that can be leveraged for this reason. 

Probably the easiest way is to calculate simplistic coefficient statistics amongst every feature and the target variable.  

In this guide, we will observe the three primary variants of more sophisticated feature importance, they are as follows: 

  • Feature importance from model coefficients 
  • Feature importance from decision trees 
  • Feature importance from permutation testing 

Prep 

Prior to diving in, let’s validate our environment and prep some test datasets. 

Check Scikit-Learn version 

To start with, validate that you possess a modern version of the scikit-learn library setup. 

This is critical as a few of the models we will look into in this guide need an advanced version of the library.  

You can verify the version of the library you have setup with the following code instance: 

# check scikit-learn version 

import sklearn 

print(sklearn.__version__) 

 

Running the example will print the version of the library. At the timeframe of writing, this deals with version 0.22. 

You are required to be on this version of scikit-learn or higher. 

0.22.1 

 

Test Datasets 

To follow-up, let’s define a few test datasets that we can leverage as the basis for illustrating and looking into feature importance scores.  

Every test issue has five critical and five unimportant features, and it may be fascinating to observe which methodologies are consistent at identifying or differentiating the features on the basis of their criticality. 

Classification Dataset 

We will leverage the make_classificiation() function to develop a test binary classification dataset. 

The data set will possess 1,000 instances, with 10 input features, five of which will be informative, and the other five will be redundant. We will fix the arbitrary number seed to make sure we obtain the same instances every time the code is executed, 

An instance of creating and summarization of the dataset is provided below: 

# test classification dataset 

from sklearn.datasets import make_classification 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# summarize the dataset 

print(X.shape, y.shape) 

 

Executing the instance develops the dataset and validates the expected number of samples and features. 

(1000, 10) (1000,) 

 

Regression Dataset 

We will leverage the make_regression() function to develop a test regression dataset. 

Like the classification dataset, the regression dataset will possess 1,000 instances, with 10 input features, five of which will be informative and the other five that will be redundant.  

# test regression dataset 

from sklearn.datasets import make_regression 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1) 

# summarize the dataset 

print(X.shape, y.shape) 

 

Executing the instance creates the dataset and validates the expected number of samples and features. 

(1000, 10) (1000,) 

 

Now, let’s take a deeper look at coefficients as importance scores. 

Coefficients as Feature Importance 

Linear machine learning algorithms fit a model where the forecast is the weighted total of the input values.  

Instances consist of linear regression, logistic regression, and extensions that add regularization, like ridge regression and the elastic net.  

Each one of these algorithms identify a grouping of coefficients to leverage in the weighted total in order to make a forecast. These coefficients can be leveraged directly as ca crude variant of feature importance score. 

Let’s delve deeper and look at leveraging coefficients as feature importance for classification and regression. We will fit a model on the dataset to identify the coefficients, then summarize the critical scores for every input feature and ultimately develop a bar chart to obtain an idea of the comparative criticality of the features. 

Linear Regression Feature Importance 

We can fit a linear regression model on the regression dataset and retrieve the coefficient property that consists of the coefficients identified for every input variable. 

These coefficients can furnish the basis for a crude feature importance score. This goes by the assumption that the input variables have the same scale or have been scaled prior to fitting a model. 

The complete instance of linear regression coefficients for feature importance is listed below: 

# linear regression feature importance 

from sklearn.datasets import make_regression 

from sklearn.linear_model import LinearRegression 

from matplotlib import pyplot 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1) 

# define the model 

model = LinearRegression() 

# fit the model 

model.fit(X, y) 

 

Executing the instance fits the model, then reports the coefficient value for every feature. 

Your results may demonstrate variance provided the stochastic nature of the algorithm or assessment procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The scores indicate that the model identified the five critical features and marked all other features with a zero coefficient, basically deleting them from the model. 

Feature: 0, Score: 0.00000 

Feature: 1, Score: 12.44483 

Feature: 2, Score: -0.00000 

Feature: 3, Score: -0.00000 

Feature: 4, Score: 93.32225 

Feature: 5, Score: 86.50811 

Feature: 6, Score: 26.74607 

Feature: 7, Score: 3.28535 

Feature: 8, Score: -0.00000 

Feature: 9, Score: 0.00000 

 A bar chart is the developed for the feature importance scores 

This strategy might also be leveraged with Ridge and ElasticNet models. 

 

Logistic Regression Feature Importance 

We can fit a logistic regression model on the regression dataset and retrieve the coeff_ property that consists of the coefficients identified for every input variable. 

The coefficients can furnish the basis for a crude feature importance score. This goes by the assumption that the input variables have the same scale or have been scaled before to fitting a model. 

 

The complete instance of logistic regression coefficients for feature importance is enlisted below: 

# logistic regression for feature importance 

from sklearn.datasets import make_classification 

from sklearn.linear_model import LogisticRegression 

from matplotlib import pyplot 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# define the model 

model = LogisticRegression() 

# fit the model 

model.fit(X, y) 

# get importance 

importance = model.coef_[0] 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Executing the instance fits the model, then reports the coefficient value for every feature. 

Your outcomes may demonstrate variance, provided the stochastic nature of the algorithm or assessment procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

Remember this is a classification issue with classes 0 and 1. Observe that the coefficients are both positive and negative. The positive scores suggest a feature that forecasts class 1, whereas the negative scores suggest a feature that forecasts class 0. 

No overt pattern of critical and non-critical features can be detected from these outcomes, at least from what can be deciphered, 

 

Feature: 0, Score: 0.16320 

Feature: 1, Score: -0.64301 

Feature: 2, Score: 0.48497 

Feature: 3, Score: -0.46190 

Feature: 4, Score: 0.18432 

Feature: 5, Score: -0.11978 

Feature: 6, Score: -0.40602 

Feature: 7, Score: 0.03772 

Feature: 8, Score: -0.51785 

Feature: 9, Score: 0.26540 

 A bar chart is then leveraged for the feature importance scores.

Now that we have observed the leveraging of coefficients as importance scores, let’s observe the more typical instance of decision-tree based importance scores. 

 

Decision Tree Feature Importance 

Decision Tree Algorithms such as classification and regression trees (CART) provide importance scores on the basis of reduction in the criterion leveraged to choose split points, like Gini or entropy.  

 

The same strategy can be deployed for ensembles of decision tress, like the random forest and stochastic gradient boosting algorithms

Let’s observe a worked example of each. 

 

CART Feature Importance 

We can leverage the CART algorithm for feature importance implemented in sci-kit learn as the DecisionTreeRegressor and DecisionTreeClassifier Classes. 

Upon being fit, the model furnishes a feature_importances_property which can be accessed to retrieve the relative importance scores for every input feature. 

Let’s observe an instance of this for classification and regression.  

 

CART Regression Feature Importance 

The complete instance of fitting a DecisionTreeRegressor and summarizing the calculated feature importance scores is listed below. 

 

# decision tree for feature importance on a regression problem 

from sklearn.datasets import make_regression 

from sklearn.tree import DecisionTreeRegressor 

from matplotlib import pyplot 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1) 

# define the model 

model = DecisionTreeRegressor() 

# fit the model 

model.fit(X, y) 

# get importance 

importance = model.feature_importances_ 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Executing the instance fits the model, then reports the coefficient value for every feature. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or assessment procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome.  

 

This outcome indicate perhaps three of the ten features as being critical to prediction. 

 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Feature: 0, Score: 0.00294 

Feature: 1, Score: 0.00502 

Feature: 2, Score: 0.00318 

Feature: 3, Score: 0.00151 

Feature: 4, Score: 0.51648 

Feature: 5, Score: 0.43814 

Feature: 6, Score: 0.02723 

Feature: 7, Score: 0.00200 

Feature: 8, Score: 0.00244 

Feature: 9, Score: 0.00106 

 

A bar chart is then produced for the feature importance scores. 

CART Classification Feature Importance 

The complete instance of fitting a DecisionTreeClassifier and summarizing the calculated feature importance scores is listed below: 

 

 

1

2

3

4 

5

6

7

8 

9

10

11 

12

13

14

15

16

17

18

. 

# decision tree for feature importance on a classification problem 

from sklearn.datasets import make_classification 

from sklearn.tree import DecisionTreeClassifier 

from matplotlib import pyplot 

# define dataset 

X,y= make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# define the model 

model = DecisionTreeClassifier() 

# fit the model 

model.fit(X, y) 

# get importance 

importance = model.feature_importances_ 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Executing the instance fits the model, the reports the coefficient value for every feature. Your outcome may demonstrate variance provided the stochastic nature of the algorithm or assessment procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The outcomes indicate perhaps four of the ten features as being critical to prediction. 

Feature: 0, Score: 0.01486 

Feature: 1, Score: 0.01029 

Feature: 2, Score: 0.18347 

Feature: 3, Score: 0.30295 

Feature: 4, Score: 0.08124 

Feature: 5, Score: 0.00600 

Feature: 6, Score: 0.19646 

Feature: 7, Score: 0.02908 

Feature: 8, Score: 0.12820 

Feature: 9, Score: 0.04745 

 

A bar chart is then developed for the feature importance scores. 

Random Forest Regression Feature Importance 

The complete example of fitting a RandomForestRegressor and summarizing the calculated feature importance scores is listed below. 

 

# random forest for feature importance on a regression problem 

from sklearn.datasets import make_regression 

from sklearn.ensemble import RandomForestRegressor 

from matplotlib import pyplot 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1) 

# define the model 

model = RandomForestRegressor() 

# fit the model 

model.fit(X, y) 

# get importance 

importance = model.feature_importances_ 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Running the instance fits the model, then reports the coefficient value for every feature. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The outcomes indicate perhaps two or three of the ten features as being critical to predicition. 

Feature: 0, Score: 0.00280 

Feature: 1, Score: 0.00545 

Feature: 2, Score: 0.00294 

Feature: 3, Score: 0.00289 

Feature: 4, Score: 0.52992 

Feature: 5, Score: 0.42046 

Feature: 6, Score: 0.02663 

Feature: 7, Score: 0.00304 

Feature: 8, Score: 0.00304 

Feature: 9, Score: 0.00283 

 

A bar chart is then generated for the feature importance scores. 

Random Forest Classification Feature Importance 

The complete example of fitting a RandomForestClassifier and summarizing the calculated feature importance scores is listed below. 

 

# random forest for feature importance on a classification problem 

from sklearn.datasets import make_classification 

from sklearn.ensemble import RandomForestClassifier 

from matplotlib import pyplot 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# define the model 

model = RandomForestClassifier() 

# fit the model 

model.fit(X, y) 

# get importance 

importance = model.feature_importances_ 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Executing the instance fits the model, then reports the coefficient value for every feature. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The outcome indicates perhaps two or three of the 10 features as being critical to forecasting. 

 

Feature: 0, Score: 0.06523 

Feature: 1, Score: 0.10737 

Feature: 2, Score: 0.15779 

Feature: 3, Score: 0.20422 

Feature: 4, Score: 0.08709 

Feature: 5, Score: 0.09948 

Feature: 6, Score: 0.10009 

Feature: 7, Score: 0.04551 

Feature: 8, Score: 0.08830 

Feature: 9, Score: 0.04493 

 

A bar chart is subsequently developed for the feature importance scores. 

XGBoost Feature Importance 

XGBoost is a library that furnishes an efficient and effective implementation of the stochastic gradient boosting algorithm.  

This algorithm can be leveraged with scikit-learn through the XGBRegressor and the XGBClassifier classes. 

Upon fitting, the model furnishes a feature_importances_property that can be accessed to retrieve the comparative importance scores for every input feature.  

This algorithm is also furnished through scikit-learn through the GradientBoostingClassifier and GradientBoostingRegressor classes and the same strategy to feature selection can be leveraged. 

 

To start with, setup the XBBoost Library, like with pip. 

sudo pip install xgboost 

 

Then validate that the library was setup correctly and functions by checking the version number.  

# check xgboost version 

import xgboost 

print(xgboost.__version__) 

 

Executing the instance, you should observe the following version number or higher. 

0.90 

Let’s observe an instance of XGBoost for Feature Importance on regression and classification problems. 

 

XGBoost Regression Feature Importance 

The complete instance of fitting a XGBRegressor and summarizing the calculated feature importance scores is listed below: 

 

# xgboost for feature importance on a regression problem 

from sklearn.datasets import make_regression 

from xgboost import XGBRegressor 

from matplotlib import pyplot 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1) 

# define the model 

model = XGBRegressor() 

# fit the model 

model.fit(X, y) 

# get importance 

importance = model.feature_importances_ 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Running the instance fits the model, then reports the coefficient value for every feature. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or evaluation process, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The results indicate perhaps two or three of the ten features as being critical to prediction. 

 

Feature: 0, Score: 0.00060 

Feature: 1, Score: 0.01917 

Feature: 2, Score: 0.00091 

Feature: 3, Score: 0.00118 

Feature: 4, Score: 0.49380 

Feature: 5, Score: 0.42342 

Feature: 6, Score: 0.05057 

Feature: 7, Score: 0.00419 

Feature: 8, Score: 0.00124 

Feature: 9, Score: 0.00491 

 

A bar chart is then developed for the feature importance scores. 

XGBoost Classification Feature Importance 

The complete instance of fitting an XGBClassifier and summarization of the calculated feature importance scores is listed below. 

# xgboost for feature importance on a classification problem 

from sklearn.datasets import make_classification 

from xgboost import XGBClassifier 

from matplotlib import pyplot 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# define the model 

model = XGBClassifier() 

# fit the model 

model.fit(X, y) 

# get importance 

importance = model.feature_importances_ 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Executing the instance fits the model and then reports the coefficient value for every feature. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or assessment process, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The results indicate perhaps 7/10 features as being critical to prediction. 

 

Feature: 0, Score: 0.02464 

Feature: 1, Score: 0.08153 

Feature: 2, Score: 0.12516 

Feature: 3, Score: 0.28400 

Feature: 4, Score: 0.12694 

Feature: 5, Score: 0.10752 

Feature: 6, Score: 0.08624 

Feature: 7, Score: 0.04820 

Feature: 8, Score: 0.09357 

Feature: 9, Score: 0.02220 

 

A bar chart is then developed for the feature importance scores. 

Permutation Feature Importance 

The complete example of fitting a KNEighborsRegressor and summarization of the calculated permutation feature importance scores are enlisted below. 

 

# permutation feature importance with knn for regression 

from sklearn.datasets import make_regression 

from sklearn.neighbors import KNeighborsRegressor 

from sklearn.inspection import permutation_importance 

from matplotlib import pyplot 

# define dataset 

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1) 

# define the model 

model = KNeighborsRegressor() 

# fit the model 

model.fit(X, y) 

# perform permutation importance 

results = permutation_importance(model, X, y, scoring=’neg_mean_squared_error’) 

# get importance 

importance = results.importances_mean 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Running the instance fits the model, then reports the coefficient value for every feature. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or assessment procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The results indicate perhaps two or three of the ten features as being critical to forecasting. 

Feature: 0, Score: 175.52007 

Feature: 1, Score: 345.80170 

Feature: 2, Score: 126.60578 

Feature: 3, Score: 95.90081 

Feature: 4, Score: 9666.16446 

Feature: 5, Score: 8036.79033 

Feature: 6, Score: 929.58517 

Feature: 7, Score: 139.67416 

Feature: 8, Score: 132.06246 

Feature: 9, Score: 84.94768 

 

A bar chart is then produced for the feature importance scores. 

Permutation Feature Importance For Classification 

The complete instance of fitting a KNeighborsClassifer and summarization of the calculated permutation feature importance scores are listed below: 

 

# permutation feature importance with knn for classification 

from sklearn.datasets import make_classification 

from sklearn.neighbors import KNeighborsClassifier 

from sklearn.inspection import permutation_importance 

from matplotlib import pyplot 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# define the model 

model = KNeighborsClassifier() 

# fit the model 

model.fit(X, y) 

# perform permutation importance 

results = permutation_importance(model, X, y, scoring=’accuracy’) 

# get importance 

importance = results.importances_mean 

# summarize feature importance 

for i,v in enumerate(importance): 

print(‘Feature: %0d, Score: %.5f’ % (i,v)) 

# plot feature importance 

pyplot.bar([x for x in range(len(importance))], importance) 

pyplot.show() 

 

Running the instance fits the model, then reports the coefficients value for every feature.  

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or evaluation process, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

 

The outcomes indicate perhaps two or three of the ten features as being critical to forcasting. 

Feature: 0, Score: 0.04760 

Feature: 1, Score: 0.06680 

Feature: 2, Score: 0.05240 

Feature: 3, Score: 0.09300 

Feature: 4, Score: 0.05140 

Feature: 5, Score: 0.05520 

Feature: 6, Score: 0.07920 

Feature: 7, Score: 0.05560 

Feature: 8, Score: 0.05620 

Feature: 9, Score: 0.03080 

 

A bar chart is then generated with regards to the feature importance scores. 

Feature Selection with Importance 

Feature Importance scores can be leveraged to assist interpreting the data, however they can also be leveraged directly to assist rank and select features that are most critical to a predictive model. 

Remember, our synthetic dataset possesses 1,000 instances each one with 10 input variables, five of which are redundant/irrelevant and five of which are critical to the result. We can leverage feature importance scores to assist in choosing the five variables that are apt and just use them as inputs to a predictive model. 

To start with, we can demarcate the training dataset into train and test sets and go about training a model on the training dataset, make forecasts on the evaluation set and assess the outcome leveraging classification precision.  

 

We will leverage a logistic regression model as the predictive model. 

The furnishes a baseline for comparing and contrasting when we eradicate some features leveraging feature importance scores. 

The complete instance of assessing a logistic regression model leveraging all features as input on our synthetic dataset is listed below. 

 

# evaluation of a model using all features 

from sklearn.datasets import make_classification 

from sklearn.model_selection import train_test_split 

from sklearn.linear_model import LogisticRegression 

from sklearn.metrics import accuracy_score 

# define the dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# split into train and test sets 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) 

# fit the model 

model = LogisticRegression(solver=’liblinear’) 

model.fit(X_train, y_train) 

# evaluate the model 

yhat = model.predict(X_test) 

# evaluate predictions 

accuracy = accuracy_score(y_test, yhat) 

print(‘Accuracy: %.2f’ % (accuracy*100)) 

Running the instance prior to the logistic regression model on the training dataset and assesses it on the test set. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or assessment procedure, or variations in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

In this scenario we can observe that the model accomplished the classification precision of approximately 84.55 percent leveraging all features within the dataset. 

Accuracy: 84.55 

Provided the we have developed the dataset, we would expect improved or similar outcomes with the half the number of input variables. 

We can leverage the SelectFromModel class to provide definition to both of the models we desire to calculate importance scores, RandomForestClassifier in this scenario, and the number of features to choose, five, in this scenario. 

 

# configure to select a subset of features 

fs = SelectFromModel(RandomForestClassifier(n_estimators=200), max_features=5) 

We can fit the feature selection strategy on the training dataset. 

This will calculate the importance scores that can be leveraged to rank all input features. We can then have application of this method as a transform to choose a subset of five most critical features from the dataset. This transform will have application to the training dataset and the test set. 

 

# learn relationship from training data 

fs.fit(X_train, y_train) 

# transform train input data 

X_train_fs = fs.transform(X_train) 

# transform test input data 

X_test_fs = fs.transform(X_test) 

 

Inputting all of this together, the complete instance of leveraging random forest feature importance for feature selection s listed below: 

# evaluation of a model using 5 features chosen with random forest importance 

from sklearn.datasets import make_classification 

from sklearn.model_selection import train_test_split 

from sklearn.feature_selection import SelectFromModel 

from sklearn.ensemble import RandomForestClassifier 

from sklearn.linear_model import LogisticRegression 

from sklearn.metrics import accuracy_score 

# feature selection 

def select_features(X_train, y_train, X_test): 

# configure to select a subset of features 

fs = SelectFromModel(RandomForestClassifier(n_estimators=1000), max_features=5) 

# learn relationship from training data 

fs.fit(X_train, y_train) 

# transform train input data 

X_train_fs = fs.transform(X_train) 

# transform test input data 

X_test_fs = fs.transform(X_test) 

return X_train_fs, X_test_fs, fs 

 

# define the dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1) 

# split into train and test sets 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) 

# feature selection 

X_train_fs, X_test_fs, fs = select_features(X_train, y_train, X_test) 

# fit the model 

model = LogisticRegression(solver=’liblinear’) 

model.fit(X_train_fs, y_train) 

# evaluate the model 

yhat = model.predict(X_test_fs) 

# evaluate predictions 

accuracy = accuracy_score(y_test, yhat) 

print(‘Accuracy: %.2f’ % (accuracy*100)) 

Running the instance first performs feature selection on the dataset, then fits and assesses the logistic regression model as prior. 

Your outcomes may demonstrate variance provided the stochastic nature of the algorithm or assessment procedure, or differences in numerical accuracy. Consider executing the instance a few times and contrast the average outcome. 

In this scenario, we can observe the model accomplishes the same performance on the dataset, even though with 50% the number of input features. As one would expect, the feature importance scores calculated by random forest enabled them to precisely rank the input features and delete those that were not of any relevance to the target variable. 

 

Accuracy: 84.55 

Conclusion 

 

In this article by AICoreSpot, you learned about feature importance scores for machine learning in Python. 

Particularly, you learned: 

  • The part of feature importance in a predictive modelling problem 
  • How to calculate and review feature importance from linear models and decision trees 
  • How to calculate and review permutation feature importance scores.  

 

Add Comment