Perceptron Algorithm for Classification in Python
The Perceptron is a linear machine learning algorithm for binary classification activities.
It might be viewed as one of the first and one of the simplest variants of artificial neural networks. It is most certainly not “deep” learning, however, it is a critical building block. Much like logistic regression, it can swiftly learn a linear separation in feature space for two-class classification activities, even though not like logistic regression, it learns leveraging the stochastic gradient descent optimization algorithm and does not forecast calibrated probabilities.
In this guide, you will find out about the Perceptron classification machine learning algorithm.
After going through this guide, you will be aware of:
- The Perceptron Classifier is a linear algorithm that can be applied to binary classification tasks.
- How to fit, evaluate, and make forecasts with the Perceptron model with Scikit-learn.
- How to tune the hyperparameters of the Perceptron algorithm on a provided dataset.
Tutorial Summarization
This guide is subdivided into three portions, which are:
1] Perceptron Algorithm
2] Perceptron with Scikit-learn
3] Tune Perceptron Hyperparameters
Perceptron Algorithm
The Perceptron Algorithm is a two-class (binary) classification machine learning algorithm.
It is a variant of neural network model, probably the simplest variant of neural network model.
It is made up of a singular node or neuron that takes a row of data as input and forecasts a class label. This is accomplished by calculating the weighted sum of inputs and a bias (set to 1). The weighted sum of the input of the model is referred to as the activation.
Activation = Weights * Inputs + Bias
If the activation is more than 0.0, the model will output 1.0, otherwise, it will output 0.0.
Predict 1: If Activation > 0.0
Predict 0: If Activation <=0.0
Provided that the inputs are multiplied by model coefficients, such as linear regression and logistic regression, it is best practice to normalize or standardize information before leveraging the model.
The Perceptron is a linear classification algorithm. This implies that it learns a decision boundary that separates two classes leveraging a line (referred to as a hyperplane) within the feature space. As such, it is relevant for those issues where the classes can be separated well by a line or linear model, referenced to as linearly separable.
The coefficients of the model are referenced to as input weights and receive training leveraging the stochastic gradient descent optimization algorithm.
Instances from the training dataset are shown to the model one by one, the model makes a forecast, and error is quantified. The weights of the model are then updated to minimize the errors for the instance. This is referred to as the Perceptron update rule. This procedure is rinsed and repeated for all instances in the training dataset, referred to as an epoch. This procedure of updating the model leveraging instances is then repeated for several epochs.
Model weights are updated with a minimal proportion of the error each batch, and the proportion is controlled by a hyperparameter referred to as the learning rate, usually established at a small value. This is to make sure learning does not happen too swiftly, having the outcome of a potentially lower skill model, referenced to as premature convergence of the optimization (search) process for the model weights.
- Weights(t+1) = weights(t) + learning_rate * (expected_i – predicted_) * input_i
Training is ceased when the error made by the model falls to a minimal level or no longer improves, or a max number of epochs is carried out.
The initial values for the model weights are set to small arbitrary values. Also, the training dataset is shuffled before every training epoch. This is by design to accelerate and enhance the model training procedure. Due to this, the learning algorithm is stochastic and may accomplish differing outcomes every time it is run. As such, it is best practice to summarize the performance levels of the algorithm on a dataset leveraging repeated evaluation and reporting the mean classification precision.
The learning rate and number of training epochs are hyperparameters of the algorithm that can be established leveraging heuristics or hyperparameter tuning.
Now that we are acquainted with the Perceptron algorithm, let’s look into how we can leverage the algorithm in Python.
Perceptron with Scikit-learn
The Perceptron algorithm is available in the scikit-learn Python machine learning library through the Perceptron class.
The class facilitates you to configure the learning rate (eta0), which defaults to 1.0
[Control]
1 2 3 | … # define model model = Perceptron(eta0=1.0) |
The implementation also enables you to configure the cumulative number of training epochs (max_iter), which defaults to 1,000.
[Control]
1 2 3 | … # define model model = Perceptron(max_iter=1000) |
The scikit-learn implementation of the Perceptron algorithm also furnishes other config options that you might desire to look into, like early stopping and the leveraging of a penalty loss.
We can demonstrate the Perceptron classifier with a worked instance.
To start with, let’s define a synthetic classification dataset.
We will leverage the make_classification() function to develop a dataset with 1,000 instances, each with 20 input variables.
The instance creates and summarizes the dataset.
1 2 3 4 5 6 | # test classification dataset from sklearn.datasets import make_classification # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # summarize the dataset print(X.shape, y.shape) |
Running the instance develops the dataset and confirms the number of rows and columns of the dataset.
(1000, 10) (1000,)
We can fit and evaluate a Perceptron model leveraging repeated stratified k-fold cross-validation through the RepeatedStratifiedKFold class. We will leverage 10 folds and three repeats in the test harness.
We will leverage the default configuration.
[Control]
1 2 3 | … # create the model model = Perceptron() |
The full instance of evaluating the Perceptron model for the synthetic library classification activity is detailed below.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # evaluate a perceptron model on the dataset from numpy import mean from numpy import std from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron() # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate model scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1) # summarize result print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(scores), std(scores))) |
Running the instance evaluates the Perceptron algorithm on the synthetic dataset and reports the average precision across the three repeats of 10-fold cross-validation.
Your particular outcomes may demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times.
In this scenario, we can observe that the model accomplished a mean precision of approximately 84.7%.
Mean Accuracy: 0.847 (0.052)
We might decide to leverage the Perceptron classifier as our final model and make forecasts on new data.
This can be accomplished by fitting the model pipeline on all available data and calling the predict() function passing in a fresh row of data.
We can illustrate this with a complete instance detailed below.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # make a prediction with a perceptron model on the dataset from sklearn.datasets import make_classification from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron() # fit model model.fit(X, y) # define new data row = [0.12777556,-3.64400522,-2.23268854,-1.82114386,1.75466361,0.1243966,1.03397657,2.35822076,1.01001752,0.56768485] # make a prediction yhat = model.predict([row]) # summarize prediction print(‘Predicted Class: %d’ % yhat) |
Running the instance fits the model and makes a class label prediction for a new row of data.
Predicted Class: 1
Then, we can look at configuring the model hyperparameters.
Tune Perceptron Hyperparameters
The hyperparameters for the Perceptron algorithm must be setup for your particular dataset.
Probably the most critical hyperparameter is the learning rule.
A large learning rate can cause the model to learn quickly, but probably at the expense of lower skill. A reduced learning rate can have the outcome of a better performing model but might take a long time to train the model.
It is typical to evaluate learning rates on a log scale between a small value such as 1e-4 (or smaller) and 1.0. We will evaluate the following values in this scenario.
[Control]
1 2 3 4 | … # define grid grid = dict() grid[‘eta0’] = [0.0001, 0.001, 0.01, 0.1, 1.0] |
The instance below demonstrates this leveraging the GridSearchCV class with a grid of values we have given definition to.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # grid search learning rate for the perceptron from sklearn.datasets import make_classification from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron() # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid[‘eta0’] = [0.0001, 0.001, 0.01, 0.1, 1.0] # define search search = GridSearchCV(model, grid, scoring=’accuracy’, cv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize print(‘Mean Accuracy: %.3f’ % results.best_score_) print(‘Config: %s’ % results.best_params_) # summarize all means = results.cv_results_[‘mean_test_score’] params = results.cv_results_[‘params’] for mean, param in zip(means, params): print(“>%.3f with: %r” % (mean, param)) |
Running the instance will evaluate every combo of configurations leveraging repeated cross-validation.
Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Attempt running the instance a few times.
In this scenario, we can observe that a smaller learning rate than the default outcomes in improved performance with learning rate 0.0001 and 0.001 both accomplishing a classification precision of approximately 85.7% in contrast to the default of 1.0 that accomplished a precision of approximately 84.7%.
[Control]
1 2 3 4 5 6 7 | Mean Accuracy: 0.857 Config: {‘eta0’: 0.0001} >0.857 with: {‘eta0’: 0.0001} >0.857 with: {‘eta0’: 0.001} >0.853 with: {‘eta0’: 0.01} >0.847 with: {‘eta0’: 0.1} >0.847 with: {‘eta0’: 1.0} |
Another critical hyperparameter is how many epochs are leveraged in training the model.
This might be dependent on the training dataset and could demonstrate great variance. Again, we will look into configuration values on a log scale between 1 and 1e+4.
[Control]
1 2 3 4 | … # define grid grid = dict() grid[‘max_iter’] = [1, 10, 100, 1000, 10000] |
We will leverage our well-performing learning rate of 0.0001 identified in the prior search.
1 2 3 | … # define model model = Perceptron(eta0=0.0001) |
The full instance of grid searching the number of training epochs is detailed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # grid search total epochs for the perceptron from sklearn.datasets import make_classification from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron # define dataset X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) # define model model = Perceptron(eta0=0.0001) # define model evaluation method cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # define grid grid = dict() grid[‘max_iter’] = [1, 10, 100, 1000, 10000] # define search search = GridSearchCV(model, grid, scoring=’accuracy’, cv=cv, n_jobs=-1) # perform the search results = search.fit(X, y) # summarize print(‘Mean Accuracy: %.3f’ % results.best_score_) print(‘Config: %s’ % results.best_params_) # summarize all means = results.cv_results_[‘mean_test_score’] params = results.cv_results_[‘params’] for mean, param in zip(means, params): print(“>%.3f with: %r” % (mean, param)) |
Running the instance will assess every combo of configurations leveraging repeated cross-validation.
Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Try running the instance a few times.
In this scenario, we can observe that epochs 10 to 10,000 have the outcome of approximately the same classification precision. An interesting exception would be to explore configuring learning rate and number of training epochs at the same time to observe if improved outcomes can be accomplished.
1 2 3 4 5 6 7 | Mean Accuracy: 0.857 Config: {‘max_iter’: 10} >0.850 with: {‘max_iter’: 1} >0.857 with: {‘max_iter’: 10} >0.857 with: {‘max_iter’: 100} >0.857 with: {‘max_iter’: 1000} >0.857 with: {‘max_iter’: 10000} |
Further Reading
This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper.
Books
Neural networks for Pattern Recognition, 1995
Pattern Recognition and Machine Learning, 2006
Artificial Intelligence: A Modern Approach, 3rd edition, 2015
APIs
sklearn.linear_model.Perceptron API.
Articles
Perceptron, Wikipedia
Perceptrons (book), Wikipedia
Conclusion
In this guide, you found out about the Perceptron classification machine learning algorithm.
Particularly, you learned:
- The Perceptron classifier being a linear algorithm that can be applied to binary classification activities.
- How to fit, evaluate and make forecasts with the Perceptron model with Scikit-learn
- How to tune the hyperparameters of the Perceptron algorithm on a provided dataset.