>Business >Perceptron Algorithm for Classification in Python

Perceptron Algorithm for Classification in Python

The Perceptron is a linear machine learning algorithm for binary classification activities. 

It might be viewed as one of the first and one of the simplest variants of artificial neural networks. It is most certainly not “deep” learning, however, it is a critical building block. Much like logistic regression, it can swiftly learn a linear separation in feature space for two-class classification activities, even though not like logistic regression, it learns leveraging the stochastic gradient descent optimization algorithm and does not forecast calibrated probabilities. 

In this guide, you will find out about the Perceptron classification machine learning algorithm. 

After going through this guide, you will be aware of: 

  • The Perceptron Classifier is a linear algorithm that can be applied to binary classification tasks. 
  • How to fit, evaluate, and make forecasts with the Perceptron model with Scikit-learn. 
  • How to tune the hyperparameters of the Perceptron algorithm on a provided dataset. 

Tutorial Summarization 

This guide is subdivided into three portions, which are: 

1] Perceptron Algorithm 

2] Perceptron with Scikit-learn 

3] Tune Perceptron Hyperparameters 

Perceptron Algorithm 

The Perceptron Algorithm is a two-class (binary) classification machine learning algorithm. 

It is a variant of neural network model, probably the simplest variant of neural network model. 

It is made up of a singular node or neuron that takes a row of data as input and forecasts a class label. This is accomplished by calculating the weighted sum of inputs and a bias (set to 1). The weighted sum of the input of the model is referred to as the activation. 

Activation = Weights * Inputs + Bias 

If the activation is more than 0.0, the model will output 1.0, otherwise, it will output 0.0. 

Predict 1: If Activation > 0.0 

Predict 0: If Activation <=0.0 

Provided that the inputs are multiplied by model coefficients, such as linear regression and logistic regression, it is best practice to normalize or standardize information before leveraging the model. 

The Perceptron is a linear classification algorithm. This implies that it learns a decision boundary that separates two classes leveraging a line (referred to as a hyperplane) within the feature space. As such, it is relevant for those issues where the classes can be separated well by a line or linear model, referenced to as linearly separable.  

The coefficients of the model are referenced to as input weights and receive training leveraging the stochastic gradient descent optimization algorithm. 

Instances from the training dataset are shown to the model one by one, the model makes a forecast, and error is quantified. The weights of the model are then updated to minimize the errors for the instance. This is referred to as the Perceptron update rule. This procedure is rinsed and repeated for all instances in the training dataset, referred to as an epoch. This procedure of updating the model leveraging instances is then repeated for several epochs. 

Model weights are updated with a minimal proportion of the error each batch, and the proportion is controlled by a hyperparameter referred to as the learning rate, usually established at a small value. This is to make sure learning does not happen too swiftly, having the outcome of a potentially lower skill model, referenced to as premature convergence of the optimization (search) process for the model weights. 

  • Weights(t+1) = weights(t) + learning_rate * (expected_i – predicted_) * input_i 

Training is ceased when the error made by the model falls to a minimal level or no longer improves, or a max number of epochs is carried out. 

The initial values for the model weights are set to small arbitrary values. Also, the training dataset is shuffled before every training epoch. This is by design to accelerate and enhance the model training procedure. Due to this, the learning algorithm is stochastic and may accomplish differing outcomes every time it is run. As such, it is best practice to summarize the performance levels of the algorithm on a dataset leveraging repeated evaluation and reporting the mean classification precision. 

The learning rate and number of training epochs are hyperparameters of the algorithm that can be established leveraging heuristics or hyperparameter tuning. 

Now that we are acquainted with the Perceptron algorithm, let’s look into how we can leverage the algorithm in Python. 

Perceptron with Scikit-learn 

The Perceptron algorithm is available in the scikit-learn Python machine learning library through the Perceptron class. 

The class facilitates you to configure the learning rate (eta0), which defaults to 1.0 

 

[Control] 

1 

2 

3 

 

# define model 

model = Perceptron(eta0=1.0) 

 

 

The implementation also enables you to configure the cumulative number of training epochs (max_iter), which defaults to 1,000. 

 

[Control] 

1 

2 

3 

 

# define model 

model = Perceptron(max_iter=1000) 

 

 

The scikit-learn implementation of the Perceptron algorithm also furnishes other config options that you might desire to look into, like early stopping and the leveraging of a penalty loss. 

We can demonstrate the Perceptron classifier with a worked instance. 

To start with, let’s define a synthetic classification dataset.  

We will leverage the make_classification() function to develop a dataset with 1,000 instances, each with 20 input variables. 

The instance creates and summarizes the dataset. 

1 

2 

3 

4 

5 

6 

# test classification dataset 

from sklearn.datasets import make_classification 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) 

# summarize the dataset 

print(X.shape, y.shape) 

 

 

Running the instance develops the dataset and confirms the number of rows and columns of the dataset. 

(1000, 10) (1000,) 

We can fit and evaluate a Perceptron model leveraging repeated stratified k-fold cross-validation through the RepeatedStratifiedKFold class. We will leverage 10 folds and three repeats in the test harness. 

We will leverage the default configuration. 

 

[Control] 

1 

2 

3 

 

# create the model 

model = Perceptron() 

 

 

The full instance of evaluating the Perceptron model for the synthetic library classification activity is detailed below. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

# evaluate a perceptron model on the dataset 

from numpy import mean 

from numpy import std 

from sklearn.datasets import make_classification 

from sklearn.model_selection import cross_val_score 

from sklearn.model_selection import RepeatedStratifiedKFold 

from sklearn.linear_model import Perceptron 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) 

# define model 

model = Perceptron() 

# define model evaluation method 

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) 

# evaluate model 

scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1) 

# summarize result 

print(‘Mean Accuracy: %.3f (%.3f)’ % (mean(scores), std(scores))) 

 

 

Running the instance evaluates the Perceptron algorithm on the synthetic dataset and reports the average precision across the three repeats of 10-fold cross-validation. 

Your particular outcomes may demonstrate variance provided the stochastic nature of the learning algorithm. Take up running the instance a few times. 

In this scenario, we can observe that the model accomplished a mean precision of approximately 84.7%. 

Mean Accuracy: 0.847 (0.052) 

We might decide to leverage the Perceptron classifier as our final model and make forecasts on new data. 

This can be accomplished by fitting the model pipeline on all available data and calling the predict() function passing in a fresh row of data. 

We can illustrate this with a complete instance detailed below. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

# make a prediction with a perceptron model on the dataset 

from sklearn.datasets import make_classification 

from sklearn.linear_model import Perceptron 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) 

# define model 

model = Perceptron() 

# fit model 

model.fit(X, y) 

# define new data 

row = [0.12777556,-3.64400522,-2.23268854,-1.82114386,1.75466361,0.1243966,1.03397657,2.35822076,1.01001752,0.56768485] 

# make a prediction 

yhat = model.predict([row]) 

# summarize prediction 

print(‘Predicted Class: %d’ % yhat) 

 

 

Running the instance fits the model and makes a class label prediction for a new row of data. 

Predicted Class: 1 

Then, we can look at configuring the model hyperparameters. 

Tune Perceptron Hyperparameters 

The hyperparameters for the Perceptron algorithm must be setup for your particular dataset. 

Probably the most critical hyperparameter is the learning rule. 

A large learning rate can cause the model to learn quickly, but probably at the expense of lower skill. A reduced learning rate can have the outcome of a better performing model but might take a long time to train the model. 

It is typical to evaluate learning rates on a log scale between a small value such as 1e-4 (or smaller) and 1.0. We will evaluate the following values in this scenario. 

 

[Control] 

1 

2 

3 

4 

 

# define grid 

grid = dict() 

grid[‘eta0’] = [0.0001, 0.001, 0.01, 0.1, 1.0] 

 

 

The instance below demonstrates this leveraging the GridSearchCV class with a grid of values we have given definition to. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

# grid search learning rate for the perceptron 

from sklearn.datasets import make_classification 

from sklearn.model_selection import GridSearchCV 

from sklearn.model_selection import RepeatedStratifiedKFold 

from sklearn.linear_model import Perceptron 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) 

# define model 

model = Perceptron() 

# define model evaluation method 

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) 

# define grid 

grid = dict() 

grid[‘eta0’] = [0.0001, 0.001, 0.01, 0.1, 1.0] 

# define search 

search = GridSearchCV(model, grid, scoring=’accuracy’, cv=cv, n_jobs=-1) 

# perform the search 

results = search.fit(X, y) 

# summarize 

print(‘Mean Accuracy: %.3f’ % results.best_score_) 

print(‘Config: %s’ % results.best_params_) 

# summarize all 

means = results.cv_results_[‘mean_test_score’] 

params = results.cv_results_[‘params’] 

for mean, param in zip(means, params): 

    print(“>%.3f with: %r” % (mean, param)) 

 

 

Running the instance will evaluate every combo of configurations leveraging repeated cross-validation. 

Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Attempt running the instance a few times. 

In this scenario, we can observe that a smaller learning rate than the default outcomes in improved performance with learning rate 0.0001 and 0.001 both accomplishing a classification precision of approximately 85.7% in contrast to the default of 1.0 that accomplished a precision of approximately 84.7%. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

Mean Accuracy: 0.857 

Config: {‘eta0’: 0.0001} 

>0.857 with: {‘eta0’: 0.0001} 

>0.857 with: {‘eta0’: 0.001} 

>0.853 with: {‘eta0’: 0.01} 

>0.847 with: {‘eta0’: 0.1} 

>0.847 with: {‘eta0’: 1.0} 

 

 

Another critical hyperparameter is how many epochs are leveraged in training the model. 

This might be dependent on the training dataset and could demonstrate great variance. Again, we will look into configuration values on a log scale between 1 and 1e+4. 

 

[Control] 

1 

2 

3 

4 

 

# define grid 

grid = dict() 

grid[‘max_iter’] = [1, 10, 100, 1000, 10000] 

 

We will leverage our well-performing learning rate of 0.0001 identified in the prior search. 

1 

2 

3 

 

# define model 

model = Perceptron(eta0=0.0001) 

 

The full instance of grid searching the number of training epochs is detailed below. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

# grid search total epochs for the perceptron 

from sklearn.datasets import make_classification 

from sklearn.model_selection import GridSearchCV 

from sklearn.model_selection import RepeatedStratifiedKFold 

from sklearn.linear_model import Perceptron 

# define dataset 

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1) 

# define model 

model = Perceptron(eta0=0.0001) 

# define model evaluation method 

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) 

# define grid 

grid = dict() 

grid[‘max_iter’] = [1, 10, 100, 1000, 10000] 

# define search 

search = GridSearchCV(model, grid, scoring=’accuracy’, cv=cv, n_jobs=-1) 

# perform the search 

results = search.fit(X, y) 

# summarize 

print(‘Mean Accuracy: %.3f’ % results.best_score_) 

print(‘Config: %s’ % results.best_params_) 

# summarize all 

means = results.cv_results_[‘mean_test_score’] 

params = results.cv_results_[‘params’] 

for mean, param in zip(means, params): 

    print(“>%.3f with: %r” % (mean, param)) 

 

Running the instance will assess every combo of configurations leveraging repeated cross-validation. 

Your particular outcomes might demonstrate variance provided the stochastic nature of the learning algorithm. Try running the instance a few times. 

In this scenario, we can observe that epochs 10 to 10,000 have the outcome of approximately the same classification precision. An interesting exception would be to explore configuring learning rate and number of training epochs at the same time to observe if improved outcomes can be accomplished. 

1 

2 

3 

4 

5 

6 

7 

Mean Accuracy: 0.857 

Config: {‘max_iter’: 10} 

>0.850 with: {‘max_iter’: 1} 

>0.857 with: {‘max_iter’: 10} 

>0.857 with: {‘max_iter’: 100} 

>0.857 with: {‘max_iter’: 1000} 

>0.857 with: {‘max_iter’: 10000} 

 

Further Reading 

This portion of the blog furnishes additional resources on the subject if you are seeking to delve deeper. 

Books 

Neural networks for Pattern Recognition, 1995 

Pattern Recognition and Machine Learning, 2006 

Artificial Intelligence: A Modern Approach, 3rd edition, 2015 

APIs 

sklearn.linear_model.Perceptron API. 

Articles 

Perceptron, Wikipedia 

Perceptrons (book), Wikipedia 

Conclusion 

In this guide, you found out about the Perceptron classification machine learning algorithm. 

Particularly, you learned: 

  • The Perceptron classifier being a linear algorithm that can be applied to binary classification activities. 
  • How to fit, evaluate and make forecasts with the Perceptron model with Scikit-learn 
  • How to tune the hyperparameters of the Perceptron algorithm on a provided dataset. 
Add Comment