How to leverage Regression Machine Learning Algorithms in Weka
Weka has a massive number of regression algorithms available on the platform.
The massive number of machine learning algorithms assisted by Weka is one of the biggest advantages of leveraging the problem.
In this blog article you will find out how to leverage top regression machine learning algorithms in Weka.
After going through this post, you will be aware of:
- About top 5 regression algorithms assisted by Weka.
- How to leverage regression machine learning algorithms for predictive modelling in Weka.
- About the critical configuration options of regression algorithms in Weka.
Regression Algorithms Overview
We are going to take a look at the leading five regression algorithms in Weka.
Every algorithm that we cover will be shortly detailed in terms of how it functions, critical algorithm parameters will be illustrated and the algorithm will be illustrated in the Weka Explorer interface.
The five algorithms that we will review are:
- Linear regression
- K-nearest neighbours
- Decision Tree
- Support Vector Machines
- Multi-layer Perceptron
These are five algorithms that you can attempt on your regression problem as a beginning point.
A typical machine learning regression problem will be leveraged to illustrate every algorithm.
Specifically, the Boston House Price Dataset. Every instance details the attributes of a Boston suburb and the activity is to forecast the home prices in thousands of dollars. There are thirteen numerical input variables with varying scales detailing the attributes of suburbs. You can learn more about this dataset on the UCI Machine Learning Repository.
Begin the Weka Explorer:
- Open the Weka GUI Chooser.
- Click the “Explorer” button to open the Weka explorer.
- Load the Boston house price dataset from the housing.arff file.
- Select “classify” to open the classify tab.
Let’s begin the proceedings by looking at the linear regression algorithm.
Linear Regression
Linear regression only supports regression variant problems.
It functions by estimating coefficients for a line or hyperplane that ideally fits the training data. It is a really simple regression algorithm, quick to train and can have amazing performance if the output variable for your data is a linear combination of your inputs.
It is best practice to assess linear regression on your problem prior to moving onto more complicated algorithms in the scenario it has good performance.
Select the linear regression algorithm:
- Click the “Choose” button and choose “LinearRegression” under the “functions” group.
- Click on the name of the algorithm to review the algorithm configuration.
The performance of linear regression can be minimized if your training data possesses input attributes that are highly correlated. Weka can identify and remove highly correlated input attributes automatically by setting eliminateColinearAttributes to True, which is the default.
Also, attributes that are not connected to the output variable can also negatively influence performance. Weka can automatically carry out feature selection to just choose those relevant attributes by setting the attributeSelectionMethod. This is facilitated by default and can be disabled.
Lastly, the Weka implementation leverages a ridge regularization strategy in order to minimize the intricacy of the learned model. It performs this by reducing the square of the absolute sum of the learned coefficients, which will avoid any particular coefficient from becoming too large (an indicator of complexity in regression models).
- Select “OK” to close the algorithm configuration.
- Select the “Start” button to execute the algorithm on the Boston house price dataset.
You can observe that with the default configuration that linear regression accomplishes an RMSE of 4.9.
k-Nearest Neighbours
The k-nearest neighbours algorithm assists both classification and regression. It is also referred to as a kNN for short. It functions by recording the complete training dataset and querying it to locate the k most similar training patterns when making a prediction.
As such, there is no model other than the raw training dataset and the only computation carried out is the querying of the training dataset when a forecast is requested.
It is a simple algorithm, but one that does not make a lot of assumptions about the problem other than the distance amongst data instances is meaningful in making forecasts. As such, it often accomplishes very good performance.
When making forecasts on regression problems, KNN will take the mean of the k most similar instances in the training dataset. Select the KNN algorithm.
- Select the “Choose” button and choose “iBk” under the “lazy” group.
- Click on the name of the algorithm to review the algorithm configuration.
In Weka KNN is referred to as IBk which stands for Instance Based k.
The size of the neighbourhood is managed by the k parameter. For instance, if set to 1, then forecasts are made leveraging the singular most similar training example to a provided new platform for which a forecast is needed. Typical values for k are 3,7,11, and 21, larger for larger dataset sizes. Weka can automatically find out a good value for k leveraging cross validation inside the algorithm by setting the crossValidate parameter to true.
Another critical parameter is the distance measure used. This is configured in the nearestNeighbourSearchAlgorithm which manages the way in which the training data is recorded and searched. The default is a LinearNNSearch. Choosing the name of this search algorithm will furnish another configuration window where you can select a distanceFunction parameter. By default, Euclidean distance is leveraged to calculate the distance between examples, which is good for numerical data with the same scale. Manhattan distance is good to leverage if your attributes differ in measures or type.
It is a good idea to attempt a suite of differing k values and distance measures on your problem and observe what functions best.
- Click “OK” to close the algorithm configuration.
- Click the “Start” button to run the algorithm on the Botston house price dataset.
You can observe that with the default configuration that KNN algorithm accomplishes an RMSE of 4.6
.
Decision Trees
Decision trees are compatible with classification and regression problems.
Decision trees have been lately referred to as Classification and Regression Trees or CART. They function by developing a tree to assess an instance of data, begin at the root of the tree and moving town to the leaves (roots as the tree is drawn with an inverted prospective) till a forecast can be made. The procedure of developing a decision tree operates by greedily choosing the best split point in order to make forecasts and repeating the procedure until the tree is a fixed depth.
Upon the tree’s construction, it is pruned in order to enhance the model’s capability to generalize to new data.
Select the decision tree algorithm.
- Select the “Choose” button and choose “REPTree” under the “trees” group.
- Click on the name of the algorithm to review the algorithm configuration.
The depth of the tree is defined automatically, but can mention a depth in the maxDepth attribute.
You can also select to turn off pruning by setting the noPruning parameter to true, even though this might have the outcome of worse performance.
The minNum parameter defines the maximum number of instances compatible with the tree in a leaf node when developing the tree from the training data.
- Select “OK” to close the algorithm configuration
- Click the “start” button to execute the algorithm on the Boston house price dataset.
You can observe that with the default configuration that decision tree algorithm accomplishes an RMSE of 4.8.
Support Vector Regression
Support Vector Machines were produced for binary classification problems, even though extensions to the strategy have been made to support multi-class classification and regression problems. The Adaptation of SVM for regression is referred to as Support Vector Regression or SVR for short.
SVM was developed for numerical input variables, even though will automatically convert nominal values to numerical values. Input data is also normalized prior to being leveraged.
Not like SVM that identifies a line the ideally separates the training data into classes, SVR functions by identifying a line of best fit that minimizes the error of a cost function. This is performed by leveraging an optimization procedure that just considers those data instances in the training dataset that are closest to the line with the minimum cost. These instances are referred to as support vectors, therefore the name of the strategy.
In nearly all problems of interest, a line can’t be drawn to ideally fit the data, thus a margin is included around the line to relax the constraint, facilitating some bad predictions to be tolerated but allowing an improved outcome overall.
Lastly, minimal datasets can be fit with just a straight line. At times a line with curves or even polygonal regions require to be marked out. This is accomplished by projecting the data into a higher dimensional space in order to draw the lines and make forecasts. Differing kernels can be leveraged to control the projection and the amount of flexibility.
Select the SVR algorithm:
- Select the “Choose” button and choose “SMOreg” under the “function” group.
- Click on the name of the algorithm to review the algorithm configuration.
The C parameter, referred to as the complexity parameter in Weka manages how flexible the procedure for drawing the line to fit the data can be. A value of 0 facilitates no violations of the margin, whereas the default is 1.
A critical parameter in SVM is the variant of Kernel to leverage. The simplest kernel is a Linear Kernel that separates data with a straight line or hyperplane. The default in Weka is a polynomial kernel that will fit the data leveraging a curved or wiggly line, the higher the polynomial, the more wiggly (the exponent value)
The Polynomial Kernel has a default exponent of 1, which makes it equal to a linear kernel. A widespread and potent kernel is the RBF Kernel or Radial Basis Function Kernel that is capable of learning closed polygons and complicated shapes to fit the training data.
It is best practice to attempt a suite of differing kernels and C (complexity) values on your problem and observe what functions best.
- Select “OK” to close the algorithm configuration.
- Click the “Start” button to execute the algorithm on the Boston house price dataset.
You can observe that with the default configuration that SVR algorithm accomplishes an RMSE of 5.1
Multi-Layer Perceptron
The Multi-Layer Perceptron algorithm is compatible with both regression and classification problems.
It is also referred to as artificial neural networks or merely neural networks for short.
Neural networks are a complicated algorithm to leverage for predictive modelling as there are so many configuration parameters that can just be tuned efficiently through intuition and a ton of trial and error.
It is an algorithm that draws inspiration from a model of biological neural networks in the brain where small processing units referred to as neurons are organized into layers that if configured well are capable of approximating any function. In classification, we are concerned with approximating the underlying function to ideally discriminate amongst classes. In regression problems we are concerned with approximation of a function that ideally fits the real value output.
Select the Multi-Layer Perceptron algorithm:
- Select the “Choose” button and select “MultilayerPerceptron” under the “function” group.
- Click on the name of the algorithm to review the algorithm configuration.
You can manually mention the structure of the neural network that is leveraged by the model, but this is not a recommendation for novices.
The default will automatically develop the network and train it on your dataset. The default will develop a singular hidden layer network. You can mention the number of hidden layers in the hiddenLayers parameters, set to automatic “a” by default.
You can also leverage a GUI to develop the network structure. This can be fun, but it is recommended that you leverage the GUI with a simplistic train and test split of your training data, otherwise you will be requested to develop a network for each of the 10 folds of cross validation.
You can configure the learning procedure by mentioning how much to update the model every epoch by setting the learning rate. Typical values are small such as values between 0.3 (the default) and 0.1.
The learning procedure can be further tuned with a momentum (set to 0.2 by default) to continue updating the weights even when no modifications require to be made, and a decay (set Decay to True) which will minimize the learning rate over time to perform more learning at the start of training and less at the end.
- Select “OK” to close the algorithm configuration.
- Click the “Start” button to execute the algorithm on the Boston house price dataset
You can observe that with the default configuration that Multi-Layer Perceptron algorithm accomplishes an RMSE of 4.7.
Conclusion
In this blog article, you found out about regression algorithms in Weka.
Particularly, you learned:
- About top 5 regression algorithms you can leverage for predictive modelling
- How to run regression algorithms in Weka.
- About critical configuration options for regression algorithms in Weka.