Design and run your first experiment in Weka
Weka is the ideal platform to learn machine learning. It furnishes a graphical user interface for looking into and experimenting with ML algorithms on datasets without you having to be concerned about the mathematics or the programming.
A potent feature of Weka is the Weka Experimenter interface. Not like the Weka Explorer that is for filtration of data and attempting different algorithms, the Experimenter is for developing and carrying out experiments. The experimental outcomes it generates are robust and are adequate to be published – if you possess know-how of what you’re doing.
In an upcoming post, you will learn about how to run your first classifier in the Weka explorer.
In this blog article by AICoreSpot, you will find out about the power of the Weka experimenter. If you follow the step-by-step instructions, you will develop a run for your first machine learning experiment in under five minutes.
1] Get and setup Weka
Go to the Veka download page and locate a variant of Weka suitable for your device, be it Windows, Linux, or Mac.
Weka needs Java. You might already have Java setup and if not, there are variants of Weka detailed on the download page (for Windows) that consist of Java and will set it up for you. For mac users, and a lot like everything else on Mac, Weka functions right outside of the box.
If you have interest in machine learning, then it’s a corollary you can find out how to download and setup software into your own device.
2] Begin Weka
Begin Weka. This might consist of identifying it in program launcher or dual clicking on the weka jar file. This will initiate the Weka GUI chooser.
The Weka GUI chooser lets you select one of the explorer, experimenter, KnowledgeExplorer and the Simple CLI (command line interface)
Click the ‘experimenter’ button to initiate the Weka experimenter.
The Weka Experimenter enables you to develop your proprietary experiments of carrying out algorithms on datasets, carry out the experiments and undertake analysis of the outcomes. It’s a very capable tool.
3] Design Experiment
Tap on the ‘New’ button to develop a new experiment configuration.
Test Options
The experimenter sets up the test options for you with sensible defaults. The experiment is setup to leverage Cross Validation with 10 folds. It is a “Classification” type problem and every algorithm + dataset combo is executed 10 times (iteration control).
Iris Flower Dataset
Let’s begin by choosing the dataset.
- Within the ‘datasets’ option, choose the “Add New…” button.
- Open the “data” directory and select the “iris.arff” dataset.
The Iris Flower dataset is a widespread dataset from stats and heavily borrows from researchers in machine learning. It consists of 150 examples (rows) and four attributes (columns) and a class attribute for the species of iris flower (one of setosa, versicolor, virginica). You can know more about the Iris Flower dataset on Wikipedia.
Let’s select three algorithms to execute our dataset.
ZeroR
1] Click “Add new…” in the “Algorithms” section.
2] Click the “Choose” button.
3] Click “ZeroR” under the “rules” section.
ZeroR is the simplest algorithm we can execute. It chooses the class value that is the majority in the dataset and provides that for all forecasts. Provided that all three class values have an equivalent share (50 instances), it chooses the preliminary class value “setosa” and provides that as the answer for all forecasts. Just off the top off our minds, we are aware that the ideal result ZeroR can furnish us is 33.33% (50/150). This is adequate to have as a baseline that we demand algorithms to outpace.
OneR
- Select ‘Add new….’ in the “algorithms” section
- Click the “Choose” button
- Click “OneR” under the “rules” selection.
OneR is literally our 2nd simplest algorithm. It chooses one attribute that ideally correlates with the class value and splits it up to obtain the best forecasting precision it can. Like the ZeroR algorithm, the algorithm is so simplistic that you could go about implementing it by hand and we would expect that more advanced algorithms outpace it.
J48
- Select “Add New” in the “Algorithms” section.
- Click the “Choose” button
- Click “J48” under the “trees” selection.
J48 is a decision tree algorithm. It is an implementation of the C4.8 algorithm in Java (“J” for Java and 48 for C4.8”) The C4.8 algorithm is a minor extension to the famous C4.5 algorithm and is a very capable forecasting algorithm.
4] Execute Experiment
Choose the “Run” tab at the top of the display.
This tab is the control panel for executing the presently configured experiment.
Select the big “Start” button to begin the experiment and watch the “Log” and “Status” sections to keep a look over how it’s performing.
Provided that the dataset is small and the algorithms are quick, the experiment should finish in a matter of seconds.
5] Review results
Choose the “Analyze” tab at the top of the display.
This will open up the Experiment results analysis panel.
Choose the “Experiment” button in the “Source” section to load the outcomes from the present experiments.
Algorithm Rank
The first thing we wish to know is which algorithm was the ideal one. We can perform that through ranking of the algorithms by the number of times a provided algorithm beat the other algorithms.
1] Click the “Select” button for the “Test base” and select “Ranking”
2] Now choose the “Perform Test” button
The ranking table displays the number of statistically noteworthy wins every algorithm has had compared to all other algorithms on the dataset. A win, implies a precision that is better than the precision of another algorithm and that the difference was statistically noteworthy.
We can observe that both J48 and OneR have one win each and that ZeroR has two losses. This is good, as it implies that OneR and J48 are both possibly contenders outperforming out baseline of ZeroR.
Algorithm Precision
Then, we wish to know what scores the algorithms accomplished.
1] Choose the ‘Select’ button for the ‘Test Base’ and select the ‘ZeroR’ algorithm in the list and choose the “Select” button.
2] Choose the check-box next to “Show std. deviations”
3] Now choose the “Perform test” button.
Within the “test output” we can observe a table with the outcomes for three algorithms. Every algorithm was executed 10 times on the dataset and the precision reported is the mean and the standard deviation in rackets of those 10 runs.
We can observe that both the OneR and J48 algorithms have a little “v” next to their outcomes. This implies that the difference in the precision for these algorithms contrasted to ZeroR is statistically noteworthy. We can additionally observe that the precision for these algorithms contrasted to ZeroR is high, so we can state that these dual algorithms accomplished a statistically significantly improved result in comparison with the ZeroR baseline.
The score for J48 is larger than the score for OneR, so next we wish to see if the difference between these two precision scores is noteworthy.
1] Choose the “Select” button for the “Test Base” and select the “J48” algorithm in the list and choose the “Select” button.
2] Now choose the “Perform Test” button.
We can observe that the ZeroR has an * next to its outcomes, signifying that its outcomes contrasted to the J48 are statistically differing. But we are already aware of this. We do not observe an * next to the outcomes for the OneR algorithm. This informs us that even though the mean precision between J48 and OneR is different, the differences is not considerably different.
Everything else being equal, we would select the OneR algorithm to make forecasts on this problem as it is the simpler of the two algorithms.
If we wished to report the outcomes, we would state that the OneR algorithm accomplished a classification precision of 92.53% (+/- 5.47%) which is statistically significantly better than ZeroR at 33.33% (+/-5.47%)
Conclusion
You found out how to configure a machine learning experiment with a single dataset and three algorithms in Weka. You also came to know about how to analyse the outcomes from an experiment and the criticality of statistical significance when interpreting outcomes.
You now have the capability to develop and execute experiments with any algorithms furnished by Weka on dataset of your selection and meaningfully and confidently report outcomes that you accomplish.