An introduction to the Weka Machine Learning Workbench
Machine learning is an iterative process instead of a linear process that needs every step to be revisited as additional info is gleaned upon with regards to the issue that is being investigated. This iterative procedure can need leveraging several differing tools, programs, and scripts for every process.
A machine learning workbench is a platform or environment that assists and facilitates an array of ML activities minimizing or eradicating the requirement for several tools.
A few statistical and machine learning work benches such as R furnish very sophisticated utilities, however, need a ton of manual configuration in the shape of scripts and programming. The utilities also have a tendency to be fragile, authored by and for academics instead of authored to be robust and leveraged in production settings.
What is Weka
The Weka Machine Learning Workbench is a sophisticated platform for applied machine learning. Weka is an acronym that stands for Waikato Environment for Knowledge Analysis. It also named after the Kiwi bird Weka.
Five features of Weka that ought to be promoted are:
- Open source: It is put out as an open source software under the GNU General Public License. It is dual licensed and Pentaho Corporation has the ownership to the exclusive license to leverage the platform for business intelligence in their proprietary product.
- Graphical interface: It possesses a Graphical User Interface (GUI). This enables you to finish your machine learning projects without the need for programming.
- Command Line Interface: All features of the software can be leveraged from the command line. This can be very beneficial for scripting big jobs.
- Java API: It is authored in JAVA and furnishes an API that is adequately documented and promotes integration into your proprietary apps. Observe that the GNU General Public License implies that in turn your software would also have to be put out in GPL.
- Documentation: There exist books, wikis, MOOC courses, and manuals that can get you up to speed on how to leverage the platform efficiently.
The primary purpose WEKA is being promoted here is due to the fact that a starter can go through the procedure of applied machine learning leveraging the graphical interface without needing to perform any programming. This is a massive deal as obtaining a handle on the procedure, managing data, and experimenting with algorithms is what a starter should be getting to know about, not learning another scripting language.
Intro to the Weka GUI
Now we wish to show the graphical user interface a little bit and provide you with encouragement to download and play around with WEKA. The workbench furnishes three primary ways to operate on your problem. The Explorer for toying around and attempting things, the Experimenter for controlled experiments, and the KnowledgeFlow for graphically developing a pipeline for your problem.
Weka Explorer
The explorer is where you toy around with your data and contemplate about what transforms to apply to your information, what algorithms you wish to run in experiments.
The explorer interface is subdivided into five differing tabs:
1] Preprocess: Load a dataset and manipulate the information into a shape that you wish to work with.
2] Classify: Choose and run classification and regression algorithms to work with on your data.
3] Cluster: Choose and run clustering algorithms on your dataset.
4] Associate: Run association algorithms to obtain insights from your information.
5] Choose attributes: Execute attribute selection algorithms on your information to choose those traits that are appropriate to the feature you wish to forecast.
6] Visualize: Visualize the relationship amongst attributes.
Weka Experimenter
This interface is for developing experiments with your choosing of algorithms and datasets, executing experiments and for analysis of the outcomes.
The utilities for analysis of the outcomes are very potent, enabling you to consider and contrast outcomes that are statistically noteworthy over several runs.
Knowledge Flow
Applied machine learning is a procedure and the Knowledge Flow interface enables you to graphically develop that procedure and execute the designs that you develop. This consists of loading and transformation of input data, execution of algorithms and the presentation of outcomes.
It is a potent interface and metaphor for finding solutions to complicated problems graphically.
Tips to get started
Setup WEKA immediately – it supports the three primary platforms: Windows, OS X, and Linux. Identify the distribution for your platform, obtain it, set it up and start it up. You might have to setup Java first. The setup consists of many traditional experimental datasets (in the data directory) that you can load and practice.
- Download and setup WEKA
- Download and setup Java
Read the Weka Documentation
The download consists of a PDF manual (WekaManual.pdf) that can get you up to speed really quickly. It is very detailed and comprehensive with screenshots. There is a ton of supplementary documentation online.
Extensions and plugins for WEKA
There are a ton of plugin algorithm, extensions and even platforms that add on top of WEKA:
- More datasets
- More packages
Online courses on Weka
There are two primary online courses that instruct data mining with Weka:
- Data Mining with Weka. You can look at all the videos for this course for free on Youtube.
- More data mining with Weka.
Rushdi Shams has an incredible Channel of Youtube videos illustrating to you how to do tons of particular tasks in WEKA.