Improving machine learning results
Possessing a single or dual algorithms that have adequate performance on a problem is a good launching pad, however, at times, you might be incentivized to obtain the best outcome you can provided the time and assets that you have at your disposal.
In this blog post by AICorepsot, you will go through methods and strategies you can leverage to squeeze out additional performance and enhance the outcomes you are obtaining from machine learning algorithms.
When tuning algorithms you must possess a high degree of confidence in the outcomes provided by your test harness. This implies that you ought to be leveraging strategies and techniques that minimize the variance of the performance measure you are leveraging to evaluate algorithm runs. I suggest cross validation with a somewhat high number of folds (the precise number of which is dependent on your dataset)
The three techniques you will get to know in this blog post are:
- Algorithm tuning
- Ensembles
- Extreme Feature Engineering
Algorithm Tuning
The best place to begin is to obtain improved outcomes from algorithms that you already know feature good performance on your problem. You can perform this by looking into and fine tuning the configuration for those algorithms.
ML algorithms are parameterized and modification of those parameters can impact the outcome of the learning procedure. Think of every algorithm parameter as a dimension within a graph with the values of a provided parameter as a point along the axis. Three parameters would be a cube of potential configurations for the algorithm, and n-parameters would be an n-dimensional hypercube of potential configurations for the algorithm.
The goal of algorithm tuning is to identify the ideal point or points within that hypercube for your problem. You will be optimizing against your test harness, so once again, you can’t underplay the criticality of using the time to construct a trusted test harness.
You can tackle this search problem by leveraging automated strategies that impose a grid on the possibility space and sample where good algorithm configuration might be. You can then leverage those points in an optimization algorithm to zoom in on the ideal performance.
You can rinse and repeat this procedure with a plethora of well performing strategies and look into the best you can accomplish with each. It is strongly recommended that the process is automated and considerably coarse grained as you can swiftly reach points of diminishing returns (fractional percentage performance increases) that might not translate to the production system.
The more tuned the parameters of an algorithm, the more bias the algorithm will be to the training information and test harness. This technique can be efficient, but it can also create more fragile models that overfit your test harness and don’t perform as well in practice.
Ensembles
Ensemble strategies are concerned with bringing together the outcomes of several strategies in order to obtain enhanced results. Ensemble strategies function well when you have several “good enough” models that specialize in differing aspects of the problem.
This might be accomplished through several ways. Three ensemble strategies you can look into are:
- Bagging: Known formally as Bootstrapped Aggregation is where the same algorithm has differing perspectives on the issue by being trained on differing subsets of the training data.
- Boosting: Differing algorithms are trained on the same training data.
- Blending: Referred to in a formal context as Stacked Generalization or Stacking is where an array of models whose forecasts are taken as inputs to a new model that learns how to bring together the predictions into a cumulative prediction.
It is a good practice to get into ensemble methods after you have exhausted more conventional strategies. There are two good reasons for this, they are typically more complicated than conventional methods and the traditional methods provide you a good base level from which you can build and draw upon from to develop your ensembles.
Extreme Feature Engineering
The prior two techniques have looked at obtaining more from machine learning algorithms. This technique is about exposing more structure in the issue for the algorithms to learn. In data prep learned about feature decomposition and aggregation in order to normalize the information in a better way for machine learning algorithms. In this technique, we push that idea to the very limits. This technique is called extreme feature engineering, when actually the term “feature engineering” would be adequate.
Perceive of your data as possessing complicated multi-dimensional structures embedded in it that ML algorithms are aware how to identify and exploit it to make decisions. You wish to best expose those structures to algorithms so that the algorithms can function at their optimum level. A complication is that a few of those structures might be too dense or too complicated for the algorithms to identify with no help. You might also possess some knowledge of these structures from your domain expertise.
Take attributes and decompose them broadly into several features. Technically, what you are performing with this technique is minimizing dependencies and non-linear relationships into simpler independent linear relationships.
This might be an alien concept, so here are three instances:
- Categorical: You possess a categorical attribute that had the values [red, green, blue], you could split that into three binary attributes of red, green and blue and provide every instance a 1 or 0 value for each.
- Real: You have a real valued quantity that possesses values ranging from 0 to 1000. You could develop 10 binary attributes, each one indicating a bin of values (0-99 for bin 1, 100-199 for bin 2, etc.) and allocate every example a binary value (1/0) for the bins.
It is recommended to carry out this procedure one step at a time and developing a new test/train dataset for every modification you make and then evaluate algorithms on the dataset. This will begin to give you an intuition for attributes and features in the database that are exposing more or less data to the algorithms and the impacts on the performance measure. You can leverage these outcomes to guide further extreme decompositions or aggregations.
Conclusion
In this blog post by AICorespot, you learned about three techniques for obtaining enhanced outcomes from machine learning algorithms on your problem.
- Algorithm Tuning where finding out the ideal models is treated like a search problem through model parameter space.
- Ensembles where the forecasts made by several models are combined.
- Extreme Feature Engineering where the attribute decomposition and aggregation observed in data preparation is pushed to the limits.
Resources
If you are seeking to delve deeper into this topic, the following resources might be beneficial in your pursuit.
- Machine learning for hackers, Chapter 12: Model Comparison
- Data Mining: Practical Machine Learning Tools and Techniques, Chapter 7: Transformations: Engineering the input and output.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Chapter 16: Ensemble Learning