Selecting machine learning algorithms – Lessons from Microsoft Azure
Microsoft features compatibility for machine learning on their Azure cloud computing program.
Lying in some their technical documentation for the platform are a few assets that you might find useful for thinking about what machine learning algorithm to leverage in differing scenarios.
In this blog post, we take a look at the Microsoft recommendations for machine learning algorithms and the lessons that we can leverage when working through machine learning problems on any platform.
Machine Learning Algorithm Cheatsheet
Microsoft put out a PDF cheatsheet of what machine learning algorithms to leverage, when:
The one-pager details several problem variants as groups and the algorithms compatible with Azure in every group.
These groups are:
- Regression: for forecasting values
- Anomaly detection: for identifying unusual data points
- Clustering: for discovering structure
- Two-class classification: for forecasting two categories
- Multi-class classifications: for forecasting three or more categories
The first issue with this strategy is that the algorithm names apparently map onto the Azure API documentation, and are not standard. A few typical names prop up but others are merely names for conventional algorithms provided a spin for simplicity (or we suspect to prevent some type of name infringement)
Along with the algorithm names are a few words on why you might select a provided algorithm. It’s a nice idea and provided that it’s a cheatsheet, it’s brief, but concise.
How to select machine learning algorithms
The objective of the cheatsheet is to help you swiftly choose an algorithm for your problem.
Is it? Probably not.
The reason is, you better not ever analytically choose one algorithm for your problem. You ought to spot check a number of algorithms and assess them leveraging whatever your requirements are for the problem.
We believe the cheatsheet is best leveraged to obtain an idea of what algorithms to throw into your spot check, viewed through the perspective of your problem requirements.
In a sister blog in the same Azure documentation, we are provided additional context that aligns with these ideas, titled “How to select algorithms for Microsoft Azure Machine Learning”
The post begins by putting forth the question: “What machine learning algorithms should I leverage” and provides a correct answer with “it depends”. They comment:
“Even the most experienced data scientist can’t tell which algorithm will perform best before attempting them.”
The worthwhile takeaway from this post is the considerations they furnish for contemplating about algorithm selection in the context of your requirements. These algorithm selection considerations are:
- Precision: Whether obtaining the ideal score is the objective or an approximate (“adequate”) solution while trading off overfitting.
- Training time: The amount of time available to train the model (we would guess, to verify and tune as well)
- Linearity: An aspect of model intricacy in terms of how the problem is modelled. Compared to non-linear models which are usually often more complicated to comprehend and tune.
- Number of parameters: Another aspect of model intricacy impacting time and expertise in tuning and sensitivity.
- Number of features: Actually, the problem of having more attributes than instances, the p>>n problem. This typically needs specialized handling or specialized strategies.
The post also furnishes a cute table of the algorithms compatible with Azure and their mapping onto some of the considerations detailed above.
We also believe that it is very expensive to develop (needs a specialist), does not scale to the hundreds of machine learning algorithms available and would need constant updating as fresh and more potent algorithms are generated and released.
How do We Select Algorithms Efficiently?
Typically, the objective of predictive modelling is to develop the most precise models provided reasonable time and resources.
Concerns of algorithm intricacy in terms of the linearity of the model and number of parameters typically are only a concern if the model is for descriptive reasons only, not for actually making forecasts.
With a well developed problem test harness, the choice of which algorithm and what parameter values to set becomes a combinatorial problem for the computer to find out, not the data scientist. In fact, like intuition in A/B testing, algorithm choice is biased and likely crippling performance.
This is the spot checking strategy to machine learning and is only feasible due to the large number of algorithms that have undergone implementation, due to potent systematic evaluation strategies (like cross validation) and due to cheap and plentiful computation.
What is the probability of your favourite machine learning algorithm performing well on an issue you have not worked on prior. Not really good.
The point we are attempting to make is that we can research machine learning algorithms and obtain a feeling for how they function and what they are apt for, but we would argue that this level of choice comes later on. It comes when you are attempting to select between 3-to-4 high performance models. It comes when you have good outcomes and you require to delve deeper to obtain improved outcomes.
Check out the cheatsheet and jot down some ideas and think about how you can leverage the ideas in your own procedure.