Applied Machine Learning as a Search Problem
Applied machine learning is a challenge as the development of an ideal learning system for a provided problem is intractable. There is no ideal training data or ideal algorithm for your problem, only the best that you can find out. The applying of machine learning is best viewed as a search issue for the ideal mapping of inputs to outputs provided the know-how and assets available to you for a provided project.
In this blog post by AICoreSpot, we will find out about the conceptualization of applied machine learning as a search issue.
Upon going through this post, you will be aware of the following:
- That applied machine learning is the issue of approximation of an unknown underlying mapping function from inputs to outputs.
- That design-related decisions like the choosing of data and choosing of algorithm narrow the scope of potential mapping functions that you might eventually opt for.
- That the conceptualization of machine learning as a search assist in rationalizing the leveraging of ensembles, the spot checking of algorithms and the comprehension of what is occurring when algorithms learn.
This blog post is demarcated into five portions, which includes:
- Problem of function approximation
- Function approximation as search
- Choosing of data
- Choosing of algorithm
- Implications of machine learning as search
Issue of function approximation
Applied machine learning is the development of a learning framework to tackle a particular learning issue.
The learning issue is characterized by observations made up of input data and output information and some unknown but coherent relationship amongst the two.
The objective of the learning system is to learn a generalized mapping amongst input and output information such that skilful predictions can be rendered for new examples drawn from the domain where the output variable is not known.
Within statistical learning, a statistical perspective on machine learning, the issue is framed as the learning of a mapping function (f) provided input data (X) and related output data (y).
y = f(x)
We have a sampling of X and y and do our best to come up with a function that goes about approximating f, for example, fprime, such that we can render predictions (yhat) provided new instances (Xhat) in the future.
yhat = fprime (Xhat)
As such, applied machine learning can be viewed of the problem of function approximation.
The learned mapping will be imperfect. The issue of developing a learning system is the problem of learning a useful approximation of the unknown underlying function that goes about mapping the input variables to the output variables.
We are not aware of the form of the function, as if we did, we would not require a learning system, we could specify the answer directly.
As we are not aware of the real underlying function, we must approximate it, implying we do not know and might never know how close of an approximation the learning system is to the real mapping.
Function approximation as Search
We must look for an approximation of the real underlying function that is adequate enough for our pursuits.
There are many sources of noise that introduce error into the learning procedure that can make the process more of a challenge and as a result, have the outcome of less efficient mapping. For example:
- The option of the framing of the learning issue.
- The option of the observations leveraged to train the system.
- The choice of how the training data is prepped.
- The choice of the representational form for the predictive model.
- The choice of the learning algorithm to fit the model on the training data.
- The choice of the performance measure by which to assess predictive skill.
And a lot more.
We can observe that there are several decision points in the production of a learning system, and none of the solutions are known prior.
You can view all potential learning systems for a learning issue as a huge search space, where every decision point narrows the search.
For instance, if the learning issue was to forecast the species of flowers, one of millions of potential learning systems could be honed in on as follows:
- Opt to frame the issue as forecasting a species class label, e.g. classification,
- Opt for measurments of the flowers of a provided species and their related sub-specieis.
- Opt for flowers in one particular nursery to quantify in order to gather training information.
- Opt for a decision-tree model representation so that the forecasting can be detailed to stakeholders.
- Choose the CART algorithm to fit the decision tree model.
- Choose classification precision to assess the skill of models.
And so on.
You can also observe that there might be a natural hierarchy for several of the decisions integrated into production of a learning system, every one of which further narrows the space of potential learning systems people could develop.
This narrowing puts forth a useful bias that intentionally chooses one subset of potential learning systems over another with the objective of getting closer to a useful mapping that we can leverage in practice. This biasing applies both at the top level in the framing of the issue and at low levels, like the option of machine learning algorithm or algorithm configuration.
Choice of Data
The chosen framing of the learning issue and information leveraged to train the system are a huge point of leverage in the production of your learning system.
You do not have access to all data: that is all pairs of inputs and outputs. If you did, you would not require a predictive model to make output forecasting for fresh input observations.
You do possess a few historical input-output pairs. If you did not, you wouldn’t have any information with which to go about training a predictive model.
But perhaps you possess a lot of data and you require to choose only some of it for training. Or perhaps you have the freedom to produce data at will and are challenged by what and how much data to produce or gather.
The information that you choose to model your learning system on must adequately capture the relationship amongst the input and output data for both the data that you have at your disposal and data that the model will be expected to render predictions on in the future.
Choice of Algorithm
You must choose the representation of the model and the algorithm leveraged to fit the model on the training information. This, in turn, is another major point of leverage on the production of your learning system.
Typically this decision is simplified to the choosing of an algorithm, even though it is typical for the project stakeholders to impose constraints on the project, like the model being able to describe predictions which in turn imposes constraints on the form of the final model representation and in turn on the scope of mappings that you can search.
Implications of machine learning as Search
This conceptualization of producing learning systems as a search problem assists to make clear several related issues within applied machine learning.
This section observes a few of these:
Algorithms that do iterative learning
The algorithm leveraged to learn the mapping will put forth other constraints, and it, combined with the opted algorithm configuration, will handle how the space of possible candidate mappings is navigated as the model is fit (for instance, for ML algorithms that learn iteratively.)
Here, we can observe that the act of learning from training data by a machine learning algorithm is in essence navigating the space of potential mapping with regards to the learning system, hopefully shifting from weak mappings to improved mappings. (for example, hill climbing)
This furnishes a conceptual rationale for the part of optimization algorithms at the heart of the machine learning algorithms to obtain the most out of the model representation for the particular training data.
Rationale for Ensembles
We can also observe that differing model representations will occupy very different locations in the space of all potential function mappings, and it turn possess very differing behaviour when making forecasts (e.g. uncorrelated prediction errors)
This furnishes a conceptual rationale for the part of ensemble methods that bring together the predictions from differing but skilful predictive models.
Rationale for Spot Checking
Varying algorithms with differing representations may begin in differing positions in space of potential function mappings, and will navigate the space in a different fashion.
If the constrained space that these algorithms are navigating is well detailed by an appropriating framing and good data, then a majority of algorithms will probably discover good and similar mapping functions.
We can also observe how a good framing and meticulous selection of training information can open up a pocket of candidate mappings that may be identified by a suite of sophisticated, capable, machine learning algorithms.
This furnishes rationale for spot checking a suite of algorithms on a provided machine learning issue and doubling down on the one that demonstrates the most promise, or choosing the most parsimonious option.