The Lost Art of Decile Analysis
The objective of classification is a primary and broadly-leveraged application of machine learning algorithms. Although, if meticulous consideration through extra analysis is not taken into the subtlety in the outcome of even a seemingly straightforward binary classifier, then the deeper meaning of your forecast might be obscured.
Binary Logistic Regression is leveraged as a Classification Algorithm when we wish the response variable to be dichotomous (Churn/not churned, Pass/Fail/No spam, etc.)
Typically, we make logistic regression into a classification algorithm by establishing a relevant probability cut-off or threshold.
The issue of classifying leveraging a threshold value
Fixing the probability threshold is essentially a business call an not a statistical one.
Frank Harrell, on his website, makes the point “classification is a forced choice.”
Now, take up this instance. You opt for a threshold value of 0.5. The ML algorithmel outputs the probability of default or no default (1 = default, 0 = no default) for four customers as 0.51, 0.49, 0.23, and 0.92. On the basis of this threshold, 2 are categorized as ‘default’ and 2 are categorized as ‘no default’. Although, ask yourself this – isn’t it too close to call for the clients with probabilities of 0.51 and 0.49? 0.51 is surely closer to 0.49 (which is categorized as no default) than it is to 0.92 (which is categorized as default)
A few machine learning packages and low code utilities don’t demonstrate the forecasted probabilities to the user overtly. The user is therefore oblivious to what forecasted probabilities are received. They merely get the decision – default or no default (1 or 0). In the scenario of 0.49 and 0.51, the user gladly made the decision that the individual will not default and will default, respectively. But a look inside the forecasted probabilities unveils that it was too close to judge!
The other issue with thresholds is that when we leverage an improper scoring ruling such as classification accuracy, it can be subject to gaming. For instance, if out of 100 people, 95 of them are defaulters on a loan, and 5 individuals pay up on time. If the classifier categorizes everybody to default on the loan, then it would have a precision of 95%!
Is there a better way to leverage Logistic Regression?
Domains like Finance and Marketing leverage Logistic Regression is a more relevant manner for credit risk modelling and marketing campaign targeting, respectively.
A real use case
Let’s assume you are a CMO responsible for sales and marketing of a particular product in your enterprise. You intend to initiate a marketing campaign to enhance sales of that specific product. You have been provided a static budget for this. Now you would desire to obtain the highest return of investment possible, that is, spend the precise fixed budget or even a lesser amount and obtain the maximum sales achievable. Here is what you possess.
You possess the data for 10,000 clients who either bought or not bought a similar offering historically.
You would like to comprehend which clients ought to be targeted to enhance the likelihood of purchase this time around.
You would definitely prefer to target those individuals who are more probable to purchase the product since you possess a static budget with regards to the campaign. How to go about this?
The solution is Decile Analysis.
So, what is a decile analysis?
Decile analysis was at one time a very popularly leveraged strategy. Although, the convention of teaching and bucketing machine learning issues into either ‘classification’ or ‘Regression’ variants led individuals to forget the Decile analysis type analyses.
Several veteran data scientists would recollect it by the naming ‘Gains Chart’. Decile Analysis is leveraged to classify a dataset from highest to lowest values or the other way round (on the basis of forecasted probabilities)
As apparent from the name, the analysis consists of dividing up the dataset into ten equivalent groups. Every group should possess the same number of observations/clients.
It ranks clients in the order from most probable to respond to least probable to respond.
The following are the steps:
- Develop a Logistic Regression model. In this scenario, the dependent variable was ‘odds of purchasing the product’ with 1 indicating purchased, 0 indicating not purchased. Also the appropriate independent variable were chosen.
- Obtain the forecasted probabilities from the Logistic Regression algorithm. Arrange the probabilities in descending fashion.
- Divide the entire dataset into 10 groupings. Every group should consist of an equal number of observations. Therefore, if there are 10,000 records, every group would possess 1000 records/clients.
- Compute the percentage of responders for every decile.
- Compute the response percentage for every decile.
- Compute the lift for every decile.
The leading decile would possess clients who are most probable to respond, followed by decile 2, which would possess clients who are next most probable to respond, and so on.
One primary benefit of the decile analysis is that the probabilities and probability range are their own error measures. In other words, if the leading decile has the probability range 0.75-0.81, then the odds of this person not purchasing the product or service, i.e. the error would be (1- [0.75-0.81], here ‘[]’ indicates the interval of values between 0.75-0.81 which includes 0.75 and 0.81.
The table below demonstrates a conventional decile analysis output.
As mentioned prior, every decile has an equal number of clients (1000 in every decile).
% of responders for every decile = No. of responders in that decile/No. of total responders for the total of 10 deciles.
Refer to table 1:
- Percentage of responders for Decile 1 = 224/984 = 22.8%
- 984 is the cumulative number of responders in all of the 10 deciles
- Likewise, percentage of responders for Decile 2 = 16.5% (162/984)
- And total percentage of responders for leading 2 deciles = 39.2%
Gains and Gain Chart
From Table 1, Decile 1 consists of the leading 10% of the clients who are most probable to purchase. Decile 1 has the largest number of responders (224 responders)
Therefore, out of the cumulative responders in all of the deciles, 22.8% of responders fall within Decile 1. Therefore, for 10% of the client base, there are 22.8% responses within Decile 1.
Likewise, for 1/5th of the client base in decile 1 and decile 2, there are 39.2% responses.
The Gain Chart below demonstrates this better:
A Gain Chart can be leveraged to evaluate what percentage of clients respond in every decile. Therefore, instead of targeting clients from lower deciles, clients can be chosen from the leading deciles only.
The baseline informs how much percentage of clients would respond if we targeted clients arbitrarily with no model.
Response Rate
The Response Rate informs us what the percentage of clients who responded in every decile is. The response percentage is largest in Decile 1, followed by Decile 2, and so on.
Response rate for every decile = Number of respondents in that decile / number of clients in that decile
Refer to table 1:
Here, response rate for Decile 1 = 224/1000 = 22.4%.
The takeaway from Response Rates comparison: The Response Rate of every decile is depicted in the chart below. The mean response rate of all deciles is 9.8%. Therefore, clients from Decile 1 to Decile 4 are above the mean response rate and ought to be targeted for the campaign.
Lift and Lift Curve
Lift = Total percentage of responders / clients at every decile.
Refer to Table 1:
- Lift for Decile 1 = 22.8%/10% = 2.28
- Lift for Decile 2 = 39.2%/20% = 1.96
Interpretation: If we go about targeting the leading two deciles, then we would target 1/5th of the clients. In the same deciles, the total percentage of responders is 39.2%. Therefore, there is an uplift of 1.96.
A life of 1 indicates there is no gain in contrast with the number of clients targeted arbitrarily. Lift bigger than 1 means the model strategy is better than choosing the clients arbitrarily.
The takeaway from lift chart: This can be leveraged to identify the deciles which possess a higher lift.
How to leverage Decile Analysis in business decision making
Now that we are done constructing Decile Analysis, the next salient question is how we leverage it to make efficient business decisions:
Let us go back to our Decile Analysis table:
Using the above results as a guide, we make a decision that we will target clients in the leading 4 deciles as they possess a higher odds of purchasing the product.
From the business perspective, the return of investment on the targeting the leading 4 deciles is increased. As we shift down the decile, the return on investment swiftly reduces and is not worth pursuing these clients.
Critical note: In this blog article, we concentrate on decile analysis, however decile analysis can be even more generalized by taking up intervals lesser than 10%, for instance, 5% or 1%. In the pursuit of simplicity of explanation, we restrict ourselves to 10% steps (decile analysis), however generalization to a lesser number of steps can be simply made.
This brings us to the conclusion of this piece on decile analysis.