The lift chart is synonymous with evaluating data mining model performance and the predictive power of one model against another. Often, in presentations and training sessions it is suggested that the chart is indicative of the models ability to accurately predict within a training population. For example, the following explanation is provided;

*“the lift chart shows that this model is good because it only needs to evaluate 30% of data in order to correctly predict all the target outcomes”
*

This type of statement is simply not true – it is INCORRECT, WRONG, MISLEADING and shows a lack of understanding about what the chart represents. This post looks at explaining the chart by examining how it is created – seeking to remove some of the misconceptions about the use of the chart.

Consider the following example. In this chart it would be argued that an ideal model (red line) would only need ~ 55% of the population in order to predict all the target states. Without a model, we would need to use the entire population and so our model (being somewhat useful) lies between the best model and a random guess. These values can be determined by the intercepts each model with the X axis (note that at 55% of the population, the best model achieves 100% accuracy).

Another common question arising from the interpretation of this chart occurs when we know that the target (predicted) state is found in only 55% of the population. The question is “why do we show 100% accuracy when only 55% of the population can exist in the predicted state and therefore the Y axis should have a maximum of 55%”.

For my own analysis, I shall ask a question of the reader so that the construction of the chart can better be understood. The question is simple.

**If my model does not predict the correct result 100% of the time how could my accuracy ever achieve 100%? Let’s be realistic, it would have to be a pretty impressive model to never be wrong – and this is what the chart always shows à 100% accuracy!
**

**Now let’s look at construction
**

In order to create a lift chart (also referred to as an accumulative gain) the data mining model needs to be able to predict the probability of its prediction. For example, we predict a state and the probability of that state. Within SSAS, this is achieved with the PredictProbability

function.

Now, since we can include the probability of our predictions, we would naturally order training data by the predicted probability in suggesting the likelihood of a test case being the predicted target state. Or perhaps put another way, if I only had 10 choices for choosing which test case (that is a an observation from the testing data) would be the predicted value, I would choose the top 10 testing cases based on their predicted probability – after all the model is suggesting that these cases have the highest probability of being the predicted state.

As we move through the testing data (and the predicted probability decreases), it is natural to expect the model to become less accurate – will make more false predictions. So let’s summarise this (training) data. For convenience, I have group my training data into 10 bins and each bin has ~ 320 cases (the red line below). Working with the assumption that the predictive power of my model decreases with probability, the number of predictions also decreases as we move through more of the training data. This is clearly visible in the chart and data below – the first bin has a high predictive power (275 correct predictions) while the last bin has only 96 correct predictions.

If I focus on the models ability to correctly predict values, I will notice that it can predict 1,764 correct results – but now let’s turn our look to the accumulative power of the model. If, from the set of my sample data I could only choose 322 cases (coincidently this is the number of cases in bin 1), I would choose all cases from Bin 1 and get 275 correct (or 16% of the possible correct values). If I had to choose 645 cases, I would choose the cases from bin 1 & 2 and get 531 correct (30% of correct possibilities). This continues with the more predictions that I make and is summarised in the following table.

This data is transposed onto the lift chart – the Bin on the X axis (representing the % of population) and the Percent Running Correct on the Y axis (representing the number of correct predictions). As we can see, the data is indicative of the models ability to quickly make accurate predictions rather than its overall predictive ability.

**Best and Random Cases
**

The chart also includes best and random cases as guidelines for prediction – let’s focus on these. These lines are theoretical – really ‘what if’ type of scenarios.

Suppose that I had an ideal model. If this was the case my model would predict 322 in bin 1, 323 in bin 2 and so on – it must because we have ordered the data by PredictProbability and in a perfect world we would get them all correct! However, the model can only predict 1,764 correct values -we know this from the actual results. Because of this we would only need up to bin 6 to get all our correct values (see column ‘Running Correct Best Case’ in the following table. Just as we did for the model prediction we can convert this to a percent of total correct (the population) and chart it with the model.

Now for the random guess – again this is theoretical. I know that I can only predict 1,764 correct values so, if these were evenly distributed amongst my bins, I would have ~176 correct predictions in each bin. This is then added to the chart.

**What is the Problem?**

Now we can see that the chart is essentially just a view of how quickly the model makes accurate predictions. Perhaps there is nothing wrong with that but what happens when we compare models? Well, in this case, the comparison is relative. Those steps are reproduced for each chart and what you essentially see is relative comparative performance. Thus, the comparison of two models in the charts gives NO indication of performance accuracy – after all how could they since they each plot relative percent accuracy for their own states.

For this reason, relying on this chart as a sole measure of accuracy is just dangerous and really shows very little about the total accuracy of the model.

**Conclusion
**

Understanding how the lift chart has been constructed can help in understanding how to interpret it. We can see that it indicates the accumulative power of the model to give predictions – or perhaps more correctly the accumulative power of the model to give **its **correct prediction.

If you really want to compare the accuracy of models why not compare # of true and false predictions?

I did not understand your statement saying that you can not use a lift chart to compare different models. If the target is exactly the same for your different models (say RF, Boosting, etc.), couldn’t you use a lift chart to compare them (and even reduce your lift chart to a single metric such as AUC)?

Hi Tanguy,

Thi issue that i have with relying on the lift for validation is that there’s no magnitude to it so, you need a confusion matrix to effectively validate the output. Why choose a model that better predicts its results (which is what the lift chart is showing) but is not right very often? I’d much rather go with a model that predicts more accuratley.