Metrics applied in propensity models

  • Articles
  • Data Science & IA


Today, we live in a world where precise decisions can help companies to increase their sales exponentially or to make more focused marketing campaigns. For example, a company wants to rescue customers who no longer buy or who are about to stop buying, or companies that need to know which customers to contact in order to close a sale. In these cases, it is important to know which customers are more likely to respond positively to the company’s efforts.

A propensity model is a tool that can provide valuable information about people’s most interesting behavior. This model identifies the probability that someone will perform a certain action. Below are some examples where propensity models are used:

  • Churn prediction: accurate modeling allows you to detect signs of churn and take action to prevent it before it occurs.
  • Personalization of offers and products: Companies can increase sales and improve loyalty by offering the most relevant products and services for a particular customer.
  • Calculate the likelihood that a customer will purchase a product: Companies often use these models to see who is most likely to buy a product.
  • Determine who will be brand/product recommenders: By knowing the Net Promoter Score (NPS) of some of their customers, companies can predict the likelihood that new or other customers will be promoters of their brand. This improves service and recovery, reduces survey volume and associated costs, and helps companies prioritize their customer experience initiatives.

Gradually, within the world of Data Science, propensity models have been positioning themselves as an important tool within companies. In order to build these models, it is essential to have data: history of customer subscriptions, history of customers who have stopped using a product/still use the product, history of customers who have accepted certain offers. On the other hand, Machine Learning algorithms are needed that can automate the search for patterns within that data, which help to increase the predictive power and thus obtain good results. There are several algorithms to create a propensity model, such as LightGBM, Catboost, among others.

We will focus on showing the use case in the context of car buyers and buyers, and the metrics commonly used in data science to visualize and compare propensity models.


Prior to the use of propensity models, the customer contact strategy observed was based on the business knowledge and know-how of salespeople, supported by a system for accessing historical customer information. In other words, the decision of which profiles to contact was based solely on the knowledge of marketing and sales executives. Thinking about optimizing the number of sales made, how do you quickly evaluate who should be contacted first? Would you be ignoring potential buyers with a higher probability of sales? What would happen if a customer profile arrives that they have never seen before? It is possible to see that a priori there is no certain information about how customers will behave, and we do not know with certainty who will make the sale and who will not. Nowadays, the amount of data entering the companies becomes unmanageable for the salesperson. You go from 10-20 customers to thousands. That is why it is important to automate and provide the salesperson with a limited list of customers to contact.

If we take a group of car buyers from the list shown in Table 1, we do not know how they will behave, nor do we know who will have a better chance of making a purchase than the rest. At this point, a sales executive might decide on the basis of his experience and knowledge that those Roma bidders looking for black cars have the best chance of making a purchase compared to other profiles and, therefore, will prioritize contacting them.

Table 1: List of clients interested in buying cars. The table shows a limited number of variables describing the customers. The more variables, the better the model.

The solution to this problem is provided by the propensity model. Assuming that the company has historical data on which customers bought certain vehicles, the model will provide the probability of whether other customers will

Table 2: List of customers ordered by the percentage of propensity to buy delivered by the propensity model. It is important to note that the propensity percentage is obtained based on the history of those customers who have already bought a car.

To get a better model, (given that we already have a preprocessed dataset) it is necessary to do several tests between models and optimization of them (Models: different parameters, different datasets, different columns, etc). For this optimization we need to define some metrics that will indicate how well the model makes predictions, which are described below.

ROC curve

Once the model has been trained and tested, its predictive capacity is analyzed by evaluating the characteristic curves, based on the main metrics used in binary classification models, such as true positive rate (TPR or also called recall), false positive rate (FPR) and precision. The usefulness of these performance metrics lies in their ability to compare different models and their performance. Below are their calculation formulas:


  • TP = True Positive = Number of positive cases correctly identified.
  • FP = False Positive = Number of positive cases incorrectly identified
  • TN = True Negative = Number of negative cases correctly identified
  • FN = False Negative = Number of negative cases incorrectly identified.

Positive cases are understood as those cases in the validation set that correspond to an event that did occur. For example, in the case of clients prone to elopement, positive cases are those in which the elopement did occur. In the case of people who have received an offer, the positive cases are those in which the purchase does take place after the offer.

Given the context of propensity to buy, we will consider a model to be optimal if we reduce missed sales opportunities. . Therefore, it is very important to adjust and optimize the model looking for a good recall, minimizing the loss of real sales opportunities.

On the other hand, the Receiver Operating Characteristic (ROC) curve shows the different possible operating points delivered by the model, reflecting the trade-off between the false positive rate (quotes that do not buy classified as buy) and true positive rate (quotes that buy correctly identified). Figure 4 shows the comparison of ROC curves between two models, where the one presents a 5% improvement in its AUC (area under the curve) with respect to the first model. As a rule, the closer the area under the curve is to 1, the better the predictive power of the model.

Figure 1: The ROC curve of model 1 (in red) presents an AUC = 0.847 and model 2 (in blue) an AUC = 0.895.


A key question in purchase propensity models is how can we quantify the value of this model in identifying customers with purchase propensity? To answer this question, we introduce the concept of the Gain curve. This curve allows us to quickly infer the benefit of using a propensity model that identifies buyer profiles within the pool of potential customers. In particular, it will show us what percentage of the total buyer profiles we will find in a given percentage of our list (Table 2).

Additionally, this curve allows us to make a comparison between models. If we have two propensity models and we would like to quantify the value that each one brings us, we can simply compare their Gain curves on the same graph.

Lift and Gain curves are widely used in propensity models because they allow us to answer the following questions:

  • What is the effectiveness of the propensity model over chance (without using a model)?
  • What percentage of the contributors should I contact to obtain a certain % of total purchases?
  • How much effort in contacts should I make to find the 80% of the contributors who have a real intention to buy?

It is worth mentioning that the interpretation of the Lift and Gain curves has the advantage of leaving aside the ‘binary classification’ view of the model, i.e. instead of having a list of customers with a “Yes buy” or “No buy” result, you will have a 0 to 100% probability that the customer will buy.

The Lift curve shows how much better the model is in contrast to random.  It can be seen in Figure 5 that the model curve (in orange) is always higher than the baseline curve, indicating that the model is preferable to ‘flipping a coin’ for predicting purchases.

Figure 2: The Lift curve (in orange) shows the portion between the recovery of positive cases (purchases) using the model’s score-ordered deciles, in contrast to chance. It is observed that for the first 20% of the sample data, the model is approximately 4 times more effective than random in ranking the purchase cases.

In particular, the model recovers better the purchase cases: in the first 2 deciles with the highest score, the model is 4 times more effective than a coin toss.

However, the score, equivalent to the probability that the model delivers to each quotation, has a better interpretation in business context if the Gain curve is observed.

The Gain curve shows the portion of favorable cases recovered by taking the top Xth decile of the data set, according to the score provided by the . For example, the graph in Figure 6 shows that, if we take the top 20% of the contributors according to the model score, we would be recovering about 80% of the purchase cases. Moreover, if the top 40% of the same set is taken, 90% of the cases with purchase would be obtained.

Figure 3: The Gain curve (in orange) shows the cumulative portion of positive cases (purchase) that are recovered as the highest scoring deciles of the sample are incorporated. It is observed that for the first 20% (percentage of sample = 0.2), 80% of the sample purchases are recovered.

This is the substantial contribution of the model: it can help in the decision making process regarding the contributors: which are more likely to buy, which are less likely to buy, and with this information, refine the marketing strategy and improve the effectiveness of sales management.


The development and implementation of a propensity model are highly complex tasks, requiring a detailed review of the available data for the correct training and validation of the models. To obtain a model that provides valuable information, it is essential to define the metrics that fit the business case to be addressed.

In this blog we showed the metrics to compare models in the context of propensity models, such as recall, ROC curves, Gain and Lift. These last two have a friendly and easy to understand approach, allowing to know graphically the advantage of using a predictive model to choose the customers to be contacted. The Lift curve shows the higher probability of receiving respondents than if we contact a random sample of customers. For example, if we contact only 10% of the customers based on the predictive model, we will reach three times more respondents than if we do not use any model. On the other hand, the Gain curve allows us to understand the incremental gains due to increased effort, e.g., what is the percentage of positive responses we receive if we increase the percentage of customers contacted, allowing us to better understand what our optimal effort/impact is and to be able to focus on the Pareto.