Cohort analysis and CLV study

  • Advanced Analytics
  • Articles


Nowadays, with the availability of data available to companies, it is possible to generate multiple analyses to improve decision making and enhance knowledge of customer behavior. There are several ways that can help achieve this goal, in this case, we will focus on analyzing customer behavior using a data analysis technique that is based on segmentation.

This technique is known as cohort analysis. This technique is relatively simple and provides a method to find insights about customer/user behavior by separating different segments by a certain common trait called a cohort.

This type of analysis is constantly used in the different Brain Food projects and is a very useful tool in digital marketing when you want to analyze both the impact of a campaign and the behavior of customers against certain offers and to see if they have an impact on customer churn.

¿What is a cohort?

When we talk about a Cohort, we are talking about a group of people who have something in common, for example, the month of their first purchase, geographic location, age, etc. In this case, we will focus on customers who purchase in the same month for the first time within the transaction database.

This type of segmentation differs from other techniques that are based on customer behaviors where unsupervised (untagged data) segmentation algorithms can be used. An example of this type of segmentation is presented in our blog “Introduction to segmentation”.

Study metrics

Before starting, for cohort analysis it is necessary to select a study metric that makes sense with the business model, since it is important to define a priori correctly what is being sought while remaining aligned with the company’s objectives.

Some of the applications of cohort analysis are:

  • Retention: measures the proportion of users who continue to use a product or service over time.
  • Conversions: measures the percentage of users who perform a desired action (such as buying a product).
  • Monetization: measures the monetary value that users generate for a company through their purchases or actions.
  • Engagement: measures the level of user participation and engagement with a product or service.

The following is the cohort analysis to study customer retention.

Business case

In this section, the behavior of customers after making their first purchase in a store will be analyzed. For each group or cohort, an analysis can be made of their behavior and patterns, or changes in them throughout their life cycle. Also, you can study the actions they take and how their behavior differs from that of other cohorts.

In conjunction with more business information, you can pose questions to answer that make sense for the purpose of the analysis, such as:

  • What proportion of customers return to purchase the following month?
  • Which of the groups (or cohorts) has better retention?
  • What is the effect of adding a new product to the catalog?
  • How do users behave with respect to a certain offer?
  • Does any cohort have an increase/decrease in purchases after a certain number of periods?


The data used in the analysis correspond to a transactional dataset that occurred between 12/01/2010 and 12/09/2011 for a UK-based registered online retailer [1]. The following table shows some rows randomly extracted from the base.

The columns are described below:

  • InvoiceNo: Transaction number
  • StockCode: SKU code
  • Description: SKU description
  • Quantity: Quantity sold
  • InvoiceDate: Date of the transaction
  • UnitPrice: Unit price of the SKU
  • CustomerID: Customer ID
  • Country: Country of purchase
  • TotalSum: Total of sale

To perform a cohort analysis based on the month of purchase, we need to create certain variables from the transactional data:

  • cohort: indicates the month in which the customer made their first purchase.
  • order_month: is the month in which the customer made a purchase.

After calculating these variables the table would look like this:

Then, it is necessary to group the data by cohort, thus obtaining the information for each one of them. Each row represents a cohort and the columns group the information over a certain period of time, in this case by month, as shown below:

For better visualization we left the number of new customers per month (cohort) on the axis and the time period of study and the values within the matrix represent the number of consumers. We generated a heat map to show the number of consumers per cohort and time period:

This image shows for each period and by cohort the number of consumers reflected in a color scale to get a better idea of how the groups are distributed depending on size. For example, for the 2011-03 cohort, the number of clients decreases from period 0 to 1 by approximately 85%.  Here we can see how it goes from a deep blue to an orangey red.  Then it shows some fluctuations and ends up with another drop in period 9 where we see an even darker red.

Finally, what we are really looking for is to obtain the retention matrix for each cohort and its evolution through the different periods. To do this, we divide each value by the first value of its respective row, which represents period 0.

Then, we plot this matrix in the same way as the previous one, obtaining the retention of clients by cohorts for the different periods.

The retention matrix provides information about the behavior of customers after their first purchase. On the y-axis we find the cohorts and on the x-axis the period. Each entry indicates the percentage of customers who remained active for each period pertaining to their respective date of first purchase:

This matrix clearly shows a drop in the number of clients returning to purchase the following month (from period 0 to 1). On average, the number of customers returning to purchase the following month is 20.6%, with the highest retention being the 2010-12 cohort with 37.7% with purchases the following month.

Also, different conclusions can be drawn from retention, which generally need to be accompanied by an understanding of the business. For example, if we look again at the first cohort, it ends up with a surprisingly high retention in period 11 compared to the penultimate period of the other cohorts. This could be because the first customers may be receiving some particular benefit or offer. This alone cannot draw conclusions with certainty, but it does provide information about a different behavior for that group of clients. Subsequently, it would be interesting to deepen the analysis to understand what happened in that particular period.

Finally, certain fluctuations in retention over time are observed. This could be due to particular characteristics of the business where customers do not necessarily generate purchases periodically and where periods of inactivity are not uncommon.

Customer Lifetime Value (CLV)

In conjunction with cohort analysis, the study of customer lifetime value (CLV) provides an even more comprehensive view of the relationship between a company and its customers. CLV refers to the amount of money a customer generates for a company over the course of its entire business relationship and is a prediction of the net profit attributed to that future relationship.


By analyzing the cohort data, we were able to segment customers into groups based on the date of their first purchase, and analyze how those groups differ in terms of purchase frequency and total spending. By measuring CLV, the company could determine which are its most valuable customers and focus its marketing strategy on attracting and retaining those customers.

To do this, we calculate a priori certain variables:

  • Average customer revenue: average amount of money a customer spends with the company.
  • Customer lifetime: number of years a customer is expected to remain with the company.
  • Retention rate: metric that measures the number of customers or users that remain active or loyal to a company or product over time (retention = active_customers_period / active_customers_previous_period).
  • Customer churn: is a metric that measures the rate of customer loss over a specific period of time (churn = 1 – retention).

In this study, the calculation of CLV was carried out using three different methods, specifically:

Basic method: this is based on the idea of multiplying the average revenue by the number of years a customer is expected to remain with the company. This method is easy to understand and calculate, but has some limitations as it does not take into account customer acquisition costs, customer retention costs, or changes in revenue over time.

Granular method: this method is more precise than the basic method and is based on calculating the CLV at the individual level, taking into account variables such as the customer’s repeat rate, the average order value and the customer’s customer lifetime.

Traditional method: This method multiplies the average revenue generated by the customer and multiplies it by the retention ratio and leakage. It is the most popular method and takes into account customer loyalty.

Taking the average customer lifetime value (3 months) the results were as follows:

  • Basic CLV = 1960
  • Granular CLV = 1960
  • Traditional CLV = 239

The analysis showed that the value of the traditional CLV is significantly lower than that of the basic and granular CLV. This could be due to the presence of outliers that distort the average customer lifetime. In addition, the traditional method takes into account a greater number of factors, which makes it more accurate, but also more susceptible to variations in customer lifetime.


In this post, we have explored cohort analysis and how it is used to obtain detailed information on customer lifetime value. We have seen how this tool can be of great use when making strategic decisions about how to allocate resources and direct marketing efforts.

In particular, cohort analysis allows us to see how CLV varies between different customer groups, which helps us to better understand the needs and preferences of each group.

Importantly, cohort analysis is not limited to a particular industry or type of company, and can be adapted to use different metrics depending on the needs of the business. However, it is important to keep in mind that in order to obtain accurate conclusions it is essential to have a deep understanding of the business and to have a wide variety of information.

In summary, cohort analysis is a valuable tool to better understand customer behavior and help make strategic decisions regarding marketing, advertising and pricing, among others.