Defended PhD Theses

Verhelst, Theo

Causal and predictive modeling of customer churn – Lessons learned from empirical and theoretical research PhD Thesis

2024.

@phdthesis{nokey,

title = {Causal and predictive modeling of customer churn - Lessons learned from empirical and theoretical research},

author = {Theo Verhelst},

url = {https://difusion.ulb.ac.be/vufind/Record/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/368384/Holdings},

year  = {2024},

date = {2024-01-29},

urldate = {2024-01-29},

abstract = {Customer churn is an important concern for large companies, especially in the 

telecommunications sector. Customer retention campaigns are often used to mitigate 

churn, but targeting the right customers based on their historical profiles 

presents an important challenge. Companies usually have recourse to two datadriven 

approaches: churn prediction and uplift modeling. In churn prediction, 

customers are selected on the basis of their propensity to churn in the near future. 

In uplift modeling, only customers who react positively to the campaign 

are considered. Uplift modeling is used in various other domains, such as marketing, 

healthcare, and finance. Despite the theoretical appeal of uplift modeling, its 

added value with respect to conventional machine learning approaches has rarely 

been quantified in the literature. 

This doctoral thesis is the result of a collaborative research project between 

the Machine Learning Group (ULB) and Orange Belgium, funded by Innoviris. 

This collaboration offers a unique research opportunity to assess the added value 

of causal-oriented strategies to address customer churn in the telecommunication 

sector. Following the introduction, we give the necessary background in probability 

theory, causality theory, and machine learning, and we describe the state of 

the art in uplift modeling and counterfactual identification. Then, we present the 

contributions of this thesis: 

• An empirical comparison of various predictive and causal models for selecting 

customers in churn prevention campaigns. We perform several benchmarks 

of different state-of-the-art approaches on real-world datasets and in 

live campaigns with our industrial partner, we propose a new approach that 

exploits domain knowledge to improve predictions, and we make available 

the first public churn dataset for uplift modeling, whose unique characteristics 

make it more challenging than the few other public uplift datasets. 

• Counterfactual identification allows one to classify the different behaviors 

of customers in response to a marketing incentive. This can be used to establish 

profiles of customers sensitive to the campaign, and subsequently 

improve marketing operations. We derive novel bounds and point estimators 

on the probability of counterfactual statements based on uplift models. 

• A comprehensive comparison of predictive and uplift modeling, starting 

from firm theoretical foundations and highlighting the parameters that influence 

the performance of both approaches. In particular, we provide a new 

formulation of the measure of profit, a formal proof of the convergence of 

the uplift curve to the measure of profit, and an illustration, through simulations, 

of the conditions under which predictive approaches still outperform 

uplift modeling. 

Our theoretical and empirical assessments of uplift modeling suggest that it often 

fails to deliver the anticipated advantages over predictive modeling, especially in 

scenarios such as customer churn within the telecom sector, characterized by class 

imbalance, limited separability, and cost-benefit considerations. These results are 

broadly aligned with the practical experience of our industrial partner and with 

the existing scientific literature. Our counterfactual probability estimators allow 

us to characterize customers at a level inaccessible to conventional predictive modeling, 

revealing new insights on the behavior and preferences of customers.},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

Customer churn is an important concern for large companies, especially in the
telecommunications sector. Customer retention campaigns are often used to mitigate
churn, but targeting the right customers based on their historical profiles
presents an important challenge. Companies usually have recourse to two datadriven
approaches: churn prediction and uplift modeling. In churn prediction,
customers are selected on the basis of their propensity to churn in the near future.
In uplift modeling, only customers who react positively to the campaign
are considered. Uplift modeling is used in various other domains, such as marketing,
healthcare, and finance. Despite the theoretical appeal of uplift modeling, its
added value with respect to conventional machine learning approaches has rarely
been quantified in the literature.
This doctoral thesis is the result of a collaborative research project between
the Machine Learning Group (ULB) and Orange Belgium, funded by Innoviris.
This collaboration offers a unique research opportunity to assess the added value
of causal-oriented strategies to address customer churn in the telecommunication
sector. Following the introduction, we give the necessary background in probability
theory, causality theory, and machine learning, and we describe the state of
the art in uplift modeling and counterfactual identification. Then, we present the
contributions of this thesis:
• An empirical comparison of various predictive and causal models for selecting
customers in churn prevention campaigns. We perform several benchmarks
of different state-of-the-art approaches on real-world datasets and in
live campaigns with our industrial partner, we propose a new approach that
exploits domain knowledge to improve predictions, and we make available
the first public churn dataset for uplift modeling, whose unique characteristics
make it more challenging than the few other public uplift datasets.
• Counterfactual identification allows one to classify the different behaviors
of customers in response to a marketing incentive. This can be used to establish
profiles of customers sensitive to the campaign, and subsequently
improve marketing operations. We derive novel bounds and point estimators
on the probability of counterfactual statements based on uplift models.
• A comprehensive comparison of predictive and uplift modeling, starting
from firm theoretical foundations and highlighting the parameters that influence
the performance of both approaches. In particular, we provide a new
formulation of the measure of profit, a formal proof of the convergence of
the uplift curve to the measure of profit, and an illustration, through simulations,
of the conditions under which predictive approaches still outperform
uplift modeling.
Our theoretical and empirical assessments of uplift modeling suggest that it often
fails to deliver the anticipated advantages over predictive modeling, especially in
scenarios such as customer churn within the telecom sector, characterized by class
imbalance, limited separability, and cost-benefit considerations. These results are
broadly aligned with the practical experience of our industrial partner and with
the existing scientific literature. Our counterfactual probability estimators allow
us to characterize customers at a level inaccessible to conventional predictive modeling,
revealing new insights on the behavior and preferences of customers.

Close

Buroni, Giovanni

On-Board-Unit big data analytics: from data architecture to traffic forecasting PhD Thesis

2021, (Funder: Universite Libre de Bruxelles).

2024

2021

2020

2019

2018

2017

2016

2015

2014

2013

2011

2009

2000