Academic journal article Informatica Economica

Predicting Customers Churn in a Relational Database

Academic journal article Informatica Economica

Predicting Customers Churn in a Relational Database

Article excerpt

(ProQuest: ... denotes formulae omitted.)

1 Introduction

Nowadays, predictive analytics is one of the common buzz words. Its methods and concepts are scientific, based on computer programming, mathematics and statistics, yet the interest for the topic went well beyond academics and research, to business/ corporate area and, more recently, to the general public. In 2012, Nate Silver, editor in chief of FiveThirtyEight blog and author of the book The Signal and The Noise, became famous while correctly predicting the winner of the presidential elections in The United States, in all 50 states and the District of Colombia. In the same year, Harvard Business Review published a frequently mentioned article which described the data scientist as "the sexiest job of the 21st century" [1]. In 2014, Goldman Sachs, one of the most prestigious financial institutions in the world published a consistent paper, showing their predictions for the Football World Cup that took place in Brazil [2]. The same Nate Silver and his team at FiveThirtyEight also took their chance and offered alternative predictions for the main football event of the year. In the same year, Warren Buffet, the well-known financial investor and one of the richest people on Earth, pushed the prediction challenge further and announced his willingness to offer one billion US dollars to anyone who correctly predicts the outcome of all 63 games in this year's NCAA men's college basketball tournament [3].

In most of the cases, the predictive analysis are oriented to solving more "serious" problems like identifying customer with a propensity to churn, detecting fraud or online spam, assessing the risk of an investment, sales forecast, predicting a medical diagnostic etc.

Lots of data science libraries are continuously created around open source programming languages like R and Python. They are free and quickly integrate new algorithms. At the same time, the main commercial database systems (SQL Server, Oracle etc.) introduced their own data mining modules, offering easiness of use for analysts and the possibility to build and integrate data models with the relational database systems. Starting from such data, available in a relational database, the objective of the current paper is to show and explain a few models of predicting customers having a high propensity to churn after expiring the availability of their products. Identifying these cases is highly important for every business built on the subscription model (telecom, antivirus, cable-television etc.), as the reduction of the churn rate among customers can positively impact the retention rate and the profitability of any business. According to The Charted Institute of Marketing [4], in some cases, preventing a customer from churning, involves costs from 5 to 20 times lower than acquiring a new customer.

2 The Data in the Relational Database

The starting point of this project is a relational database, made by 12 tables, designed after a snowflake model and built using Microsoft SQL Server 2012 technology. We deal with the software industry, where a company sells its products online, using the above mentioned subscription model. The data refer to online transactions, products, customers' details, offers, licenses, customer care activities involving employees and customers, online incidents, regions etc. For all the customers, based on the existing data, we are able to find out if they are still customers, or not. However, there are still a few thousand customers (both new and renewals), whose behavior we would like to predict, so that the company can apply a commercial "treatment" for the ones with a high probability to churn.

In order to build a complete profile of all the customers (with as many relevant variables as possible), we created two SQL views so that we can bring together:

* the data describing the "customers", basically including attributes like gender, marital status, income etc. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.