This is part-4 of the case study on Boost Debt Collections and Recoveries using Machine Learning (MLBR). A machine learning predictive model to enhance the current recovery system by creating focus groups for business to boost debt collection.
Disclaimer: This case study is solely an educational exercise and information contained in this case study is to be used only as a case study example for teaching purposes. This hypothetical case study is provided for illustrative purposes only and do not represent an actual client or an actual client’s experience. All of the data, contents and information presented here have been altered and edited to protect the confidentiality and privacy of the company.
In response to readers' demand, we've published a thorough and insightful real-world case study on Gumroad. This is an end-to-end machine learning case study that will assist you in increasing debt recovery by improving the traditional debt collection system of your crediting company.
This document will provide you everything you need to build a successful machine learning model, i.e. from data collection (variables) to data preparation, from feature engineering to attribute importance, from model development to model evaluation, from cross validation to production deployment and results.
As you may be aware, data is at the heart of every data science project. Over 85 percent of data science projects fail due to a lack of understanding of data components. The problem stems from a lack of understanding of the data and how it should be used in the model. If the variables are properly constructed then fitting a model to the data is only 15% of the effort, and any developer can accomplish it by following the guidelines and model described in the document.
Data is the backbone of machine learning model. All of the data elements (variables) that were gathered and used in the model are described in this document. The method for determining attribute significance and dimensionality reduction is also explained. describes the whole data pipeline design, including all transformations that were performed. Furthermore, it describes step-by-step design in an easy-to-understand manner that anyone who can program in R or Python may follow. It also discusses how historical data was labeled in order to create a classifier's training dataset.
The role of Subject Matter Experts (SMEs) is vital, hence this document also describes how, with the aid of SMEs, several types of scoring models were developed that turned out to be extremely useful in a production environment.
Model evaluation is such an important part of the project therefore, it must be carried out with care. The document also goes through the difficulties that theoretical cross-validation of the model in a production environment might cause, as well as why a different, more pragmatic cross-validation approach was utilized in this project. The document also explains which models were created and how they were created, as well as the data used to train and test the models.
Furthermore, the document also details the advantages gained when the model is used in production. This is not a tutorial for learning how to code. This document is aimed at data scientists who work in a production setting and want to create a successful machine learning model.