Explainability applied to fraud detection

| October 1, 2021

Project overview

The project explores the application of supervised deep learning techniques to fraud detection in credit card payments, a critical issue in the banking industry, with financial losses estimated at $25 billion annually. Traditional fraud detection methods are costly and limited, leading to a growing demand for more efficient machine learning (ML)-based solutions.

Fraud detection poses unique challenges in statistical learning. Fraudulent transactions represent only a small fraction of total transactions, leading to imbalanced datasets that can negatively impact model performance. Additionally, fraudsters continuously adapt their tactics, causing concept drift, where test data distributions differ from training data, further complicating detection.

Recent studies challenge the dominance of Gradient Boosted Trees (e.g., XGBoost) in tabular data classification, suggesting that deep learning (DL) models may offer superior performance [1] [2] [3]. While deep learning has demonstrated success in image and text processing, its effectiveness on tabular data remains an open research question.

The project aims to apply and evaluate various deep learning architectures for fraud detection, initially using synthetic datasets and later real-world data from LUSIS. Key objectives include assessing the impact of data imbalance on model performance, comparing deep learning models with traditional approaches like XGBoost and CatBoost, and implementing advanced deep learning-based fraud detection techniques.

The study involves implementing baseline models, including shallow models (XGBoost, CatBoost) and deep models (ResNet, MLP with Linear-ReLU-Dropout layers), followed by testing at least one advanced deep learning approach from recent literature. Hyperparameter optimization and performance comparisons will be conducted across different fraud rates.

A final scientific article summarizing the findings will be prepared for potential publication on arXiv, along with a GitLab repository to facilitate code reuse.

References:

[1] A. Kadra, M. Lindauer, F. Hutter, and J. Grabocka. Regularization is all you need: Simple neural nets can excel on tabular data, 2021.

[2] Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko. Revisiting deep learning models for tabular data, 2021.

[3] G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training, 2021.