| October 1, 2023
Project overview
This study is conducted within the Lusis - CentraleSupélec - CNRS Chair (https://chaire-lusis.centralesupelec.fr/), in collaboration with Lusis AI (https://www.lusisai.com/), which is also the project’s final client. Fraud detection in credit card payments is a major challenge, with global financial losses estimated at $28.58 billion in 2021, equivalent to the GDP of Iceland. As a result, there is a strong demand from the banking and retail sectors for more efficient automated fraud detection methods.
Fraud detection relies on two primary approaches:
- Rule-based systems, where expert-defined conditions (e.g., IF (Amount > x) AND (Currency == $) → Fraud) are used. However, these rules are costly to acquire and maintain.
- Machine Learning (ML) and Deep Learning (DL) models, which learn fraud patterns from labeled transaction datasets.
While ML methods can outperform rule-based approaches, implementing them effectively is challenging due to two key characteristics of fraud detection:
- Highly imbalanced data – Non-fraudulent transactions vastly outnumber fraudulent ones, making detection difficult.
- Concept drift – Fraud patterns change over time, requiring adaptive models to remain effective.
Each payment transaction links a cardholder to a merchant, creating a relational structure that is crucial for fraud detection. The project aims to develop a hybrid model combining graph-based learning and time-series modeling:
- Graph representation of transactions, where embeddings will be computed to capture relational fraud patterns.
- Time-series analysis, using LSTMs, GRUs, or Transformers to model the longitudinal evolution of transactions.
The final goal is to predict whether an upcoming transaction is fraudulent, while minimizing false positives, as mistakenly rejecting legitimate payments is highly detrimental to banks, potentially leading to customer loss.
Deliverables:
- Dataset analysis (600 million transactions), including filtering based on geographic, commercial, and card-related criteria.
- Development of a hybrid fraud detection model combining graph learning and time-series models.
- Testing and benchmarking against state-of-the-art models, particularly Gradient Boosting Trees.
- Code implementation and experimental report summarizing findings and performance evaluations.