2020-2021 Projects

· by [Fabrice Popineau] · Read in about 5 min · (981 words) ·

Here are the topics that are offered to CentraleSupélec 3rd year students for the period 2020-2021.

Subject 1 - Probabilistic machine learning

Machine learning techniques allow to build efficient models of classification or regression from a large volume of data. Unfortunately, many of these models “don’t know what they don’t know”: whatever the input, they will make a prediction, but no indication of certainty will be associated with that prediction. One could think in particular in the case of neural networks that the amplitude in an output softmax is an indicator of this certainty, but it turns out to be a very bad indicator. Even a high value in a softmax output can hide an “I don’t know”. Various approaches have been proposed to augment classical learning models such as random forests or artificial neural networks with a capacity to identify the probability that the model result is correct. In addition, this can be considered in the face of to noisy or not noisy datasets. The goal of the project is to identify these techniques and to evaluate their application to the two themes processed in the chair: fraud detection and automatic trading.

Subject 2 - Predictive models based on autoencoders

Many of the discretionary and automatic strategies highlighted in the literature on the trading are based on the notion of patterns. They bring into play both the raw price sequences and combinations of indicators. Most of the time these coded patterns created by the technical analysts rely mainly on perception bias. Those based on technical indicators do not withstand backtesting, and those based on purely technical forms are too subjective to be effectively quantified.

Machine learning-based approaches allow to build predictive models that are free of any perception bias. The hypothesis of this project is that there are many patterns in markets where only a fraction of the market has predictive capability, but where these patterns are not are probably made up of microstructures too complex to be identified by a human or by simple algorithms.

Some deep learning approaches, such as auto-encoders, allow to extract latent representations of complex structures. Only features that best characterize the variance of the data remain. These latent representations can then be used as input to other models such as classifiers.

This project consists in combining deep learning approaches based on variational auto encoders and clustering in order to extract groups of similar patterns from time series but also to identify those among them having the ability to predict the direction of the market with a sufficiently high accuracy.

Subject 3 - FraudMemory

Article [1] presents a very elaborate fraud detection architecture that addresses several aspects of the problem, in particular the conceptual drift, i.e. the change in customers' consumption habits. This architecture for machine learning is complex: it combines data encoding in the form of graphs, it uses memory networks [2] which are a special class of learning models, and attention mechanisms. The results presented in [1] are very good and we would like to confirm them by reimplementing this model to test it on real data.

[1] Yang, K., & Xu, W. (2019, janvier 8). FraudMemory : Explainable Memory-Enhanced Sequential Neural Networks for Financial Fraud Detection. https://doi.org/10.24251/HICSS.2019.126

[2] Weston, J., Chopra, S., & Bordes, A. (2014). Memory Networks. ArXiv:1410.3916 [Cs, Stat]. http://arxiv.org/abs/1410.3916

Subject 4 - From machine learning model to rules

Payment fraud detection systems were first implemented by manually writing rules. Today, we have powerful automatic learning mechanisms. We can create more sophisticated detection systems. For example, models based on random forests perform very well. These models built by machine learning are also very opaque. Bank operators would like to continue to use rule-based systems for their human readability, but benefit from the contributions of machine learning systems.

The goal of the project will therefore be to evaluate for different learning methods automatic, the ability to retranslate one of their models in the form of rules. This places the subject in the field of neuro-symbolic research with regard to approaches based on neural networks. We will of course also be interested in the possible loss of accuracy of the model, since the size of the resulting set of rules will have to remain within reasonable limits.

Subject 5 - Quantum computing for banking domain

Quantum algorithms make use of the quantum properties of matter. This allows in some cases to obtain a gain in complexity for the solution of conventional problems compared to classical methods. This gain in theoretical complexity does not necessarily presage an effective concrete gain for a particular instance.

This project focuses on the problems arising from the banking field that Lusis encounters: fraud detection, automatic trading, which are the themes of the chair, but also other problems such as recommendation algorithms. The aim of the project is to understand to what extent the above-mentioned problems lend themselves to the use of quantum techniques, from both a theoretical and practical point of view.

Subject 6 - Temporal point processes and machine learning

Recent work seeks to exploit the mathematical models of temporal point processes in automatic trading. These mathematical models have been studied for many years in a probabilistic framework, but they remain very little studied in association with deep learning [1, 2]. Hawkes' processes form a special class of temporal point processes that incorporate past events in the current measure of process intensity. The interest of Hawkes' processes is to provide a very general underlying model that allows them to be applied to a wide variety of phenomena that go far beyond the evolution of stock prices (natural phenomena, epidemiological phenomena, etc.).

[1] Yan, J., Xu, H., & Li, L. (2019). Modeling and Applications for Temporal Point Processes. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 3227-3228. https://doi.org/10.1145/3292500.3332298

[2] Xu, H. (s. d.). Modeling and Applications for Temporal Point Processes—Part I.