Master theses

Master Theses Topics – 2023/24

MLG proposes the following MA thesis topics for this academic year.

NB: Number of topics is limited. If interested please contact the supervisor asap.

Communication methods in Multi Agent Reinforcement Learning (T. Lenaerts, Y. Molinghen)

In human interactions, it is extremely common to communicate information with each other to achieve a task. The nature of this information can be diverse, from advertising intents to communicating knowledge or information about how our environment works.
Nowadays, when accomplishing complex cooperative tasks, it is rare in the field of Multi-Agent Reinforcement Learning to include any kind of communication between agents.
The objective of this proposal is to investigate the existing communication methods (see references) and to extend value-based algorithms (e.g. Deep Q-learning, VDN, QMIX) with communication.
Your scientific contribution will be the comparison and discussion of the relevance of using certain combinations of algorithms and communication method. Your contribution can be further extended by proposing new communication methods and comparing them to existing ones.

– Simon Vanneste, Astrid Vanneste, Stig Bosmans, Siegfried Mercelis and Peter Hellinckx, Learning to Communicate with Multi-agent Reinforcement Learning Using Value-Decomposition Network, *International Conference on P2P, Parallel, Grid, Cloud and Internet Computing*,
– Sunehag, Peter, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, et al. Value-Decomposition Networks for Cooperative Multi-Agent Learning Based on Team Reward. _Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS_ 3 (2018): 2085 87,
– Tampuu, Ardi, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent Cooperation and Competition with Deep Reinforcement Learning . _PLoS ONE_ 12, no. 4 (2017): 115.
– Kajić, Ivana, Eser Aygün, and Doina Precup. Learning to Cooperate: Emergent Communication in Multi-Agent Navigation 1, no. 1969 (2020).
– Peng, Zhaoqing, Libo Zhang, and Tiejian Luo. Learning to Communicate via Supervised Attentional Message Processing. _ACM International Conference Proceeding Series_, no. Nips (2018): 11 16.



Yannick MolinghenTom Lenaerts

Comparing value-based methods to policy gradient algorithms in Multi Agent Reinforcement Learning

There exists two main families of reinforcement learning algorithms: value-based methods such as Q-learning that learns a value function for each action, and policy gradient methods such as Actor-Critic that directly optimises the action probability distribution.
While both methods are widely studied in single agent setups, policy gradient methods are less studied in multi agent ones.
During this master thesis, you will first have to become familiar with single agent policy gradient algorithms and be able to apply them on a specific environment.
From that point, your scientific contribution is manifold. First of all you will compare the performance of agents with existing value-based algorithms and discuss these results in light of the literature. Then you should have a discussion about the relevance of this comparison because of the different natures of these algorithms (value based vs policy gradient) and of the training process. Moreover, you will contribute to the development of an existing framework by adding new algorithms to it. Finally, you can propose extensions of existing algorithms and compare your results with existing ones.

– Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal Policy Optimization Algorithms . _ArXiv_, 2017, 1 12.
– Mnih, Volodymyr, Adri Puigdomnech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning . *ArXiv,* 16 June 2016.
– Lowe, Ryan, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. _Advances in Neural Information Processing Systems_ 2017-Decem (2017): 6380 9
– Sunehag, Peter, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, et al. Value-Decomposition Networks for Cooperative Multi-Agent Learning Based on Team Reward. _Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS_ 3 (2018): 2085 87,



Yannick MolinghenTom Lenaerts

Assessment of neuromorphic computing for predictive tasks (Gianluca Bontempi)


Though neuromorphic computing is expected to significantly reduce the energy requirements of learning in AI, it does not yet meet the expectations in terms of accuracy. This MA thesis will focus on assessing neuromorphic computing for predictive tasks.  In particular, the student should assess the consumption vs. the accuracy trade-off compared to conventional machine learning techniques. The student should be particularly expert in Python programming, be registered for the MA module on computational intelligence, and have a passion for interdisciplinary applied research.


Fast variable selection without shrinkage (Maarten Jansen)

The selection of an optimal model from a broad spectrum of non-nested models can be driven by a criterium that balances a good prediction of the training set and complexity of the model, that is, the number of selected variables.  Optimization over a number of variables, or even comparison of models with a given number of variables is a problem of combinatorial complexity, and thus not feasible in the context of high-dimensional data. Part of the problem can be well approximated by changing the number of selected variables in the criterium by the sum of absolute values of the estimators of these variables within the selected model. The counting measure is replaced by a sum of magnitudes, thus changing a combinatorial problem into convex, quadratic programming problem. This problem can be solved by a wide range of algorithms, including direct methods, such as least angle regression, or iterative methods, such as iterative thresholding or gradient projection.  Moreover, for a fixed value of model complexity, the relaxed problem selects approximately the same model as the original combinatorial one. This is no longer the case when the model complexity is part of the optimization problem, but a correction for the divergence between the combinatorial and quadratic problem can be established.  The thesis is about the application of the variable selection in sparse inverse problems, or in deblurring and denoising images, using gradient projection or iterative thresholding.

Causal inference in temporal data (Gianluca Bontempi and Gianmarco Paldino)

The thesis will focus on the design and implementation of computational data-driven methods to infer causal directionality from multivariate time series.

The student should be an expert in R and Python programming, be registered in the MA module on computational intelligence, be proficient in Machine Learning and have a passion for interdisciplinary applied research.


Simulation and forecasting of traffic data (G. Bontempi )

This MA thesis will focus on

  1. the simulation of traffic data by means of the SUMO simulator.
  2. the use of the simulated data to design spatio-temporal techniques for interpolation, data imputation and forecasting

The student should be particularly expert in Python programming, be registered at the MA module on computational intelligence and have a passion for interdisciplinary research. Collaboration with a company (and internship)   is possible.


Methods for scRNA-seq data clustering (Matthieu Defrance)

Single-cell RNA sequencing (scRNA-seq) has emerged has a main strategy to study transcriptional activity at the cellular level. Clustering analysis is routinely performed on scRNA-seq data to explore, recognize or discover underlying cell identities. The high dimensionality of scRNA-seq data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. The objective of this project is to study state of the art technique used to perform scRNA-seq data clustering with an emphasis on techniques involving neural networks to perform an initial embedding of the data.


Methods for classification of rare diseases using omics data (Matthieu Defrance)

High-throughput sequencing and genome-wide analyses have profoundly impacted the genetic diagnostic of rare diseases. Beside the classical genetic variants calling that target alterations of the DNA sequence itself, a new field of methods based on epigenetic (at the DNA level) or transcriptomic (at the RNA level) alterations has emerged. The objective of the project is to develop and evaluate supervised classification methods applied to rare diseases classification.

Reference: Erfan Aref-Eshghi et al. Evaluation of DNA Methylation Episignatures for Diagnosis and Phenotype Correlations in 42 Mendelian Neurodevelopmental Disorders. The American Journal of Human Genetics, Volume 106, Issue 3, 2020.

Trade-offs in decision-making under uncertainty (Tom Lenaerts, Axel Abels)

Solving real world decision-making problems typically requires a careful trade-off between multiple, possibly conflicting, objectives. For example, essential concerns such as interpretability, fairness, and execution speed often conflict with the primary performance metric, such as classification accuracy. The objective of this project is to evaluate algorithms for decision-making under uncertainty (i.e., multi-armed bandits) in terms of these secondary objectives. If time permits, an extension into procedural fairness and interpretability in contextual bandits can be considered. As contextual bandits involve decisions made based on a set of features, it is crucial to ensure that these decisions are interpretable and made fairly with regards to a set of sensitive features (e.g., gender).
Patil, Vishakha, et al. “Achieving fairness in the stochastic multi-armed bandit problem.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 04. 2020.
Turgay, Eralp, Doruk Oner, and Cem Tekin. “Multi-objective contextual bandit problem with similarity information.” International Conference on Artificial Intelligence and Statistics. PMLR, 2018.
Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.

Transformer-based data augmentation for genetic data (Tom Lenaers, Robin Petit, Nassim Versbraegen)

Studying genetic diseases with machine learning or AI techniques often requires a lot of data.  Especially in the realm of rare diseases this is a huge problem.  A solution may be to produce models that can generate synthetic data on which predictive systems can be trained.  Variational auto encoders (VAE) may be able to help in this context.

The goal of this thesis proposal is to first study the formalism and state-of-the-art of VAE and the applications they have been used. Once this context is defined, the ambition is then to explore how VAE can be used to capture variant information in genetic data collected for a panel of genes or a Mendeliome, and how this can be used to improve predictions.


  • Kingma, D. P., & Welling, M. (2019). An Introduction to Variational Autoencoders. Foundations and Trends® in Machine Learning12(4), 307-392.
  • Doersch, C. (2016). Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908.
  • Hawkins-Hooker, A., Depardieu, F., Baur, S., Couairon, G., Chen, A., & Bikard, D. (2021). Generating functional protein variants with variational autoencoders. PLoS computational biology17(2), e1008736.
  • Battey, C. J., Coffing, G. C., & Kern, A. D. (2021). Visualizing population structure with variational autoencoders. G311(1), 1-11.

Deep learning on knowledge graphs (Tom Lenaerts, Robin Petit, Alexandre Renaux)

Deep learning on knowledge graphs (Tom Lenaerts, Robin Petit, Alexandre Renaux)


This master thesis proposal aims to examine how deep learning can be used to learn things about the information in knowledge graphs.  As a first part you will get acquainted with both topics.  Once this is established you will define a learning problem (e.g link prediction, node classification) within the context of an knowledge graph that captures the state-of-the-art on digenic diseases and provide an algorithmic solution for it.


Bacciu, D., Errica, F., Micheli, A., & Podda, M. (2020). A gentle introduction to deep learning for graphs. Neural Networks129, 203-221.

HOGAN, Aidan, et al. Knowledge graphs. Synthesis Lectures on Data, Semantics, and Knowledge, 2021, 12.2: 1-257.