Master theses

Master Theses Topics – 2022/23

MLG proposes the following MA thesis topics for this academic year.

NB: Number of topics is limited. If interested please contact the supervisor asap.

eXplanaible AI to support marketing decisions (in collaboration with Worldline) (Gianluca Bontempi, Olivier Caelen)

This MA thesis will focus on study and design of eXplainable AI solutions for supporting marketing decision making in the context of a collaboration between MLG and Worldline Brussels (Dr. Olivier Caelen).

The student should be particularly expert in Python programming, be registered at the MA module on computational intelligence and have a passion for interdisciplinary applied research.

References:

PS. For interested students there is also the opportunity of an internship in the same company.

AutoML solutions for multi-variate time series forecasting (G. Bontempi and GM Paldino)

This MA thesis will focus on the comparison, assessment and design of AutoML solutions for multi-variate time series forecasting. The first part of the thesis (MA1) will be devoted to studying and comparing existing solutions from an experimental perspective. The MA2 part will be devoted to the design of an original solution combining prediction, model/feature selection and feature engineering.

The student should be particularly expert in Python programming, be registered at the MA module on computational intelligence and have a passion for interdisciplinary research.

References:

Machine learning for analysis of EEG signals in neuroscience (Gianluca Bontempi, Cedric Simar))

This MA thesis will take place in the context of a collaboration between MLG and the Laboratory of Neurophysiology and Movement Biomechanics (LNMB).  An electroencephalogram (EEG) uses multiple electrodes to measure the electrical activity of post-synaptic potentials of cortical neurons located at specific parts of the brain. LNMB is composed of several researchers who developed a solid expertise in EEG signal acquisition and analysis. Over the years they acquired a large amount of EEG data from different domains (NASA astronauts in the ISS, hockey players from the national Belgian hockey team, tennis players from the Justine Henin Academy, children and adults with hyperactivity disorder…) and for various applications (brain-computer interface, increase human performance, diagnostic tool…).

The objective of the MA thesis is to work with cutting-edge technology and use state-of-the-art signal processing and Machine Learning techniques on EEG data.

The work will focus on i) exploring different EEG datasets ii) extracting relevant features from the brain state that may not be directly visible with standard EEG analysis iii) deploying different classification models to reach or improve state-of-the-art results.

The student should be expert in Python programming,  registered at the MA module on computational intelligence, have a passion for interdisciplinary research and be available to visit frequently the Erasme lab.

 

References:

– MNE : Gramfort, M. Luessi, E. Larson, D. Engemann, D. Strohmeier, C. Brodbeck, L. Parkkonen, M. Hämäläinen, MNE software for processing MEG and EEG data, NeuroImage, Volume 86, 1 February 2014, Pages 446-460, ISSN 1053-8119
https://martinos.org/mne/stable/index.html
– EEG Lab : A Delorme & S Makeig (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics (pdf, 0.7 MB) Journal of Neuroscience Methods 134:9-21
https://sccn.ucsd.edu/eeglab/index.php

 

PS. For interested students, LNMB also offers the possibility of an internship in the contexts of the MA2 TRAN-F-501 course.

Fast variable selection without shrinkage (Maarten Jansen)

The selection of an optimal model from a broad spectrum of non-nested models can be driven by a criterium that balances a good prediction of the training set and complexity of the model, that is, the number of selected variables.  Optimization over a number of variables, or even comparison of models with a given number of variables is a problem of combinatorial complexity, and thus not feasible in the context of high-dimensional data. Part of the problem can be well approximated by changing the number of selected variables in the criterium by the sum of absolute values of the estimators of these variables within the selected model. The counting measure is replaced by a sum of magnitudes, thus changing a combinatorial problem into convex, quadratic programming problem. This problem can be solved by a wide range of algorithms, including direct methods, such as least angle regression, or iterative methods, such as iterative thresholding or gradient projection.  Moreover, for a fixed value of model complexity, the relaxed problem selects approximately the same model as the original combinatorial one. This is no longer the case when the model complexity is part of the optimization problem, but a correction for the divergence between the combinatorial and quadratic problem can be established.  The thesis is about the application of the variable selection in sparse inverse problems, or in deblurring and denoising images, using gradient projection or iterative thresholding.

Achieving ``First Time Right`` in industrial processes by means of machine learning and causal inference. (Gianluca Bontempi, Gianmarco Paldino)

 

“First Time Right” refers to the ambitious goal of obtaining a valid product at the end of an industrial process without an expensive initial tuning phase.

This is of particular interest for production processes that can be negatively affected by small variations in the environment measures (such as humidity) or in the machine parameters (such as pressure).

The MA thesis will focus on the estimation of the causal effect of input manipulations in a real-life industrial process on the basis of historical data. The goal is to detect the causes leading to anomalies and problems during production.

The topic is related to a joint research project between MLG and Sirris in the context of the TRAIL project. It combines applied and theoretical aspects and requires strong skills in statistics, machine learning, and graphical models (e.g. Bayesian Networks). For this reason, only students with good programming skills and interested in statistical aspects of Machine Learning will be considered.

Useful links:

Papers in https://www.researchgate.net/project/Causal-discovery-from-observation-data

PS. For interested students, SIRRIS also proposes a related internship in the context of the MA2 TRAN-F-501 course.

Methods for scRNA-seq data clustering (Matthieu Defrance)

Single-cell RNA sequencing (scRNA-seq) has emerged has a main strategy to study transcriptional activity at the cellular level. Clustering analysis is routinely performed on scRNA-seq data to explore, recognize or discover underlying cell identities. The high dimensionality of scRNA-seq data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. The objective of this project is to study state of the art technique used to perform scRNA-seq data clustering with an emphasis on techniques involving neural networks to perform an initial embedding of the data.

Reference: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04210-8

Methods for classification of rare diseases using omics data (Matthieu Defrance)

High-throughput sequencing and genome-wide analyses have profoundly impacted the genetic diagnostic of rare diseases. Beside the classical genetic variants calling that target alterations of the DNA sequence itself, a new field of methods based on epigenetic (at the DNA level) or transcriptomic (at the RNA level) alterations has emerged. The objective of the project is to develop and evaluate supervised classification methods applied to rare diseases classification.

Reference: Erfan Aref-Eshghi et al. Evaluation of DNA Methylation Episignatures for Diagnosis and Phenotype Correlations in 42 Mendelian Neurodevelopmental Disorders. The American Journal of Human Genetics, Volume 106, Issue 3, 2020.

Trade-offs in decision-making under uncertainty (Tom Lenaerts, Axel Abels)

Solving real world decision-making problems typically requires a careful trade-off between multiple, possibly conflicting, objectives. For example, essential concerns such as interpretability, fairness, and execution speed often conflict with the primary performance metric, such as classification accuracy. The objective of this project is to evaluate algorithms for decision-making under uncertainty (i.e., multi-armed bandits) in terms of these secondary objectives. If time permits, an extension into procedural fairness and interpretability in contextual bandits can be considered. As contextual bandits involve decisions made based on a set of features, it is crucial to ensure that these decisions are interpretable and made fairly with regards to a set of sensitive features (e.g., gender).
References:
Patil, Vishakha, et al. “Achieving fairness in the stochastic multi-armed bandit problem.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 04. 2020. https://ojs.aaai.org/index.php/AAAI/article/view/5986/5842
Turgay, Eralp, Doruk Oner, and Cem Tekin. “Multi-objective contextual bandit problem with similarity information.” International Conference on Artificial Intelligence and Statistics. PMLR, 2018. http://proceedings.mlr.press/v84/turgay18a/turgay18a.pdf
Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020. https://tor-lattimore.com/downloads/book/book.pdf

Transformer-based data augmentation for genetic data (Tom Lenaers, Robin Petit, Nassim Versbraegen)

Studying genetic diseases with machine learning or AI techniques often requires a lot of data.  Especially in the realm of rare diseases this is a huge problem.  A solution may be to produce models that can generate synthetic data on which predictive systems can be trained.  Variational auto encoders (VAE) may be able to help in this context.

The goal of this thesis proposal is to first study the formalism and state-of-the-art of VAE and the applications they have been used. Once this context is defined, the ambition is then to explore how VAE can be used to capture variant information in genetic data collected for a panel of genes or a Mendeliome, and how this can be used to improve predictions.

References:

  • Kingma, D. P., & Welling, M. (2019). An Introduction to Variational Autoencoders. Foundations and Trends® in Machine Learning12(4), 307-392.
  • Doersch, C. (2016). Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908.
  • Hawkins-Hooker, A., Depardieu, F., Baur, S., Couairon, G., Chen, A., & Bikard, D. (2021). Generating functional protein variants with variational autoencoders. PLoS computational biology17(2), e1008736.
  • Battey, C. J., Coffing, G. C., & Kern, A. D. (2021). Visualizing population structure with variational autoencoders. G311(1), 1-11.

Deep learning on knowledge graphs (Tom Lenaerts, Robin Petit, Alexandre Renaux)

Deep learning on knowledge graphs (Tom Lenaerts, Robin Petit, Alexandre Renaux)

 

This master thesis proposal aims to examine how deep learning can be used to learn things about the information in knowledge graphs.  As a first part you will get acquainted with both topics.  Once this is established you will define a learning problem (e.g link prediction, node classification) within the context of an knowledge graph that captures the state-of-the-art on digenic diseases and provide an algorithmic solution for it.

References:

Bacciu, D., Errica, F., Micheli, A., & Podda, M. (2020). A gentle introduction to deep learning for graphs. Neural Networks129, 203-221.

HOGAN, Aidan, et al. Knowledge graphs. Synthesis Lectures on Data, Semantics, and Knowledge, 2021, 12.2: 1-257.