Master theses

Master Theses Topics – 2023/24

MLG proposes the following MA thesis topics for this academic year.

NB: Number of topics is limited. If interested please contact the supervisor asap.

Exploration methods for model-based multi-agent reinforcement learning (Yannick Molinghen, Tom Lenaerts)

Reinforcement Learning often comes in two different flavours: model-based and model-free. Because the assumption of owning a perfect representation of the model is too strong in many cases, some reinforcement learning algorithms learn a model of the environment [1] and then use it to make predictions about their future without requiring to actually take steps in this environment, which might be costly.
Simultaneously, some single-agent exploration methods based on intrinsic curiosity [2] also build an internal model of the world and check how accurate it is [3] to compute the intrinsic reward added to the reward signal from the environment.
The suggested objective of this master thesis proposal is to investigate how model-based multi-agent reinforcement learning can leverage the internal model of the environment to improve exploration, and compare that to other model-free MARL algorithms [4]. The Laser Learning Environment (LLE) is the suggested environment for this topic https://github.com/yamoling/lle.

 

References:

1. “Mastering Atari with Discrete World Models”, Danijar Hafner and Timothy Lillicrap and Mohammad Norouzi and Jimmy Ba, 2022, https://arxiv.org/pdf/2010.02193.pdf
2. “A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers”, Jurgen Schmidhuber. In From Animals to Animats, edited by Jean-Arcady Meyer, International Conference on Simulation Adaptive Behavior: From Animals to Animats., 222 27. The MIT Press, 1991. https://doi.org/10.7551/mitpress/3115.003.0030
3. “Curiosity-driven Exploration by Self-supervised Prediction”,  Deepak Pathak et al., 2017. https://arxiv.org/pdf/1705.05363.pdf
4. “Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration”, Lulu Zheng and Jiarui Chen, et al.  https://arxiv.org/pdf/2111.11032.pdf

Comparison and Assessment of techniques for synthetic data generation in machine learning (Gianluca Bontempi)

 

The generation of synthetic data is more and more important in machine learning. It can be used for data augmentation,  datasets anonymisation, oversampling, and classifier training. This MA thesis will assess existing techniques for synthetic tabular data generation in big data analytics (notably fraud and churn detection).   Conventional and generative AI techniques will be studied and evaluated. The student should be particularly expert in Python programming, be registered for the MA module on computational intelligence, and have a passion for interdisciplinary applied research.

References:

Fast variable selection without shrinkage (Maarten Jansen)

The selection of an optimal model from a broad spectrum of non-nested models can be driven by a criterium that balances a good prediction of the training set and complexity of the model, that is, the number of selected variables.  Optimization over a number of variables, or even comparison of models with a given number of variables is a problem of combinatorial complexity, and thus not feasible in the context of high-dimensional data. Part of the problem can be well approximated by changing the number of selected variables in the criterium by the sum of absolute values of the estimators of these variables within the selected model. The counting measure is replaced by a sum of magnitudes, thus changing a combinatorial problem into convex, quadratic programming problem. This problem can be solved by a wide range of algorithms, including direct methods, such as least angle regression, or iterative methods, such as iterative thresholding or gradient projection.  Moreover, for a fixed value of model complexity, the relaxed problem selects approximately the same model as the original combinatorial one. This is no longer the case when the model complexity is part of the optimization problem, but a correction for the divergence between the combinatorial and quadratic problem can be established.  The thesis is about the application of the variable selection in sparse inverse problems, or in deblurring and denoising images, using gradient projection or iterative thresholding.

Causal inference in temporal data (Gianluca Bontempi and Gianmarco Paldino)

The thesis will focus on the design and implementation of computational data-driven methods to infer causal directionality from multivariate time series.

The student should be an expert in R and Python programming, be registered in the MA module on computational intelligence, be proficient in Machine Learning and have a passion for interdisciplinary applied research.

References:

Simulation, modelling and forecasting of mobility data in collaboration with Macq Electronics (D. Guastella, G. Bontempi )

Some MA thesis in collaboration with Macq Electronics are possible. More details here.

The student should be particularly expert in Python programming, be registered at the MA module on computational intelligence, and have a passion for interdisciplinary research. An internship on related topics is possible.

References:

Machine Learning for cool stars (Matthieu Defrance and Thibault Merle)

Stellar physics is entering an era of data-driven discoveries, mainly due to modern data mining and machine learning (ML) techniques enabling new powerful ways to extract information out of observational material. During the last decades, large ground-based sky surveys and all-sky space missions have harvested hundreds of terabytes of data corresponding to hundreds of millions of astronomical sources. In the Gaia era, we expect that the data volume will move towards the petabyte domain, requiring the handling of billions of sources. Astrophysics has emerged from a data-poor science to an intensive data-driven discipline. Current stellar survey datasets present novel algorithmic, computational, and statistical challenges. In this context, the study of stellar properties in single and binary stars can greatly benefit from the application of such ML techniques. We propose, in co-supervision with the [Machine Learning Group|Institut d’Astronomie et d’Astrophysique], to investigate such data mining and ML algorithms with an application to large and homogeneous sample of stellar spectra (e.g. from the Gaia-ESO survey, RAVE, GALAH, or Melchiors database). This project does not require specific background in theoretical Physics.

Contact: Matthieu Defrance (matthieu.defrance@ulb.be)

Methods for omics data clustering (Matthieu Defrance)

Clustering analysis is routinely performed on omics data (data procuced by DNA, RNA sequencing) to explore, recognize or discover underlying cell identities. The high dimensionality of omics data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. The objective of this project is to study state of the art technique used to perform omics data clustering with an emphasis on techniques involving neural networks to perform an initial embedding of the data.

Reference: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04210-8

Contact: Matthieu Defrance (matthieu.defrance@ulb.be)

Methods for classification of rare diseases using omics data (Matthieu Defrance)

High-throughput sequencing and genome-wide analyses have profoundly impacted the genetic diagnostic of rare diseases. Beside the classical genetic variants calling that target alterations of the DNA sequence itself, a new field of methods based on epigenetic (at the DNA level) or transcriptomic (at the RNA level) alterations has emerged. The objective of the project is to develop and evaluate supervised classification methods applied to rare diseases classification.

Reference: Erfan Aref-Eshghi et al. Evaluation of DNA Methylation Episignatures for Diagnosis and Phenotype Correlations in 42 Mendelian Neurodevelopmental Disorders. The American Journal of Human Genetics, Volume 106, Issue 3, 2020.

Contact: Matthieu Defrance (matthieu.defrance@ulb.be)

Trade-offs in decision-making under uncertainty (Tom Lenaerts, Axel Abels)

Solving real world decision-making problems typically requires a careful trade-off between multiple, possibly conflicting, objectives. For example, essential concerns such as interpretability, fairness, and execution speed often conflict with the primary performance metric, such as classification accuracy. The objective of this project is to evaluate algorithms for decision-making under uncertainty (i.e., multi-armed bandits) in terms of these secondary objectives. If time permits, an extension into procedural fairness and interpretability in contextual bandits can be considered. As contextual bandits involve decisions made based on a set of features, it is crucial to ensure that these decisions are interpretable and made fairly with regards to a set of sensitive features (e.g., gender).
References:
Patil, Vishakha, et al. “Achieving fairness in the stochastic multi-armed bandit problem.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 04. 2020. https://ojs.aaai.org/index.php/AAAI/article/view/5986/5842
Turgay, Eralp, Doruk Oner, and Cem Tekin. “Multi-objective contextual bandit problem with similarity information.” International Conference on Artificial Intelligence and Statistics. PMLR, 2018. http://proceedings.mlr.press/v84/turgay18a/turgay18a.pdf
Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020. https://tor-lattimore.com/downloads/book/book.pdf

Learning correlated equilibria (Tom Lenaerts)

You will examine how learning (and evolution) may find correlated equilibria, an extension of the notion of Nash equilibria in games.  The references below will b examined for the thesis preparation and a state-of-the-art will be formulated. For the thesis a series of the suggested approaches will be implemented and tested on learning problems to see to what extend they are useful.

 

Aumann, R.J. (1987). Correlated equilibrium as an expression of Bayesian rationality. Econometrica, 1-18. https://doi.org/10.2307/1911154.

Milgrom, P., and Roberts, J. (1991). Adaptive and sophisticated learning in normal form games. Games and Economic Behavior 3, 82-100. https://doi.org/10.1016/0899-8256(91)90006-Z.

Foster, D.P., and Vohra, R.V. (1997). Calibrated learning and correlated equilibrium. Games and Economic Behavior 21, 40-55. https://doi.org/10.1006/game.1997.0595.

Hart, S., and Mas‐Colell, A. (2000). A simple adaptive procedure leading to correlated equilibrium. Econometrica 68, 1127-1150. http://www.jstor.org/stable/2999445.

Cripps, M. (1991). Correlated equilibria and evolutionary stability. Journal of Economic Theory 55, 428-434. https://doi.org/10.1016/0022-0531(91)90048-9.

Metzger, L.P. (2018). Evolution and correlated equilibrium. Journal of Evolutionary Economics 28, 333-346. https://doi.org/10.1007/s00191-017-0539-z.

Arifovic, J., Boitnott, J.F., and Duffy, J. (2019). Learning correlated equilibria: An evolutionary approach. Journal of Economic Behavior & Organization 157, 171-190.

Knowledge graphs and drug repurposing (Tom Lenaerts, Inas Bosch and Nassim Versbraegen)

In this thesis we will explore the potential of associating drugs to diseases based on knowledge graphs (KG) and KG embeddings (KGE).  Several studies have been proposed to perform drug-disease association, and those based on biomedical KG have shown potential.  One drug repurposing  case was published by  Himmelstein et al. using a meta-path approach on the KG called. HetioNet, other exist.  Your preparatory work for the thesis will in the first place identify all the most relevant contributions that have been made in this context.  Based on this knowledge, we will then focus in the thesis on one or two approaches to see if the results in the scientific works can be confirmed.  Finally we will examine whether these methods are useful for rare diseases and whether they can be used also in the context where more than one mutant plays a role in the disease.  Some relevant publications are;

  1. Himmelstein, D. S., Lizee, A., Hessler, C., Brueggeman, L., Chen, S. L., Hadley, D., … & Baranzini, S. E. (2017). Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife, 6, e26726.
  2. Roessler, H. I., Knoers, N. V., van Haelst, M. M., & van Haaften, G. (2021). Drug repurposing for rare diseases. Trends in pharmacological sciences, 42(4), 255-267.
  3. Bang, D., Lim, S., Lee, S., & Kim, S. (2023). Biomedical knowledge graph learning for drug repurposing by extending guilt-by-association to multiple layers. Nature Communications, 14(1), 3570.
  4. Johnson, R., Li, M. M., Noori, A., Queen, O., & Zitnik, M. (2024). Graph Artificial Intelligence in Medicine. Annual Review of Biomedical Data Science, 7.
  5. Perdomo-Quinteiro, P., & Belmonte-Hernández, A. (2024). Knowledge Graphs for drug repurposing: a review of databases and methods. Briefings in Bioinformatics, 25(6), bbae461.
  6. Wang, Q., Mao, Z., Wang, B., & Guo, L. (2017). Knowledge graph embedding: A survey of approaches and applications. IEEE transactions on knowledge and data engineering, 29(12), 2724-2743.

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Sentiment analysis (Tom Lenaerts and Yannick Molinghen)

This topic requires the student to speak both French and English !In this master thesis proposal, the student is going to use Natural Language Processing (NLP) techniques to identify insulting or offensive sentences. The objective is to develop a software that can be plugged in to the “évaluation des enseignements” (evalens) system in order to suggest offensive or insulting comments that the “commission pédagogique” may want to hide.A particularity of this topic is that comments on evalens can be written in multiple languages (mainly French and English). That specificity must be accounted for.In the first year, the student is expected to explore the state of the art in the field of sentiment analysis.
In the second year, the student is expected to work on three different aspects.
1. Academic research: since this is a master thesis, some scientific contribution is expected.
2. Software development: the student has to develop a software that performs sentiment analysis that can be plugged in to the “évaluation des enseignements” system.
3. Technical: the student has to identify one or multiple technical solutions for this problem considering the system in place today. This also includes discussions with the team in charge of the “évaluation des enseignements” platform.Useful links:
– VUB NLP course: https://ai.vub.ac.be/course/natural-language-processing-2/
– NTLK: https://www.nltk.org/
– Évaluation des enseignements: https://evalens.ulb.ac.be

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.