This MA thesis will take place in the context of a collaboration between MLG and CENAERO and will concern the design and application of machine learning techniques for prediction in the context of buildings’ energy consumption monitoring.
The objective of the MA thesis is to work with cutting-edge technology and use state-of-the-art signal processing and Machine Learning techniques on energy temporal data.
The student should be expert in Python programming, registered at the MA module on computational intelligence, have a passion for interdisciplinary research and be available to visit periodically CENAERO.
PS. For students that are interested, CENAERO also offers the possibility of an internship compatible with the TRAN-F-501 course available in the MA Computer Science cursus.
This MA thesis will take place in the context of a collaboration between MLG and the Laboratory of Neurophysiology and Movement Biomechanics (LNMB). An electroencephalogram (EEG) uses multiple electrodes to measure the electrical activity of post-synaptic potentials of cortical neurons located at specific parts of the brain. LNMB is composed of several researchers who developed a solid expertise in EEG signal acquisition and analysis. Over the years they acquired a large amount of EEG data from different domains (NASA astronauts in the ISS, hockey players from the national Belgian hockey team, tennis players from the Justine Henin Academy, children and adults with hyperactivity disorder…) and for various applications (brain-computer interface, increase human performance, diagnostic tool…).
The objective of the MA thesis is to work with cutting-edge technology and use state-of-the-art signal processing and Machine Learning techniques on EEG data.
The work will focus on i) exploring different EEG datasets ii) extracting relevant features from the brain state that may not be directly visible with standard EEG analysis iii) deploying different classification models to reach or improve state-of-the-art results.
The student should be expert in Python programming, registered at the MA module on computational intelligence, have a passion for interdisciplinary research and be available to visit frequently the Erasme lab.
– MNE : Gramfort, M. Luessi, E. Larson, D. Engemann, D. Strohmeier, C. Brodbeck, L. Parkkonen, M. Hämäläinen, MNE software for processing MEG and EEG data, NeuroImage, Volume 86, 1 February 2014, Pages 446-460, ISSN 1053-8119
– EEG Lab : A Delorme & S Makeig (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics (pdf, 0.7 MB) Journal of Neuroscience Methods 134:9-21
PS. For students that are interested, the LNMB also offers the possibility of an internship compatible with the TRAN-F-501 course available in the MA Computer Science cursus.
The selection of an optimal model from a broad spectrum of non-nested models can be driven by a criterium that balances a good prediction of the training set and complexity of the model, that is, the number of selected variables. Optimization over a number of variables, or even comparison of models with a given number of variables is a problem of combinatorial complexity, and thus not feasible in the context of high-dimensional data. Part of the problem can be well approximated by changing the number of selected variables in the criterium by the sum of absolute values of the estimators of these variables within the selected model. The counting measure is replaced by a sum of magnitudes, thus changing a combinatorial problem into convex, quadratic programming problem. This problem can be solved by a wide range of algorithms, including direct methods, such as least angle regression, or iterative methods, such as iterative thresholding or gradient projection. Moreover, for a fixed value of model complexity, the relaxed problem selects approximately the same model as the original combinatorial one. This is no longer the case when the model complexity is part of the optimization problem, but a correction for the divergence between the combinatorial and quadratic problem can be established. The thesis is about the application of the variable selection in sparse inverse problems, or in deblurring and denoising images, using gradient projection or iterative thresholding.
Accurate prediction is the main goal of any supervised learning approach. However, machine learning models estimate the target value given an input observation and not the expected value given an input manipulation. In that sense, it is highly biased (and dangerous) to use associative models (like the ones returned by machine learning) to predict the causal effect of an input perturbation (or manipulation). Recent research focused on asymmetric properties of the data distribution to infer the directionality of causal relationships. This thesis will use the same properties to estimate the bias of an associative model in predicting the causal effect. The MA thesis will focus on:
⋅ A review and comparative assessment of existing causal feature selection algorithms, including the methods developed at MLG
⋅ The design of a strategy to estimate the causal effect of input manipulation and its assessment on synthetic data (i.e. data for which the underlying causal model is available)
The topic is theoretic and requires strong skills in statistics, machine learning, and graphical models (e.g. Bayesian Networks). For this reason, only students strongly interested in statistical aspects of Machine Learning will be considered.
Biomarkers (e.g. epigenetic, expression) can be used to monitor alterations that are occurring at the cellular level in a given organism. One challenging task is to identify a restricted set of markers (e.g. genes) that allow an accurate estimation of the monitored properties. The main objective of this project is to evaluate the influence of noise and missing measurements on the prediction accuracy. To that aim, next generation sequencing data (RNA-seq, RRBS) will be used to explore real case settings.
High-throughput sequencing and genome-wide analyses have profoundly impacted the genetic diagnostic of rare diseases. Beside the classical genetic variants calling that target alterations of the DNA sequence itself, a new field of methods based on epigenetic (at the DNA level) or transcriptomic (at the RNA level) alterations has emerged. The objective of the project is to develop and evaluate supervised classification methods applied to rare diseases classification.
Reference: Erfan Aref-Eshghi et al. Evaluation of DNA Methylation Episignatures for Diagnosis and Phenotype Correlations in 42 Mendelian Neurodevelopmental Disorders. The American Journal of Human Genetics, Volume 106, Issue 3, 2020.
Solving real world decision-making problems typically requires a careful trade-off between multiple, possibly conflicting, objectives. For example, essential concerns such as interpretability, fairness, and execution speed often conflict with the primary performance metric, such as classification accuracy. The objective of this project is to evaluate algorithms for decision-making under uncertainty (i.e., multi-armed bandits) in terms of these secondary objectives. If time permits, an extension into procedural fairness and interpretability in contextual bandits can be considered. As contextual bandits involve decisions made based on a set of features, it is crucial to ensure that these decisions are interpretable and made fairly with regards to a set of sensitive features (e.g., gender).
Patil, Vishakha, et al. “Achieving fairness in the stochastic multi-armed bandit problem.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 04. 2020. https://ojs.aaai.org/index.php/AAAI/article/view/5986/5842
Turgay, Eralp, Doruk Oner, and Cem Tekin. “Multi-objective contextual bandit problem with similarity information.” International Conference on Artificial Intelligence and Statistics. PMLR, 2018. http://proceedings.mlr.press/v84/turgay18a/turgay18a.pdf
Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020. https://tor-lattimore.com/downloads/book/book.pdf