- Online demos
Le Machine Learning Group propose pour l'année 2012/2013 une dizaine de sujets pour les étudiants en master. Les domaines d'applications incluent le calcul à haute performance, la bioinformatique, les réseaux de capteurs, l'évolution artificielle, la médecine assistée par ordinateur, les protéines artificielles et la dynamique des réseaux.
NB: Le nombre de sujets est limité. Les étudiants intéressés sont priés de se manifester au plus tôt.
The collection of gigantic datasets in several domains (e.g. social networks, finance, internet) and the need to extract useful information from them asks for the development of new and effective techniques to store and mine very large data structures. The Master thesis will focus on methods to scale up and make parallel machine learning algorithms in order to deal effectively with very large and distributed databases (e.g. Hadoop). The objective of the thesis is to design and setup a running distributed system (based on existing open-source solutions) to store and analyze huge datasets.
Required competences; machine learning, computational statistics, programming.
The treatment of muscle spasticity can nowadays take advantage of a large amount of clinical data. This is made possible by the development of new sensor technologies (e.g. camera, magnetic sensors, wearable inertial sensors like gyroscopes, accelerometers) and their integration in daily life monitoring systems. This opens the way to the development of data-driven approaches to modelling and detection of human movement with the purpose of obtaining better diagnosis of patients (e.g. affected by Parkinson's disease), improving the medication process and recognizing the movement patterns (e.g. in biometrics). The Master thesis will focus on automated approaches based on statistical machine learning and data mining approaches to emulate the innate human capability of recognizing, disambiguating, classifying the type of movements and to support clinicians in diagnosing and decision making.
The thesis will be carried out in the context of the ICT4REHAB project funded by the Brussels Region.
Required skills: statistical analysis, numerical computing, machine learning, passion for interdisciplinary research
One of the main research tracks in the group is linked to questions related to the structure and function of proteins. Machine learning methods can assist in answering these questions.This is a short list of topics which we want to investigate:
1. Mining relevant features that drive protein function : Understanding how proteins behave is a complex task. Mutants of the same protein provide an invaluable source of data for the application of statistical learning techniques. Mutation data can be analyzed to find patterns of amino acids that are most relevant for the manifestation of a certain protein behavior. This thesis proposal aims at learning relevant features and rules to explain a specific protein behavior. The techniques used for attaining this goal will draw on game theory (used for feature selection) and/or inductive logic programming.
2. Preference handling as an approach to analyze and understande protein binding preferences: Proteins and especialy their domains have a finely tuned preference for particular peptides. One of the main interests of bioinformaticians is to provide a description of these preferences so that potential binding partners can be searched with the database of known proteins. This project proposes to investigate the relevance of preference handling methods to solve this problem.
1. Multi agent learning is a field of increasing importance, both for its direct technological impact on our society, and for its importance in modeling real complex phenomena in economics and social sciences. The social aspect of learning is often neglected in the design and analysis of adaptive multi agent systems. We propose to study the synergy of social and individual learning in the light of recent advances in evolutionary anthropology (cultural evolution in particular), evolutionary game theory, and learning theory.
The aim of this project is to investigate, through a model of the hematopoietic system and CML, the emergence and dynamics of therapy resistant clones, and the relation between patient treatment response, survival and the diagnostic risk groups. Patients diagnosed with early-phase CML may relapse during treatment due to the appearance of cancer cells resistant to first-line treatment compounds like Imatinib. Understanding therefore how treatment affects the dynamics of these resistant cells is important and resulting insights will aid medical practitioners in setting up treatment protocols for individual patients. In addition, each patient responds differently to Imatinib. Using our model and available serial Q-RT-PCR patient data we can determine the severity of the disease and the quality of initial treatment response. Together these will reclassify patients with respect to their survival chances. Additionally it will shed light on the correlation between the risk groups identified at diagnosis and treatment response, which is not clear yet.
Interested? Contact Tom Lenaerts for more details.
7. Analyse de la sécurité d’implémentation physique de finalistes du concours SHA3 au moyen de l’apprentissage automatique (TFE encadré par QualSec et MLG (machine learning group))
Le NIST hash function competition (SHA3) est un concours mondial (dont le terme est prévu au cours de cette année 2012) dont le but est de sélectionner une nouvelle fonction de hachage cryptographique en tant que standard pour les Etats-Unis. Une telle fonction de hachage permettra entre autre d’assurer l’intégrité et l’authenticité des informations transmises ou stockées et ce sur base de l’utilisation d’une clé secrète (i.e. Message Authentication Code, MAC). Un tel MAC doit résister aux attaques cryptanalytiques connues. Une de ces techniques efficaces d’attaques est celle des attaques d’implémentations physiques par canaux auxiliaires (side channel attacks). Celles-ci se focalisent sur le comportement des devices physiques (la consommation d’énergie ou le temps de calcul) pour vérifier si une information secrète peut en être déduite. Toutefois, ces techniques connues et déjà implémentées semblent pouvoir être améliorées en les combinant avec celles de l’apprentissage automatique. Le travail se focalisera sur cette combinaison novatrice dans le but de tester des finalistes du concours SHA3.
Directa Sim, an Italian online trading broker, is organizing a trading challenge with real money for European master students.
The goal of the project is to use machine learning/statistical techniques to build a trading model. The people interested in participating are supposed to form a group and choose a leader who will manage the trading operation.
For more information: http://www.universiadideltrading.it/index_fr.html or contact Andrea Dal Pozzolo at email@example.comRequired skills: Statistical analysis, machine learning.
"Classification is a basic task in data analysis and pattern recognition that requires the construction of a classifier, that is, a function that assigns a class label to instances described by a set of attributes. The induction of classifiers from data sets of preclassified instances is a central problem in machine learning. Numerous approaches to this problem are based on various functional representations such as decision trees, decision lists, neural networks, decision graphs, and rules.
Course prerequisite: INFO-F-422 (Statistical Methods of Machine Learning) or some equivalent courses.
Multiscale or multiresolution analysis is a technique for the analysis and processing of data in a telescopic way. That means that the data is decomposed into a reperesentation that separates global, large scale features from small scale details, with a broad spectrum in between. In that sense, multiscale is related to a frequency (Fourier) analysis (with slowly and fast oscillating components), but, unlike a Fourier transform, a multiscale analysis keeps information on the location in the original time or space domain.
The most well known example of a multiscale analysis is a wavelet decomposition. Wavelets are particularly popular in image processing, for instance in the JPEG compression standard. This thesis investigates the use of an other algorithm for a multiresolution, known as a Laplacian pyramid. This Laplacian pyramid is a slightly overcomplete transform, meaning that it maps n data onto 2n coefficients in the multiscale representation. It can be implemented as an overcomplete version of a lifting scheme, which is a fast implementation of the wavelet transform.
In this thesis, the Laplacian pyramid is equiped with a local polynomial smoothing technique, popular in statistics. The objective is to investigate the properties a Laplacian pyramid with local polynomial smoothing in applications of image processing (denoising, compression).
The selection of an optimal model from a broad spectrum of non-nested models can be driven by a criterium that balances a good prediction of the training set and complexity of the model, that is, the number of selected variables. Optimization over a number of variables, or even comparison of models with a given number of variables is a problem of combinatorial complexity, and thus not feasible in the context of high-dimensional data. Part of the problem can be well approximated by changing the number of selected variables in the criterium by the sum of absolute values of the estimators of these variables within the selected model. The counting measure is replaced by a sum of magnitudes, thus changing a combinatorial problem into convex, quadratic programming problem. This problem can be solved by a wide range of algorithms, including direct methods, such as least angle regression, or iterative methods, such as iterative thresholding or gradient projection. Moreover, for a fixed value of model complexity, the relaxed problem selects approximately the same model as the original combinatorial one. This is no longer the case when the model complexity is part of the optimization problem, but a correction for the divergence between the combinatorial and quadratic problem can be established. The thesis is about the application of the variable selection in sparse inverse problems, or in deblurring and denoising images, using gradient projection or iterative thresholding.
The study and modelling of human behaviour is a key aspect in the development of AmI systems, since its intrinsic goal is focused in user assistance. Nowadays, the advances in technology, and specially sensor networks and the miniaturization of electronic devices, allow us to monitor the user activity at any time and place with the aim of improving life quality. However, the real time recognition of these activities becomes a challenge when the sensor signals provide multivariate data. We need fast and efficient methods not only to learn a behaviour from the training data, but also to recognize the learned activities online. The aim of this thesis will be focused in the design and development of machine learning methods to solve this problem, applied to Human-Computer Interaction.
Required skills: Machine learning, statistical analysis, programming skills, passion for interdisciplinary research