About – Machine Learning Group

The ULB Machine Learning Group (MLG) is a research unit of the Computer Science Department of the Faculty of Sciences, co-headed by Prof. Gianluca Bontempi and Prof. Tom Lenaerts. MLG was founded in 2004 by Gianluca Bontempi. The activity of the group covers the areas of machine learning, computational modelling and statistics and their applications in data mining, simulation and time series prediction. In October 2008, Tom Lenaerts joined MLG as a new academic, extending the group’s expertise towards computational biology, evolutionary dynamics and complex systems research.

Currently we focus on:

Big data mining, modeling and prediction

This research concerns the use of machine learning techniques for extracting relevant information from real massive datasets. A particular attention is devoted to techniques of feature selection, causal inference, model selection and validation and long-term prediction.

1. Long term prediction of time series: reliable and accurate prediction of time series over large future horizons has become the new frontier of the forecasting discipline. Current approaches to long-term time series forecasting rely either on iterated predictors or direct predictors. We proposed a multi-output extension of our previous work on Lazy Learning and we showed that this prediction strategy can be particularly effective in multiple-step-ahead tasks.

2. Modelling and distributed compression of wireless sensor data: Wireless sensor networks form an emerging class of computing devices capable of observing the world with an unprecedented resolution, and promise to provide a revolutionary instrument for environmental monitoring. In environmental monitoring studies, many applications are expected to run unattended for months or years. Sensor nodes are however constrained by limited resources, particularly in terms of energy. We proposed a machine learning approach which combines time series prediction and model selection for reducing the amount of communication. The rationale of this approach, called adaptive model selection, is to let the sensors determine in an automated manner a prediction model that does not only ﬁts their measurements, but that also reduces the amount of transmitted data. Secondly we designed a distributed approach for modeling sensed data, based on the principal component analysis (PCA).

3. Application of data mining to several domains: motion analysis, anesthesiology, indoor localization and tracking, cryptography (side-channel attack), fraud detection, power systems and remote sensing.

Bioinformatics and Computational Biology

This research addresses the use of computational techniques for the modelling, simulation and prediction of complex biological systems.

In this context the bioinformatics research of Gianluca Bontempi’s team focuses on the use of machine learning techniques for the classification of microarray data in breast cancer and diabetes and the inference of genomic networks. In particular the team obtained signiﬁcant results in the context of the following projects:

Prognostication of breast cancer patients using gene expression proﬁling (in collaboration with the group of Dr. C. Sotiriou in Bordet Hospital and the group of Pr. J. Quackenbush in Harvard Dana Farber).

The research presented an original methodology dealing speciﬁcally with the analysis of microarray and survival data in order to build prognostic models and provide an honest estimation of their performance. The approach used for signature extraction consists of a set of original methods for feature transformation, feature selection and prediction model building. A novel statistical framework was presented for performance assessment and comparison of risk prediction models. Such interdisciplinary contributions have brought new insights in biological processes critical to a patient’s clinical outcome and have been published both in top bioinformatics journals such as Bioinformatics, Genome Biol, PNAS, and BMC Genomics, and clinical journals such as Nature Medicine, Lancet Oncology, J Natl Cancer Inst , J Clin Oncology, Clin Cancer Res and Breast Cancer Research.

Inference of complex networks for expression data (in collaboration with Bordet Hospital, Erasme and CSAIL MIT) :

We developed computationally efﬁcient and theoretically founded techniques for inferring large networks of dependencies from expression measures. We proposed a set of information theoretic approaches which rely on the estimation of mutual information and conditional mutual information from data in order to measure the statistical dependence between genes expression. These techniques have been published in bioinformatics journals such as BMC Bioinformatics, EURASIP and were adopted in the context of the Drosophila Model Organism Encyclopedia Of DNA Elements project (modENCODE) consortium then leading to a Science publication where P.E. Meyer appears as co-ﬁrst author. Also, packages for feature selection and network inference have been made available in the R/Bioconductor framework

The computational biology research of Tom Lenaerts’ team focuses on:

1. Identifying and analyzing the information processing capacity of proteins :

An initial contribution in first area was to show that Shannon’s information theory can be used to identify and quantify the allosteric/cooperative changes in the structural dynamics of a proteins and their domains [1-3].

This approach quantifies the allosteric/cooperative effects induced by peptide-binding, which are essential for protein function and are gradually considered to be a general property for every protein inside the cell. In simple terms the developed method is able to identify how information flows through the structures of the proteins. The insights obtained from this work have both theoretical and medical impact. Currently, the computational predictions are being validated using NMR relaxation experiments, biophysical analysis and in vivo experimentation.

[1] T. Lenaerts, J. Ferkinghoff-Borg, J. Schymkowitz, and F. Rousseau. Information theoretical quantification of cooperativity in signalling complexes. BMC Syst Biol, 3:9, 2009.

[2] T. Lenaerts, J. Ferkinghoff-Borg, F. Stricher, L. Serrano, J. Schymkowitz, and F. Rousseau. Quantifying information transfer by protein domains: Analysis of the Fyn SH2 domain structure. BMC Structural Biology, 8:43, 2008.

[3] T. Lenaerts, J. Schymkowitz, and F. Rousseau. Protein domains as information processing units. Curr Protein Pept Sci, 10(2):133–145, 2009.

2. Understanding the evolutionary dynamics of chronic myeloid leukaemia (CML):

As far as the second CB topic is concerned, the research focusses on the analyses of diseases like

CML through the use of a mathematical model of the hematopoietic system. Together with an international team, we examine the response dynamics of CML to treatment with tyrosine kinase inhibitors (TKI)like Imatinib and Nilotinib, providing in this way insight into clinical data [4-6]. One recent important contribution showed that stochastic effects at the level of the stem cell pool and early progenitors leads to the loss of the original cancer cell that drives the disease [4]. This result has raised the interest of clinicians since implies that TKI might be capable of curing a patient, which is up to now still considered to be impossible.

[4] T. Lenaerts, J.M. Pacheco, A. Traulsen, and D. Dingli. Tyrosine kinase inhibitor therapy can cure chronic myeloid leukemia without hitting leukemic stem cells. Haematologica, 95(6):900-907. 2009.

[5] T. Lenaerts, F. Castagnetti, A. Traulsen, J.M. Pacheco andG. Rosti, and D. Dingli. Explaining the in vivo and in vitro differences in leukemia therapy. Cell Cycle, 10:1540-1544 , 2011.

[6] D. Dingli, A. Traulsen, T. Lenaerts, and J.M. Pacheco. Evolutionary dynamics of chronic myeloid leukemia. Genes & Cancer, 1(4):309-315. 2010.

Dynamics of cooperation and competition

This research focuses on the mathematical and computational modelling of populations of learning individuals that try to collectively achieve some goal like cooperation, communication or coordination. In this context, Tom Lenaerts’ team has examined the effects of network structure on the evolution and learning of cooperative behavior in social networks using evolutionary game dynamics (ED).

One of the fundamental assumptions of early ED research is that individuals interact in a well-mixed population, meaning that every individual can interact with everyone else and that each has the same number of interactions. Through the analysis of technological, social and biological networks, it has clearly been shown that this basic assumption is wrong when one wants to understand the dynamics of real populations.

We showed the relevance of network topology on evolutionary dynamics by studying the evolution of cooperation in social dilemmas on networks with different degrees of heterogeneity [1].

We also showed that when individuals can decide to change their interaction partners, then cooperation grows even faster [2]. This is an essential feature since social and technological networks are not static. Links change continuously, adapting in this way the topology to the spread of cooperation in these particular games. Extensive analysis of the different parameters that make up these problems has been performed and published [3-6].

Recently our interest has shifted to group-based dynamics (modeled through N-player games), individual learning (as compared to social learning) and harvesting problems (described as public goods games).

[1] F C Santos, J M Pacheco, and Tom Lenaerts. Evolutionary dynamics of social dilemmas in structured heterogeneous populations. Proc Natl Acad Sci U S A, 103(9):3490–4, 2006.

[2] Francisco C Santos, Jorge M Pacheco, and Tom Lenaerts. Cooperation prevails when individuals adjust their social ties. PLoS Comput Biol, 2(10):e140, 2006.

[3] S. Van Segbroeck, F.S. Santos, A. Nowé, J.M. Pacheco, and T. Lenaerts. The evolution of prompt reactions to adverse ties. BMC Evolutionary Biology, 8:287, 2008

[4] S. Van Segbroeck, F.C. Santos, T. Lenaerts, and J.M. Pacheco. Reacting differently to adverse ties promotes cooperation in social networks. Phys Rev Lett, 102(5):058105, 2009.

[5] S. Van Segbroeck, S. de Jong, A. Nowé, F.C. Santos, and Tom Lenaerts. Learning to coordinate in complex networks. Adaptive Behavior, 18(5):416-427. 2010

[6] S. Van Segbroeck, F.C. Santos, J.M. Pacheco, and T. Lenaerts. Coevolution of cooperation, response to adverse social ties and network structure. Games, 1(3):317-337. 2010